TEACH MATCHES Revised A.Sloman Oct 1996 Revised Nov 2011 (With help from Will Price) matches -> This file introduces the use of the Pop-11 pattern matcher. Readers who are familiar with pattern matching will find a more complete and concise summary in HELP * MATCHES. (Note for experts: In Sept 1996 this file was extended to include the use of the pattern prefix "!" which allows matcher variables to be lexically scoped.) PATTERN MATCHING IN POP11 ========================= CONTENTS - (Use g to access required sections) -- Introduction -- Using the matcher in conditions: matching expressions -- A pattern includes one or more pattern elements -- What is a list? -- A pattern is a list used as a template for matching -- Pattern elements with a memory -- Some examples -- How MATCHES works: a first approximation -- Matching arbitrary elements -- Segment pattern elements -- Digging out elements of a list using ?? -- Using list constructors/insertion elements in a created list -- Improving the rule -- More detailed explanation of segment pattern elements using ?? -- More examples -- Getting practice with pattern variables -- Repeated pattern variables -- Using the same pattern variable more than once in a pattern -- Constraining a variable to match only one item, using "?" -- Revision questions -- Note on "!" with patterns -- Further reading -- Introduction ------------------------------------------------------- POP11 has a built in 'pattern matcher' which can be used for checking the correspondence of a list with a pattern. The pattern matcher provides a powerful tool for operating on lists, and makes it much easier to write list-processing programs than if you use the "lower level" facilities such as "hd" and "tl", sometimes referred to as "first" and "rest", or "car" and "cdr". This TEACH file introduces the use of the matcher, but without mentioning all of its capabilities, since that would be too confusing at this stage. At the end of this file there are pointers to additional things to read, giving information about the more sophisticated facilities, and some uses of the matcher. If you look at TEACH * RESPOND after this you will see how to use the MATCHER to write a simple chatbot program. In AI programs it is very common to use lists to represent information. Solving problems can then require programs to examine various lists to find information that is relevant to the problem and then make use of that information. Writing programs to dig into structures of varying complexity to find information can be tedious and error prone. Use of the matcher simplifies many commonly occurring cases. However it is not claimed that this is only way to store, find and reuse information in AI systems. For example use of neural nets is another. There is no single right way to do it (even brains seem to use varied ways of storing and using information). The pattern matcher described below, and other procedures defined in terms of it, can be used for checking whether some information is present in a list or list of lists, and also for extracting components from memory stores. It can also be used for comparing a sentence typed in by the user with various templates and choosing an action depending on which template the input matches. This file introduces those mechanisms and some additional syntax for easily constructing new information structures on the basis of what was previously found. -- Using the matcher in conditions: matching expressions -------------- Checking whether some required information is in a stored structure can take many forms, but the basic form we explain in this tutorial is use of an expression in which the operator "matches" is used to compare two items, a "datum" and a "pattern". This is a matching expression: matches A matching expression, when run by pop11, will produce a boolean (true or false) result, so that it can be used as a test in conditional expressions, such as if matches then endif The use of patterns allows many complex tests to be collapsed into a simple form, producing shorter, clearer, programs, that are easier to write than more common alternatives, as will be demonstrated below. -- A pattern includes one or more pattern elements ----------- The can be any list, including a list of lists. The can also be a list, but in addition to the sorts of items normally found in lists it can contain "pattern elements" of various sorts. These pattern elements control how the various items in the datum are grouped and compared with items in the . In addition to pattern elements a list may include constant items, e.g. numbers, words, strings, vectors, procedures, and lists (among others). The pattern expression [the = is ?x on == and ??y] contains four constants, the words "the", "is", "on", "and", and also four pattern elements "=", "==", "?x" and "??y", whose uses will be explained below. If there are no pattern elements in the pattern, then the comparison is the same as that done by the pop11 equality operator "=". I.e. [a b c] matches [a b c] has the same effect as [a b c] = [a b c] [Note: later you will meet a stronger equality operator "==" which requires strict identity rather than similarity of structure and contents. These two uses of "=" and "==" as infix operators and as pattern elements are quite distinct. The context should make clear which is being used.] So a pattern containing only constants, without any pattern elements, can be thought of as a "limiting case" of a pattern. When a matching expressing uses a pattern containing only constants, it produces a boolean (true or false) result. But when the pattern contains pattern elements there can also be important side effects. In particular, the matching process can discover and remember components of the datum, for use in subsequent actions. This "data extraction" capability, combined with the economy of expression mentioned above, makes writing some of the kinds of programs required for intelligent systems much faster and easier than use of a conventional programming language. Examples will be given below, and in many other teach files. The Pop11 matcher is at the heart of the Poprulebase system and the SimAgent toolkit, which are powerful extensions to the core of pop11. There is a more general and more powerful version of the matcher which was added much later by John Gibson. It is described in HELP EQUAL and may be used by advanced programmers. However, it would be a major job to rewrite all the libraries and documentation to use the new version. -- What is a list? ---------------------------------------------------- The in a matching expression is a list. In POP-11 lists are created using square brackets. Thus the following is a list of words: [the cat sat on the mat] this is a list of numbers: [1 2 3 99 98 97] and this is a list containing three strings: ['the cat' 'my dog' 'your boat'] This is a list containing a number, a word and a string [ 99 hundred 'a hundred and one'] To be more precise, those are list expressions, which instruct Pop-11 to create the appropriate lists, which are special structures that can be stored in the computer's memory, then combined, searched, changed, etc. Pop-11 allows lists that contain a mixture of items of different sorts. The following list contains words, numbers, lists of words and a list of lists of numbers. [1 cat 2 dog [a b c] [d e f] [[66 67 68] [200 205 210 215]] 99] In other words, Pop-11 allows lists to contain any mixture of objects of arbitrary types, unlike some languages that require all elements of a list to be of the same type. A list expression can extend over several lines: spaces and newlines are treated as equivalent. E.g. this is one list: [a list expression containing several words] -- A pattern is a list used as a template for matching -------------- In a matching expression matches The , like the is a list. But that list may contain not only Pop11 items such as words, numbers and lists, but also special items called 'pattern elements'. These are components of the pattern that are matched against individual items or groups of items in the datum. An example of a pattern is: [== is a ==] This can be used as a template which would match the lists: [fred is a student] [the oldest person in the room is a very happy woman] [[joe bloggs] is a [champion golfer] or even: [ sad asdf gobbledigook is a junkrubbish asdfasd sadf] The pattern element "==" will "match" any sequence of items in a list, a bit like a joker, or "wild card" in a card game, which can be used to stand for any card. The difference is that "==" can stand for any sub-sequence of list elements, even an empty sub-sequence, so that these lists will also match the above pattern: [fred bloggs is a] [is a merry old soul] [is a] -- Pattern elements with a memory ------------------------------------- The pattern element "==" does not remember what it matched. It is therefore a memory-less pattern element. However when we want a matching expression not only to produce a true or false result, but also to remember what was found, then we can replace the memory-less element with an element of the for ??, to remember an arbitrary sub-sequence of items in the datum, or ? to remember exactly one matching item in the datum. For example if we declare the variables role1, role2; vars role1, role2; and then try these two matching expressions, [Fred Bloggs is a good teacher] matches [== is a ??role1] => [Mary is a doctor] matches [== is a ?role2] => Then both should produce the result true. The pattern elements "==" will not remember what they were matched against. But the variables role1 and role2 will remember. The first will remember that it was matched against a list containing two words: role1 => ** [good teacher] The second will remember that it was matched against a single item, a word (not a list containing that word): role2 => ** doctor The differences between "??" and "?" as prefixes for pattern elements, and their uses, will become clearer with practice, provided below. Summary so far: The normal format for using the matcher is: matches That is a POP-11 instruction that produces a result that is either TRUE or FALSE, and if the pattern contains "??" or "?" pattern elements, the associated variables will remember what they were matched against. -- Some examples ------------------------------------------------------ To get a feel for how you use the matcher, try out the commands below, using the Load This Line or Mark and load commands. The result will be printed in your 'output.p' file. Before loading each line see if you can tell whether the result will be TRUE or FALSE. [ 1 2 3] matches [1 2 3] => [1 2 3] matches [3 2 1] => [steve is a teacher ] matches [ == is a == ] => [fred is not a computer] matches [ == is a == ] => Experiment with different lists and different patterns to see which ones produce TRUE and which FALSE. Insert and delete words and pattern elements in the above lists and patterns. Compare the last one with [fred is not a computer] matches [ == is == a == ] => -- How MATCHES works: a first approximation --------------------------- The operation called "matches" takes as its inputs two lists, one written on each side. The first is the "datum", the second the "pattern". It decides whether the list on the left (the datum) matches the pattern on the right, by comparing the two element by element. If it does match, the operation produces the result TRUE, otherwise the result FALSE. It always produces a "BOOLEAN" result, so it can be used in conditional expressions in the form if matches then .... If it finds an embedded list on one side there must be an embedded list in the corresponding place on the other side, and it will compare them, as in this example: [[the old mayor] of [new york]] matches [[the ==] of [== york]] => If the pattern element "==" is found in the pattern, the matcher will try matching different numbers of items from the datum list, to see if it can find a way to make everything else match. Pattern elements with a memory, ie. pattern elements containing a variable name, are interpreted differently, as described below. This sort of ability can be used in programs like Chatbots (e.g. Eliza) to sort sentences typed by the user into different categories, to be treated differently, as explained in TEACH RESPOND. For example you can write programs of the form: if matches then elseif matches then elseif matches then elseif matches then .... etc ... -- Matching arbitrary elements ---------------------------------------- You can use '==' to represent any number of 'don't care' elements, including NO elements. Try the following and see which are true, which false (in each case, try to work out the result for yourself before you get POP-11 to obey the command, using the Load This Line command. [1 2 3 4] matches [1 == 4] => [1 2 3 4] matches [1 == ] => [1 2 3 4] matches [2 == 4] => [1 2 3 4] matches [== 4] => [1 2 3 4] matches [1 == 2 3 4] => [1 and also 2 3 4] matches [1 == 2 3 4] => [1 2 3] matches [== 2 ==] => In each case where the result is FALSE try changing either the list on the left or the pattern to make it TRUE, and vice versa. Notice the different ways in which == will match an empty sublist [ 1 2 3] matches [== 1 2 3] => [ 1 2] matches [ 1 == 2] => [ 1 2 3] matches [== 3 ==] => Will these come out true or false? [1 2 3] matches [== 1 == 2 == 3 == ] => [1 2 3 4] matches [ == ] => Work out your answer then try them. Try constructing a list containing the numbers 1 to 6 that will match this pattern: [== 3 == 2 == 1 == ] There are many ways of doing that. How many? Then create a list that will not match the pattern. Test your answers. -- Segment pattern elements ------------------------------------------- "==" is called a "segment" pattern element because it will match a segment (i.e. a sublist) of arbitrary length in the datum. It is an anonymous segment pattern element because it has no name, and you therefore cannot refer to the segment of the list that was matched. Later we'll see how to use ?? with a name, as a segment pattern element with a memory, to store the item or items that were matched. "=" is another type of anonymous pattern element, which will match only ONE item at a time in the datum. See which of these is true, and which false: [1 2 3 4 5] matches [ 1 = 3 = 5] => [1 2 3 4 5 6] matches [ = 2 = = 5 =] => [1 2 3 4 5 6] matches [ = 2 = 5 =] => [the cat is on the mat] matches [= is on =] => [[the cat] is on [the mat]] matches [= is on =] => To understand the last example you have to realise that an embedded list, like [the cat], is treated as ONE element of the list, so it can match an occurrence of "=". -- Digging out elements of a list using ?? ---------------------------- Often you don't merely want to see if something is recognised as fitting a pattern. You also want to get at the components of the list, and use them in a subsequent command. For that Pop11 allows use of pattern elements with a memory, represented by a Pop11 program variable. For instance, consider a conversational program with the following strategy: user types program responds FRED IS VERY HAPPY SUPPOSE FRED WERE NOT VERY HAPPY SMOKING IS BAD FOR YOU SUPPOSE SMOKING WERE NOT BAD FOR YOU EVERY COMPUTER IS STUPID SUPPOSE EVERY COMPUTER WERE NOT STUPID. The rule can be expressed in English something like this. Rule1 (English): If the input contains some words followed by "is" followed by some more words, then the output must contain the word "suppose" followed by the first set of words, then "were not" followed by the second set of words. In order to express that in Pop-11 we shall need to use a conditional instruction of the form: IF THEN ELSE ENDIF This says test the result of . If it is true, then perform the actions in otherwise perform the actions in We can use that in the following translation (notice that we are using four variables "input", "first", "second" and "output", which have not yet been declared): For example, this could be part of a rule-based program: vars input, output; [Uncle fred is often very impatient] -> input; if input matches [??first is ??second] then [suppose ^^first were not ^^second] -> output; else [Sorry I do not understand] -> output; endif; output => Mark all of that and then run the marked range (Ctrl-d) to see what is printed out. Alter either the list assigned to input, or the pattern after "matches", then see what happens when you mark and re-compile the whole range. [Note, Pop11 uses 'endif' as a closing bracket for 'if', 'endwhile' as a closing bracket for 'endwhile', 'endfor' as a closing bracket for 'for', and so on.] Here "??" is used, in combination with a variable name, e.g. ??first or ??second (though spaces can be used between them) to define a PATTERN ELEMENT used in matching and remembering what is matched. -- Using list constructors/insertion elements in a created list ------- In contrast, after the "then" of the conditional instruction, "^^" is used, in combination with a variable name (as in "^^first"), as a LIST CONSTRUCTOR ELEMENT, or an "insertion element", to produce a new list starting with the word "suppose", that is assigned to the variable "output". [suppose ^^first were not ^^second] -> output; [NB: Output here does not refer to the computer's output, such as a printer or screen. It is just programming variable, to which the list can be assigned. It could have been "x", or "y", or "supposition" for example.] In general a "list constructor element" (or "insertion element") is a portion of a list that functions as a program command to insert something in the list, instead of being itself inserted in the list. So in the above examples, neither the pop11 word "^^" (which is a two character word made of symbols) nor the word "first" is included in the list assigned to output. But together they tell pop11 to splice in the contents of the list that is the value of "first". Pop11 allows an alternative syntax for that, using the append operator "<>" which can join two objects of the same type make a new object of that type containing all the elements of the two objects. So this [suppose ^^first were not ^^second] -> output; is equivalent to [suppose] <> first <> [were not] <> second -> output; One of the advantages of the previous notation is that it shows the correspondence with the original pattern structure more clearly (though that is not always important), and for reasons that will not be explained here it can sometimes be more efficient to use list constructor elements than to use the list append operator. Note, some Pop11 teach files invite you to read ?var and ??var as "set the value of var" "set the value of var(a list)" and suggest reading ^var and ^^var as "get the value of var" "splice in the value of var(a list)" -- Improving the rule ------------------------------------------------- The above rule would sometimes behave rather stupidly, for instance responding to: I DONT THINK THAT IS VERY SENSIBLE with SUPPOSE I DONT THINK THAT WERE NOT VERY SENSIBLE Getting round that stupidity would require giving the program an understanding of English grammar, introduced in another teach file (TEACH GRAMMAR). Never mind the stupidity for now. Let's look at the important programming concepts. -- More detailed explanation of segment pattern elements using ?? ----- Let's look more closely at the pattern elements used in the first line of the conditional: if input matches [??first is ??second] then Instead of the anonymous pattern element "==" we now have two new pattern elements ??first and ??second. These ensure that if "matches" returns the result TRUE, i.e. if the match succeeds, then the variables first and second will remember which segments of the input list were matched against the two pattern elements before and after "is". The variable first will then contain a list of items from the input which were before "is", and the variable second will contain a list of the items that came after "is". (Either or both could be empty lists). If the value of input were the list [fred is very happy], matches would do the following assignments: [fred] -> first; [very happy] -> second; Test that as follows vars first, second; [fred is very happy] matches [ ??first is ??second ] => Then find out the values of first and second thus: first => second => The new values of the variables FIRST and SECOND can then be used in conjunction with the double-up-arrow symbol '^^' to build up the response [suppose ^^first were not ^^second] which in this case would evaluate to [suppose fred were not very happy] Try it: [suppose ^^first were not ^^second] => You can think of "^^" as meaning, roughly, 'use the value of the following variable to insert elements into the list'. If the value of the variable following "^^" is not a list, a mishap will occur. So "??" in a pattern expression means, roughly, "set the value of...." and "^^" means "Use the value of. It can also be used in more complex ways to insert found or constructed components into a list. Later you will learn also to use "?" to set the value but without constructing a new list, and "^" to insert the value, but without first putting it in a list. For now, we shall not pursue the use of "^^". (For more on "^^" see TEACH ARROW, and TEACH RESPOND). -- More examples ------------------------------------------------------ To see how the matcher assigns lists to pattern variables in the above examples, try getting POP-11 to obey the following. Note that we start by telling POP11 that we are going to use FIRST and SECOND as variables: vars first, second; [fred is sad] matches [ ??first is ??second] => first => second => Then try: [father christmas is a very old man] matches [ ??first is ??second] => first => second => Try further variations of your own. You can use more than two "pattern variables", e.g. vars a, b, c; [alice andrews stood between bertha butlin and cathy carter] matches [??a stood between ??b and ??c ] => a=> b=> c=> Notice that an instruction using matches can extend over more than one line, as can other Pop-11 expressions such as an addition: 3 + 4 => -- Getting practice with pattern variables ---------------------------- What will be the values of X and Y after the following matches tests? First try to work out the answers, then get POP-11 to tell you. vars x, y; [a b c] matches [a ??x] => x => [a b c] matches [??x c] => x => [a b c] matches [??x ??y] => x => y => [1 2 3 4] matches [??y] => y => Use the "Compile This Line" (ESC-d) command to find out the answers. Notice that the items not preceded by '??' are 'constants', i.e. they only match themselves. A pattern variable may get the empty list as its value if there are no corresponding elements in the other list. Try: [a b] matches [a ??x b] => x => [1 2 3] matches [??x 1 2 3 ??y] => x => y => When a match fails, the new values of the variables are unpredictable. Try undef -> x; undef -> y; [1 2 3 4] matches [??x 3 5 ??y] => x => y => -- Repeated pattern variables ----------------------------------------- If you use the same variable twice over in a pattern, the match will be TRUE ONLY if there is a corresponding repetition in the list on the left. Try the following: vars x, y; [war is war] matches [??x is ??x] => x=> [war is very nasty] matches [??x is ??x] => [war is very nasty] matches [??x is ??y] => x => y => In the next example the use of "=" before "and" and at the end of the list plays a crucial role: vars sublist; [he is a teacher and he is a father] matches [??sublist = and ??sublist =] => ** sublist => ** [he is a] -- Using the same pattern variable more than once in a pattern -------- In some cases you may find it a little unobvious how the use of repeated variables affects the matcher. Try the following, and see if you can anticipate how the match will work. [1 2 3 1 2 3] matches [??x ??x] => x=> [1 2 3 4 5 6] matches [??x ??x] => contrast: [1 2 3 4 5 6] matches [== ==] => Thus, repeating a variable makes the pattern more restrictive than repeating the "joker" (or "wild card") symbol '=='. Repeated matching of the empty list is allowed. Try: [1 2 3 4 5 6] matches [??x ??y ??x] => x=> y => -- Constraining a variable to match only one item, using "?" ---------- So far we have mostly prefixed variables with the double query symbol '??', which tells POP-11 to match the variable with any number of elements in the left hand list. We sometimes want to make sure we get only one item from the list on the left, and assign that item to a variable. Try: vars first rest; [a b c d] matches [?first ??rest] => first=> rest => Notice how the use of '?' restricts ?FIRST to match ONE item which is assigned to it (the head of the list), whereas '??' allows ??REST to match any number of items (the tail of the list). The items matching REST are put together into a new list and assigned to the variable REST. Compare the following: [a b c d ] matches [?first ?rest] => [a b c d] matches [??first ?rest] => first => rest => In the second case it might have been better to use "last" rather than "rest". Try writing a matches command that will assign the second element of a list to the variable second, e.g. complete the following instruction in such a way as to assign "b" to "second" [a b c d e] matches ........ => Try making it assign the second last element to "x". I.e. You can use the single query "?" to ensure that the matcher digs out just the first or second or or the third or .... the last element of the list. As explained above you can use "=" as an "anonymous" pattern element to match exactly one item, to contrast with "==" which will match any number. Thus if we are interested only in the second, or second last element of a list, we can use a combination of "=", "==" and "?" to extract it, thus vars x; [a b c d e] matches [ = ?x ==] => x => [a b c d e] matches [ == ?x =] => x => Try those out with different lists on the left hand side. How would you use the matcher to dig out the third last element of a list? Try it out and make sure it works. You can mix 'queried' variables, with the 'anonymous' pattern elements '=' and '=='. So the following will make x stand for the object occurring immediately after c : [a b c d e] matches [== c ?x ==] => x => [a 1 b 2 c 3 d 4] matches [ == c ?x ==] => x => Having done that, try a version which will set x to be whatever comes just before the number 3, in various lists. E.g. complete the following so as to make x get the value 2. [a 1 b 2 c 3 d 4 ] matches ....... -- Revision questions ------------------------------------------------- 1. What does MATCHES do? What inputs does it take, and what sort of result does it produce? 2. Explain how '==' works with MATCHES. 3. Explain how '??' works with MATCHES. 4. What is the difference between '??' and '?'. 5. What is the difference between '==' and '='. 6. What are anonymous pattern elements? 7. What are segment pattern elements? -- Note on "!" with patterns ------------------------------------------ There is an extension to Pop-11 available at Birmingham which allows patterns to be preceded by "!" which has the effect that if the pattern is used inside a procedure, then the pattern variables preceded by "?" or "??" can be made "lvars", so that they do not need to be declared as "vars". This reduces bugs in programs due to unintended interactions between global variables, and also allows variables declared as "lvars" to be used in patterns. Example: define next_after(item, list) -> next; ;;; given item and list, return the first thing after item in list ;;; or false if there isn't one. ;;; Note: input and output variables (item, list and next) are ;;; automatically declared as lvars. lvars found; ;;; the pattern variable if list matches ! [ == ^item ?found == ] then found -> next else false -> next endif enddefine; /* ;;; Test that: next_after(3, [1 2 3 4 5]) => ** 4 next_after(6, [1 2 3 4 5]) => ** next_after("cat", [the cat [did sit] on mat] ) => ** [did sit] */ That procedure would not work without the "!" list prefix. The space between "!" and "[" is not required. It was inserted for clarity. Strictly speaking, preceding a pattern with "!" inside a procedure does TWO things. First it treats all the pattern variables in the list, i.e. all those following "?" or "??", as lvars local to the current procedure. If you have not declared those variables as lvars you will get a warning message. Try removing the lvars declaration from the above procedure and recompiling it. Secondly it changes the pattern so that instead of words following "?" or "??" it contains identifier records, which the pattern matcher can use to "communicate with" the local lvars in the procedure. This is possible only since Poplog Version 15.0 The change to the pattern contents may show up if you use the Pop-11 tracer. This is how patterns print out without and with "!". [ == cat ?next == ] => ** [== cat ? next ==] ! [ == cat ?next == ] => ** [== cat ? > ==] The tells you that a word has been replaced with an identifier, and the tells you that this is an undefined global variable. If you give the identifier a value, its value shows when the pattern is printed out: "dog" -> next; ! [ == cat ?next == ] => ** [== cat ? ==] Inside a procedure, the default undefined value is likely to be 0, though that depends on the implementation. -- Further reading ---------------------------------------------------- TEACH * RESPOND shows how to make use of the matcher in designing an Eliza-like program. TEACH * ARROW gives more instruction in using '^^' and other constructs to build lists in POP-11. If you'd like some additional exercises giving more features of the matcher try TEACH * MATCHES2 The Pop-11 PRIMER gives more information about the matcher, with examples. TEACH * LISTS introduces some of the other procedures provided in POP-11 for operating on lists. For more experienced users, HELP * MATCHES gives a summary of the facilities of the matcher, including several more sophisticated than those shown here. Advanced applications of the matcher are used in the Poprulebase library and in the SimAgent toolkit. Adventurous learners wishing to find out more about those two can do the following: First make the libraries available uses newkit then you can access these: TEACH * RULEBASE HELP * POPRULEBASE TEACH * SIM_AGENT HELP * SIM_AGENT (long) --- $usepop/pop/teach/matches --- Copyright University of Sussex 2011. All rights reserved. ---------- --- Copyright University of Birmingham 2011. All rights reserved. ------