TEACH GRAMMAR Aaron Sloman Updated Aug 1994 GENERATING AND ANALYSING SENTENCES ---------------------------------- CONTENTS - (Use g to access required sections) -- You need to know a little about grammar -- A sentence is an 'np' followed by a 'vp' -- There are other syntactic categories -- Types of sentences can be described with grammars -- Words have categories too -- Now print out the grammar and the lexicon -- How to generate sentences -- Exercise 1: add more words to the lexicon -- Exercise 2: add adjectives to the grammar -- Exercise 3: railway station announcements -- A complex grammar is provided for you -- Look at the library grammar -- Look at the library lexicon -- Analysing sentences: the procedure setup -- Seeing if your grammar 'accepts' a sentence -- Try more sentences, using the macro "---" -- Some examples with preposition phrases -- Using 'donouns' -- Use "trace" to find out more -- Tracing gives more information if the sentence is unacceptable -- How to turn off tracing -- Using "showtree" to get a picture of a parse tree -- Exercise: extend the grammar to allow several adjectives -- Try to stop the production or acceptance of "stupid" sentences -- Summary The LIB GRAMMAR library package makes available some programs which you can use to explore sentence structures. This teach file introduces you to the notion of a formal grammar for a simplified subset of English and shows you how to use the library program to generate or analyse sentences according to a grammar. -- You need to know a little about grammar ----------------------------- The following sections assume you know a bit about grammar. 1. Nouns are words like "cat" "dog" "car" "number". 2. Nounphrases are expressions like "the old man", "the dog that bit the cat", "each little lady who likes programming". We use NP as an abbreviation for "noun phrase". Each noun phrase typically refers to something. 3. Verbs are words like "hit", "look", "choose", "put", "turn". Some verbs are transitive, and are followed by a noun or noun phrase, whereas others are intransitive. In "Bill hit Fred", the verb "hit" is transitive and as "Fred" as its direct object, whereas in "Bill smiled" the verb "smiled" is intransitive: it has no object. 4. Verb phrases (abbreviated as VP), include things like "hit me", "looked at the old man", "chose the book in the car", "put the dog in the car in the box". A verb phrase typically says something about an object referred to in a noun phrase. So it normally needs to be preceded by a noun phrase to from a complete assertive sentence, as in [[the big boy] [hit me]] [[Mary] [looked at the old man]] [[the dog in the box in the corner][barked at the big grey cat]] 5. Prepositions are words like "at", "in", "inside", "over", "under", which are typically followed by a noun phrase to form a prepositional phrase, such as "at the house", "in the box", "inside the garden". -- A sentence is an 'np' followed by a 'vp' ---------------------------- By joining an NP to a VP (which may contain embedded NPs) we can form a sentence (abbreviated S), for example: [NP the cat] [VP bit the girl] [NP the man in the car] [VP looked at the little dog] [NP he] [VP jumped] [NP Mary] [VP smiled at John] [NP Every student] [VP enjoys programming] Try swapping those round with different combinations of NPs and VPs. Try creating some more example sentences of your own, and dividing them into noun phrases and verb phrases. -- There are other syntactic categories -------------------------------- In addition to nouns and verbs those examples used other types of words, including determiners (abbreviated DET), like: "the", "each", "some" adjectives (ADJ) like: "old" "little" prepositions (PREP) like: "in" "at" "on" "under" We've also used different sorts of verbs - transitive verbs which require a following NP (e.g. "bit") and intransitive verbs which don't (e.g. "jumped"). Some verbs may allow or even require the use of an associated preposition, e.g. jump over NP put NP in NP look at NP Note that even if there were only one rule for forming sentences, namely to combine an NP with a VP, there could still be many different types of sentences because NPs can have different forms and VPs can have different forms. In fact there are many different ways of forming sentences. E.g. if S is a sentence so is "It is not the case that S" If S1 and S2 are sentences so are: S1 and S2 S1 or S2 if S1 then S2 and so on. These are compound or molecular sentences, which contain sentences as components. Atomic sentences do not contain sentences as components. -- Types of sentences can be described with grammars ------------------- The different forms of sentences can be described economically by means of a grammar and a lexicon. The grammar specifies the types of sentences, noun phrases, verb phrases, propositional phrases, and so on that are allowed by the language, and the lexicon indicates which words can be used in different roles in the sentence. If you give the computer a set of rules defining a grammar and a lexicon, it can show you what sorts of sentences it generates. First you have to define a grammar. Type in the following definition of a grammar, preferably using the editor to store it in a file called 'mygram.p'. (You can leave out the "comments" following three semicolons ";;;" - the three semi-colons tell POP11 to ignore the rest of that line. They are included here simply to help you see what's going on). We declare a variable MYGRAM which will be given a list of rules as its value. After the declaration, type in the list of rules (in this case a list of lists of lists!). NB don't forget '.p' in the file name. ;;; declare a variable mygram, and initialise it with a list of ;;; rules for sentence components vars mygram = [ ;;; start a list of rules ;;; a sentence is a NP then a VP [s [np vp]] [np [snp] [snp pp]] ;;; a NP is either a simple NP ;;; or a simple NP followed by ;;; a prepositional phrase PP [snp [det noun]] ;;; a simple NP is a determiner followed by ;;; a noun [pp [prep snp]] ;;; a PP is a preposition ;;; followed by a simple NP. [vp [verb np]] ;;; verbphrase = verb followed by NP ] ; Type or copy that lot into your file mygram.p, taking care to get the square brackets right. You don't need to include the "end of line" comments following three semi-colons ";;;". -- Words have categories too ------------------------------------------- Now you can type in a lexicon, to tell the system about nouns, verbs, prepositions, and determiners. ;;; Declare a variable mylex, and initialise it with a list of rules ;;; specifying lexical categories, i.e. types of words vars mylex = [ ;;; start a list of lexical categories [noun man girl number computer cup battle room car garage] [verb hated stroked kissed teased married taught added] [prep in into on above under beside] [det the a every each one some] ]; Put that in your file. Again, take care to get the brackets right. -- Now print out the grammar and the lexicon -------------------------- Compile those two definitions. You can now print out your grammar, using the 'pretty-print' arrow thus: mygram ==> and print out your lexicon mylex ==> The "pretty print" arrow "==>" makes long lists print out neatly. -- How to generate sentences ------------------------------------------- Add the following line to the top of your file to load LIB GRAMMAR. uses grammar; Now compile that line. Here's how you get the computer to generate sentences according to your grammar and lexicon (you will find that not all the sentences generated actually look like good English. That is a sign that the grammar is inadequate). Try this command: generate(mygram, mylex) ==> You can make the computer generate many sentences, by using the Pop-11 "repeat" construction. repeat 20 times generate(mygram, mylex) ==> endrepeat; -- Exercise 1: add more words to the lexicon -------------------------- You can now try extending the grammar and lexicon to generate a wider range of sentences. Edit the file containing the grammar and the lexicon. Exercise 1. Add more words in each category to the lexicon, and redo the above "repeat....endrepeat" command to generate sentences. Some of the sentences will make no sense, and many will appear quite ungrammatical. Try to analyse what exactly is wrong with them. -- Exercise 2: add adjectives to the grammar -------------------------- Exercise 2. Try adding adjectives (e.g. "big", "pretty", "clever", "square", "red") to the grammar. o First you will need to extend the lexicon (i.e. the list called "mylex") with a new category. Call the new category "adj". So you will have to add a new sublist of the form [adj ..... ] containing some adjectives. o Second you will need to extend the definition of the grammar so as to give a role for adjectives. The most plausible way to do this is to add a new sub-type of simple noun phrases (SNP), to cover noun phrases like "the red ball" "every big car" "some angry girl". The way to do this is to add a new sub-list to the "snp" list. Try doing that, and then generate 20 more sentences looking to see which ones now include adjectives based on your extension. When you've had enough of generating sentences continue reading this file. -- Exercise 3: railway station announcements -------------------------- The sentences generated by your grammar and lexicon include a lot of junk. Try starting again, and define a grammar and a lexicon that will generate only sensible sentences suited to a particular context. E.g. you could try one of these contexts: Railway train departure announcements Statements about tomorrow's weather A doctor describing a patient's health Choose one of those and produce a new grammar and a lexicon and test them with the "generate" procedure. You can introduce new types of grammatical or lexical categories to suit the context. The only absolute requirement is that the "top level" grammatical category should be "s", as the procedure generate starts working down from the rules for "s". HINT: in order to prevent nonsensical combinations of categories you may find it useful to break the categories down into matched sub-categories. For example you could distinguish animate nouns ("mary", "man"), and inanimate nouns ("train", "platform"), and distinguish different sorts of noun phrases depending on whether the nouns are animate or inanimate. Then you could distinguish verbs that need an animate subject (ani_verbs) from verbs that don't (inani_verbs). Then your rules could specify that only animate nouns can go with animate verbs and vice versa. You could make further subdivisions of the verbs according to what sorts of prepositions are allowed to follow them. If you do all this the labels for the types of words and phrases in the grammar could get quite long and complicated. Instead of just "verb" you might have something like "ani_verb_np_prep_np" to label a verb that requires an animate subject and is followed by a direct object (the first "np") then a preposition and an indirect object (the second "np"). Such a word could be "put" in a sentence like: [mary put the fish into the oven] mary - animate subject proper noun put - verb of type ani_verb_np_prep_np the fish - direct object np into - preposition the oven - indirect object np Steps towards an answer to a railway announcement program. Try the following grammar and lexicon: vars traingram = [ [s [subject pred]] [subject [np] [np participle prepp]] [np [det train_noun] [det adj train_noun]] [prepp [prep np]] [pred [is adjphrase] [will verbphrase] [will verbphrase prepp]] [adjphrase [adv adj] [adj adv] [adj]] [verbphrase [verb_intransitive] [verb_intransitive adv] [verb_trans np ]] ]; vars trainlex = [ [participle arriving departing standing loading stopping waiting] [det every each the a some] [train_noun train carriage express sleeper service shuttle] [adj late early fast slow expensive reckless comfortable delayed overloaded expected regular special next previous] [prep at from near alongside above] [adv slowly quickly soon punctually tardily] [verb_trans overtake follow delay obstruct carry] [verb_intransitive wait explode depart leave pause arrive start crash finish] ]; Try this out: generate(traingram, trainlex) => This is still not a very sensible grammar. But it generates slightly better sentences than the original. See if you can improve it to generate good railway announcements. -- A complex grammar is provided for you ------------------------------- Poplog includes a more complex grammar and lexicon, which can be used to produce some weird sentences. You first have to load them by typing: lib grammar1; lib lexicon; Then generate 20 sentences at random: repeat 20 times generate(grammar1, lexicon) ==> endrepeat; -- Look at the library grammar ----------------------------------------- If you want to see the library grammar, you can type grammar1 ==> Notice that some of the rules are defined in terms of themselves, i.e. the grammar is recursive - e.g. the rule for NP. -- Look at the library lexicon ----------------------------------------- You can also type out the lexicon provided in lib lexicon lexicon ==> Note the different kinds of verbs. -- Analysing sentences: the procedure setup --------------------------- LIB GRAMMAR also provides a program called SETUP which enables you to transform your grammar and lexicon into a program which will analyse sentences to see if they are legal, according to the grammar. Start by typing setup(mygram, mylex); You could put that in your file, after defining mygram and mylex. When you get the colon prompt, check that analysing procedures have been created, corresponding to the rules in mygram. E.g. type: s => vp => np => -- Seeing if your grammar 'accepts' a sentence ------------------------- Here is how you check whether your grammar will accept a sentence: put the sentence in a list, then apply the procedure S to it (remember, S was created by SETUP). Here is an example: s([the girl kissed the man]) ==> s([the computer added each number]) ==> Use words and phrases which fit your own grammar and lexicon. Notice how the procedure S creates a list of lists showing how the sentence is broken down into parts. -- Try more sentences, using the macro "---" -------------------------- Try some different sentences. It's a chore to keep on having to type: s([ ]) ==> LIB GRAMMAR defines a 'macro' abbreviation of three hyphens. To use it just type the words of the sentence after the three hyphens, that is you can type: --- the computer kissed the girl instead of: s([the computer kissed the girl]) ==> Here are some more examples of the use of the three hyphens: --- the big number added the computer --- the number added every computer It's important that you put a space after ---. The response means that the sentence was not acceptable to the analysing procedures. -- Some examples with preposition phrases ------------------------------ Try some examples with prepositional phrases, for example: --- the man in the car kissed the cup --- the computer hated every number in the room -- Using 'donouns' ----------------------------------------------------- To extend the power of the system type: true -> donouns; You can then use unknown nouns and they will be accepted by the program if the context is suitable, e.g. --- the fozzle teased every grumpet -- Use "trace" to find out more --------------------------------------- You can see in more detail what is going on if you trace your procedures. This will show you that in order to get an analysis, the computer tries all sorts of guesses as to what should be coming next, which fail (result is ), before it gets the right one. So this is not a very intelligent language analysing system. Try: trace s vp np snp pp verb noun prep det; --- the girl hated the man in the car Beware, you'll get a lot of printing! If you need to interrupt you can do so using CTRL-c (I.e. hold down the button marked "Control", and while still holding it, tap the "C" key. -- Tracing gives more information if the sentence is unacceptable ------ You can also see what happens when an unacceptable sentence is provided. The program makes lots of attempts, but they don't lead anywhere: --- the big cup in the room hated the computer --- each number added the car in the green room The trace printing will show how the process of trying to find a suitable analysis of the sentence in accordance with the grammar is a process of SEARCH, i.e. the computer has to search for suitable way of dividing up the sentence and linking its components to the various rules of the grammar. Later you will learn a lot more about programs that search for the solution to a problem. -- How to turn off tracing ---------------------------------------------- You can suppress the amount of printout by untracing the procedures untrace ; -- Using "showtree" to get a picture of a parse tree ------------------ The Pop-11 library includes a rather clever procedure called showtree that can be used to display a parsed sentence as a tree diagram. To make it available do uses showtree That may take a little while to compile. You can then use showtree to print out the result of parsing a sentence thus, try these showtree(s([the cup in the room hated the computer])); showtree(s([each number added the car in the room])); I.e. apply "s" to a list of words to get a list of lists showing the sentence structure, and apply showtree to the result of "s". Compare that with what was previously printed out to show the tree structure. -- Exercise: extend the grammar to allow several adjectives ----------- Can you also extend the grammar so that it will accept several adjectives in a row before a noun, e.g. the happy little man the old old old car the big clever old old blue tree Hint: 1. Introduce a grammatical category adjectival phrases ("adjphrase") and extend the definition of "snp" to include an adjphrase between "det" and "noun". 2. Define a rule for adjphrase which allows an adjphrase to be Either just an adjective OR an adjphrase followed by an adjective That recursive definition should allow an adjphrase to have an arbitrary number of adjectives. Why? Try that and use "generate" to produce examples of your extended grammar. Also use "setup" to create a parser based on your new grammar and use the "---" operator to test sentences with several adjectives in a row. Use "showtree" to see what the parse tree looks like. -- Try to stop the production or acceptance of "stupid" sentences ----- The grammar and lexicon given above allow some quite silly sentences to be produced, using -generate-, or to be analysed after using -setup- to create parsing programs. Try modifying the grammar and lexicon so as to constrain the "grammatical" sentences to be more sensible. -- Summary ------------------------------------------------------------- LIB GRAMMAR makes available the procedures SETUP and GENERATE. LIB LEXICON defines the lexicon LEXICON, and LIB GRAMMAR1 defines the grammar GRAMMAR1. You can parse some more complex sentences if you do lib grammar2; lib lexicon; ;;; unless done previously setup(grammar2, lexicon); Print out GRAMMAR2 and LEXICON (using "==>" and then see if you can work out what sorts of sentences will be accepted and parsed, e.g. true -> donouns; setup(grammar2, lexicon); --- he put a big dog into each car --- he smiled at every man who thought he liked her Try more of your own. Further TEACH files: WHYSYNTAX, ISASENT. Note for advanced programmers: There is an extension to LIB GRAMMAR called LIB FACETS which enables you to associate semantic rules (i.e. meaning rules) with a grammar. For details see HELP * FACETS. There is another extension, called LIB * TPARSE, which uses the Poplog "process" mechanism to overcome a serious limitation of lib grammar, namely that parsers produced by setup can only find a single parse for each sentence, whereas in fact a sentence may have several different parses. Give an example. [Modified by A.Sloman at Birmingham 1994] --- C.all/teach/grammar ------------------------------------------------ --- Copyright University of Sussex 1987. All rights reserved. ---------- --- $poplocal/local/teach/grammar --- Copyright University of Birmingham 1994. All rights reserved. ------