TEACH GRAMMAR                            Aaron Sloman Updated Aug 1994


                   GENERATING AND ANALYSING SENTENCES
                   ----------------------------------

         CONTENTS - (Use <ENTER> g to access required sections)

 -- You need to know a little about grammar
 -- A sentence is an 'np' followed by a 'vp'
 -- There are other syntactic categories
 -- Types of sentences can be described with grammars
 -- Words have categories too
 -- Now print out the grammar and the lexicon
 -- How to generate sentences
 -- Exercise 1: add more words to the lexicon
 -- Exercise 2: add adjectives to the grammar
 -- Exercise 3: railway station announcements
 -- A complex grammar is provided for you
 -- Look at the library grammar
 -- Look at the library lexicon
 -- Analysing sentences: the procedure setup
 -- Seeing if your grammar 'accepts' a sentence
 -- Try more sentences, using the macro "---"
 -- Some examples with preposition phrases
 -- Using 'donouns'
 -- Use "trace" to find out more
 -- Tracing gives more information if the sentence is unacceptable
 -- How to turn off tracing
 -- Using "showtree" to get a picture of a parse tree
 -- Exercise: extend the grammar to allow several adjectives
 -- Try to stop the production or acceptance of "stupid" sentences
 -- Summary


The LIB GRAMMAR library package makes available some programs which you
can use to explore sentence structures.

This teach file introduces you to the notion of a formal grammar for a
simplified subset of English and shows you how to use the library
program to generate or analyse sentences according to a grammar.

-- You need to know a little about grammar -----------------------------

The following sections assume you know a bit about grammar.

1. Nouns are words like "cat" "dog" "car" "number".

2. Nounphrases are expressions like "the old man", "the dog that bit the
cat", "each little lady who likes programming". We use NP as an
abbreviation for "noun phrase". Each noun phrase typically refers to
something.

3. Verbs are words like "hit", "look", "choose", "put", "turn". Some
verbs are transitive, and are followed by a noun or noun phrase, whereas
others are intransitive. In "Bill hit Fred", the verb "hit" is
transitive and as "Fred" as its direct object, whereas in "Bill smiled"
the verb "smiled" is intransitive: it has no object.

4. Verb phrases (abbreviated as VP), include things like "hit me",
"looked at the old man", "chose the book in the car", "put the dog in
the car in the box". A verb phrase typically says something about an
object referred to in a noun phrase. So it normally needs to be preceded
by a noun phrase to from a complete assertive sentence, as in

    [[the big boy] [hit me]]
    [[Mary] [looked at the old man]]
    [[the dog in the box in the corner][barked at the big grey cat]]


5. Prepositions are words like "at", "in", "inside", "over", "under",
which are typically followed by a noun phrase to form a prepositional
phrase, such as "at the house", "in the box", "inside the garden".


-- A sentence is an 'np' followed by a 'vp' ----------------------------

By joining an NP to a VP (which may contain embedded NPs) we can form a
sentence (abbreviated S), for example:

    [NP the cat] [VP bit the girl]
    [NP the man in the car] [VP looked at the little dog]
    [NP he] [VP jumped]
    [NP Mary] [VP smiled at John]
    [NP Every student] [VP enjoys programming]


Try swapping those round with different combinations of NPs and VPs.
Try creating some more example sentences of your own, and dividing
them into noun phrases and verb phrases.

-- There are other syntactic categories --------------------------------

In addition to nouns and verbs those examples used other types of words,
including determiners (abbreviated DET), like: "the", "each", "some"
adjectives (ADJ) like: "old" "little" prepositions (PREP) like: "in"
"at" "on" "under"

We've also used different sorts of verbs - transitive verbs which
require a following NP (e.g. "bit") and intransitive verbs which don't
(e.g. "jumped"). Some verbs may allow or even require the use of an
associated preposition, e.g.

    jump over NP
    put NP in NP
    look at NP

Note that even if there were only one rule for forming sentences, namely
to combine an NP with a VP, there could still be many different types of
sentences because NPs can have different forms and VPs can have
different forms.

In fact there are many different ways of forming sentences. E.g. if S is
a sentence so is

    "It is not the case that S"

If S1 and S2 are sentences so are:

    S1 and S2
    S1 or S2
    if S1 then S2

and so on. These are compound or molecular sentences, which contain
sentences as components. Atomic sentences do not contain sentences as
components.


-- Types of sentences can be described with grammars -------------------

The different forms of sentences can be described economically by means
of a grammar and a lexicon. The grammar specifies the types of
sentences, noun phrases, verb phrases, propositional phrases, and so on
that are allowed by the language, and the lexicon indicates which words
can be used in different roles in the sentence.

If you give the computer a set of rules defining a grammar and a
lexicon, it can show you what sorts of sentences it generates. First you
have to define a grammar. Type in the following definition of a grammar,
preferably using the editor to store it in a file called 'mygram.p'.

(You can leave out the "comments" following three semicolons ";;;" - the
three semi-colons tell POP11 to ignore the rest of that line. They are
included here simply to help you see what's going on). We declare a
variable MYGRAM which will be given a list of rules as its value. After
the declaration, type in the list of rules (in this case a list of lists
of lists!). NB don't forget '.p' in the file name.


    ;;; declare a variable mygram, and initialise it with a list of
    ;;; rules for sentence components
    vars mygram =
    [
        ;;; start a list of rules
        ;;; a sentence is a NP then a VP
        [s [np vp]]
        [np [snp] [snp pp]]     ;;; a NP is either a simple NP
                                ;;; or a simple NP followed by
                                ;;; a prepositional phrase PP
        [snp [det noun]]        ;;; a simple NP is a determiner followed by
                                ;;; a noun
        [pp [prep snp]]         ;;; a PP is a preposition
                                ;;; followed by a simple NP.
        [vp [verb np]]          ;;; verbphrase = verb followed by NP
    ] ;

Type or copy that lot into your file mygram.p, taking care to get the
square brackets right. You don't need to include the "end of line"
comments following three semi-colons ";;;".


-- Words have categories too -------------------------------------------

Now you can type in a lexicon, to tell the system about nouns, verbs,
prepositions, and determiners.

    ;;; Declare a variable mylex, and initialise it with a list of rules
    ;;; specifying lexical categories, i.e. types of words

    vars mylex =
      [       ;;; start a list of lexical categories
        [noun  man girl number computer cup battle room car garage]
        [verb  hated stroked kissed teased married taught added]
        [prep  in into on above under beside]
        [det   the a every each one some]
     ];

Put that in your file. Again, take care to get the brackets right.


-- Now print out the grammar and the lexicon --------------------------

Compile those two definitions. You can now print out your grammar, using
the 'pretty-print' arrow thus:

    mygram ==>

and print out your lexicon

    mylex ==>

The "pretty print" arrow "==>" makes long lists print out neatly.


-- How to generate sentences -------------------------------------------

Add the following line to the top of your file to load LIB GRAMMAR.

    uses grammar;

Now compile that line.

Here's how you get the computer to generate sentences according to your
grammar and lexicon (you will find that not all the sentences generated
actually look like good English. That is a sign that the grammar is
inadequate). Try this command:

    generate(mygram, mylex) ==>

You can make the computer generate many sentences, by using the Pop-11
"repeat" construction.


    repeat 20 times generate(mygram, mylex) ==> endrepeat;



-- Exercise 1: add more words to the lexicon --------------------------

You can now try extending the grammar and lexicon to generate a wider
range of sentences. Edit the file containing the grammar and the
lexicon.

Exercise 1. Add more words in each category to the lexicon, and redo the
above "repeat....endrepeat" command to generate sentences. Some of the
sentences will make no sense, and many will appear quite ungrammatical.
Try to analyse what exactly is wrong with them.

-- Exercise 2: add adjectives to the grammar --------------------------

Exercise 2. Try adding adjectives (e.g. "big", "pretty", "clever",
"square", "red") to the grammar.

o    First you will need to extend the lexicon (i.e. the list called
    "mylex") with a new category. Call the new category "adj". So you
    will have to add a new sublist of the form


        [adj ..... ]


    containing some adjectives.

o    Second you will need to extend the definition of the grammar so as
    to give a role for adjectives. The most plausible way to do this is to
    add a new sub-type of simple noun phrases (SNP), to cover noun
    phrases like

        "the red ball"
        "every big car"
        "some angry girl".

    The way to do this is to add a new sub-list to the "snp" list. Try
    doing that, and then generate 20 more sentences looking to see which
    ones now include adjectives based on your extension.

When you've had enough of generating sentences continue reading this
file.

-- Exercise 3: railway station announcements --------------------------

The sentences generated by your grammar and lexicon include a lot of
junk. Try starting again, and define a grammar and a lexicon that will
generate only sensible sentences suited to a particular context. E.g.
you could try one of these contexts:

    Railway train departure announcements
    Statements about tomorrow's weather
    A doctor describing a patient's health

Choose one of those and produce a new grammar and a lexicon and test
them with the "generate" procedure. You can introduce new types of
grammatical or lexical categories to suit the context. The only absolute
requirement is that the "top level" grammatical category should be "s",
as the procedure generate starts working down from the rules for "s".

HINT: in order to prevent nonsensical combinations of categories you may
find it useful to break the categories down into matched sub-categories.
For example you could distinguish animate nouns ("mary", "man"), and
inanimate nouns ("train", "platform"), and distinguish different sorts
of noun phrases depending on whether the nouns are animate or inanimate.
Then you could distinguish verbs that need an animate subject
(ani_verbs) from verbs that don't (inani_verbs). Then your rules could
specify that only animate nouns can go with animate verbs and vice
versa.

You could make further subdivisions of the verbs according to what sorts
of prepositions are allowed to follow them. If you do all this the
labels for the types of words and phrases in the grammar could get quite
long and complicated. Instead of just "verb" you might have something
like "ani_verb_np_prep_np" to label a verb that requires an animate
subject and is followed by a direct object (the first "np") then a
preposition and an indirect object (the second "np"). Such a word could
be "put" in a sentence like:

    [mary put the fish into the oven]

    mary        - animate subject proper noun

    put         - verb of type ani_verb_np_prep_np

    the fish    - direct object np

    into        - preposition

    the oven    - indirect object np



Steps towards an answer to a railway announcement program. Try the
following grammar and lexicon:


  vars traingram =
    [
        [s [subject pred]]
        [subject [np] [np participle prepp]]
        [np [det train_noun] [det adj train_noun]]
        [prepp [prep np]]
        [pred [is adjphrase] [will verbphrase] [will verbphrase prepp]]
        [adjphrase [adv adj] [adj adv] [adj]]
        [verbphrase
            [verb_intransitive]
            [verb_intransitive adv]
            [verb_trans np ]]
    ];

  vars trainlex =
    [
        [participle arriving departing standing loading stopping
            waiting]
        [det every each the a some]
        [train_noun train carriage express sleeper service shuttle]
        [adj late early fast slow expensive reckless comfortable
            delayed overloaded expected regular special next previous]
        [prep at from near alongside above]
        [adv slowly quickly soon punctually tardily]
        [verb_trans overtake follow delay obstruct carry]
        [verb_intransitive wait explode depart leave pause arrive
            start crash finish]
    ];


Try this out:


    generate(traingram, trainlex) =>


This is still not a very sensible grammar. But it generates slightly
better sentences than the original. See if you can improve it to
generate good railway announcements.

-- A complex grammar is provided for you -------------------------------

Poplog includes a more complex grammar and lexicon, which can be used to
produce some weird sentences. You first have to load them by typing:

    lib grammar1;
    lib lexicon;


Then generate 20 sentences at random:

    repeat 20 times generate(grammar1, lexicon) ==> endrepeat;


-- Look at the library grammar -----------------------------------------

If you want to see the library grammar, you can type

    grammar1 ==>



Notice that some of the rules are defined in terms of themselves, i.e.
the grammar is recursive - e.g. the rule for NP.


-- Look at the library lexicon -----------------------------------------

You can also type out the lexicon provided in lib lexicon

    lexicon ==>


Note the different kinds of verbs.


-- Analysing sentences: the procedure setup ---------------------------

LIB GRAMMAR also provides a program called SETUP which enables you to
transform your grammar and lexicon into a program which will analyse
sentences to see if they are legal, according to the grammar. Start by
typing

    setup(mygram, mylex);

You could put that in your file, after defining mygram and mylex. When
you get the colon prompt, check that analysing procedures have been
created, corresponding to the rules in mygram. E.g. type:

    s =>
    vp =>
    np =>


-- Seeing if your grammar 'accepts' a sentence -------------------------

Here is how you check whether your grammar will accept a sentence: put
the sentence in a list, then apply the procedure S to it (remember, S
was created by SETUP). Here is an example:

    s([the girl kissed the man]) ==>
    s([the computer added each number]) ==>


Use words and phrases which fit your own grammar and lexicon. Notice how
the procedure S creates a list of lists showing how the sentence is
broken down into parts.


-- Try more sentences, using the macro "---" --------------------------

Try some different sentences.

It's a chore to keep on having to type:

    s([    ]) ==>

LIB GRAMMAR defines a 'macro' abbreviation of three hyphens. To use it
just type the words of the sentence after the three hyphens, that is you
can type:

    --- the computer kissed the girl

instead of:

    s([the computer kissed the girl]) ==>

Here are some more examples of the use of the three hyphens:

    --- the big number added the computer
    --- the number added every computer


It's important that you put a space after ---.

The response <false> means that the sentence was not acceptable to the
analysing procedures.


-- Some examples with preposition phrases ------------------------------

Try some examples with prepositional phrases, for example:

    --- the man in the car kissed the cup
    --- the computer hated every number in the room



-- Using 'donouns' -----------------------------------------------------

To extend the power of the system type:

    true -> donouns;


You can then use unknown nouns and they will be accepted by the program if
the context is suitable, e.g.

    --- the fozzle teased every grumpet



-- Use "trace" to find out more ---------------------------------------

You can see in more detail what is going on if you trace your
procedures. This will show you that in order to get an analysis, the
computer tries all sorts of guesses as to what should be coming next,
which fail (result is <FALSE>), before it gets the right one. So this is
not a very intelligent language analysing system. Try:

    trace s vp np snp pp verb noun prep det;
    --- the girl hated the man in the car


Beware, you'll get a lot of printing! If you need to interrupt you can
do so using CTRL-c (I.e. hold down the button marked "Control", and
while still holding it, tap the "C" key.

-- Tracing gives more information if the sentence is unacceptable ------

You can also see what happens when an unacceptable sentence is provided. The
program makes lots of attempts, but they don't lead anywhere:

    --- the big cup in the room hated the computer

    --- each number added the car in the green room


The trace printing will show how the process of trying to find a
suitable analysis of the sentence in accordance with the grammar is a
process of SEARCH, i.e. the computer has to search for suitable way of
dividing up the sentence and linking its components to the various
rules of the grammar. Later you will learn a lot more about programs
that search for the solution to a problem.

-- How to turn off tracing ----------------------------------------------

You can suppress the amount of printout by untracing the procedures

    untrace ;


-- Using "showtree" to get a picture of a parse tree ------------------

The Pop-11 library includes a rather clever procedure called showtree
that can be used to display a parsed sentence as a tree diagram. To
make it available do

    uses showtree


That may take a little while to compile. You can then use showtree to
print out the result of parsing a sentence thus, try these

    showtree(s([the cup in the room hated the computer]));

    showtree(s([each number added the car in the room]));


I.e. apply "s" to a list of words to get a list of lists showing the
sentence structure, and apply showtree to the result of "s". Compare
that with what was previously printed out to show the tree structure.


-- Exercise: extend the grammar to allow several adjectives -----------

Can you also extend the grammar so that it will accept several
adjectives in a row before a noun, e.g.

    the happy little man
    the old old old car
    the big clever old old blue tree


Hint:

    1. Introduce a grammatical category adjectival phrases ("adjphrase")
    and extend the definition of "snp" to include an adjphrase between
    "det" and "noun".

    2. Define a rule for adjphrase which allows an adjphrase to be
        Either  just an adjective
        OR      an adjphrase followed by an adjective

    That recursive definition should allow an adjphrase to have an
    arbitrary number of adjectives. Why?

Try that and use "generate" to produce examples of your extended
grammar. Also use "setup" to create a parser based on your new grammar
and use the "---" operator to test sentences with several adjectives in
a row. Use "showtree" to see what the parse tree looks like.


-- Try to stop the production or acceptance of "stupid" sentences -----

The grammar and lexicon given above allow some quite silly sentences to
be produced, using -generate-, or to be analysed after using -setup- to
create parsing programs. Try modifying the grammar and lexicon so as
to constrain the "grammatical" sentences to be more sensible.

-- Summary -------------------------------------------------------------

LIB GRAMMAR makes available the procedures SETUP and GENERATE.
LIB LEXICON defines the lexicon LEXICON, and LIB GRAMMAR1 defines the
grammar GRAMMAR1.

You can parse some more complex sentences if you do

    lib grammar2;
    lib lexicon;    ;;; unless done previously
    setup(grammar2, lexicon);


Print out GRAMMAR2 and LEXICON (using "==>" and then see if you can work
out what sorts of sentences will be accepted and parsed, e.g.

    true -> donouns;
    setup(grammar2, lexicon);
    --- he put a big dog into each car
    --- he smiled at every man who thought he liked her


Try more of your own.

Further TEACH files: WHYSYNTAX, ISASENT.

Note for advanced programmers:

There is an extension to LIB GRAMMAR called LIB FACETS which enables you
to associate semantic rules (i.e. meaning rules) with a grammar. For
details see HELP * FACETS.

There is another extension, called LIB * TPARSE, which uses the Poplog
"process" mechanism to overcome a serious limitation of lib grammar,
namely that parsers produced by setup can only find a single parse for
each sentence, whereas in fact a sentence may have several different
parses. Give an example.

[Modified by A.Sloman at Birmingham 1994]

--- C.all/teach/grammar ------------------------------------------------
--- Copyright University of Sussex 1987. All rights reserved. ----------

--- $poplocal/local/teach/grammar
--- Copyright University of Birmingham 1994. All rights reserved. ------