Itemisation rules in Pop-11

Next: Revision questions Up: CHAPTER.2: INTRODUCTION TO Previous: Lexical analysis and

Itemisation rules in Pop-11

We have so far assumed that we can treat Pop-11 programs as made of numbers and words which can be combined to form expressions, or imperatives, or sequences thereof. But what is actually typed in, or read in from a file is a sequence of characters. For instance the following is a sequence of five characters, which has to be broken up into four items, the number 3, the word "+" the number 55 and the word "=>" :

    3+55=>

The Pop-11 `itemiser' applies quite complex rules to decide how to divide up the stream of characters into meaningful chunks. For instance, if you type:

    [a little list,and,6*5]

this is read as 11 items:

    [  a  little  list  ,  and  ,  6  *  5  ]

and in fact they will be interpreted as an instruction to build a list containing nine items: seven words and two numbers. (Non alphabetic characters can also be used to form words, e.g. "+++", "##@##".)

To do this Pop-11 needs `lexical' rules saying which sorts of characters can be joined up with which, since you do not have to use spaces to separate things. Besides things like spaces, tabs and newlines, which are normally ignored by Pop-11, there are the following types of characters:

Numeric:        0 1 2 3 4 5 6 7 8 9

Alphabetic:     a b c d e f g  ...  z
                A B C D E F G  ...  Z

Signs:          ! # $ & + - : < = > ? @ \ ^ | ~ / *

Underscore:     _

Separators:     ; " % ( ) , .  [  ]  {  }

String quote:   '

Character quote: `

Word formation in Pop-11

Unfortunately, Pop-11 has fairly complex rules for grouping characters in the text input stream into words, although words created by programs can contain arbitrary characters.

During program compilation, a letter followed by a series of letters and numbers will be formed into a single word, e.g. list1, list2. But if a text item starts with a number, then as soon as a non-number is reached (e.g. the "l" in "1list") Pop-11 assumes that it should insert a break. I.e. the text is separated into a number followed by a word. This can be shown by typing in the following instructions to create and print out lists:

    [list3] =>
    ** [list3]

    [3list] =>
    ** [3 list]

The second list is taken to have two elements, a number and a word. The first has a single word "list3".

A numeric character may be buried in the middle of a word which starts with letters, e.g. "list3a". Thus a word that starts with a letter can be followed by any combination of numbers and letters.

The word quote symbol """ can be used to tell Pop-11 that you wish to refer to a word, instead of using it as the name of something else (i.e. as variable):

    "list3" =>
    ** list3

But if you give it an illegal combination of characters you will get an error:

    "3list" =>
    ;;; MISHAP: IQW INCORRECT QUOTED WORD
    ;;; INVOLVING: 3

You can also make a word out of certain non-alpha-numeric characters, i.e. sign characters:

    "*+*+*::\/^" =>
    ** *+*+*::\/^

But you cannot mix letters and sign characters:

    "+x" =>
    ;;; MISHAP ...

and in a list they will be separated into two:

    [+x] =>
    ** [+ x]

However, the underscore character can be used to join alphanumeric type characters to sign characters, e.g. here are two lists each containing only one word:

    [+_x] =>
    ** [+_x]

    [apple_#@$=>] =>
    ** [apple_#@$=>]

The underscore can also be used as a convenient way of producing long names which are readable. E.g. the following is the name of a system variable:

    pop_readline_prompt

Using the underscore to join letters and sign characters

In general a sequence of characters made of "sign" characters and letters will be broken at the point where the two sorts of characters meet, unless they are joined by an underscore symbol "_", e.g.

    fast_++         ++_lists_++

Two of the sign characters `/` and `*` play a special role in that they can be combined to form the `comment brackets', explained aboved. So "/*" and "*/" cannot be used as ordinary Pop-11 words.

The separator characters cannot be used to join up with anything else, except for the use of `.` in decimal numbers (e.g. 66.35). This is because separators play a special role in the syntax of Pop-11. E.g. the following is a list of seven items

    [(.,)a"!] =>
    ** [( . , ) a " !]

The semicolon, though normally a separator which marks the end of an imperative has a special role if repeated three times without anything between: it marks an `end of line' comment as explained previously. E.g.

    6 * 6 =>        ;;; this bit on the right is ignored!
    ** 36

Some of the characters have special roles which will not be explained fully till later. In particular `%' can be used both in creating procedure closures by `partial application' and in `unquoting' part of a list expression. (The file TEACH PERCENT gives a tutorial introduction to both.)

Strings can contain arbitrary characters

Strings, created using the string quote character can contain arbitrary characters:

    'this is a *+*+*+* string %&$%$ of rubbish!!!'

except that if you wish to include the string quote itself in the string it must be preceded by the backslash character \ to indicate that it does not mark the end of the string. Here is a string containing the string quote:

    'isn\'t it' =>
    ** isn't it

Note that strings are normally printed without the outer quotes. To make the quotes appear, do

    true -> pop_pr_quotes;

Character quotes and string quotes

Characters themselves are represented by positive integers less than 256. Since it is difficult to remember which number represents which character (the so called `ASCII code'), the character quote can be used to tell Pop-11 to read a character as representing the number. The character quote, sometimes referred to as the "backquote" is the backward sloping single quote character. It should not not be confused with the forward sloping (sometimes displayed as vertical) single quote character used to begin and end string expressions. Depending on the printer used the string and character quotes in this document may have different appearances.

    Here is the character quote symbol:     `
    Here is the string quite symbol:        '

Unfortunately neither symbol has a predictable location on keyboards: they appear in different places on different keyboards.

Here are some examples using the character quote to represent characters (as integers) without remembering their integer values. The letter `A` has the code 65 and the numerals start from 48:

    `A` =>
    ** 65
    `B` =>
    ** 66
    `a` =>          ;;; lower case codes are different
    ** 97
    `0` =>
    ** 48
    `5` =>
    ** 53

If you wish to include non-printing characters in a string, see the details in HELP ASCII. In particular you can use the following

    \s  = a space
    \t  = a tab
    \n  = a newline
    \r  = the return character (ascii 13)

    '\nA string\n\twith text\n\t\s\son three lines' =>
    **
    A string
        with text
          on three lines

Double quotes with single quotes can form arbitrary words

If you really need to have a word containing arbitrary characters you can create it by putting word quotes around the corresponding string.

For example

    vars funny_word = "'A word with spaces and junk:*&*=%][)))'";

    isword(funny_word) =>
    ** <true>

    funny_word =>
    ** A word with spaces and junk:*&*=%][)))

However, you would not be able to use such a word as the name of a variable, since typing something like

    vars 'A word with spaces and junk:*&*=%][)))' = 999;

will produce an error.

    ;;; MISHAP - vars STATEMENT: IDENTIFIER NAME EXPECTED
    ;;; INVOLVING:  'A word with spaces and junk:*&*=%][)))'

Changing Pop-11's "itemiser" rules

To complicate matters further, it is possible to tell Pop-11 that you wish to alter its rules, by using the procedure item_chartype. This is especially useful when defining a new language in terms of Pop-11. Details will not be given in this introduction. The on-line documentation file REF ITEMISE gives more information, as does HELP ITEM_CHARTYPE

Next: Revision questions Up: CHAPTER.2: INTRODUCTION TO Previous: Lexical analysis and

Aaron Sloman
Fri Jan 2 03:17:44 GMT 1998