Lexical analysis and the Pop-11 itemiser

Next: Itemisation rules in Up: CHAPTER.2: INTRODUCTION TO Previous: Summary of syntactic

Lexical analysis and the Pop-11 itemiser

A program file read in by Pop-11 or a command typed in is basically a stream of characters. Somehow the Pop-11 system has to break this input stream into meaningful components which it can recognize and then translate into machine code instructions of various sorts. This happens in two stages.

The first stage is called `lexical analysis' or `tokenising' or `itemising'. This breaks the stream of characters into separate `text' items that form the basic building blocks of programs. These building blocks are

o words, including syntax words, procedure names, user variables, etc.

o strings, which are delimted by the string quote symbol "'"

o numbers, including integers, decimals, ratios and complex numbers.

The rules for breaking text up into these text items, are explained fully in REF ITEMISE. Only a subset will be explained here.

The second stage of analysis is even more complicated and involves grouping these text items into recognizable syntactic forms, such as procedure calls, assignments, loops, conditionals, declarations, definitions of procedures, and so on. This is usually called `parsing', though Pop-11 does not create a parse tree.

There is a third stage which involves translating these recognizable forms into machine code instructions. This is sometimes called code-planting.

All the above occur at compile time. Later on, at run time, the compiled instructions can be executed.

The following sections will describe the first stage of compilation, i.e. the processes involving the itemiser.

Aaron Sloman
Fri Jan 2 03:17:44 GMT 1998