Lecture 2: An Introduction to the SML Language

Information about ML

This document provides a basic introduction to ML as implemented in POPLOG. Below is an annotated list of the available HELP files (on-line documentation) relating to POPLOG ML. The rest of this document is a HTML version of help PML and help EXAMPLES. Note that the name of the help file can be typed in upper or lower case.

ARRAY Describes the predeclared structure Array, which implements the abstract type array.
ASSOC Describes the structure Assoc, which defines an implementation of association lists.
COMMANDS Describes the commands available from the PML top level.
EXAMPLES Some simple examples demonstrating how to use the language ML.
HELPFILES This file.
LOAD Describes both the LOAD command in particular, and ways of compiling files in general in the PML system. It includes a description of the autoloading mechanism for signatures, structures and functors.
MLVED Describes how to use the POPLOG editor VED with PML, and ways in which VED is tailored for ML.
PERVASIVE Describes the declaration form pervasive, which is an extension to Standard ML.
PML A basic introduction to the POPLOG ML subsystem.
QUEUE Describes the structure Queue, which defines an implementation of queues.
REALCONSTANTS Describes a set of real constants defined in the predeclared structure RealConstants.
SHOWDEF A command for displaying information about names defined in the current environment.
STACK Describes the structure Stack, which defines an implementation of stacks.
SYNTAX Describes the syntax and lexical conventions of PML using an informal grammar.
SYSTEM Describes the predeclared structure System, which defines a number of functions and variables which affect the behaviour of the PML system and provide information about the system.
USEFUL Describes the set of basic functions defined in the predeclared structure Useful. [Derived from "HELP PML", by Simon Nichols and Robert Duncan, Sept 1987]

A basic introduction to the POPLOG ML subsystem.

CONTENTS

Introduction
Invoking PML
Initialisation Using PML
Error reporting
Integration with POPLOG
Compatibility with Standard ML
Further Documentation
References

Introduction

POPLOG ML (PML) is an implementation of the programming language Standard ML (SML). SML can be characterised as a strongly-typed, statically-scoped functional language with a strict semantics. "Functional" means that functions are first-class data objects - they may be passed as parameters, returned as results and embedded in data structures - and that the principal mode of computation is recursive function application. "Strongly typed" means that every expression has a type which is determined automatically by the compiler without the need for type declarations. Types can be polymorphic so that functions and data constructors can be defined which work uniformly on arguments of many types; the type system guarantees that any expression which can be typed will not generate type errors at run time. Other features of the language include a type-secure exception mechanism (for handling run-time errors) and a powerful module facility for the support of large-scale programming projects. For a proper definition of the Standard ML language and some guides to ML programming see the references cited below.

Invoking PML

PML is provided as a POPLOG saved-image file; it can be run by typing to the shell

     pml        (on UNIX systems)

     pml  %x    (on UNIX systems with X windows)

      pop11 /pml        (on VMS systems)

After a short delay, PML will print up a version banner to announce its presence, followed by the word "Setml" and the primary prompt "- ". For example

    POPLOG ML (Version 1.0) Wed Sep  9 10:08:17 BST 1987

    Setml
    -

"Setml" is printed whenever the PML system is reset - on start-up, or after an error or an interrupt. The primary prompt indicates that PML is waiting for new input from the terminal; a secondary prompt, "= ", is used whenever the system is expecting something extra, i.e. when an input line is incomplete. You can quit from PML by typing the "end-of-file" character to the primary prompt; on most systems this will be either CTRL-Z or CTRL-D. You can interrupt it by typing the interrupt or break character, usually a CTRL-C. During an evaluation, this generates an interrupt exception which can be caught and handled by an ML program. The default behaviour if the exception is not caught is to abandon the evaluation, reset the system and return to top-level; outside of an evaluation (during compilation) this will always be the case as it is not possible to catch interrupts arising at compile time.

Initialisation

The PML environment can be tailored by means of two initialisation files which are compiled whenever PML is run. The two files provide for both system-wide and per-user initialisation. Both files have the name init.ml but are sought in different directories. The system-wide file should be placed in the directory $popsys; this is compiled first. The per-user file should be placed either in the directory $poplib (which is distinct for each user) or else in the user's home directory; PML will look for the file in $poplib first, but will go to the home directory if the file can't be found there. This initialisation behaviour is the same as that of other parts of the POPLOG system - VED and POP-11 itself for example. It is described in more detail (with an explanation of the variables $popsys and $poplib) in the file HELP * INITIAL.

You should also copy ~pop/vedinit.p to your home directory to give you emacs compatibility.

Using PML

PML is an interactive language. Declarations terminated by semicolons are typed to the prompt; these are read, analysed, compiled and evaluated and any resultant bindings are printed out together with their types. A useful form of input allows an expression

    exp;

to be entered as shorthand for the binding

    val it = exp;

In this case, just the value of the expression is printed with its type. HELP * SYNTAX gives a semi-formal description of the syntax of ML declarations and expressions. The PML top-level will also accept any of a small set of commands in place of a declaration; these commands give access to some of the extra features provided by the POPLOG system. The commands currently available are

    cd  help  im  load  pop11  pwd  showdef  teach  ved

They are described in the file HELP * COMMANDS.

Error reporting

Errors occurring at compilation time fall into two main categories: syntax errors, which cover all violations of the grammar rules, and type errors which cover all errors arising from the static analysis of the program done before evaluation starts. The principal type error arises when an expression or declaration is badly typed and the type-checker is unable to unify the types of two of its sub-components, but other faults (such as undefined names) will also give rise to type errors. A third (unnamed) category of errors arises from miscellaneous events such as trying to load a non-existent file. Error messages will identify the category of the error and its context. Wherever possible, the context will include the name of the file being compiled and the line number. Once the parsing phase is over, however, this information is lost and the context is given instead as the chain of declarations in which the error occurred, starting with the most local. Static type checking means that the majority of programming errors will be caught at compile time. Most of the unavoidable run-time errors (such as division by zero) and external interrupts manifest themselves as built-in exceptions which (if not caught by the user program) will escape to top-level and print an EXCEPTION message. There are also other errors which arise from deeper inside the POPLOG system and which cannot be handled by the user: when the heap space becomes exhausted, for example, or the call stack overflows. Such errors appear as standard POPLOG mishaps.

Integration with POPLOG

PML runs as a subsystem of the POPLOG system with the same status as the other POPLOG languages: POP-11, Common Lisp and Prolog. The PML command ved gives access to the VED editor which supports the incremental compilation and immediate mode evaluation of ML programs; on-line documentation can be read with the commands help and teach. Access to the language POP-11 (and from there to other languages) is provided by the command pop11; it is not at the moment possible to write mixed language programs which include ML, but this facility should be available soon. Refer to HELP * MLVED and HELP * COMMANDS for more details.

Compatibility with Standard ML

PML can be considered as a full implementation of the ML standard as described in reference [1] below. Inevitably, however, there are certain minor differences and extensions. These are as follows: (1) Equality: PML's equality function

            val op = : 'a * 'a -> bool

is a true polymorphic function, i.e. it can be applied to items of any type. This is for convenience. Its meaning on functions is an accident of implementation; on abstract data types it is structural equality. (Future versions of PML should allow abstract types to include their own definitions of equality.) (2) Variables in type constraints: These respect the scoping rules of section 9 of the standard, but may be constrained by the type-checker. (3) The structure ExtendedIO Only part of this structure is provided by PML. The names missing are:

            val execute : string -> instream * outstream
            val can_input : instream * int -> bool

These should be provided in a later release. (4) Pervasiveness Only those primitive names which are built-in to SML are normally visible inside the bodies of structures and signatures. PML includes a new declaration form pervasive which allows this set of primitive names to be extended. (See HELP * PERVASIVE). (5) Comments In addition to the SML comment brackets, PML also supports the standard POPLOG end-of-line comment introduced by the symbol ";;;" i.e. all text after this symbol up to the end of the same line is ignored by the lexical analyser.

Further Documentation

HELP * HELPFILES A list of all the PML help files currently available HELP * EXAMPLES Some simple examples of programming in ML

References

[1] Robert Harper, David MacQueen, Robin Milner STANDARD ML University of Edinburgh technical report, LFCS-86-2, March 1986

[2] Robert Harper INTRODUCTION TO STANDARD ML University of Edinburgh technical report, LFCS-86-14, November 1986

[3] A. Wikstrom FUNCTIONAL PROGRAMMING USING ML Prentice-Hall, 1987

%HELP EXAMPLES Robert Duncan, Sept 1987. Examples of the use of ML

Some simple examples demonstrating how to use the language ML. The examples illustrate only uses of the core language; examples of modules are not yet available.

For an introduction to the PML system see HELP * PML and for a full grammar see HELP * SYNTAX. CONTENTS of CHAPTER

An Introduction to the PML Top Level
The Basic Types of ML
Functions
Overloaded Functions
Pattern Matching
Defining New Types
The Exception Mechanism
Further Reading

An Introduction to the PML Top Level

PML is an interactive language. It prompts for input initially with a single dash, "-", but if input spreads over more than one line a secondary prompt, "=", is used on lines after the first. Everything typed in to the prompt is a declaration. For example, the name x can be declared to have the integer value 3 by typing:

        val x = 3;

Note the keyword val indicating that this is a value declaration (rather than a type or exception declaration - see below) and the terminating semicolon. For each such declaration, PML will respond with a list of the bindings it has made: the response from the above input would be:

    val x = 3 : int

This confirms the nature of the declaration (val), and indicates the name bound, its value and its type. In the examples which follow, the output produced by PML will be shown immediately after the example, just as it would appear in a real session. Thus:

    val x = 3;
    val x = 3 : int

You can confirm that the output shown is correct by marking and loading the example using the load marked range facility provided by VED (and described in HELP * MLVED). Once a name has been bound to a value (such as x above) it can then be used to stand for that value in subsequent declarations. So we could declare the value y as being twice that of x by doing

    val y = 2 * x;
    val y = 6 : int

Often it is convenient not to have to give explicit names to values computed at top-level, and a shorthand form of declaration is provided for that: an expression typed in on its own, E say, with no introductory val, is interpreted by PML as being short for the declaration

    val it = E;

i.e. a binding to the name it. So, for example, the input

    (x + y) * (x + y);

is exactly equivalent to

    val it = (x + y) * (x + y);

but the output produced by this kind of binding is shortened too, with only the value being printed. The result of the example above is just

    81 : int

A consequence of this is that the name it is always bound to the result of the last expression evaluated:

    it;
    81 : int

The Basic Types of ML

Every expression in ML has a type. If the type of some expression E is ty say, then we write

    E : ty

The colon symbol (:) is pronounced "has type" (thus: "E has type ty"). We have seen some examples of this already in the output from the declarations above, as PML displays the type of every value it computes. The type int (short for integer of course) is one of the basic (or primitive) types of ML and is built in to the language. The complete set of basic types is as follows:

int: Integer numbers. Examples are:
```
            1  ~1  256  ~300000
```
The prefix "~" indicates a negative number. Integers in PML can be of any magnitude.
real: Real numbers. Examples are:
```
            1.0  ~1.0  0.3333  1E5  2.35E~4
```
Real numbers are always distinct from integers as they must include either a decimal point or the exponent symbol E (or both). The two types can't be mixed directly in expressions either; the built-in functions real and floor must be used to convert between them. E.g.
```
            floor(22.0 / 7.0);
            3 : int

            real it;
            3.0 : real
```

string Character strings. There is no simple character type in PML; single character strings are used instead. Examples:

            ""  "a"  "More matter, with less art"  "2 + 2 = 4"

Operators on strings include size, which returns the length of a string, and "@", which joins two strings:

            val message = "Hello world\n";
            val message = "Hello world
" : string

            size(message ^ message ^ message);
            36 : int

bool Booleans (truth values). There are only two values in this type:
```
            true  false
```
Built-in predicates such as "=" (general equality) and "<" (less-than ordering) return values of this type, and there is a conditional expression form which can choose between alternative values based on a boolean test. For example:
```
            if x = y then "yes" else "no";
            "no" : string
```
unit The unit type. This type contains only a single value (hence its name) written as "()" (with no spaces between the parentheses):
```
            ();
            () : unit
```
This type is analagous to the void types which exist in other languages; its principal use is to provide a value in those cases where no value is really required. For example, the built-in function output, which prints on an output stream, is called purely for its side-effect and so returns "()" as a result:
```
            output(std_out, message);
            Hello world
            () : unit
```

These are all constant (or nullary) types, but ML also provides a set of built-in type constructors for building complex types from more basic ones. The first example of these is the list type constructor. A list, as a value, is some arbitrary-length sequence of values of a given type. The list type constructor thus takes a single type as an argument (the type alpha, say) and constructs a new type which is the type of lists of values of type alpha. For every type alpha in ML, there is a type of lists of alphas. There are an infinite number of such types, but here are some particular examples:

    string list         (lists of strings)
    unit list           (lists of units)
    int list list       (lists of lists of integers)

List values of known length can be written with a special syntax using brackets ("[" and "]") and commas:

    ["More", "matter", "with", "less", "art"] : string list
    [(), (), (), ()] : unit list
    [[1, 2], [3, 4], [5, 6, 7]] : int list list

Lists can also be constructed dynamically using the operator "::" (pronounced cons, short for construct) which adds an item to the front of a list. For example:

    val numlist = [3, 4, 5];
    val numlist = [3, 4, 5] : int list

    1 :: 2 :: numlist;
    [1, 2, 3, 4, 5] : int list

In fact, if we were to start always with an empty list, written as "[]" (or sometimes called "nil"), then we could construct every list using "::", as in:

    "More" :: "matter" :: "with" :: "less" :: "art " :: [];
    ["More", "matter", "with", "less", "art "] : string list

    1 :: 2 :: 3 :: 4 :: 5 :: nil;
    [1, 2, 3, 4, 5] : int list

Just as the type constructor list constructs new types from more basic ones, the so-called value constructors nil and "::" construct complex lists from more basic values. This relationship between type constructors and value constructors becomes more apparent with user-defined types discussed below. The type of the empty list is of interest:

    [];
    [] : 'a list

The identifier 'a (pronounced "alpha") is a type variable which can stand for any ML type. Because there are an infinite number of types constructed with list but only one empty list, all the list types must share the same empty list. The type of the empty list is thus a general type which can be instantiated to any particular list type. For non-empty lists, every item in the list must be of the same type. There is no legal type which could be attached to an expression such as

    [1, "elephant"] : ???

Any expression which tries to mix types in the same list will cause a type error:

    val bad_list = [1, "elephant"];

    ;;; ML TYPE ERROR - Cannot unify the following types
    ;;; FOUND    :  string
    ;;; WANTED   :  int
    ;;; CONTEXT  :  val bad_list

This error message explains that in the context of compiling the declaration of bad_list, PML has discovered a wrongly-typed expression. Specifically, it has encountered a string where it was expecting another integer (since a list starting with the value 1 should contain nothing else but integers). This is no less of an error than (for example) trying to add together a string and an integer:

    val bad_sum = 1 + "elephant";

    ;;; ML TYPE ERROR - Cannot unify the following types
    ;;; FOUND    :  string
    ;;; WANTED   :  int
    ;;; CONTEXT  :  val bad_sum

To construct aggregate types whose components are of different types, ML provides the tuple type constructor, written as an infix "*". This constructor takes two or more types as arguments. For example:

    int * real                          (a 2-tuple)
    int * real * string                 (a 3-tuple)
    int * real * string * bool          (a 4-tuple)

The syntax for writing tuple-typed values uses parentheses ("(" and ")") and commas:

    (1, 1.0) : int * real
    (1, 1.0, "a") : int * real * string
    (1, 1.0, "a", true) : int * real * string * bool

There is no mechanism by which tuples can be constructed dynamically (i.e. there are no value constructors for tuples): lists can be dynamic only because they contain objects of a single type; tuples may contain different types but must be of fixed length. Other type constructors built in to ML are:

ref: short for reference. This type adds the concept of updatable structures to the language, but won't be considered further in this file;
labelled record: an extension of the tuple type which allows fields in a structure to be labelled with identifiers. This type is described in the section on user-defined types below;
function: this is an extremely important type in ML, and will be discussed in detail in the next section.

The type of a value is normally deduced by the compiler (as in all the above examples) without the need for explicit declarations. An exception to this arises in the case of functions which are overloaded (this concept is explained later). Where a type declaration is needed, it can be attached to any expression or binding in the manner we have seen already. For example:

    val pair : bool * int = (true, 0) and n : int = 1;
    val pair = (true, 0) : bool * int
    val n = 1 : int

Such type declarations (or constraints as they are usually called) can be useful too for documentation purposes, particularly where expressions involve complex, user-defined types. Well-placed type constraints can add useful information for the reader of a program.

Functions

Functions in ML are simply values, with the same status as values of other types. The most general type of a function is

    'a -> 'b

(pronounced alpha arrow beta): this represents a function which takes an argument of some type alpha and returns a result of some (possibly different) type beta. Particular functions will have the type variables 'a and 'b instantiated in different ways. Thus:

    real;
    fn : int -> real

    floor;
    fn : real -> int

Because functional values have no meaningful printed representation, they are always displayed by PML just as the word fn. There are no value constructors for function types, so a special syntax must be used to create new functional values. Here is an example of a function which doubles its argument (the function has a single formal parameter x and computes a result which is 2 times the value of x):

    fn x => 2 * x;
    fn : int -> int

    it 4;
    8 : int

We could give this a name using a standard value declaration:

    val double = fn x => 2 * x;
    val double = fn : int -> int

    double 4;
    8 : int

Named functions can be made recursive by use of the keyword rec:

    val rec fac = fn n => if n = 0 then 1 else n * fac(n-1);
    val fac = fn : int -> int

    fac 6;
    720 : int

We can see from this last example (and from some earlier ones) how function application is represented simply by the juxtaposition of function and argument. Parentheses aren't necessary except to change the order of evaluation: function application binds more tightly than any other expression construct in ML and so will always be evaluated first unless parentheses are used to change this. Compare:

    double 4 + 4;
    12 : int

    double (4 + 4);
    16 : int

An important point to note about functions is that every function takes only a single argument (this is apparent from the function type, which specifies only one argument and one result). Functions which are to operate on more than one value may be written in either of two ways. Firstly, the two (or more) values needed may be provided to the function as a tuple. Here, for example, is the maximum function on integer pairs:

    val max : int * int -> int = fn (n, m) => if n >= m then n else m;
    val max = fn : int * int -> int

    max(0, 1);
    1 : int

This concept can cause confusion to those used to programming in other languages, as max here looks very much like a two-argument function would if declared in, say, POP-11 or Pascal. However, the following example emphasises that max does indeed take a single argument, a tuple, by constructing the tuple separately. This example also demonstrates the use of a let expression which localises a declaration to an expression; the binding of arg is not visible at top-level.

    let val arg = (0, 1) in max arg end;
    1 : int

Many of the built-in functions we have seen already (such as +, *, ^) are actually defined in this way. Their applications are normally written differently however, as they are all declared to be infix operators which means that they can be written between the two components of their arguments. Such operators can only be written in the usual way if the keyword op is used to turn off their infix status. For example:

    op ^;
    fn : string * string -> string

    op *(4, 2);
    8 : int

We could declare max as infix too if we wished:

    infix max;

    0 max 1;
    1 : int

An alternative solution for functions which require multiple arguments is to have functions which return functions as results. Here's a function which adds an item to the end of a list (rev is a built-in function which reverses the order of items in a list):

    val add_to_end = fn item => fn list => rev (item :: (rev list));
    val add_to_end = fn : 'a -> 'a list -> 'a list

Note the type: the argument to add_to_end is an alpha-value and the result is a function from alpha-lists to alpha-lists (the function arrow -> associates to the right). Using the function requires two applications:

    add_to_end 4 [1, 2, 3];
    [1, 2, 3, 4] : int list

but the advantage of this is that we can apply it just once, and then save the resulting function for use later.

    val add_4_to_end = add_to_end 4;
    val add_4_to_end = fn : int list -> int list

    add_4_to_end [1, 2, 3];
    [1, 2, 3, 4] : int list

    add_4_to_end it;
    [1, 2, 3, 4, 4] : int list

ML provides a more concise syntax for function declarations, introduced by the keyword fun. We would normally define add_to_end as follows:

    fun add_to_end item list = rev (item :: (rev list));
    val add_to_end = fn : 'a -> 'a list -> 'a list

The effect is exactly the same as with the previous definition, but the declaration is shorter and clearer and the function created will be more efficient to run. Also, functions declared with fun are implicitly recursive, so there is no need for the rec keyword. As a further example of a simple function declaration, we will define the function iota which, when applied to an integer argument n, constructs a list of all the integers from 1 to n. (The name iota is taken from the language APL.) The definition makes use of another let expression to define a local function Iota which generates the list of integers from n1 to n2.

    fun iota n =
    let
        fun Iota n1 n2 = if n1 > n2 then [] else n1 :: Iota (n1+1) n2
    in
        Iota 1 n
    end;
    val iota = fn : int -> int list

    iota 10;
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] : int list

Overloaded Functions

A complexity in the ML type system arises when the same function name is used to stand for functions of different types. This concept is called overloading. Many of the built-in maths functions have this property. For example:

    2 + 3;
    5 : int

    2.0 + 3.0;
    5.0 : real

Here the name "+" is used for both the integer and real addition functions. The relational operators are like this too:

    "a" < "b";
    true : bool

    1 < 2;
    true : bool

Such overloaded names must always be used at a particular type, and type constraints are sometimes necessary to indicate which type is intended. For example, an alternative definition of double might be:

        fun double x = x + x;

but this will fail with the message:

    ;;; ML TYPE ERROR - Cannot determine a type for overloaded operator
    ;;; INVOLVING:  +
    ;;; CONTEXT  :  fun double

as there is no indication of whether the parameter x is meant to be integer or real. A type constraint is necessary:

    fun double x : int = x + x;
    val double = fn : int -> int

This concept is quite distinct from that of polymorphism, where the same function may be used on objects of different types. List concatenation (written "@") is an example:

    fun dup x = x @ x;
    val dup = fn : 'a list -> 'a list

    dup [2];
    [2, 2] : int list

    dup [2.0];
    [2.0, 2.0] : real list

Polymorphism is the rule and overloading the exception in ML.

`Pattern Matching`

Function declarations may be written with more than one clause, where alternative clauses are separated by the keyword "|". An equivalent declaration of the factorial function would be:

    fun fac 0 = 1
    |   fac n = n * fac(n-1);
    val fac = fn : int -> int

    fac 6;
    720 : int

In any particular application of the function only one of the alternative clauses can be used; the process by which a particular alternative is chosen is called pattern matching. The formal parameters in each clause (0 and n in this case) are called patterns. Patterns mirror the structure of values, and their function is to decompose values into their constituent parts. We say that a pattern matches a value if the pattern has the same structure as the value; whenever a pattern matches a value, the value can be decomposed according to the pattern. For example, the pattern 0 matches the integer value 0 only; the pattern n however, being a variable, matches any value (although in the context of the definition of fac, the value must be an integer). A pattern such as (0, n) would match any 2-tuple whose first component was 0. Variables which occur in patterns (such as n in fac) are so-called binding occurrences, i.e. the match, if it succeeds, binds the variables in the pattern to the corresponding components of the argument value, and the bindings remain visible throughout the right hand side of the chosen clause. Thus matching n against, say, 3, would bind n to the value 3, as would matching (0, n) against (0, 3). Applying fac to the value 0 we get:

    fac 0;
    1 : int

The argument 0 matches the pattern in the first clause of fac, and so that clause is chosen. The right hand side of the clause, 1, is returned straight away. Conversely, the application:

    fac 3;
    6 : int

matches the second rule of fac, which, by the binding rules described above, expands the right hand side of the rule to

    3 * fac(3-1);

By following through the recursive call (and a further two calls) the answer

    3 * 2 * 1 * 1 = 6

is computed. It's not only constants, variables and tuples which can be used to build patterns: the value constructors of types can be used too. This means that we can pattern-match on lists (but not on functions, which have no value constructors). In a context where we have defined:

    a : 'a
    x : 'a list

then we can construct the value

    (a :: x) : 'a list

Conversely, we can use the pattern (a :: x) to do the opposite task: to decompose any non-empty list into two parts a and x, a being the fist item in the list (the head of the list) and x being the list of all items but the first (the tail of the list). The function length uses pattern matching on lists to measure the length of a list: the length of the empty list is obviously zero, while the length of any non-empty list must be 1 greater than the length of its tail.

    fun length [] = 0
    |   length (a::x) = 1 + length x;
    val length = fn : 'a list -> int

    length [];
    0 : int

    length numlist;
    3 : int

It's important that for every function definition, at least one clause should match for every possible value to which the function might be applied. If this were not the case, the function would be undefined for some values and a run-time error could arise. The ML compiler will give a warning about any declaration which is not complete in this sense. For example, this definition of sumlists which adds together two integer lists component-wise is incomplete:

    fun sumlists [] [] : int list = []
    |   sumlists (n :: ns) (m :: ms) = (n + m) :: sumlists ns ms;

    ;;; ML WARNING - Clauses of function declaration are non-exhaustive
    ;;; CONTEXT  :  fun sumlists
    val sumlists = fn : int list -> int list -> int list

The definition leaves an expression such as

    sumlists [1] [];

undefined. Some additional constructs are available for patterns which make possible some sophisticated effects. The next function, removedups, removes duplicate items from a sorted list; the restriction to sorted lists allows us to assume that duplicate items will always be adjacent to one another in the list. Two new pattern constructs are used: the wildcard pattern, written using the underscore as "_", matches any value, but creates no new bindings so that the matched value is effectively discarded; the layered pattern, written var as pattern, both binds the matching value to the variable var, but also matches it against the subsidiary pattern which may decompose it further. With this construct, both the whole value and its component parts can be made accessible.

    fun removedups (a :: (l as b :: _)) =
        if a = b then
            removedups l
        else
            a :: removedups l
    |   removedups l = l;

    val removedups = fn : 'a list -> 'a list

    removedups [1, 1, 2, 3, 4, 4, 5, 6, 6, 6, 7];
    [1, 2, 3, 4, 5, 6, 7] : int list

Defining New Types

The type system of ML is extendable: the keyword datatype introduces declarations of new types. This single declaration form encompasses all the sorts of user defined types found in languages such as Pascal - enumerated types, records and unions. Here, for example, is an enumerated type representing the days of the week:

    datatype day = Sun | Mon | Tue | Wed | Thu | Fri | Sat;
    datatype day
    constructor Fri : day
    constructor Mon : day
    constructor Sat : day
    constructor Sun : day
    constructor Thu : day
    constructor Tue : day
    constructor Wed : day

This declaration creates a new type day together with a set of seven value constructors, each of which is a value of the new type (these are in fact the only values of the new type). The value constructors may be used in both patterns and expressions:

    fun weekend Sat = true
    |   weekend Sun = true
    |   weekend _ = false;
    val weekend = fn : day -> bool

    weekend Wed;
    false : bool

    weekend Sun;
    true : bool

Value constructors may be declared to take arguments, in which case they become functions from their argument types to the newly created type. This supports the creation of new record types such as complex numbers:

    datatype complex = Complex of (real * real);
    datatype complex
    constructor Complex : real * real -> complex

The keyword of introduces the argument type of the constructor; the resulting function Complex converts 2-tuples of real numbers into complex numbers. The Complex constructor can be used for pattern matching also:

    fun realpart (Complex(r, _)) = r
    and imagpart (Complex(_, i)) = i;
    val realpart = fn : complex -> real
    val imagpart = fn : complex -> real

    realpart (Complex(1.0, 0.0));
    1.0 : real

    imagpart (Complex(~1.5, 0.5));
    0.5 : real

For more sophisticated structures, we might choose to use a labelled record as the argument to a value constructor. This is similar to a tuple, in that it has a fixed number of fields of mixed type, but each field is labelled with an identifier; the labels are made explicit both in the type itself and in every value of the type. With large structures this can make programs much clearer, and there is a special pattern matching construct available for selecting particular labelled fields. Labelled records are written inside braces, "\{" ... "\}"; all the labels in a record must be distinct, but as a consequence the fields may be written in any order. This next datatype uses a labelled record to describe a book:

    datatype book = Book of {
        title : string,
        author : string,
        date : int
    };
    datatype book
    constructor Book : {author:string, date:int, title:string} -> book

    val good_book = Book {
        title = "Standard ML",
        author = "Harper et.al.",
        date = 1986
    };
    val good_book = Book({author = "Harper et.al.", date = 1986, title =
        "Standard ML"}) : book

Note how the colon symbol ":" is used in the type specification to separate field names and their types while the equality symbol "=" is used in expressions to associate values with fields. Fields are sorted internally into a standard order, so the output doesn't always match the input exactly. Unwanted fields of the record can be ignored in pattern matches by use of the record wildcardpattern, written "..." (this is another keyword of the language):

    fun written_by name (Book {author, ...}) = author = name;
    val written_by = fn : string -> book -> bool

    written_by "Harper et.al." good_book;
    true : bool

    written_by "Henry James" good_book;
    false : bool

As a further demonstration of the flexibility of datatype declarations, here is a declaration of the type of simple binary trees (we can define a simple binary tree as EITHER an empty (Null) tree OR a Node consisting of two binary trees):

    datatype tree =
        Null
    |   Node of (tree * tree);
    datatype tree
    constructor Node : tree * tree -> tree
    constructor Null : tree

An example tree might be:

    val tree = Node(Node(Null, Null), Null);
    val tree = Node(Node(Null, Null), Null) : tree

The function height measures the height of a tree:

    fun height Null = 0
    |   height (Node(left, right)) =
        let val hl = height left and hr = height right
        in  (if hl > hr then hl else hr) + 1
        end;
    val height = fn : tree -> int

    height tree;
    2 : int

It is also possible to create parameterised type constructors. For example, a more useful binary tree would include some kind of information, perhaps at each interior node. We can parameterise the definition of trees as follows:

    datatype 'a tree =
        Null
    |   Node of ('a tree * 'a * 'a tree);
    datatype 'a tree
    constructor Node : 'a tree * 'a * 'a tree -> 'a tree
    constructor Null : 'a tree

Now the Node constructor takes a 3-tuple as an argument: a pair of trees plus some information of type alpha. We can instantiate alpha in any way we choose:

    Node(Null, 0, Null);
    Node(Null, 0, Null) : int tree

    Node(Node(Null, "a", Null), "d", Null);
    Node(Node(Null, "a", Null), "d", Null) : string tree

but, as ever, we cannot mix integers and strings in the same tree. Some of the built-in types of ML are no more than datatypes. Lists, for example:

    datatype 'a list = nil | op :: of ('a * 'a list);

This declaration won't actually compile however, as it is forbidden to redefine built-in types. The datatype declaration creates new, unique types which match only themselves; even if a declaration is repeated word for word, the two types thus created are distinct. An alternative declaration form, the type declaration, creates what is called a type synonym or a type abbreviation, i.e. it defines a new name for some existing type. This offers no more type security, as types with the new name will match correctly with the original type, but it can be extremely useful in allowing more descriptive names to be given to types for documentation purposes. Thus in geometry, we might declare:

    type point = real * real;
    type point

    type line = point * point;
    type line

There are no value constructors associated with such types: the values of type point are precisely those values of type real * real. Because of this equivalence, the ML compiler can never deduce for itself that any particular pair of real numbers is meant to represent a point. Values must be explicitly constrained with the new types to show any effect. For example,

    val origin : point = (0.0, 0.0);
    val origin = (0.0, 0.0) : point

    val x_axis : line = (origin, (1.0, 0.0))
    and y_axis : line = (origin, (0.0, 1.0));
    val x_axis = ((0.0, 0.0), (1.0, 0.0)) : line
    val y_axis = ((0.0, 0.0), (0.0, 1.0)) : line

    fun gradient (((x1, y1), (x2, y2)) : line) : real = (y2-y1)/(x2-x1);
    val gradient = fn : line -> real

    gradient x_axis;
    0.0 : real

It's worth stressing again that although a function such as gradient has been typed to take an argument of type line, it will in fact work on any type whose underlying structure is

    (real * real) * (real * real)

For example, we might choose to represent a circle as two points (its centre, plus an arbitrary point on the circumference):

    type circle = point * point;
    type circle

    val unit_circle : circle = (origin, (1.0, 0.0));
    val unit_circle = ((0.0, 0.0), (1.0, 0.0)) : circle

We can now say (without getting a type error):

    gradient unit_circle;

the result of which is probably meaningless. This would be impossible had we defined points, lines and circles as datatypes (in a manner analagous to the definition of complex numbers above). A third form of type declaration provided by ML, the abstype declaration, allows the creation of abstract data types. This will not be covered here.

The Exception Mechanism

ML run-time errors (i.e. those not caught by the type checker) manifest themselves as exceptions. It's possible to provoke one of these with a bad usage of a built in operator such as div:

    1 div 0;
    ;;; ML EXCEPTION: div (Division by zero)

or, more realistically, by using the gradient function defined above:

    gradient y_axis;
    ;;; ML EXCEPTION: / (Real overflow)

The exception message prints the name of the exception which was raised (such as div or /); this is normally the same as the name of the function which failed, so as to give an indication of what has gone wrong. Some exceptions will also print an extra message to provide more information, as with Real overflow above. The default effect of an exception is to abort the current evaluation and return to the top-level prompt. This need not be the case however, as exceptions may be caught by means of the handle construct:

    1 div 0 handle div => 999;
    999 : int

The exception raised by the bad div expression is trapped by the exception handler

    div => 999

This tries matching the name of the raised exception against the name specified in the handler; in this case they are the same, so the value of the handler is returned as the result of the expression. In the case where the exception names don't match, the raised exception passes through the handler either up to the next handler or to top level. This next example tries catching the wrong exception, without success:

    gradient y_axis handle div => 999.0;
    ;;; ML EXCEPTION: /

The / exception raised by the application of gradient passes through the div handler and up to top-level. Note how the handler has been changed to return 999.0 (a real) instead of 999 (an integer): this is because we have

    (gradient y_axis) : real

and the result returned by an exception handler must have the same type as that of the expression being handled. New exceptions can be declared with the exception keyword and then raised to signal error conditions in user functions. Here the function subscript picks the nth item from a list, but raises an exception if the list is not long enough:

    exception subscript;
    exception subscript : unit

    fun subscript 1 (item :: _) = item
    |   subscript n (_ :: list) = subscript (n-1) list
    |   subscript n [] = raise subscript;
    val subscript = fn : int -> 'a list -> 'a

    val days = [Sun, Mon, Tue, Wed, Thu, Fri, Sat];
    val days = [Sun, Mon, Tue, Wed, Thu, Fri, Sat] : day list

    subscript 5 days;
    Thu : day

    subscript 8 days;
    ;;; ML EXCEPTION: subscript

    subscript 8 days handle subscript => Sun;
    Sun : day

Exceptions can also be used to communicate information between the point at which they are raised and the point at which they are handled. In this case the exception has to be declared with a type; when it is raised, it is raised with a value of that type, and the handler for it must use a pattern-matching construct to determine a course of action depending on the value.

    exception errcond : string;
    exception errcond : string

    fun try x =
        if x < 10 then raise errcond with "too low"
        else if x > 19 then raise errcond with "too high"
        else x * x
    and keep_trying x = try x handle errcond with
            "too low" => keep_trying (x+1)
        |   "too high" => keep_trying (x-1)
        |   msg => raise errcond with msg;
    val try = fn : int -> int
    val keep_trying = fn : int -> int

    keep_trying 15;
    225 : int

    keep_trying 1;
    100 : int

    keep_trying 20;
    361 : int

The simple declaration

    exception subscript;

is actually just a contracted form of

    exception subscript : unit;

so that exceptions default to the unit type. Similarly, the handler

    subscript => Sun;

is equivalent to the full handler form

    subscript with () => Sun;

the superfluous "with ()" being elided. A special built-in exception interrupt is raised by interrupt signals generated from the keyboard. This can be caught in the same way as any other exception, allowing ML programs to keep control after interrupts.