Extending our Language to Include Types

Introduction

In this lecture we will consider how we could extend Scheme to be a typed language. The mechanism for type-checking is similar to the mechanism for evaluation. In effect, instead of type-checking an expression to arrive at a value, we type-check an expression to arrive at a type.

To do this in the conventional way we need to provide a way of defining type-constraints which say what type our variables have. Let's adopt the convention that we'll use the name of the function that recognises types. So we might say:

    (define (integer? x) 45)

To mean that x is a variable restricted to having values that satisfy the integer? predicate.

We'll use environments to hold the current type of variables, rather than their value, for example:

    (type_check '(+ x 45) the-type-environment)

will return the symbol integer?. Type errors will be discovered by the function type_check (type_check '(+ x 'hello)) will produce an error.

Cartesian Product Types

If type₁...type_n are types, then the type (>< type₁...type_n) denotes the set of all Scheme lists (x₁...x_n) where each x_i belongs to the type type_i.

This apparatus for describing cartesian product types is close to that employed by the SML language. Note that while the domains of these types intersect, we will provide no mechanism for viewing them as sharing members (just as, for example, Java integers and Java objects are commonly BOTH represented as 32-bit sequences, but are treated by the Java type system as being entirely separate).

Note also that we have no means of describing certain sets of objects that one might create in a Scheme program, for example the set of all lists which contain alternating strings and integers. Sound type systems that are decidable will inevitably constrain the ability of the user to characterise sets of computational objects using the type-language.

Function Types

We need a way to refer to the types of functions. We'll use the convention that:

  (--> (>< type₁...type_n) type)

is the type of a function that takes n arguments of type type₁...type_n and returns result of type type. Thus, for example, the type of

    (lambda ((integer? x)) (+ x 5))

will be (--> (>< integer?) integer?)

Environments for Type-checking

Recall that an environment is simply a list of association lists. For type-checking each association list will hold the types of variables for one function-call, except for the last, which is called the global environment, and which holds the types of system variables such as car, cons, cdr and of user-defined global variables.

3 Implementing (type_check expr env) to type-check an expression

Let us now define our top-level function for type-checking expressions, type_check. This function takes two parameters. (type_check expr env)

means "type-check the expression expr in the environment env". So, we would wish to support a call such as

        (type_check '(+ x (>< 2 y))  '(((x . integer?) (y . integer?)) '()))

which would return integer?.

The code for type_check

This differs from that for eval_env! only in the case of a constant like 1, in which case we need to ascribe a specific type to it.

If the expression is a pair [1] it means that we have a function applied to arguments, or a special form. We call the function type_compound [2] to handle this case.

If the expression is a symbol [3] then we look up its type in the environment [4]. Otherwise we call type_constant to give it type.[5]


(define (type_check expr env)
    (if (pair? expr)                                  ;[1]
        (type_compound (car expr) (cdr expr) env)     ;[2]
        (if (symbol? expr)                            ;[3]
            (lookup_type_chk expr env)                         ;[4]
            (type_constant expr) )                    ;[5]
        )
    )


(define (type_constant c)
    (cond
        ((integer? c) 'integer?)
        ((string? c) 'string?)
        ((symbol? c) 'symbol?)
        ((boolean? c) 'boolean?)
        (else (error "constant " c "of unknown type"))
    ) )

So, as before, merely having defined this function, we can type-check a constant. We give the empty environment '() as second argument.


(example '(type_check 34 '()) 'integer?)

3.1 Implementing the environment - the lookup_type function.

We'll need a slightly different version of the function lookup, lookup_type which we'll use to find the type of a variable in an environment. It uses assoc to examine each frame until it finds the required one. The main difference is that it requires that every variable have a type.


(define (lookup_type var env)
    (if (null? env) #f
        (let ((pair (assoc var (car env)))) ; look it up in current frame
            (if pair
                (cdr pair)                  ; found a binding for var in frame
                (lookup_type var (cdr env)))     ; no binding in this frame
            )
        )
    )

(define (lookup_type_chk var env)
    (let ((t (lookup_type var env)))
        (if t t (error "variable " var " is undeclared"))
    ))

Having defined lookup_type it is now possible to apply type_check to type-check a variable in an environment.


(example '(type_check
    'x                           ; the variable x
     '(
       ((a . boolean?) (b . boolean?)) ; first frame of environment
       ((x . integer?) (y . integer?))        ; second frame
       ((+ --> (integer? integer?) --> integer?))         ; global frame
      ))
    'integer?)

So, if we try to type-check an undeclared variable we'll get an error message [note that this code is in effect commented out]

    (example '(type_check 'x '()) 'x)

    Error: variable  x  is not declared

3.2 Methods for special forms

We can keep our old definition of method, although the methods themselves will be different.


(define alist_method '())

(define (method f)
    (let ((pair (assoc f alist_method)))
        (if pair (cdr pair) pair )
        )
    )

For convenience, let us define a mutating procedure to add a method so that we can extend our repetoire of special forms on the fly.


(define (add_method! name m)
    (set! alist_method (cons (cons name m) alist_method))
    )

3.3 type_compound type-checks special forms, and any application

The function type_compound is modelled on eval_compound!. We see if there is a method to handle a special form. If there is, it will take the environment as an extra argument. If there is not we type the arguments and type-check the application of the function. Again lambda expression in the map to ensure that type_check has its env argument.


(define (type_compound f args env)
    (let ((m (method f)))               ; get method if there is one
        (if m
            (m args env)                 ; use method if there is one
            (type_application f
                (map (lambda (e)
                        (type_check e env))
                    args)
                env
                )                       ; otherwise type-check the arguments
            ; and apply the function to them
            )                           ; end if
        )                               ; end let
    )

... Defining the method to type-check the special form `lambda`

Type-checking a lambda- expression is quite complicated. What we do is rather like applying a lambda-expression when we were doing evaluation, that is to say, we bind the formal parameters to types, and type-check the body. Consider:

    (lambda ((integer? x)) (+  x b))

type-checked with the environment

    '(
        ((b  . integer?))
        ((+  --> (integer? integer?) integer?)
    )

. We'll have to create an environment

    '(
        ((x  . integer?))
        ((b  . integer?))
        ((+  --> (integer? integer?) integer?)
    )

in which to type-check the body of the lambda-expression.

(define (method_lambda args_n_body env)
    (let* (
            (args    (car args_n_body))
            (body    (cdr args_n_body))
            (ty_args (cons '>< (map car args)))
            (frame   (map (lambda (a) (cons (cadr a) (car a))) args))
            (ty_ret  (type_sequence body (cons frame env)))
          )
       (list '--> ty_args ty_ret)
    ))

Here type_check_seq is like our eval_sequence


(define (type_sequence seq env)
    (cond
        ((null? seq) (error "cannot type-check null seqence"))
        ((null? (cdr seq)) (type_check (car seq) env))
        (else
            (type_check (car seq) env)
            (type_sequence (cdr seq) env)
            )
        )
    )

We can't type-check a very complicated lambda expression yet, but let's try this:



(add_method! 'lambda method_lambda)


(example '(type_check '(lambda ((integer? x)) x) '())
    '(--> (>< integer?) integer?)
         )

... Defining the method to type-check the special form if

The method for if Type-checking conditional expressions can be quite complicated, as you will see if you look at the Java language specification. What type might we ascribe to the expression (if (> b 0) (- acct b) "overdrawn") ?

There are a number of possible treatments of this. We'll adopt the simplest, and require that both branches of the conditional have the same type, thus effectively ruling out the above expression. Some languages, such as Java, have the idea of a type-lattice in which an expression would be given the least-upper-bound of types.

To type-check a conditional expression of the form (if bool expr₁ expr₂) we first type-check bool [1]. If it has type boolean then we type-check expr₁ [2]. If bool type-checks to #f then we type-check expr₂ [3]. Otherwise we arrange that the (if...) expression type-checks to itself. Note that here again we have a rule for evaluation that differs from standard Scheme.


(define (method_if args env)
    (let ((bool (type_check (car args) env)))               ;[1]
        (cond                                             ;
            ((eq? bool 'boolean?)
                (unify_types
                    (type_check (cadr args) env)    ;[2]
                    (type_check (caddr args) env)    ;[2]
                ))
            (else (error
                     "first argument of conditional must be boolean")))))

(add_method! 'if method_if)

For some applications, we will need to use our old friend unify to unify-types, but for the nonce we'll just require exact equality.

(define (unify_types ty1 ty2)
    (if (equal? ty1 ty2) ty1
         (error "type " ty1 "is not the same as " ty2)
    ))



(example

    '(type_check '(lambda ((boolean? b)) (if b 3 4)) '())

    '(--> (>< boolean?) integer?)
 )

Let's Postpone Type Checking Quoted constants

How to handle the type of data-structures such as lists raises quite a few problems. Basically, the question is "a list of what?".

Type checking set!

Type-checking set! will not be a big deal in our type-system, for we'll require that both sides have the same type. (define (method_set! args env) (let ((ty_left (type_check (car args) env)) (ty_right (type_check (cadr args) env)) ) (unify_types ty_left ty_right) ) )



(add_method! 'set! method_set!)

(example

    '(type_check '(set! x 23) '(((x . integer?))))

    'integer?
 )

3.4 Use type_application for finding the result-type

We now come to the definition of the function type_application which implements our call of (type_application f args env) back i

... We put types of primitives in the global type-environment

Let's restrict ourselves to arithmetic operators and relations, because we are going to have some difficulty in talking about the type of car, cdr and cons.


(define the-type-environment
    (list(list
      (cons '+ '( --> (>< integer? integer?) integer?))
      (cons '- '( --> (>< integer? integer?) integer?))
      (cons '>< '( --> (>< integer? integer?) integer?))
      (cons '>= '( --> (>< integer? integer?) boolean?))
      (cons '> '( --> (>< integer? integer?) boolean?))
      (cons '< '( --> (>< integer? integer?) boolean?))
      (cons '<= '( --> (>< integer? integer?) boolean?))

     ))
    )

In defining type_application, we use type_check to to find the ty of f


(define (type_application f ty_args env)
    (let ((ty (type_check f env)))
        (cond
            ((type_procedure? ty)    ; is f a procedure?
                (begin
                    (unify_args (type_arg_list ty) ty_args)
                    (type_result ty)
                ))
            (else (error "applying non-function " f " of type " ty))
            )
        )
    )


(define (type_procedure? obj)
    (and (pair? obj) (eq? (car obj) '-->)))

(define type_arg_list cdadr)
(define type_result   caddr)



(define (unify_args types1 types2)
    (if (null? types1)
        (if (null? types2) #t
            (error "wrong number of arguments"))
        (if (null? types2)
            (error "wrong number of arguments")
      (begin (unify_types (car types1) (car types2))
      (unify_args (cdr types1) (cdr types2)) ))
    ))

We can now try applying a primitive function to arguments:


(example
    '(type_application
        '+
        '(integer? integer?)
        the-type-environment)

    'integer?)

Indeed, we can type-check an expression involving primitives:


(example '(type_check '(+ (>< 4 5) 3) the-type-environment) 'integer?)


(example '(type_check
                '(lambda ((integer? x)) (+ (>< 4 5) 3))
                the-type-environment)

         '(--> (>< integer?) integer?)
)

But this will give an error, if you give it a chance.

   (type_check
                '(lambda ((integer? x)) (+ (> 4 5) 3))
                the-type-environment) )

Error: type  integer? is not the same as  boolean?

The define special form

Since define always binds a variable to have a value, and we can determine the type of the value, it might seem that we don't need to change the format of define. However in order to type-check recursive functions, we have got to have a type associated with the function-name before we can work out the type of the body. Consider

    (define fact
        (lambda ((integer? n)) (if (= n 0) 1 (>< n (fact (- n 1))))))

We can't work out the type of the lambda expression without knowing the type of fact, and conversely we can't work out the type of the fact without knowing the type of the lambda expression. We can solve this problem, rather clumsily, by requiring that we specify the type of fact when defining it.

    (define ((--> (>< integer?) integer?) fact)
        (lambda (n) (if (= n 0) 1 (>< n (fact (- n 1))))))

Again, will not implement the "shorthand" form exemplified by: (define (fred (integer? x)) (+ x 1)).

Suppose we are type-checking

      (define VAR EXPR)

We must first make a null entry for VAR in the global type environment, and then type-check the expression EXPR in the modified global environment.

In detail, we define the function method_define which [1] performs an initial check that the form of the define statement is valid. Next [2] it unpacks the specification qvar of the variable being defined and the value expr to be given to that variable. We further unpack the specification into the type ty of the variable, and its name var.

Now follows [3] a check that var is truly a variable.

Having checked the legality of the construct, we proceed [4] to add var as a variable to the global type environment with its assigned type if necessary, which leaves us free [5] to update the type of that variable with the type of the expression expr type-checkd in the global environment.

All that is left to us [7-8] is to provide suitable error reports about malformed define statements.


(define (method_define args env)
    (if (= (length args) 2)                                         ; [1]
        (let* ((qvar (car args)) (expr (cadr args))                 ; [2]
               (ty   (car qvar)) (var (cadr qvar)))

            (if (symbol? var)                                       ; [3]
                (begin
                    (set! the-type-environment                    ; [4]
                        (add_tvariable var ty the-type-environment))
                    ( unify_types
                        ty
                        (type_check expr the-type-environment)     ; [6]
                        )
                    )
                (error "variable needed in define statement"        ; [7]
                    (cons 'define args))
                )
            ) ; end let

        (error "wrong form for define statement" (cons 'define args)) ;[8]
        ) ; end if
    )

The function add_tvariable makes the new entry for a variable in an environment if one is required.



(define (add_tvariable var ty env)
    (let ((ty1 (lookup_type var env)))
        (if ty1
            (begin (unify_types ty ty1) env)
            (cons
                (cons ; extend the first frame
                (cons var ty) (car env))
            (cdr env))
        )
    ))

We can now incorporate the define special form into our interpreter.


(add_method! 'define method_define)

(type_check '(define ((--> (>< integer?) integer?) add1) (lambda ((integer? x)) (+ x 1))) the-type-environment )

4.3 Implementing the begin special form.

We also need to provide a method for begin. We simply use type_sequence to type-check the sequence of expressions forming the body of a begin expression.


(add_method! 'begin
    (lambda (seq env)
        (type_sequence  seq env)
        )
    )

4.4 Testing out the "banking" example.

For convenience, let us define a function evalg which type-checks an expression in the global environment. This will allow us to implement the equivalent of the "read-eval" loop of the Scheme system, whereby it continually reads an expression, type-checks it, and writes out the type. Without learning more about the input-output aspects of the Scheme language we can't make our interpreter behave exactly like the built-in system, but we can make it tolerably convenient to use.


(define (evalg expr)
    (writeln "=:> " (type_check expr the-type-environment))
    )

Note that we have only treated the most basic kind of definition - we have to have an explicit lambda if we are going to define functions. To define a function fred to our interpreter, we need to do:


(evalg
    '(define ((--> (>< integer?) integer?) fred)
        (lambda ((integer? x)) (+ x 2)))
    )

Now, we can check whether this definition has "taken", by looking fred up in the global environment.

For convenience, let us define evals which allows us to type-check a sequence of expressions in the global environment.


(define (evals seq) (for-each evalg seq))

We can now do our most of our banking example and see what happens in the environment model when we have a set!. Let us use evals to define the make_account function, and to make an account for Jane. What we can't do is to return the string "insufficient funds", or a list ('balance balance), because our type-system is not rich enough to support this.


(evals
    '(
     (define ((--> (>< integer?) (--> (>< integer?) integer?)) make_account)
         (lambda ((integer? balance))
             (lambda ((integer? amount))
                 (if (>= balance amount)
                     (begin (set! balance (- balance amount))
                          balance
                         )
                     balance
                     )
                 )
             )
         )
     (define ((--> (>< integer?) integer?) jane)  (make_account 100))
     jane
     )
    )

Types for Data Structures

So far we have not treated lists at all. This is because of an important problem. A Scheme list can contain objects of any type whatsoever. So how are we to talk about its type. There are two main approaches

Parametric Polymorphism: This is the SML approach, which involves the idea of a type-constructor, that is to say a function from types to types. In fact we have already met type-constructors, namely the operators >< and --> which create the a cartesian product type and a function type respectively.
It type is a type, then (list type) denotes the type of all Scheme lists (x₁...x_n) (for any n) where each x_i belongs to the type type.
However, the most convenient approach to these types involves introducing type-variables, so that for example (list (? a)) is used to mean a list of whatever type the variable (? a) happens to be bound to.
In this approach, the function car for example has the type (--> (>< (list (? a))) (? a)) . So, for example if a variable l has type (list integer?) then the expression (car l) is ascribed the type integer
A Type Hierarchy: This is the Java approach. It is less precise than parametric polymorphism, but allows you to build a wider range of structures (for example lists with alternating integers and strings). We might have a hierarchy something like this:
```
                    object?

    boolean? char? null? integer? pair?  string? symbol? vector?
```
We will still need a kind of parametric polymorphism for functions, which rather complicates the issue (and let's note that Java doesn't have functions, but methods, which are not objects in their own right).
With the type-hierarchy approach car has the type (--> (>< pair?) object?) , saying in effect that a pair can hold anything as its first component. To make this work, we have to introduce a notion of type-casting - the expression
```
    (cast (car list1) integer?)
```
will make sure at run time that (car list1) actually is an integer.
```
(define (method_cast args env)
    (let ((type (type_check (car args) env))
          (type_res (cadr args)))

        (cond                                             ;
            ((castable? type type_res) type_res)

            (else (error
                     "cannot cast type " type " to type " type_res )))))

(add_method! 'cast method_cast)
```
One type is castable? to another if it is either above it or below it in the hierarchy.