Computer Science 591i
Evaluation

For this work, we'll write (C) for the -calculus with a particular constant set C.

So far, we've looked at the -calculus as a formalism which we can manipulate according to rules that do (I hope) seem reasonable. Now, we could take the -calculus itself as a basis for computation - it's fairly easy to implement what we've defined as a computational structure, particularly using a LISP derivative (Scheme, POP-11, SML...).

However, from a computational point of view, to restrict ourselves to a literal implementation of -reduction as a means of computation is problematic, because substitution as defined involves extensive copying of structures if it is implemented in the most obvious way. So, it's desirable to think of ways that we can map the -calculus onto various kinds of structure some of which will be more efficient computationally.

Mathematically, there is also a problem with the unvarnished -calculus. Since we are interested in using it a basis for specifying computations that we can prove correct. Unfortunately the -calculus, as we currently know it, is paradoxical when used as the basis for a logic. So, we'd like to see how we can eliminate the possibility of paradox, by restricting ourselves to forms which are seen as legal.

To illustrate this problem, let's adapt an example from Meyer[1982]. Consider the term

for which T g means, in effect, that function which applies the function g twice to its argument. So, for example,

x. (( z. z³) x³)

x. (x³)³

x. x⁹

Now, consider the term (T T). Can we can put an interpretation upon it, despite the fact that it would not be legal if we think of -expressions as being mathematical functions, because in the standard mathematical concept of functions, a function can't lie in its own domain. But if (T T) is legal, and T is seen as a mathematical function, then T must lie in its own domain, a contradiction.

Nevertheless, there is a commonsense interpretation that can bu put upon (T T). Consider the expression:

(( x. T(T x)) ) g

T(T g)

T(( f x. f(f x)) g)

T( x. g(g x) )

( f x. f(f x))( x. g(g x) )

( x. ( x. g(g x) ) (( x. g(g x) ) x)))

( x. ( x. g(g x) ) ( g(g x) ))

( x. (g (g (g (g x)))))

So, we could conclude that

where o denotes function product.

However allowing self-application, as in (T T), can lead to contradiction. For example, suppose we define a (higher-order) function P by the rule

(P f) = 0

(f f) 0

(P f) = 1 otherwise

Then the expression (P P) is paradoxical. An Equational View

Note that if we regard the above "definition" of P as being instead an equation for P, the paradox disappears, just as the equation x + 1 = x is not paradoxical, merely insoluble.

Now there is a long tradition in mathematics of characterising functions by equations. The elementary trancendental functions sin, cos.. can be characterised by differential equations, and indeed more advanced functions, such as the Bessel functions are primarily characterised by differential equations.

Similarly, we can regard recursive functions, such as the factorial function, as solutions of recursion equations. Thus if we solve

we obtain the factorial function. On the other hand, an equation such as

has no solution.

Applicative Algebras

Bearing in mind the above discussion, from both the computational and mathematical point of view, it's useful to proceed abstractly by focussing on the idea of mapping the -calculus to some kind of domain in which computation takes place. By considering what should be the essential properties of such a map we can hope to characterise a useful variety of domains.

Such abstraction is second-nature to mathematicians raised in the modern tradition; computer scientists can reflect that what we are trying to do is to characterise an abstract data type which will allow us to consider a range of possible implementations of the -calculus.

Perhaps the simplest possible mathematical representation of an abstract concept of computation is focus on the idea of the application of a program-object to a data-object. If we want to be faithful to the functional paradigm (and indeed to the idea of the von Neumann architecture), we will require that program-objects and data-objects inhabit the same applicative algebra.

Definition

An applicative algebra is a structure = <D,> on which is defined a binary operation, called application. For any f, x D, we write the application of f to x as

We will write D = | |.

Now our concept of applicative algebra seems rather austere. There is no apparent apparatus, for example, for defining functions. The one thing we don't want to do is to re-introduce the -apparatus directly into the algebra, since that would prohibit us investigating other means of realising its capabilities. Instead, let's consider a map from the -calculus to our applicative algebra. Such a map, which we'll call an evaluation, is going to be built up from a simple map from the variables and constants,C, of (C) to . Such a map is commonly called an environment.

So, the idea is, given an environment which tells us what are the values in of variables and constants of (C), how can we define an evaluation which maps the whole of (C) to . Such an evaluation ought to preserve certain properties of the -calculus, so that we can feel that the -calculus is faithfully represented in .

Since we don't want explicitly to reproduce the -apparatus in , we can proceed indirectly to develop our concept of faithfulness by requiring that -calculus terms which are alpha- and beta- equivalent be mapped onto identical members of .

Definition

₁

₂

₁

₂

An Approach to Realising Evaluation

. Can the characterisation of evaluation given in EV1-5 help us actually construct an evaluation? Clearly axioms EV1-EV3 are helpful in this respect - they say how the applicative structure of the

(C) can be mapped directly into operations on an applicative algebra. But EV4 and EV5 are less directly helpful - EV4 merely relates two abstractions, which is not a manifest construct of an applicative algebra, while EV5 relates abstraction to substitution, which is again something we don't know how to do.

Rescue from our predicament arrives in the the form of our old friends the S and K combinators.

S = f g x . (f x) (g x)

Let's investigate how an evaluation maps these combinators into an applicative domain.

Now suppose we have an evaluation that, for some , maps (C) onto . Let d₁,d₂,d₃ . Let D₁, D₂, D₃ be the inverse-images of d₁,d₂,d₃ under Consider the expression

₁

₂

= (K) (D₁) (D₂)

= (K D₁ D₂) = (D₁) = d₁

And also:

₁

₂

₃

₁

₂

₃

= (S D₁ D₂ D₃) = ((D₁ D₃) (D₂ D₃))

= (d₁ d₃) (d₂ d₃)

Thus any applicative domain that is the image of (C) under an evaluation contains analogs of the S and K combinators. This turns out, as we shall see, to be an important aspect of what it is to be a domain which is a model of the (C) calculus.

Definition

An applicative algebra is said to be a combinatory algebra if there are elements K, S || such that

₁

₂

₁

CA2 S d₁ d₂ d₃ = (d₁ d₃) (d₂ d₃)

(from Meyer[1982], originally from Curry.)

One way in which we can extend an environment into being an evaluation is to make use of the S and K elements of a combinatory algebra. To do this, we will need an additional element I, in effect the identity function, which is shown necessarily to exist by the following lemma.

Lemma

Let be a combinatory algebra. Let I = SKK then for all d D

I d = d

Proof

For any d in |

|, we have I d = S K K d = (K d) (K d) = d

Now let us show that, in

(C), every term is

-equivalent to a term in which the only

-abstractions are the S, K, I combinators. If we do this, it will then be fairly obvious how we can create an evaluation mapping to a combinatory algebra.

Definition

A combinatory lambda term is defined inductively as follows.

CT1 If v (C) is a variable then v is a combinatory lambda term.

CT2 If c C is a constant then c is a combinatory lambda term.

CT3 If F,G are combinatory lambda terms, then so is (F G)

CT4 S,K are combinatory lambda terms (and hence so is I).

Now let's start finding out how to transform any term in

(C) to an equivalent combinatory term. To do this, we introduce a piece of notation that I found rather baffling when first I saw it. In essence, the notation <x> E means "the combinatory lambda equivalent of x . E". We could, of course, have used some more conventional notation such as comb(x,E) in defining the transformation to combinatory terms, but the syntactic similarity of <x> E to the

form plays a useful role in helping us to understand what is happening.

Definition

If x (C) then, for any combinatory lambda term E (C), the term denoted by <x> E is defined as follows:

CL1 <x> x = I

CL2 <x> E = (K E) if x is not free in E

CL3 <x>(F G) = (S (<x>F) (<x>G)) if x is free in F or G.

[Note - Meyer uses the notation <<x>> E, since he is being meticulous about distinguishing between this operation and an similar one defined over a combinatory algebra.]

Now we can define the transformation from a term E to an equivalent combinatory term by specifying that we systematically replace any abstraction
v. F occurring in E by <v> F

Definition

Let E be a term of

(C). Then comb(E) of E is defined by

CM1 comb(c) = c for c C

CM2 comb(x) = x for any variable x

CM3 comb((F G)) = (comb(F) comb(G))

CM4 comb( x. F) = <x>comb(F)

Lemma (the Combinatory Lambda Term Lemma)

If E is a term of (C) then

comb(E) is a combinatory lambda term.

E comb(E) where denotes - equivalence.

Proof

By induction on the height of E.

Base Case

For any term of height 0, the result follows from CM2 and CM2.

Inductive Step

Suppose for any term E for which height(E) n we have that comb(E) is a combinatory lambda term and E comb(E).

Consider a term E for which height(E) = n+1

Case 1: E = (F G). Here, by CM3
and so by the inductive hypothesis and CT3, comb((F G)) is a combinatory lambda term. Moreover, by the definition of - equivalence, since we have, from the inductive hypothesis, comb(F) F, comb(G) G, that
Case 2: E = x. F. Here, by CM4
Now by the inductive hypothesis, comb(F) is a combinatory lambda term equivalent to F.
- Sub case 2.1, F = x. Here <x>comb(F) = I. Hence comb(E) is a combinatory lambda term, and
- Sub case 2.2, x is not free in comb(F).
  applying the definition of K and equivalence. So, by the inductive hypothesis, and -conversion this is
  since x is not free in comb(F)
- Sub case 2.3, F = (G H) and x is free in comb(G) or in comb(H)

An Example

The following shows the transformation process at work. Consider:

If we rewrite this using CM1 as

<x> . ((+ x) 1) the Combinatory Lambda Term Lemma assures us that the result of this rewriting will be a new form which is alpha-beta-equivalent to the original one. Now, x occurs free in ((+ x ) 1) so, using CL3 we can rewrite this to

((S (<x> (+ x))) (<x> 1)) This can be rewritten further as

((S (S (<x> +)) (<x> x)) (K 1)) and finally as

((S (S (K +)) (I x)) (K 1))

As a reality check, if we transform back any one line by replacing the <x>.. form by the corresponding lambda abstraction, we should obtain a function equivalent to the initial one. Consider

((S (<x> (+ x))) (<x> 1)) Transform this back to

Substituting an alpha-equivalent version of the definition of S we obtain:

By beta-reduction we obtain:

Which, by further beta-reduction, yields

which is alpha-equivalent to our original function.

Enhancing the Efficiency of the Combinator Approach

The combinatorial approach is conceptually simple, which is an advantage for mathematicians. However we find that a modest lambda expression is converted to a big combinatorial expression. We can go some way towards mitigating this problem by introducing some new (redundant) combinators. Consider

CL3 <x>(F G) = (S (<x>F) (<x>G)) if x is free in F or G.

Using S in this case in effect is piping x into both the transformed versions of F and G. This only makes sense if x is free in both. So, let's introduce two new combinators, defined as follows

B = f g x. f (g x)

C = f g x. (f x) g

And now we can modify rule CL3, obtaining 3 new rules

CL3' <x>(F G) = (S (<x>F) (<x>G)) if x is free in F and G.

CL4 <x>(F G) = (B F (<x>G)) if x is only free in G.

CL5 <x>(F G) = (C (<x>F) G) if x is only free in F.

The example reworked

We can now rework our earlier example.

If we rewrite this using CM1 as

<x> . ((+ x) 1) Now, x occurs free in ((+ x ) 1), but only occurs free in (+ x) so, using CL5 we can rewrite this to

((C (<x> (+ x))) 1) This can be rewritten further as

(C (B + (<x> x)) 1) and finally as

(C (B + I) 1) Which is quite a decent condensation. However there is more to come, since we still have not dealt with the major obstacle to the practical use of combinators: multiple nested

-abstractions give rise to expressions that grow too rapidly. [Peyton-Jones has a good discussion on this - see section 16.2]

Consider an expression of the form F G. A single lambda-abstraction translates, in the most general case, ( z. F G) into

Abstracting again, even with the B optimisation, converts (y z. F G) into

Abstracting again, converts (x y z. F G) into

= S(B S (B (B S) (<x><y><z>F))) (<x><y><z>G))

If we think of the commonest circumstance in which the B combinator finds itself, we have

    <x>((B E) F)   =  (B (B E) <x>F)

That is, every abstraction replaces each such occurrence B combinator by 2 occurrences of a B. So we'd seem to have an exponential dependence on the number of abstractions.

The Turner Combinators

We can remedy this problem by introducing yet another family of combinators, the Turner combinators (have faith - we're not going to do this indefinitely). The most important of these is S', and its definition is:

And we introduce a new rule:

CL2.9 <x>(H F G) = (S' H (<x>F) (<x>G)) if x is not free in H

but is free in F or G.

The use of S' combinator is a great improvement because it gives rise to an expression that is linear in the number of abstractions.

Reverting to our consideration of abstraction over an expression of the form F G, we see that a single lambda-abstraction translates, as before, ( z. F G) into

But the next abstraction shows a marked improvement since it converts (y z. F G) into

S' S

Abstracting again, we convert (x y z. F G) into

S' S

= (S' (S' S) (<x><y><z>F) (<x><y><z>G))

It's fairly clear that each successive abstraction will simply have the effect of nesting the first combinator one deeper, with the addition of one S' combinator. The same process will take place inside F and G, but the number of additional combinators will be related to the product of the number that arose in the first abstraction and the number of abstractions.

Peyton Jones discusses the use of the B' and C' combinators, which bear the same relation to S' as S' does to S. He points out that the use of B' is not necessarily beneficial, and discusses an alternative B* combinator.

The following lemma shows that it's correct to use these new combinators and their rules in the comb function.

Lemma (the S,K,I,B,C,S' Combinatory Lambda Term Lemma)

If E is a term of (C) and comb is defined using rules CL1,CL2,CL2.9,CL3',CL4,CL5, then

comb(E) is a combinatory lambda term.

E comb(E) where denotes - equivalence.

Proof

By induction on the height of E.

Base Case

For any term of height 0, the result follows from CM2 and CM2.

Inductive Step

Suppose for any term E for which height(E) n we have that comb(E) is a combinatory lambda term and E comb(E).

Consider a term E for which height(E) = n+1

Case 1: E = (F G). Here, by CM3
and so by the inductive hypothesis and CT3, comb((F G)) is a combinatory lambda term. Moreover, by the definition of - equivalence, since we have, from the inductive hypothesis, comb(F) F, comb(G) G, that
Case 2: E = x. F. Here, by CM4
Now by the inductive hypothesis, comb(F) is a combinatory lambda term equivalent to F.
- Sub case 2.1, F = x. Here <x>comb(F) = I. Hence comb(E) is a combinatory lambda term, and
- Sub case 2.2, x is not free in comb(F).
  applying the definition of K and equivalence. So, by the inductive hypothesis, and -conversion this is
  since x is not free in comb(F)
- Sub case 2.29, F = (J G H) and x is free in comb(G) or in comb(H) but x is not free in comb(J)
- Sub case 2.3', F = (G H) and x is free in comb(G) and in comb(H)
Sub case 2.4, F = (G H) and x is not free in comb(G) but is free comb(H)
Sub case 2.5, F = (G H) and x is free in comb(G) but is not free in comb(H) This case is similar to 2.4.

Summary

The first combinatory term lemma provides one approach to constructing an evaluation of an arbitrary term E of (C). All we need to do is to transform E into an equivalent combinatory term E', which can then be mapped into an arbitrary combinatory algebra using rules EV1-EV3.

The second combinatory term lemma suggests a more efficient approach in which we use the additional combinators B,C,S' to obtain relatively condensed combinatory terms. Since the new combinators can be defined in terms of S and K, we can again map a combinatory term E'' obtained from E by use of the second lemma, into using rules EV1-EV3.

Computationally, of course, it would be important to implement the new combinators as part of any implementation of a combinatory algebra - mathematically however they're already there.

Mathematically, however, there is a snag. While we've shown that the concept of evaluation requires that an applicative algebra must be a combinatory algebra if it is the image of (C) under evaluation, and we've shown one way to construct a mapping from (C) to a combinatory algebra, we haven't shown that the mapping so constructed is an evaluation. In fact, we need a stronger concept than the combinatory algebra.

This situation will be familiar to computer scientists. An abstract data-type may be characterised by some kind of type-signature which partly specifies what the functions acting on the data-type should do, and what kinds of objects the data-type should have. But a program that meets the type-signature is not necessarily a correct implementation of the data-type. Even the equations we have put upon combinatory algebras, in CA1 and CA2, are too weak.

There are two approaches to remedying this problem. The algebraic approach is to add additional conditions beyond CA1 and CA2 which will characterise a structure called a Lambda Algebra. The non-algebraic approach is to add non-algebraic conditions beyond CA1 and CA2 which will characterise a structure called a Combinatory Model. D. A. Turner ``Another Algorithm for Bracket Abstraction'', Journal of Symbolic Logic, vol 44, no 2, pp 267-270 (June 1979).

Computer Science 591i Evaluation

Applicative Algebras

Definition

Definition

An Approach to Realising Evaluation

Definition

Lemma

Proof

Definition

Definition

Definition

Lemma (the Combinatory Lambda Term Lemma)

Proof

Base Case

Inductive Step

An Example

Enhancing the Efficiency of the Combinator Approach

The example reworked

The Turner Combinators

Lemma (the S,K,I,B,C,S' Combinatory Lambda Term Lemma)

Proof

Base Case

Inductive Step

Summary

Computer Science 591i
Evaluation