In the last lecture we introduced the
-calculus as
providing not only theoretical characterisation of computation but as being
the basis a rigorous characterisation of a class of programming languages called
functional programming languages. It was, we alleged, the case that
it is possible to verify programs written in such languages in a simple way
related to ordinary mathematical proof.
We can make our understanding of the role of the
-calculus more precise as follows:
-calculus
-calculus.
-calculus perspective, there is no
essential distinction between program and data - it is only a matter of
point of view.
This means in particular that if we want to prove that a
program is correct, then we must prove that the corresponding
-calculus expression has a particular property. Such a
"proof" does not guarantee that the program does in fact behave as required
- but it does provide a basis for determining whether the error is in the
user's program or in the language implementation.
-calculus and variable x
that E[x:=x] = E. Since we are characterising
substitution with regard to being able to program it, the proofs
we do in this section are examples of the kind of proof we would like to be
able to do with our mechanised proof-methods for the Theorem
datatype.
-calculus always "give
the same result" when applied to any data-expression. In practice we have to
qualify "always" to "always when in certain specified
circumstances". For example we may say that
the abstractions
x . (+ 5 x)
x. (+ x 5)
-calculus?". We would like, say, to regard
x . (+ 5 x) and
y . (+ 5 y) as being equal - to be able to write,
with a good conscience,
x . (+ 5 x) =
y . (+ 5 y)
is a great convenience, because it means that we can harness a whole load of mathematical expectations about equality to our understanding of program-verification. But the two expressions above are manifestly different, so developing an appropriate concept of equality is necessary.
For example, suppose we want to prove that a sorting function that we
have written, say merge_sort,
is correct. Then, within the rigorous
functional-paradigm, merge_sort should be exactly equivalent to an
expression of the
-calculus. What does it mean to say
that merge_sort is correct? There are two criteria:
(merge_sort l)
is sorted. How do we render this English requirement into a formalism?
Well it's fairly easy to write a function sorted that determines
if a list is sorted - a list is sorted if it is empty, or contains
one element, or if the first two elements are in order and tail of the list
(everything but the first element) is sorted. So our requirement can be
expressed as
l .(sorted (merge_sort l)) =
l . true
In order to understand how to implement software that can reliably
support proof of facts like these we need to characterise exactly
many important aspects of the
-calculus. Let's start by
defining
-reduction so precisely that we
can write a program to implement it.
-calculus we mean syntactic identity
That is to say, for the present we are restricting ourselves to the very simplest, most basic, idea of equality.
Informally, an expression E1 is said to occur in an expression E2 if it is a sub-expression of E2.
-calculus, then we say that E occurs in F if
OCC1 E occurs in F if E = F
OCC2 If E occurs in F, and G is a term, then E occurs in FG
OCC3 If E occurs in F, and G is a term, then E occurs in GF
OCC4 If E occurs in F, and x is a variable,
then E occurs in
x . F
[Note Hindley allows that the bound variable x of a
abstraction
x. E occurs in
the abstraction even if x does not occur in E. Barendregt
does not. We have followed Barendregt.]
There
can be several occurrences of E1 in
E2. Note that if E1 is a variable
v, say, then if v is the variable bound in a
-abstraction, this does not constitute an occurrence. For
example v does not occur in (
v. 2),
while x has two occurrences in (+ x x).
When we defined
-reduction, we were not very precise
about what we meant by substitution. Now is the time to remedy this,
Now we need to define the idea of a bound variable. This is essential to giving a precise sense to the scope of a variable, and thereby being able to avoid confusing what are essentially two different variables which happen to have the same name.
Let E be an expression of the
-calculus,
and let v be a variable. We say that an occurrence of v
in E is bound if it is inside a sub-expression of
E of the form
v.E2. An
occurrence is said to be free otherwise.
Thus v occurs
bound in
v. x v and in
(y
v. v) but it occurs free in
x.
v x.
Note that we are speaking of an occurrence of a variable as being
bound - a variable can occur both bound and free in a given
expression. For example, in v
v. v, the first occurrence of v
is free, and the last is bound.
We can define a function FV which forms the free-variables of an expression as:
FV(E2)
v .E) = FV(E) - {v}
Here FV1 says that the set of free variables of an expression that consists of a
single
variable is the set consisting of just that variable,
while FV2 says that the set
of free variables of a constant expression is empty. FV3 says that the free
variables of an application E1E2 is the union of
the free variables of the two expressions E1
and E2,
while FV4 says that the set of free variables of a
-abstraction is the set of free variables of its body minus the
variable bound by the abstraction.
For example FV (
x. (f x y)) = {f,y}
-conversion It is clear that the variable used in a
-abstraction ought to be regarded as arbitrary.
Thus
x. + x 2 and
y. + y 2 are, intuitively, the
same function.
There is indeed a rule of the
calculus,
called
-conversion, which allows the above two
expressions to be treated as equivalent. It is a little tricky however,
since one does not want to convert
x. y
x to
y. y y - the rule is that we may
only replace the variable bound in a
-abstraction by one that does not occur free in the
body.
The
conversion rule is thus:
x.E
y.E[x:=y]
FV(E)
-calculus as being
a tree, we define the notion of the height of a term as:
x.E) = 1+height(E) Now let's return to the problem of defining substitution.
Consider for example:
x. + x (
x.(+ x 5) 4) 3)
-abstraction, since that is essentially a different x - the
outer x is out-of-scope in the inner
-abstraction.
So our beta-reduction is to
x.(+ x 5) 4))
-abstraction whose variable
is different
x. + x (
y.(+ y x) 4) 3)
-abstraction, obtaining:
y.(+ y 3) 4)
So we have to make sure that, as we explore the expression we are
substituting inside, if we encounter an inner
-abstraction
that has the same variable as the one we're replacing, that we don't
continue substituting inside the expression. This is embodied in rule S5
below.
However, there is yet another pitfall for the naive implementor
of
-reduction.
Suppose we want to
-reduce
x. E1) E2
-abstraction inside E1,
then we are facing a problem called variable capture. The problem
is exemplified in
x. (
y. + x y)) (+ y 3)
y. + (+ y 3) y)) ------- WRONG
-conversion on the inner
-expression so that the bound-variable does not occur free in the
argument of the beta-redex.
x. (
z. + x z)) (+ y 3)
z. + (+ y 3) z)
To summarise our discussion above, we may say that substitution,
forming E1[v:=E2],
`E1 with E2 substituted for v'
is a straightforward matter of rewriting sub-terms except when we come to a
-abstraction which binds the variable v or a variable free in
E2.
We can define it by cases, using v,u for variables,
E,E1... for expressions, with C for the
set of constants:
S2 - u[v:=E] = u, when u
v
S3 - c[v:=E] = c, for any c in C
S4 - (E1E2)[v:=E]
=(E1[v:=E]E2[v:=E])
S5 -
v.
E [v:=E1] =
v. E
S6 - (
u. E1) [v:=E]
=
u. (E1 [v:=E]),
v and
FV(E)
or
v
FV(E1)
S7 - (
u. E1) [v:=E]
=
w. ((E1[u:=w]) [v:=E]),
v and u
FV(E)
and v
FV(E1), where we choose
w so that
FV(E)
FV(E1)
Firstly let's note that S6 does follows the definition given in
Hindley & Seldin in incorporating the
condition v
FV(E1). We could
envisate leaving out the condition v
FV(E1) from S6, forcing a change of variable (S7) in that
case. This would slightly simplify the definition of substitution, though
it does not appear to simplify the proofs given below.
Cases S1-S4 need no comment. Case S5 is the one we discussed earlier in
which the variable we are substituting for is
rebound in a
-abstraction. Thus, inside the
expression it no longer `means' the same
thing - in some sense it is actually a different variable, so we should not
substitute for it.
In case S6, the
-abstraction introduces a
new variable u different from v, but,
there is no problem of confusing it with any variable
occurring in E either when u does not occur in E
or when v does not occur in E1.
However in case S7 there is a real problem - the new variable u
introduced in the
-abstraction is the same as
a variable occurring free in E and v occurs in
E1 (so that we will actually make a change in
E1.
The solution is to perform an
-conversion, replacing it throughout the
-abstraction by a variable w that does not
occur in either E or in E1.
We can always choose a w for S7 because we have an infinite
supply of variables to choose from and any
-calculus
expression only contains finitely many.
We now have our first opportunity to prove a results relevant to
developing a theory of equality in the
-calculus. If we
take nothing for granted we find that some rather "obvious" properties of
substitution take quite a lot of effort to prove. This arises from the fact
that there are seven clauses in the definition of substitution, so
that if we are starting from the definition we have to consider all seven
cases in our proof. Later, with a decent repetoire of lemmas under our
belt, we'll have less labour in dealing with substitution.
Our proofs are by induction on the height of terms, and, since
S6 requires us to change the variable of a
-abstraction, we'll need a little lemma which says that
height is unchanged thereby.
-calculus,
and E is a term of that calculus, then
x is
a variable. Then height(E[x:=y]) = height(u) = 0 = height(E).
Inductive Step
Suppose for a given n we have for any term E for which
height(E)
n, for all x',y'
height(E) = height(E[x':=y'])
n
height(E2)
n.
So by the inductive hypothesis,
Thus, using H3 and S4, we have
x . G.
Then E[x:=y] = E by S5. Hence height(E[x:=y]) = height(E)
u . G
where u
x and
either u
FV(y) or
x
FV(G).
Note that by H4, height(G) = n.
u . G[x:=y])
u . G
where u
x and
u
FV(y) and
x
FV(G). Again, height(G)=n.
Let us choose a variable
w for which w
FV(G) and
w
FV(y).
u . G)[x:=y]
= (
w . G[u:=w][x:=y])
w . G[u:=w][x:=y]) [A] If x
FV(E) then
FV(F)
[B] If x
FV(E) then
Base Case n=0
FV(E),
that is we have case [A].
Now, by S1, E[x:=F] = F.
FV(F) = ({x}-{x})
FV(F)
= FV(F) = FV(E[x:=F])
So the result is satisfied for this case.
x, where u is a
variable. Then FV(E) = FV(u) = {u} by FV1.
FV(E) , that is we have case [B].
Now E[x:=F] = u[x:=F] = u by S2, so we obtain:
Suppose E = c
C.
Then FV(E) = FV(c) = {} by FV1,
and hence x
FV(E) , that is we have case [B].
Now E[x:=F] = c[x:=F] = c by S3.
E[x:=F] = c = E
Inductive Step
Suppose for a given n we have for any term E for which
height(E)
n
[A] If x
FV(E) then
FV(F)
[B] If x
FV(E) then
FV(E2)),
Note that by H3, height(E1)
n ,
height(E2)
n .
There are 4 sub-cases
FV(E). This is case [B]
of the theorem.
In this sub-case, it follows that
x
FV(E1),
x
FV(E2).
Then
by the inductive hypothesis. Also
by the inductive hypothesis.
FV(E),
x
FV(E1),
x
FV(E2)
In this sub-case, applying S4, we have:
= FV(E1[x:=F])
FV(E2[x:=F])
by FV3. And now, applying the inductive-hypothesis and the associativity of union
FV(F)
(FV(E2)-{x})
FV(F)
= ((FV(E1)
FV(E2))-{x})
FV(F)
by a little elementary set-theory. We now use FV3 again:
FV(F) = (FV(E)-{x})
FV(F)
FV(E),
x
FV(E1),
x
FV(E2)
In this sub-case, applying S4, we have:
= FV(E1[x:=F])
FV(E2[x:=F])
by FV3. And now, applying the inductive-hypothesis, remembering
x
FV(E1)
(FV(E2)-{x})
FV(F))
= ((FV(E1)
FV(E2))-{x})
FV(F))
by a little elementary set-theory, remembering again that
x
FV(E1).
We now use FV3 again:
FV(F)) = (FV(E)-{x})
FV(F))
FV(E),
x
FV(E1),
x
FV(E2)
This case is symmetric with case 3.
x . G.
In this case FV(E) = FV(G)-{x}, so
x
FV(E) .
Using S5 we have
x . F)[x:=F]
= (
x . F) = E Also
thus the lemma holds for this case.
u . G
where u
x and
either u
FV(F) or
x
FV(G).
Note that by H4, height(G) = n.
We have, by S6
u . G)[x:=F]
= (
u . G[x:=F])
and by FV4
u . G[x:=F]) = FV(G[x:=F]) - {u}
There are 2 sub-cases
FV(G)
In this sub-case, by the inductive hypothesis,
FV(G[x:=F]) = FV(G)
and G[x:=F] = G
So,
by FV4. Moreover E[x:=F] = E
So, since we x
FV(E)=FV(G)-{u}, the result
holds in this case.
FV(G) ,
u
FV(F).
In this sub-case, by the inductive hypothesis,
FV(F)So,
FV(F)) - {u}
by FV4. But {u}
FV(F) , so
FV(F)-{u} = FV(F). Hence
FV(F))
= ((FV(E)-{x})
FV(F))
So our result holds for this sub-case.
u . G
where u
x and
u
FV(F) and
x
FV(G).
Let us choose a variable
w for which w
FV(G) and
w
FV(F). We have
u . G)[x:=F]
= (
w . G[u:=w][x:=F]) _______________[1]
and by FV4 and [1]
w . G[u:=w][x:=F])There are 2 sub-cases.
FV(G) ,
In this sub-case, by the inductive hypothesis,
FV(G[u:=w])=FV(G). So, again by the inductive hypothesis,
since x
FV(G)
FV(F)
= (FV(G)-{x})
FV(F)
Since u
FV(G) . So that, substituting in
[2] we obtain
FV(F)) -{w}
= (FV(G)-{x})
FV(F)
Since w has been chosen not to be in either set of free-variables.
On the other hand,
FV(E) = FV(G)-{u} = FV(G), since in this sub-case
u
FV(G) . So we have
FV(F)
Proving the result for this sub-case, since
x
FV(G) and
x
u implies
x
FV(E)
FV(G) ,
In this sub-case, by the inductive hypothesis and FV1,
{w}
Thus x
FV(G[u:=w]). Applying the inductive
hypothesis twice again we obtain
FV(F)
= ((FV(G)-{u})
{w}) -{x})
FV(F)
Now w
x , since x in this
case is a free variable of G, while w is chosen not
to be. Hence we have by [2]
FV(F)
= (FV(E) - {x})
FV(F)
establishing the result in this sub-case.
-calculus,
and x is a variable, then E[x:=x] = E
Base Case n=0
x, where u is a
variable. Then, by S2, E[x:=x] = u = E.
So the result is satisfied for this case.
Suppose E = c
C. Then, by S3,
E[x:=x] = c = E.
So the result is satisfied for this case.
Inductive Step
Suppose for a given n we have for any term E for which
height(E)
n
x . G.
Then E[x:=x] = E by S5. Hence the result holds in this case.
u . G
where u
x. Then necessarily
u
FV(x). So we can use S6 and the inductive
hypothesis,
noting that by H4,
height(G) = n.
to obtain
u . G[x:=x] =
u .
G = E
and we see the result holds for this case.
-calculus. However it should be stressed that
doing this can change the mathematical properties. For example, a simple
change in the definition of substitution (as discussed above
with respect to S6 and S7) can invalidate all the proofs, which, you
will observe, are quite long. In particular, it's necessary to maintain a
firm distinction between the pure-calculus (with or without
-reduction) and the calculus with
-reduction.
-Calculus"
Cambridge University Press.