Computer Science 591i
Substitution, Free and Bound Variables

The function FV finds the free variables of a term
A Formal Definition of Substitution
Some Lemmas Relating to Substitution

In the last lecture we introduced the -calculus as providing not only theoretical characterisation of computation but as being the basis a rigorous characterisation of a class of programming languages called functional programming languages. It was, we alleged, the case that it is possible to verify programs written in such languages in a simple way related to ordinary mathematical proof.

We can make our understanding of the role of the -calculus more precise as follows:

A program is an expression E_prg, say, of the -calculus
A data-set is also an expression E_data of the -calculus.
Running a program E_prg with a given data-set E_data corresponds to forming the application (E_prgE_data) and then using reduction rules to transform that application into a simple form that can reasonably be considered as the result of the computation. We gave a preliminary definition of beta-reduction, and mentioned delta-reduction as "working out" applications involving constants.

Thus, from the

-calculus perspective, there is no essential distinction between program and data - it is only a matter of point of view.

This means in particular that if we want to prove that a program is correct, then we must prove that the corresponding -calculus expression has a particular property. Such a "proof" does not guarantee that the program does in fact behave as required - but it does provide a basis for determining whether the error is in the user's program or in the language implementation.

In this lecture we will develop these ideas further.

We will make our definition of beta-reduction precise enough to be programmed.
Moreover we will prove (at considerable length) that substitution has some apparently simple properties. For example we will show that, for any term E of the -calculus and variable x that E[x:=x] = E. Since we are characterising substitution with regard to being able to program it, the proofs we do in this section are examples of the kind of proof we would like to be able to do with our mechanised proof-methods for the Theorem datatype.
The essential basis of our proving programs "correct" is to show that two apparently different expressions E₁ and E₂ of the -calculus always "give the same result" when applied to any data-expression. In practice we have to qualify "always" to "always when in certain specified circumstances". For example we may say that the abstractions
always give the same result when applied to an integer.
Consequently, an issue we have to tackle is "what do we mean by equality in the -calculus?". We would like, say, to regard x . (+ 5 x) and y . (+ 5 y) as being equal - to be able to write, with a good conscience,
is a great convenience, because it means that we can harness a whole load of mathematical expectations about equality to our understanding of program-verification. But the two expressions above are manifestly different, so developing an appropriate concept of equality is necessary.

For example, suppose we want to prove that a sorting function that we have written, say merge_sort, is correct. Then, within the rigorous functional-paradigm, merge_sort should be exactly equivalent to an expression of the -calculus. What does it mean to say that merge_sort is correct? There are two criteria:

For any list l of data (possibly a list satisfying some criterion that is part of the specification of merge_sort) the result of evaluating
```
        (merge_sort l) 
```
is sorted. How do we render this English requirement into a formalism? Well it's fairly easy to write a function sorted that determines if a list is sorted - a list is sorted if it is empty, or contains one element, or if the first two elements are in order and tail of the list (everything but the first element) is sorted. So our requirement can be expressed as
The other requirement is that (merge_sort l) contain exactly the same elements as l itself.

In order to understand how to implement software that can reliably support proof of facts like these we need to characterise exactly many important aspects of the -calculus. Let's start by defining -reduction so precisely that we can write a program to implement it.

Until further notice, when we speak of "equality" of terms of the -calculus we mean syntactic identity

That is to say, for the present we are restricting ourselves to the very simplest, most basic, idea of equality.

Sub-Expressions and Occurrences

Informally, an expression E₁ is said to occur in an expression E₂ if it is a sub-expression of E₂.

Definition of occurrence

If E and F are terms of the

-calculus, then we say that E occurs in F if

OCC1 E occurs in F if E = F

OCC2 If E occurs in F, and G is a term, then E occurs in FG

OCC3 If E occurs in F, and G is a term, then E occurs in GF

OCC4 If E occurs in F, and x is a variable, then E occurs in x . F

[Note Hindley allows that the bound variable x of a abstraction x. E occurs in the abstraction even if x does not occur in E. Barendregt does not. We have followed Barendregt.]

There can be several occurrences of E₁ in E₂. Note that if E₁ is a variable v, say, then if v is the variable bound in a -abstraction, this does not constitute an occurrence. For example v does not occur in ( v. 2), while x has two occurrences in (+ x x).

When we defined -reduction, we were not very precise about what we meant by substitution. Now is the time to remedy this,

Bound and Free variables

Now we need to define the idea of a bound variable. This is essential to giving a precise sense to the scope of a variable, and thereby being able to avoid confusing what are essentially two different variables which happen to have the same name.

Bound and Free Occurences of a Variable

Let E be an expression of the -calculus, and let v be a variable. We say that an occurrence of v in E is bound if it is inside a sub-expression of E of the form v.E₂. An occurrence is said to be free otherwise. Thus v occurs bound in v. x v and in (y v. v) but it occurs free in x. v x.

Note that we are speaking of an occurrence of a variable as being bound - a variable can occur both bound and free in a given expression. For example, in v v. v, the first occurrence of v is free, and the last is bound.

The FV function finds the free variables of an expression

In order to define substitution we will need to be able to operate on the set of free variables of an expression.

We can define a function FV which forms the free-variables of an expression as:

FV1

FV(v) = {v} for a variable v

FV2

FV(c) = {} for a constant c

FV3

FV(E₁E₂) = FV(E₁)

FV(E₂)

FV4

FV( v .E) = FV(E) - {v}

Here FV1 says that the set of free variables of an expression that consists of a single variable is the set consisting of just that variable, while FV2 says that the set of free variables of a constant expression is empty. FV3 says that the free variables of an application E₁E₂ is the union of the free variables of the two expressions E₁ and E₂, while FV4 says that the set of free variables of a -abstraction is the set of free variables of its body minus the variable bound by the abstraction.

For example FV ( x. (f x y)) = {f,y}

An expression E is said to be closed if it has no free variables, i.e. FV(E) = {}.

-conversion

It is clear that the variable used in a -abstraction ought to be regarded as arbitrary. Thus x. + x 2 and y. + y 2 are, intuitively, the same function.

There is indeed a rule of the calculus, called -conversion, which allows the above two expressions to be treated as equivalent. It is a little tricky however, since one does not want to convert x. y x to y. y y - the rule is that we may only replace the variable bound in a -abstraction by one that does not occur free in the body. The conversion rule is thus:

provided

The Height of a Term.

We can regard a term of the

-calculus as being a tree, we define the notion of the height of a term as:

The Pitfalls of Substitution

Now let's return to the problem of defining substitution.

Consider for example:

clearly we shouldn't substitute 3 for x in the inner

-abstraction, since that is essentially a different x - the outer x is out-of-scope in the inner

-abstraction. So our beta-reduction is to

On the other hand if we have an inner-

-abstraction whose variable is different

it is correct to substitute inside the inner

-abstraction, obtaining:

So we have to make sure that, as we explore the expression we are substituting inside, if we encounter an inner -abstraction that has the same variable as the one we're replacing, that we don't continue substituting inside the expression. This is embodied in rule S5 below.

However, there is yet another pitfall for the naive implementor of -reduction. Suppose we want to -reduce

₁

₂

and one of the free variables of E₂ is the bound variable of a

-abstraction inside E₁, then we are facing a problem called variable capture. The problem is exemplified in

which does not reduce to

WRONG

Instead, we need to perform an

-conversion on the inner

-expression so that the bound-variable does not occur free in the argument of the beta-redex.

and now we can go ahead, obtaining (

z. + (+ y 3) z)

The definition of substitution

To summarise our discussion above, we may say that substitution, forming E₁[v:=E₂], `E₁ with E₂ substituted for v' is a straightforward matter of rewriting sub-terms except when we come to a -abstraction which binds the variable v or a variable free in E₂. We can define it by cases, using v,u for variables, E,E₁... for expressions, with C for the set of constants:

v[v:=E] = E

S2 - u[v:=E] = u, when u v

S3 - c[v:=E] = c, for any c in C

S4 - (E₁E₂)[v:=E] =(E₁[v:=E]E₂[v:=E])

S5 - v. E [v:=E₁] = v. E

S6 - ( u. E₁) [v:=E] = u. (E₁ [v:=E]),

u v

u FV(E)

v FV(E₁)

S7 - ( u. E₁) [v:=E] = w. ((E₁[u:=w]) [v:=E]),

u v

u FV(E)

v FV(E₁)

w FV(E) FV(E₁)

Comments on S1-S7

Firstly let's note that S6 does follows the definition given in Hindley & Seldin in incorporating the condition v FV(E₁). We could envisate leaving out the condition v FV(E₁) from S6, forcing a change of variable (S7) in that case. This would slightly simplify the definition of substitution, though it does not appear to simplify the proofs given below.

Cases S1-S4 need no comment. Case S5 is the one we discussed earlier in which the variable we are substituting for is rebound in a -abstraction. Thus, inside the expression it no longer `means' the same thing - in some sense it is actually a different variable, so we should not substitute for it.

In case S6, the -abstraction introduces a new variable u different from v, but, there is no problem of confusing it with any variable occurring in E either when u does not occur in E or when v does not occur in E₁.

However in case S7 there is a real problem - the new variable u introduced in the -abstraction is the same as a variable occurring free in E and v occurs in E₁ (so that we will actually make a change in E₁. The solution is to perform an -conversion, replacing it throughout the -abstraction by a variable w that does not occur in either E or in E₁. We can always choose a w for S7 because we have an infinite supply of variables to choose from and any -calculus expression only contains finitely many.

Lemmas Relating to Substitution

We now have our first opportunity to prove a results relevant to developing a theory of equality in the -calculus. If we take nothing for granted we find that some rather "obvious" properties of substitution take quite a lot of effort to prove. This arises from the fact that there are seven clauses in the definition of substitution, so that if we are starting from the definition we have to consider all seven cases in our proof. Later, with a decent repetoire of lemmas under our belt, we'll have less labour in dealing with substitution.

Our proofs are by induction on the height of terms, and, since S6 requires us to change the variable of a -abstraction, we'll need a little lemma which says that height is unchanged thereby.

Lemma Height

If x,y are variables of the

-calculus, and E is a term of that calculus, then

height(E[x:=y]) = height(E) Proof
We proceed by induction on height(E)

Base case n=0

Suppose E = x. Then height(E[x:=y]) = height(y) = 0 = height(E)
Suppose E = u, where u x is a variable. Then height(E[x:=y]) = height(u) = 0 = height(E).
Suppose E = c, where c is a constant. Then height(E[x:=y]) = height(c) = 0 = height(E).

Inductive Step

Suppose for a given n we have for any term E for which height(E) n, for all x',y'

height(E) = height(E[x':=y'])

Consider an expression E of height n+1.

Suppose E=(E₁E₂) . Then, from the definition of height, we have that height(E₁)n height(E₂)n. So by the inductive hypothesis,
Thus, using H3 and S4, we have
Suppose E = x . G.
Then E[x:=y] = E by S5. Hence height(E[x:=y]) = height(E)
Suppose E = u . G where u x and either u FV(y) or x FV(G). Note that by H4, height(G) = n.
Suppose E = u . G where u x and u FV(y) and x FV(G). Again, height(G)=n.
Let us choose a variable w for which w FV(G) and w FV(y).

Lemma Sub.1

[A] If x FV(E) then

[B] If x FV(E) then

and

E[x:=F] = E

Proof

We proceed by induction on n, the height of E.

Base Case n=0

Suppose E = x. Then FV(E) = FV(x) = {x} by FV1. So x FV(E), that is we have case [A]. Now, by S1, E[x:=F] = F.
So the result is satisfied for this case.
Suppose E = u x, where u is a variable. Then FV(E) = FV(u) = {u} by FV1.
So x FV(E) , that is we have case [B].

Now E[x:=F] = u[x:=F] = u by S2, so we obtain:
Suppose E = c C. Then FV(E) = FV(c) = {} by FV1, and hence x FV(E) , that is we have case [B].
Now E[x:=F] = c[x:=F] = c by S3.

Inductive Step

Suppose for a given n we have for any term E for which height(E) n

[A] If x FV(E) then

[B] If x FV(E) then

FV(E[x:=F]) = FV(E)

E[x:=F] = E

Consider an expression E of height n+1 We must show our result holds for E

Suppose E=(E₁E₂) . Recall that from FV3,
Note that by H3, height(E₁) n , height(E₂) n .
There are 4 sub-cases
- Sub-case 1: x FV(E). This is case [B] of the theorem.
  In this sub-case, it follows that x FV(E₁), x FV(E₂). Then
  by the inductive hypothesis. Also
  by the inductive hypothesis.
- Sub-case 2: x FV(E), x FV(E₁), x FV(E₂)
  
  In this sub-case, applying S4, we have:
  by FV3. And now, applying the inductive-hypothesis and the associativity of union
  by a little elementary set-theory. We now use FV3 again:
- Sub-case 3: x FV(E), x FV(E₁), x FV(E₂)
  
  In this sub-case, applying S4, we have:
  by FV3. And now, applying the inductive-hypothesis, remembering x FV(E₁)
  by a little elementary set-theory, remembering again that x FV(E₁). We now use FV3 again:
- Sub-case 4 x FV(E), x FV(E₁), x FV(E₂)
  
  This case is symmetric with case 3.
Suppose E = x . G.

In this case FV(E) = FV(G)-{x}, so x FV(E) . Using S5 we have
Also
thus the lemma holds for this case.
Suppose E = u . G where u x and either u FV(F) or x FV(G). Note that by H4, height(G) = n.
We have, by S6
and by FV4
There are 2 sub-cases
- Sub-case 1: x FV(G)
  
  In this sub-case, by the inductive hypothesis, FV(G[x:=F]) = FV(G)
  and G[x:=F] = G
  So,
  by FV4. Moreover E[x:=F] = E
  
  So, since we xFV(E)=FV(G)-{u}, the result holds in this case.
- Sub-case 2: x FV(G) , u FV(F).
  In this sub-case, by the inductive hypothesis,
  So,
  by FV4. But {u} FV(F) , so FV(F)-{u} = FV(F). Hence
  So our result holds for this sub-case.
Suppose E = u . G where u x and u FV(F) and x FV(G).
Let us choose a variable w for which w FV(G) and w FV(F). We have
and by FV4 and [1]
There are 2 sub-cases.
- Sub-case 1: u FV(G) ,
  In this sub-case, by the inductive hypothesis, FV(G[u:=w])=FV(G). So, again by the inductive hypothesis, since xFV(G)
  Since u FV(G) . So that, substituting in [2] we obtain
  Since w has been chosen not to be in either set of free-variables. On the other hand, FV(E) = FV(G)-{u} = FV(G), since in this sub-case u FV(G) . So we have
  Proving the result for this sub-case, since x FV(G) and x u implies x FV(E)
- Sub-case 2: u FV(G) ,
  In this sub-case, by the inductive hypothesis and FV1,
  Thus x FV(G[u:=w]). Applying the inductive hypothesis twice again we obtain
  Now w x , since x in this case is a free variable of G, while w is chosen not to be. Hence we have by [2]
  establishing the result in this sub-case.

Lemma Sub.2

If E is a term of the

-calculus, and x is a variable, then E[x:=x] = E

Proof

We proceed by induction on n, the height of E.

Base Case n=0

Suppose E = x. Then, by S1 E[x:=x] = x = E So the result is satisfied for this case.
Suppose E = u x, where u is a variable. Then, by S2, E[x:=x] = u = E. So the result is satisfied for this case.
Suppose E = c C. Then, by S3, E[x:=x] = c = E. So the result is satisfied for this case.

Inductive Step

Suppose for a given n we have for any term E for which height(E) n

E[x:=x] = E Consider an expression E of height n+1 We must show our result holds for E

Suppose E=(E₁E₂) . Then, by S4 we have E[x:=x] =(E₁[x:=x]E₂[x:=x]) . Using the inductive hypothesis, we have E[x:=x] =(E₁E₂) = E . So the result holds in this case.
Suppose E = x . G.
Then E[x:=x] = E by S5. Hence the result holds in this case.
Suppose E = u . G where u x. Then necessarily u FV(x). So we can use S6 and the inductive hypothesis, noting that by H4, height(G) = n. to obtain
and we see the result holds for this case.
In these circumstances we don't have occasion to apply rule S7.