1.1 Perfectly Balanced Trees: left & right branches are same size 1.2 Well balanced trees 1.3 AVL trees are adequately balanced2 Implementing sets as trees.
2.1 Tree-nodes have four entries 2.2 Implementing the empty set representing sets as trees 2.3 We need the height of trees, empty or non-empty. 2.4 (mk_tree entry left right) makes a tree. 2.5 Making a balanced tree with make_tree ... Rotation is required to balance a tree 2.6 Implementing set->list representing sets as trees 2.7 Implementing member_set? representing sets as trees 2.8 Implementing adjoin representing sets as trees3 Other representations of sets.
If we represent a set as a balanced tree we can achieve a significant speed up in evaluating the member_set? and adjoin functions.
We speak of the tree as being composed of nodes, each of which contains an entry which is a member of the set being represented, a left branch and a right branch. A tree is a binary search tree with respect to a given total ordering relation if the entry at any node is greater than all entries occurring in the left branch, and less than all entries occurring in the right branch. This is the first data invariant for our representation of sets as trees.
A tree can also be an empty-tree, which has no entry, left branch or right branch.
The height of a tree is defined as being 0 for the empty tree, and one more than the maximum of the height of the left branch and the height of right branch for a non-empty tree.
The great advantage of a binary search tree is that, if we are looking for a given entry, x in a tree, we can compare x with the entry, y say, to find which sub-tree x must lie in. If x=y then x is in the tree, and we have found it. If x<y then we know that it must lie in the left branch of the tree if it is in the tree at all, and conversely, if x>y then it must lie in the right branch. If the tree is adequately balanced, then at each stage we are dividing the size of the set of values in which we are searching by 2, which means that our search for x will terminate in logarithmic time.
A tree, such as the one on the left above, for which the left and right branches of all subtrees contain the same number of elements, is said to be perfectly balanced.
In a perfectly balanced tree, each entry is the median of the set of entries in the whole subtree headed by that entry.
Proof by induction:
Base case: h=0. A perfectly balanced tree of height 0, that is the empty tree, contains 0 entries. 20-1 = 1-1 = 0. So the result holds.
Inductive step:
Now suppose that for some h, all perfectly balanced trees of height h have 2h-1 entries. Consider a perfectly balanced tree of height h+1. It has two sub-trees of height h, each, by the inductive hypothesis, containing 2h-1 entries. So the total number of entries for our tree of height h+1 is 2*(2h)-1 + 1 = 2(h+1) - 2 + 1 = 2(h+1) - 1.
Suppose we have a perfectly balanced tree containing n entries. Then n = 2(h-1) - 1, that is, taking logarithms to the base 2, h-1 = log(n+1) Thus we can get to any entry in worst case time O(log n), provided we know which branch to take at every node.
However we may not have a set of size exactly 2h-1 - 1 elements. Such a set cannot be represented by a perfectly balanced tree, but we can limit the amount of unbalance to maintain logarithmic time access.
Any set can be represented by a tree in which the disparity between the number of entries in the left branch and the number in the right branch is never more than 1. The algorithm to do this is obvious enough:
Let us call such a tree "well-balanced". If we kept our trees well-balanced, then this would give us the shortest worst-case time to find a given element in a member_set? operation.
However, maintaining trees in a well-balanced form is not practicable if we want an efficient implementation of adjoin. Imagine a well-balanced tree
where A and B are subtrees, and size(B) = size(A) + 1. If we now adjoin an element y > x, y < b for all b in B, to the tree, B becomes one larger, but the rearrangement required to maintain the well-balanced condition can be quite expensive.
Consider, for example the tree:
Adjoin 7 - a simple algorithm gives:
Adjust to make it well balanced:
It is clear from this example that we have a significant amount of work to do. The 7 has moved from a tip of the tree to the root-node, while the 6 has moved from the root-node to a tip. This kind of re-arrangement could take place in many circumstances in which an entry which "belongs" between the left and right branches is adjoined to a tree of any depth.
So, we need to look for compromise in our idea of balance that will make the adjoin operation cheaper. If we are less particular about balance, we can adjust balance by a local operation as we rebuild a tree. Let us say that a tree is adequately balanced if the branches of all sub-trees differ in height by no more than one. These are more commonly known as AVL trees.
Consider an adequately balanced tree T, with top-level entry x, left and right sub-trees A and B, which is converted into a tree T' by adjoining an element s > x. The right sub-tree B of T will be replaced in T' by B'. Now if height(B) = height(A) + 1, then T' will be no longer adequately balanced if height(B') > height(B).
However we can restore adequate balance by a local transformation of T', which moves some material into the left branch. This requires us to analyse four distinct cases:
If the tree B' has height 2 greater than A, it must have height at least 2. So we can expand B' as a sub-tree, obtaining the following tree, which is annotated with the heights.
Since the original tree T was adequately balanced, the unbalanced nature of T' must arise from either C or D having height h-2, but not both, since we have only adjoined one element.
If D has height h-2 then C must have height h' = h-3 or h'= h-4, so we can move it to the left branch as follows:
This new tree is adequately balanced, and is a binary search tree, since every entry in the left sub-tree is less than y, every entry in the right sub-tree is greater than y and every entry in C is greater than x and every entry in A is less than x.
We call this operation on a tree a left rotation.
This is symmetric to CASE 1, where the left branch becomes too long. It is cured by a right rotation.
However, if C has height h-2 then D must have height h"', where h"' = h-3 or h"' = h-4. We can split C into E of height h' and F of height h", where h'= h-3 or h'= h-4 and h" = h-3 or h" = h-4.
Here the new tree is adequately balanced, and is a binary search tree, since every entry in the left sub-tree is less than z, every entry in the right sub-tree is greater than z, every entry in A is less than x, every entry in E is greater than x, every entry in F is less than y, every entry in D is greater than y.
This tree-transformation can be achieved by a right rotation followed by a left rotation.
This is the symmetric condition in which the left branch becomes too long.
Let us now implement sets as AVL trees. We must first design the concrete data-structures to represent the nodes of a tree, and decide how to represent the empty set.
We will need to be able to decide quickly whether a tree is balanced, so it is convenient to have a "slot" in our representation of a node which holds the height of the tree. Thus a node is represented as a record having components
entry tree_left tree_right height
We will require that these 4-member records preserve the data-invariant that the contents of the height-slot are actually the height of the tree represented by the node.
We can use the record-class function of UMASS Scheme to create suitable records for our nodes.
(define class_set_tree
(make_class 'tree class_set '()
'(entry ; value at node
left ; left branch
right ; right branch
height ; should be the height.
We'll define the tree representation of the empty-set as being a tree-node with its components set to false, apart from the height, which is 0
(insert_static_method
class_set_tree
'empty
(lambda () (send class_set_tree 'new #f #f #f 0))
)
We will also use the method:
(insert_instance_method
class_set_tree
'null?
(lambda (this) (not (send this 'left)))
)
We define the method make to preserve the data-invariant for height:
(insert_static_method
class_set_tree
'make
(lambda (entry left right)
(send class_set_tree 'new
entry left right
(+ 1 (max (send 'height left) (send 'height right))))) )
[Note that if we were writing in Java we could embody the above code in
a user-defined constructor method]
Now we need a function to measure the degree of balance of a tree:
(define (balance T)
(let* (
(L (send 'left T))
(R (send 'right T))
(diff (- (send 'height R) (send 'height L)))
)
diff
)
)
Given these capabilities, we can define a make_tree function which, given two AVL trees, makes a new AVL tree by adjusting the balance as discussed above.
(insert_instance_method
class_set_tree
'make_balanced
(lambda (x L R)
(let* (
(T (send class_set_tree 'make x L R))
(B (balance T))
)
(cond
( (> B 1) ; right tree is too deep
(if (> (balance R) 0)
(send T 'rotate_left) ; CASE 1
(send ; CASE 3
(send class_tree 'make x L (send R 'rotate_right))
'rotate_left
)
)
)
( (< B -1) ; left tree is too deep
(if (< (balance L) 0)
(send T 'rotate_right) ; CASE 2
(send ; CASE 4
(send class_tree 'make x (send L 'rotate_left) R)
'rotate_right
)
)
)
(else T) ; balance is adequate anyway
);end cond
); end let
)
)
We can readily define the rotation operations. Let us recall our picture of a tree which is to be rotated left:
This is to be converted into a tree:
and we can do this as follows:
(insert_instance_method
class_set_tree
'rotate_left
(lambda (T)
(let* (
(R (send T 'right))
(x (send T 'entry))
(y (send R 'entry ))
(A (send T 'left))
(C (send R 'left))
(D (send R 'right))
)
(send class_set_tree 'make y
(send class_set_tree 'make x A C) D)
)
)
)
We can use the same pictures to guide our definition of right-rotation.
(define (rotate_right T)
(let* (
(L (send T 'left)
(y (send T 'entry))
(x (send L 'entry ))
(A (send L 'left ))
(C (send L 'right))
(D (send T 'right))
)
(send class_set_tree 'make x A
(send class_set_tree 'make y C D)
)
) )
)
Having managed to deal with writing a function for making adequately balanced trees, we can now define our functions to represent sets as trees. The set->list function will require us to walk the tree with an accumulator, so we need an auxiliary function help_stol.
(define (help_stol s acc)
(if (send s 'null?) acc ; empty set? use the accumulated elements
(help_stol ; collect elements
(send s 'left) ; in the left branch
(cons ; having already accumulated..
(send s 'entry) ; the current entry and ..
(help_stol ; all elements in the right branch
(send s 'right)
acc)
)
)
) ; end if
)
Now the ->list method requires us to call the auxiliary function with a null accumulator.
(insert_instance_method
class_set_tree
'->list
(lambda (s)
(help_stol s '())
)
)
We can write member_set?:
(define (member_set? x s)
(cond
((send s 'null?) #f) ; nothing belongs to the empty set
((= x (send s 'entry)) #t) ; we have found the entry for x
((< x (send s 'entry )) ; is x less than the current entry?
(member_set? x ; if so, go down the left branch
(send s 'left)))
(else (member_set? x ; otherwise go down the right branch
(send s 'right)))
)
)
(insert_instance_method
class_set_tree
'member?
member_set?)
We write the adjoin function using make_tree which will maintain balance. Essentially, it rebuilds the tree down a path; to the left of this path every entry is less than x, to the right every entry is greater.
(define nt (send class_set_tree 'empty))
(define (adjoin x s)
(cond
((send s 'null) ; to adjoin x to the empty set
(send class_set_tree 'make
x nt nt) ; we make a tree with x as the only
) ; entry. [end of null case]
((< x (send s 'entry)) ; if x less than the current entry
(send class_set_tree
'make_balanced ; we make a balanced tree, starting
(send s 'entry) ; with one whose entry is the current
(adjoin x (send s 'left)) ; whose left branch has x adjoined
(send s 'right) ; and with the same right branch.
)
) ; end < entry case
((> x (send s 'entry)) ; if x is greater than the current
(send class_set_tree
'make_balanced ; entry, we similarly rebuild ...
(send s 'entry)
(send s 'left)
(adjoin x (send s 'righ)) ; the right branch
)
) ; end > entry case
(else s) ; otherwise x is equal to current entry
) ; x is already in the tree - use it
)
(insert_instance_method
class_set_tree
'adjoin
(lambda (this x) (adjoin x this))
)
The above adjoin function takes log(n) time, because we only have to call make_tree at each node down a path in the tree, and make_tree takes constant time, since it only rearranges the nodes of the tree to a depth of 3.
We can use the generic function for intersection that already exists. This now takes time n log(n) because member_set? now takes log time.
The rest of the implementation can use the generic functions we defined in the previous lecture.
We can summarise the computational complexity of the chosen functions for given representations of sets as follows:
If you are compiling this whole lecture, we can stop at this point, because what's below doesn't form an integral part of the code embedded in the lecture.
(error "Ignore this - it's just to stop compilation at this point")
If we are representing an infinite set the set->list function cannot be implemented. It is possible to represent a countably infinite set as a stream, which can be thought of as an extension of the list concept, with a "lazy cdr" usually called tail.
Generally for infinite sets the equal_set? function is hard to implement, and will often be undecidable.
We could define a infinite set by a predicate which recognises whether an object is a member of it. For example the set of even integers could be defined by:
(define (even x)
(= (remainder x 2) 0)
)
(define (member_set? x s) (s x))
Given this representation, it is easy to define member_set?
(define (member_set? x s) (s x))
(:- (member_set? 2 even))
adjoin (but this is less useful for infinite sets) union and intersection. However the equal_set? function requires us to determine the equality of two functions, which is known to be undecidable.
Russell's Paradox, due to Bertrand Russell, shows that allowing a set to be defined just by a predicate is problematic. The main difficulty is that it allows one to have sets that are members of themselves. For example, one might speak of the set of all abstract concepts, which is surely an abstract concept and so is a member of itself. Now let us call a set normal if it is not a member of itself. Is the set of all normal sets a normal set? If it is, then it is not a member of itself, but, being a normal set it must be a member of itself, a contradiction.
We can try out this paradox in Scheme! We can define a normal set to be one which is not a member of itelf.
(define (normal x)
(not (member_set? x x)))
Now consider whether the set of all normal sets is normal. If you paste the line below into a file test.scm
(normal normal)
and execute it you will get
Error: rle: RECURSION LIMIT (pop_callstack_lim) EXCEEDED
Incidentally, this raises the question of the soundness of the lambda-calculus itself, since the lambda calculus allows us to write dangerous looking formulae like (x x). Is the calculus a formalism that can be given any consistent interpretation? - Scott and Strachey showed that it can be, but the construction is not easy.
A language is an infinite set of sequences of tokens drawn from an alphabet. A parser for a language is in effect a helper function for the member_set? function. It is easy to see how we can implement the union and intersection of languages represented by their parsers.
However the problem of performing the equal_set? computation for languages is much harder. Indeed, for general languages, it is undecidable.