LECTURE 14 Representation of Sets


Operations on sets and the representations we'll consider
The Abstract Class class_set.
      2.1   The abstract implementation of  ->set
      2.2   The abstract implementation of  intersect
      2.3   The abstract implementation of  '<=
      2.4   The abstract implementation of  =
Methods for Sets as unordered lists
      3.1   Implementing the empty set as an unordered list
      3.2   Implementing '->list representing sets as unordered lists
      3.3   Implementing 'member? representing sets as unordered lists
      3.4   Implementing 'adjoin  representing sets as unordered lists
      3.5   What's inherited

Methods for Sets as ordered lists
      4.1   Implementing the empty set as an ordered list
      4.2   Implementing '->list representing sets as ordered lists
      4.3   Implementing 'member?  representing sets as ordered lists
      4.4   Implementing 'adjoin  representing sets as ordered lists
      4.5   We have a big win implementing intersect on ordered lists
Sets as Trees - an Introduction

Operations on sets and the representations we'll consider

A set is a fundamental concept of mathematics. Unfortunately, there is no single uniform representation of set that meets all our needs as computer scientists. The most important distinction is between finite and infinite sets. A finite set can be represented by some kind of explicit enumeration in a data-structure, whereas an infinite set must be represented by some kind of description that does not explicitly enumerate the elements. Of course, it is not always practicable to enumerate large finite sets.

We shall study three representations of finite sets from the point of view of a small number of basic operations on sets. Let's use cs for our implementation of the current class of sets.

    (send cs 'empty)         The representation of the empty set.

    (send cs '->set l)       Creates a set which consists of the elements of a
                             list.

    (send s '->list)         Creates a list of the elements of the set in an
                             undefined order. This will be the identity function
                             for representations of sets as lists.

    (send s 'member? x)      Computes whether a given object x is a member of a
                             set s.

    (send s1 '<= s2)      Computes whether each member of s1 is a
                             member of s2

    (send s1 '= s2)          Computes whether two sets s1 s2 are the same set.

    (send s 'adjoin x)       Makes a new set by adding the element x to the set s.

    (send s1 'intersect s2)  Computes the intersection of the two sets s1, s2.

In particular we need to study the relationship between the representation and how fast we can make these basic operations run - their time complexity.

The 3 representations are

    Unordered lists: A set {1,2,3} may be represented as the list (3 1 2)
    Ordered lists:   A set {1,2,3} will be represented as the list (1 2 3)
    Binary trees:    A set {1,2,3,4,5} may be represented as the tree:

All of these representations require that we be able to compare for equality elements which occur in sets. The ordered list and tree representations require that an ordering relation be defined on the elements. For simplicity, we shall confine ourselves to sets of numbers, where <= is an ordering relation.

The Abstract Class for Sets

An abstract class of objects is one that can have no instances, typically because it isn't fully implemented. It's a useful place to hang methods that are quite generic - they will work for all sets, for example.

We can define an abstract class for sets


(define class_set
       (make_class 'set class_object '() #f)
    )


We can regard the functions ->list, member?, adjoin together with empty_set as being basic operations which we have to define for all representations of sets. [If we had implemented interfaces, we would specify these as methods in the interface for sets.] Using these, we can provide abstract implementations of the methods ->set, <=, equal_set? and intersect. While these abstract implementations will always work, they will not always be the fastest possible implementation for a given representation, since we may be able to exploit the special properties of that representation.

2.1 The abstract implementation of list->set

We can convert a list to a set by repeated application of the adjoin operation, giving us the function:


(insert_static_method
    class_set
    '->set
    (lambda (l)
        (if (null? l)
            (send class_set 'empty)
            (send
                'adjoin
                (send '->set class_set (cdr l) )
                (car l)
                )
            ) ; end if
        )     ; end lambda
    ))        ; end insert

The abstract implementation of intersect

We can conveniently make use of the reduce function that we defined earlier in the course to save us writing some explicit recursions.


(define (reduce f acc base l)
   (if (null? l)
       base
       (acc (f (car l)) (reduce f acc base (cdr l)))))

Using reduce we can write a abstract intersect function. This converts one of the sets to a list, and then uses an accumulator function in which member_set? is used to determine if each member of the list is a member of the other set. If it is, it is combined into the result, and if not it isn't. The base is simply the empty_set.


(insert_instance_method class_set
    'intersect
    (lambda (this s2)
        (reduce
            (lambda (x) x)                   ;f
            (lambda (x s)                    ;acc
                (if (send s2 'member? x )
                    (send s  'adjoin x)
                    s)
                )
            (send s1 'empty)                 ;base
            (send s1 '->list)                ;list
            )
        )
    )
2.3 The abstract implementation of included_in? Likewise, we can define included_in? with reduce. Here the base is #t and the accumulator function is the "and" operation, and the mapping function is member_set?

(insert_instance_method
    class_set
    '<=
    (lambda (this s)
        (reduce
            (lambda (x) (send s member? x))    ;f
            andf                               ;acc
            #t                                 ;base
            (send '->list this))               ;list
        )
    )
We need to define andf as a proper function, since and is a special form which can't be passed as an argument.

(define (andf b1 b2)
    (and b1 b2)
    )

2.4 The abstract implementation of equal_set?

We can define equal_set? in terms of included_in?:

(insert_instance_method
    class_set
    '=
    (lambda (this s)
        (and (send s1 '<= s2) (send  s2 '<= s1))
        )
    )

3 Sets as unordered lists

A set can be represented as a list with no duplicates. The fact that the list contains no duplicates can be regarded as an invariant for this representation.

3.1 Implementing the empty set as an unordered list

3.2 Implementing set->list representing sets as unordered lists

(define class_set_uol
    (make_class 'set_uol 'class_set '() '(->list)))

The empty set is simply implemented as the empty list.


(insert_static_method
    class_set_uol
    'empty
    (lambda () (send class_set_uol 'new '()))
    )
(example
    '(send  (send class_set_uol 'empty) '->list)
    '())

3.3 Implementing member_set? representing sets as unordered lists

To implement set membership, we can use the built-in member function, but ensure that an actual boolean value is returned.


(insert_instance_method
    class_set_uol
    'member?
    (lambda (this x)
        (if (member x (send this '->list)) #t #f)
        )
    )

This takes O(n) time, since member takes O(n) time to go through the list and compare each element for equality with x.

3.4 Implementing adjoin representing sets as unordered lists

For (adjoin x s) we need to test membership and only cons on x to the list representing s if it is not already there. This preserves the "no duplicates" invariant.


(insert_instance_method
    class_set_uol
    'adjoin
    (lambda (this x)
        (if (send this 'member? x)
            this
            (send set_uol 'new (cons x (send this '->list)))
            )
        )
    )
(send class_set_uol '->set '(1 2 3))

This takes O(n) time, since member_set? takes O(n) time.

3.5 We inherit the abstract functions for intersect, equal_set?

We can use the abstract definitions of intersect and equal_set?. These both take O(n^2) time.

4 Sets as ordered lists

If we add the additional requirement (invariant) that our sets be represented as lists with the elements placed in order, we find that intersection can be done more efficiently.

4.1 Implementing the empty set as an ordered list

4.2 Implementing set->list representing sets as ordered lists

As before, the empty_set is represented by the empty list.


(define class_set_ol
    (make_class 'set_ol 'class_set '() '(->list)))
4.3 Implementing the 'member? method for the ordered list representation.

We can make the member? method rather more efficient. Assuming a uniform distribution of values of x we can halve the expected time for an evaluation of (send s member? x) in the cases in which x actually is a member of s by using the fact that if the first member of s larger than x we cannot possibly find x in s (see (1) below). However member_set? still remains O(n).


(insert_instance_method
    class_set_ol 'member?
    (lambda (this x)
        (let ((s (send this '->list)))
            (cond
                ((null? s) #f)
                ((= x (car s)) #t)
                ((> (car s) x) #f)                  ; (1)
                (else (send
                        (send 'class_set_ol 'new (cdr s))
                        'member
                        x)
                    )
                )
            )
        )
    )

4.4 Implementing adjoin representing sets as ordered lists

In this representation, adjoin still takes O(n) time, since we have in the worst case to examine the entire list. For example:

(example
    '(send
      (send class_set_ol '->set '(1 2 3 4))
      'adjoin
     )
    (send class_set_ol 'new '(1 2 3 4 5))
)

But we can achieve a small improvement if we recognise that if the first member of the list representing the set is greater than the element we are adjoining, then we don't have to look any further. Let's for a change, write our adjoin method as a Scheme function which we then make into a method.


(define (adjoin_ol x s)
    (cond
        ( (null? s) (list x))
        ( (> x (car s)) (cons (car s) (adjoin_ol x (cdr s))))
        ((= x (car s)) s)
        (else (cons (car s) (cdr s)))
        )
    )

(insert_instance_method
    class_set_ol
    'adjoin
    (lambda (this x)
        (send class_set_ol 'new (adjoin_ol x (send this '->list)))
        )
    )


Exercise: Use letrec for the above definition of the 'adjoin method.

4.5 We have a big win implementing intersect on ordered lists

However we can improve our implementation of intersect significantly by exploiting the fact that the two sets are represented as ordered lists. To do this we employ a kind of algorithm known as merging.

The function below, based on merging, takes O(n) where n is the maximum of the size (cardinality) of the two sets. The idea is that we go through the ordered lists in "lock step" successively comparing the first elements and deciding on the basis of the comparison whether to incorporate them in the result, always taking the cdr of the list with the smaller first element.

     '(2 3 4 6 7)
     '(1 3 5 6)         First element not in the intersection, take cdr

     '(2 3 4 6 7)       First element not in the intersection, take cdr
     '(3 5 6)

     '(3 4 6 7)         First elements are in the intersection, take cdr
     '(3 5 6)           of both, incorparate car's in the result.

     '(4 6 7)           First element not in the intersection, take cdr
     '(5 6)

     '(6 7)
     '(5 6)             First element not in the intersection, take cdr


     '(6 7)             First elements are in the intersection, take cdr
     '(6)               of both, incorporate car's in the intersection.

     '(7)
     '()               No elements in the intersection.

(define (intersect_ol s1 s2)
    (if (or (null? s1) (null? s2)) '()
        (let (
             (x1 (car s1))
             (x2 (car s2))
             ); end let binding
            (cond
                ((= x1 x2) (cons x1 (intersect_ol (cdr s1) (cdr s2))))
                ((< x1 x2) (intersect_ol (cdr s1) s2))
                (else (intersect_ol s1 (cdr s2)))
                ) ;end cond
            ) ; end let
        ) ;end if
    ) ;end define
(insert_instance_method class_set_ol 'intersect (lambda (this s) (send class_set_ol 'new (intersect_ol (send this '->list) (send s '->list) ) ; end intersect ) ; end send ) ; end lambda ) ; end insert

We can test for the equality of sets under the ordered list representation very simply - if they are equal as sets they must be equal as lists.


(insert_instance_method
    class_set_ol
    '=
    (lambda (this s)
        (equal?
            (send this '->list)
            (send s    '->list)
            )
        )
    );

Note that intersect_ol is an example of a general kind of operation, the merge in which two ordered sequences are compared in lock-step to produce a result derived from both of them. This is a very important kind of algorithm in cases in which you have large sets of data and only have sequential access to them. It past years, the only way that large data-sets could be stored was on magnetic tape, and all commercial data-processing depended on the use of merging operations. For example a bank would have records of the balance of customer accounts on one (or more than one!) tape, kept in order of account-number. The transactions for the day would be put on another tape, also in order of account-number. Then the two tapes would be merged, thereby updating the balances to allow for the transactions. Even the process of preparing the sorted-tape for merging would take place using a merge-based sorting operation.

5 Sets as Trees - an Introduction

If we represent a set as a balanced tree we can achieve a significant speed up in evaluating the member_set? and adjoin functions. The idea of a balanced tree is illustrated below - essentially the idea is that we want to equalise the number of entries to the left and right of each node as far as practicable.

If a tree is balanced, we can get to any given node in a rather small number of steps, in fact in a number of steps logarithmic in the cardinality (size) of the set represented in the tree. The details of how we can achieve this are discussed in Lecture 15.