Course 287: Lecture 6 Procedural and Data Abstraction;

The need for abstraction
Using record-class to create opaque records.
Starting to write an object-oriented capability for Scheme.

Abelson and Sussman

Chapter 1 of Abelson and Sussman is entitled "Building Abstractions with Procedures". Chapter 2 of Abelson and Sussman is entitled "Building Abstractions with Data". We shall discuss many of the ideas covered in these chapters during the course, but not in exactly the same order. You will find a discussion of Data Abstraction in section 2.1 of Ableson and Sussmann (p83 ff.) See also section 1.1.8 Procedures as Black Box Abstractions (p26 ff.).

The need for abstraction

In abstraction we draw out (from the Latin verb traho I pull (c.f. tractor), and the preposition ab meaning "from" or "away from") an essential aspect of an idea, allowing it to be applied to more than the particular set of circumstances in which we first encountered it. We have already seen this at work when we considered the sum function and abstracted it to obtain the reduce function.

An important principle in the design of computational systems is to provide a measure of isolation of the implementation of a capability from its users. Thus a user is required to employ some kind of standard interface in accessing a capability. In doing this we are abstracting the essence of the capability from the point of view of its users.

For example, in computer hardware, a standard bus such as the VME bus can be used to connect modules. In operating systems, access to backing store is mediated via system-calls.

This isolation offers two primary advantages:

The implementation can be improved or changed without affecting how it is used. Provided the user adheres to the standard interface, (s)he need not alter how (s)he uses the capability.

Thus for example, in hardware, a larger memory module can be plugged into the standard bus and can be immediately usable. In operating systems, a file system local to a particular machine can be replaced by a distributed file system with minimal disruption to users.

Safety features can be built into the implementation. Generally it is true that not all states of a resource are legal. For example, in an operating system, each block on the disc should either belong to one named file, or should be known to be free. Ensuring that this remains true can remain the responsibility of the operating system (OS) provided that the user only accesses the disc via the abstraction that the OS provides, namely the file.

Sometimes a mechanism is provided to police the safety features. For example in the Unix operating system, it is impossible for a user program to issue an input-output instruction to access the disc directly. Any such instruction will be trapped by the machine hardware and referred to the kernel of the OS. On the other hand, in the DOS operating system, there is no such protection, so that correct usage of the disc is dependent on programmer discipline.

Likewise, in the Scheme language, any access to the machine's store is mediated by the car,cdr and cons functions. This prevents certain kinds of illegality from occurring. For example it is impossible in a Scheme system for a piece of store to be regarded as free when in fact it forms part of a user's data-structure.

By contrast, in the C language, the user has free access to her entire virtual machine, so that it is possible for a piece of store to be used in two contradictory ways by a single program.

However there is often a performance penalty associated with using a standard interface. During the evolution of computer hardware, many bus-standards have become obsolete as technology has advanced. For example, memory is now supplied as SIMM's which plug directly into a processor board. Likewise the writers of computer games are notorious for employing direct access to graphics hardware, rather than employing the standard interface, in order to achieve the necessary speed.

Likewise, the use of the car,cdr and cons functions in Scheme may carry a performance penalty compared with the more direct access offered by C. Not all storage configurations that can be created by C can be created by Scheme. Moreover the storage integrity demanded by Scheme can require that these primitive functions perform a check that the car and cdr functions are being applied to lists. The issue of efficiency is a complex one, and does not always imply that languages like Scheme are more inefficient than C, especially for large programs.

Levels of Abstraction

Any engineered system of any complexity exhibits abstraction layered into levels. In computer hardware, levels of abstraction are imposed by technology - there is at least a chip-level, a board-level and a system-level. Within the chip-level, there are further levels of abstraction - at least a device level and a register-level. With a processor chip there will be larger functional units, e.g. an ALU or a cache.

For each level of abstraction there are two separate problems to be addressed

Implementing that level of abstraction.
Using that level of abstraction.

These are very different activities. Using ought to be very much easier than implementing if the abstraction is worthwhile.

Building Levels of Abstraction within Scheme.

While car,cdr and cons provide a base abstract view of the machine store, they operate at a low conceptual level, not related to what most programs are about. We can use them, and other facilities of Scheme, as building blocks to provide abstractions appropriate to the requirements of a given program by

Defining functions which implement the concepts which are needed for the program. For example, if we are interested in geometric computations, we will need functions to implement points, lines, planes, circles, spheres...
Writing the rest of the program using only this basic set of functions.

Within a purely functional approach, implementation requires us to define constructor functions to build representations of the objects and selector functions to access such representations. At the base-level, cons is a constructor function car, cdr are the selectors.

Example 1 A point

Implementation of points


(define (mk_point x y)   ; the basic constructor function
    (list x y))

(define x_point car)     ; two selector functions.
(define y_point cadr)

Use of points

From now on we use only mk_point, x_point and y_point to construct and access points. For example:


(define (mid_point p1 p2)
    (mk_point
        (/(+ (x_point p1) (x_point p2)) 2)
        (/(+ (y_point p1) (y_point p2)) 2)
        )
    )

(example '(mid_point (mk_point 1 2) (mk_point 5 8)) (mk_point 3 5))


(define (diff_point p1 p2)
    (mk_point
        (- (x_point p1) (x_point p2))
        (- (y_point p1) (y_point p2))
        )
    )

So any code we write to manipulate points is quite independent of the implementation of points.

New implementation of points

If we make the representation of points more mnemonic:


(define (mk_point x y)
    (list 'point x y))

(define x_point cadr)
(define y_point caddr)

our mid_point function will still work!


(example '(mid_point (mk_point 1 2) (mk_point 5 8)) (mk_point 3 5))

example: (mid_point (mk_point 1 2) (mk_point 5 8)) = (point 3 5),  ok!
                                                      |
Note however that the print out is different here -----

Layers of Abstraction

We can build one abstraction on top of another. For example, we can use points to define lines:

Example 2 A line

Implemention of lines


(define (mk_line x0 y0 x1 y1)
    (let (
         (p_0 (mk_point x0 y0))
         (p_1 (mk_point x1 y1))
         )
        (list p_0 p_1))
    )


(define point_0_line car)
(define point_1_line cadr)

Use of lines


(define (length_line l)
    (let* (
         (p0 (point_0_line l))
         (p1 (point_1_line l))
         (p  (diff_point p0 p1))
         (x  (x_point p))
         (y  (y_point p)))
        (sqrt (+ (* x x) (* y y)))
        )
    )

(define l1 (mk_line 0 1 6 7))

(define l2 (mk_line 0 4 7 8))

(length_line l1)

Using `record-class` to create opaque records.

UMASS Scheme provides opaque records as an option. The function-call

    (record-class class_info spec)

will return a list-structure containing record access functions. Here class_info is a symbol or other structure that is common to all members of the class. The parameter spec is a list of field specifiers. A field specifier says what kind of data can be held in a field. The only kind of specifier we will use is the symbol 'object, which creates a field able to hold any Scheme object. Historical Note: The record-class capability is non-standard for Scheme. It is derived from the recordfns capability of POP-2 [Burstall and Popplestone 1968], as modified for POP-11 [Barrett, Ramsay & Sloman 1985]. )

The record-class function returns a list of four items:

The first is a constructor function for the record-class. This takes as many arguments as there are field-specifier in the spec and builds a record containing them.
The second is a destructor function for the record-class. This takes a record created by the constructor and makes a list of the objects out of which the record was constructed.
The third is a list of selector functions for the record-class, one for each field-specifier in the spec. Each selector function extracts the contents of the appropriate field.
The fourth is a function for recognising members of the record-class.

For example we might do:


(define class_point (record-class 'point '(object object)))

(define mk_point  (car class_point))
(define sel_point (caddr class_point))
(define x_point   (car sel_point))
(define y_point   (cadr sel_point))

You may also use


(define dest_point  (cadr class_point))
(define point?      (cadddr class_point))

Starting to write an object-oriented capability for Scheme.

The record-class capability allows to create opaque data-structures which can only be accessed by the appropriate selector functions. However the selector functions as we have used them just live in the global name-space. This is a problem if we try to build a big system out of software components written by disparate authors since we can't be sure that some people won't use the same names for different functions. This, of course, is a problem for the C language, which also has a big global name-space.

Another problem is that we may want to have a class of objects that in some way extends another class. For example, if we were modelling a university, then we would want to have a basic person class that was extended to a student class. That is a student has all the attributes of a person (name, age, sex let's say), plus some others, for example a list of courses that he or she is taking.

And finally we may want to say that a particular class implements some kind of abstractly-defined capability. For example, we might have a notion of what software to manipulate sets ought to provide - membership, union and intersection operations, say.

The record-class facility of UMASS Scheme provides a basic tool-kit for addressing the above issues; however "spilling out" the package of capabilities provided by record-class into global name-space is not a good basis for maintaining encapsulation. One paradigm that supports encapsulation is the usual object-oriented paradigm, in which the capabilities associated with a class of objects remain encapsulated in a class-structure which is accessible primarily via objects of the class. One Scheme view of this might be to implement object-orientation in terms of a call to a function "send" which passes a message to an object. So, instead of writing (x_point p), we would instead write:

    (send p 'x)

Implementing this kind of capability is something of a doddle using record-class, since we can use the first argument of the call to provide class-common information. However actually doing this will have to wait on us knowing about the imperative paradigm in Scheme.

References

Burstall, R.M. and Popplestone, R.J., [1968] The POP-2 Reference Manual, Machine Intelligence 2, pp. 205-46, eds Dale,E. and Michie,D. Oliver and Boyd, Edinburgh, Scotland.

Barrett,R., Ramsay,A. and Sloman A., [1985] POP-11 A Practical Language for Artificial Intelligence, Ellis Horwood, Chichester, England and John Wiley N.Y.,USA.

EXTRACT FROM POPLOG ON-LINE MANUAL - ref keys

The material below is not required reading. It explains more fully how to use field specifiers.

6 Field Specifiers for Poplog Data

This section lists the permissible field type specifiers for Poplog data, i.e. that can appear in the SPEC or SPEC_LIST argument to conskey (SPEC_LIST is a list of type specifiers for a record class, and SPEC is a single one for a vector class).

Full Poplog Item

This is the quickest field type to access or update, since it requires no conversion to and from Poplog representation, and no check on values assigned into it, etc (see 'Representation of Data in Poplog' in REF*DATA).

Type	Meaning
"full"	Holds a single Poplog item, and occupies one 'natural' machine word (32 bits in all current implementations - except the DEC ALPHA).

'Packed' Integers

These fields can contain integers only, either signed or unsigned, represented in a fixed number of binary bits. When accessed, such a field produces a Poplog simple integer or biginteger (the latter only for a field having more bits than a simple integer - 29 bits unsigned or 30 bits signed in current implementations - and the value overflows this range). Similarily, a simple integer or biginteger within the range allowed can be assigned into the field. The named types correspond to standard sizes on the host machine (and are always aligned on appropriate boundaries to be efficiently accessible), whereas fields specified as a specific number of bits (i.e. N or -N) are 'bitfields' and are generally slower to access/update (and in a structure, simply occupy the next N bits).

Type	Meaning
"int"	Signed integer of the 'natural' machine wordsize (32 bits in all current implementations, range -231 <= I < 231 ).
"uint"	Unsigned integer of the 'natural' machine wordsize (32 bits in all current implementations, range 0 <= I < 2**32 ).
"long"	Signed 'long' integer (same as "int" in all current implementations).
"ulong"	Unsigned 'long' integer (same as "uint" in all current implementations).
"short"	Signed 'short' integer (16 bits in all current implementations, range -215 <= I < 215 ).
"ushort"	Unsigned 'short' integer (16 bits in all current implementations, range 0 <= I < 2**16 ).
"sbyte"	Signed byte (8 bits in all current implementations, range -27 <= I < 27 ).
"byte"	Unsigned byte (8 bits in all current implementations, range 0 <= I < 2**8 ).
-N	Signed bitfield of N bits, where 1 <= N <= 32. (Range of-2(N-1) < = I < 2(N-1)) ).
N	Unsigned bitfield of N bits, where 1 <= N <= 32. (Range 0 <= I < 2**N ).
"pint"	Same as "int", but declares the field as holding only values within the range of a Poplog simple integer (pop_min_int <= I <= pop_max_int). When this is known for an "int" field, using "pint" instead gives faster access/update.

(N.B. All vector classes constructed on the types "byte" and "sbyte" are special insofar as they are guaranteed to be null-terminated, i.e. to have a 0 byte following the last actual byte of the vector. This costs on average an extra byte per vector, but allows data such as standard strings to be passed to external C functions without modification.)

Floating Point

Any non-complex number, including integers and ratios, can be assigned into these fields, conversion and/or rounding being done where necessary (but a mishap will occur if the input value is outside the range of the field). Accessing an "sfloat" or "float" field always produces a Poplog decimal; accessing a "dfloat" field produces a decimal if popdprecision is false and a decimal can contain the value, or a ddecimal otherwise (see REF*popdprecision).

Type	Meaning
"dfloat"	A double-length floating-point number in machine format, occupying two 'natural' machine words (64 bits in all current implementations).
"sfloat"	A single-length floating-point number in machine format, occupying one 'natural' machine word (32 bits in all current implementations).
"float"	Identical to "sfloat", EXCEPT when specified as an external function result - see 'Additional Field Specifiers for External Data' below.

(N.B. For upward compatibility with earlier versions of Poplog, the words "ddecimal" and "decimal" are also allowed, and are synonymous with "dfloat" and "float" respectively. Note that "decimal" equals "float", NOT "sfloat".)