TEACH WALTZ Aaron Sloman Jan 1981 Updated October 1994 AN INTRODUCTION TO WALTZ FILTERING AND CONSTRAINT PROPAGATION CONTENTS - (Use g to access required sections) -- Introduction -- Some background reading -- Waltz filtering -- -- The problem -- -- Waltz' contribution -- An overview of the algorithm -- -- What Waltz found -- Generalized constraint propagation -- Two-way consistency -- A programming task -- -- LIB SEEPICTURE -- A simple example picture -- -- Paper exercise -- Using the library programs -- -- The notation for individual junctions -- -- The notation for interpretation rules -- -- The (incomplete) set of interpretation rules -- -- Making your own copy of the program -- -- Varying trace output when running the program -- Filtering Interpretations -- The library subroutines -- -- filter -- -- prune -- -- somebadlabel -- -- nointerp -- -- labelfitsinterp -- Possible exercises -- -- Exercise 1 -- -- Exercise 2 (Harder) -- -- Exercise 3 (possible mini-project) -- -- Others (less programming) -- See also: -- Introduction ------------------------------------------------------- This file describes a possible mini-project concerned with the process of filtering out impossible interpretations of junctions in an image, using the sort of technique described by David Waltz in his PhD thesis at MIT over 20 years ago. A Pop-11 library is introduced, some examples given, and a mini-project of variable extent described. -- Some background reading -------------------------------------------- In order to understand this document and tackle the exercise, you will need to be familiar with the line-labelling approach to interpreting line drawings. For introductory reading see: TEACH * LABELLING Poplog teach file D. Waltz, PhD Thesis, reprinted in P.H. Winston, The Psychology of Computer Vision McGraw Hill, 1975 M. Boden AI and Natural Man, (look up Clowes, Huffman and Waltz, in index). You'll get only a very "qualitative" explanation. A. Bundy et.al. AI, an Introductory Course, read chapter by Weir on Vision, especially pp. 147 - 153 (May be hard to get hold of now) P H Winston, Artificial Intelligence, First, second, or third edition. E.Kant and K Knight, Artificial Intelligence Second edition, section 14.3 Other text books on AI and on Vision are likely to have sections on line-labelling (sometimes indexed under Huffman or Clowes, who independently invented the idea), and on Waltz filtering and constraint manipulation, which is a very general and powerful idea. The original line-labelling approach was first described independently in papers by Huffman (in Machine Intelligence 6) and Clowes (in Artificial Intelligence Journal 1971). -- Waltz filtering ---------------------------------------------------- -- -- The problem The key ideas of line labelling are described in TEACH LABELLING. A picture made of straight lines depicting 3-D polyhedra can be interpreted by attaching labels to the lines saying what sorts of scene edges they depict. The problem is that you cannot work out from the local context of a line in a simple black and white line drawing, what sort of edge it represents: e.g. concave, convex, occluding, abutting (a crack), a shadow edge, etc. Yet the human visual system usually immediately assigns a unique interpretation (except in the case of some nasty pictures.) How does our brain do this? Locally you can tell from knowledge of possible scene structures that a particular line may have only three possible interpretations, another line only 2, another line only 4, etc. Also because of the ways the lines meet at junctions in the picture (representing corners in the scene) there are CONSTRAINTS, such as "this line cannot be concave if that one is convex" etc. The earliest work, by Huffman and by Clowes did a depth first search for an assignment of interpretations to the lines that was globally consistent, i.e. did not violate any of the constraints. This could be horrendously slow, and in fact the time required typically grew exponentially with the size of the picture (numbers of lines and junctions). -- -- Waltz' contribution The program by Waltz, mentioned in the above books, and described in far more detail in his thesis, reprinted in P H Winston 1975, both extended the sets of possible line labels for polyhedral scenes, by introducing cracks and shadows, and also addressed the problem of reducing the combinatorial explosion encountered when searching for sets of consistent labellings of a complex picture. He did the latter by showing that by analysing the constraints and the possible alternatives we could "prune" some of the alternatives as impossible. Eliminating some alternatives then meant that the constraints ruled out other alternatives which no longer had consistent neighbours in the network of lines and junctions. Repeated pruning sometimes led to unique solutions. Waltz and others were very surprised. -- An overview of the algorithm --------------------------------------- The essential idea is this. Assume you are given: = a set of data D, (e.g. junctions or lines in a picture) = a set of rules R, specifying for each element of D, given its context, one or more possible interpretations for d. I.e. if d is an element of D then R generates a set Int(d) containing possible interpretations of d. (If d is a line then Int(d) is a set of possible labels for d, e.g. [convex concave left-occluding, ...] = a set of consistency rules C specifying conditions under which interpretations of neighbouring items of D are consistent. I.e. if I1 is one interpretation of a data item d1, and and I2 is one possible interpretation of a neighbouring item d2, then rules C can be checked to see if I1 and I2 are consistent with each other or not. THEN, for each interpretation I of an item d in D (that is for each I in Int(d)), look at each of the neighbours n of d, and ask: "does n have at least one possible interpretation (according to R), which is consistent with I, i.e. does not violate any rule in C?" If the answer is NO, then remove I from the possible interpretations of d: for it can never be part of a globally consistent interpretation. Keep on doing this until nothing more can be removed. -- -- What Waltz found Waltz demonstrated the surprising result that even in quite complex pictures, with his extended labelling set, this filtering process could sometimes reduce the set of possible interpretations for each element of D (i.e. for each junction) down to one. In that case, the whole picture has only one interpretation consistent with C. In some cases the remaining possible interpretations are not unique, but since the sets are reduced in size, the search space of possible combinations of interpretations is much reduced. -- Generalized constraint propagation --------------------------------- In principle, the consistency rules C may refer to constraints on two, three, or more elements of D at a time. But Waltz considered rules dealing with only two at a time, and they were all reduced to the single consistency rule that if two junctions are joined by a line, then the interpretations of those junctions are consistent only if they assign the same label to that line. Later work by Mackworth and others extended the investigation to three-way, four-way and N-way consistency rules. Work by Hinton (in his PhD thesis at Edinburgh University 1977) and others extended the ideas to systems where violations of consistency could be tolerated, but some violations were worse than others, and the task of the system was to minimise the total "cost" of violation according to some agreed cost functions. This also leads into some ideas found in neural networks, and also a technique known as "relaxation" which is widely used for gradually homing in on a solution to complex problem by getting a succession of more and more accurate estimates. (As far as I know relaxation was first used by Ivan Sutherland in the 60s, e.g in trying to find stresses in the girders in a bridge.) -- Two-way consistency ------------------------------------------------ We shall restrict ourselves to consistency rules that mention only two nodes at a time, as Waltz did. We also restrict ourselves to the following sorts of labels for lines in an image: [cnvx point1 point2] - edge from point1 to point2 is convex [cncv point1 point2] - edge from point1 to point2 is concave [occ point1 point2] - the edge is occluding, and attched to the surface which is on the right as you go from point1 to point2. The surface on the other side is then in the background and goes behind the visible attached surface. -- A programming task ------------------------------------------------- There are two main parts to a programming exercise exploring these ideas. FIRST, you need a program which, given a picture, or at least a database D of information about lines and junctions, and a set of rules R will produce a new database associating possible interpretations with each element of D. This is provided, in an incomplete form, in LIB LABELS. The set of rules R, given there is incomplete. Your task is to complete it. NB make sure that if your database includes a type of junction, then that type of junction is catered for in your set of rules. Otherwise you'll get errors. SECOND, you need a program which will filter the sets of interpretations, removing those for which there are no consistent neighbouring interpretations. This is provided in LIB WALTZ. A possible task is to complete an incomplete version of this, or re-implement it all. There is a simulated SEEPICTURE database available in LIB TETRA; which you can use to test the library programs and your own programs. You can see it by looking at SHOWLIB * TETRA -- -- LIB SEEPICTURE LIB SEEPICTURE is a Pop-11 library program that is capable of finding lines and junctions in a simple binary image. LIB TETRA shows the sort of output it can produce. For now you don't need to know anything about LIB SEEPICTURE, but if you want to be able to draw some images and get the database of lines and junctions generated automatically, then see TEACH VTURTLE, for drawing pictures in a VED buffer, and TEACH SEEPICTURE for a library that will analyse such pictures, provided that all the lines are vertical, horizontal or diagonal (at 45 degrees). SEEPCTURE is divided into findlines, findjuncs, and paintpicture, the last of which provides visual feedback. -- A simple example picture ------------------------------------------- Here is a simple picture of a 3-D object, included in LIB TETRA, which represents the picture by a list of junctions found in the picture (not edges in the scene). The list simulates the result of running the seepicture program on a picture of a tetrahedron with a corner of another object visible behind it: 2 /|\ / | \__6 / | 5\ | / | \|7 1/ | \ \ | / 3 \ | / \ | / \ | / \|/4 Using the sets of possible junction interpretations described in the text books (and below) see how many interpretations each junction has without taking context into account. Note: there are three ell junctions, two tee junctions and two arrow junctions here. Make sure you can identify each type. These are 2-D characterisations of lines in the picture. The question is what sorts of 3-D configurations of surfaces, edges and corners might these represent. Assume that the whole thing is seen against an infinite flat background surface, to which things may be stuck, or not. Not all the 3-D surfaces, edges, and corners will actually be VISIBLE: e.g. the far side of a tetrahedron is not visible. Some edges and surfaces may be PARTIALLY visible if they are "occluded" (or "occulted" as some prefer to say). -- -- Paper exercise Try to work out what sort of 3-D configuration of intersecting surfaces each junction could possibly represent, if the scene is made only of polyhedra with no more than three surfaces meeting at each point (allowing the corners of a cube, but not a six-sided pyramid). Then see how many consistent interpretations there are for the above picture. See how the interpretations increase if you remove the junctions labelled 5, 6, 7, and the lines (5-6) and (6-7). Work all this out with pencil and paper. -- Using the library programs ----------------------------------------- You will probably understand the issues better if you play with the library programs involved in this exercise, as follows. The result of doing the following will be hard to understand unless you have first read later sections describing the notations used. lib tetra; ;;; get the list of junctions in the picture junctions ==> ;;; the notation for junctions is explained below ENTER showlib tetra To look at the picture from which they were derived, reproduced above Note that since none of the programs uses the actual locations of the junctions, they are here called J1, J2, etc. instead of giving coordinates. I.e. the constraints are all topological (to do with connectivity and ordering) not geometric (to do with actual lengths and angles, except for the difference between obtuse and acute angles.) Now get the interpretation procedures and rules thus: lib labels; rules ==> ;;; print out interpretation rules (explained below) interpall(junctions) -> interps; ;;; get interpretations interps ==> ;;; print them out INTERPALL uses the list RULES globally. The rules have the following format -- -- The notation for individual junctions Each junction type is represented by a type label and a set of numbers indicating the points in the order in which they would be produced by seepicture. (See the SEEPICTURE demo for details of the conventions), e.g. [tee 1 2 3 4] with lines 1-2 and 1-4 forming the crossbar, and 1-3 the stem, as in 1--------2-------4 | | 3 Similarly, an arrow junction might be [arw 1 2 3 4] /2 / 1-----------3 \ \ \4 Note that the first number is always the one with the label given. The others represent neighbouring junctions in a fixed clockwise order round the main junction. Here a possible interpretation for the line [1 2] could be concave or occluding on the left as you go from 1 to 2. -- -- The notation for interpretation rules Associated with each junction type is a list of possible interpretations, where a possible interpretation assigns a label to each line at the junction. Each line is represented by its end points. So a labelled line might be [occ 4 1] meaning the line from 4 to 1 depicts an occluding edge belonging to the surface on the right as you go from point 4 to point 1. -- -- The (incomplete) set of interpretation rules Here is the set of rules provided by LIB LABELS. Try to draw pictures for each case to convince yourself that you agree with the rules and see if there are other cases left out: ;;; Rules for possible interpretations of ell junctions [[[ell 1 2 3] ;;; first possible interpretation, type ell1 [ ell1 [occ 2 1] [occ 1 3]] ;;; second possible interpretation, type ell2 [ ell2 [occ 3 1] [occ 1 2]] ;;; rules for further possible ell labellings to be completed by ;;; you, either by thinking about polyhedra, or reading about ;;; line labelling ] ;;; Rules for possible interpretations of tee junctions [[tee 1 2 3 4] ;;; first possible interpretation, type tee1 [ tee1 [occ 1 2] [occ 4 1] [cnvx 1 3]] ;;; second possible interpretation type tee2, etc. [ tee2 [occ 1 2] [occ 4 1] [cncv 1 3]] [ tee3 [occ 1 2] [occ 4 1] [occ 1 3]] [ tee4 [occ 1 2] [occ 4 1] [occ 3 1]] ] ;;; Rules for possible interpretations of arrow junctions [[arw 1 2 3 4] [arw1 [occ 2 1] [cnvx 1 3] [occ 1 4]] ;;; other possible arrow labellings to be completed by you ] ;;; Rules for possible interpretations of fork junctions [[frk 1 2 3 4] ] ;;; fork labellings to be provided by you ] -> rules; If you wished to include ends in your pictures you'd have to add a rule [[end 1 2] ........] with the possible interpretations for the ends. The list of interpretations could be empty. (See below.) Note that each interpretation (set of labels) is given a name for easy reference, e.g. tee1, ell2, etc., though the programs don't actually use the names. -- -- Making your own copy of the program You can make a copy of a library program, in order to edit the rules, as follows: Examine SHOWLIB * LABELS, Then, name labels.p You will then have a version of the library file, called 'labels.p'. -- -- Varying trace output when running the program Try all the above commands, including this: interpall(junctions) -> interps; ;;; get interpretations If you wish to reduce the verbosity of INTERPALL, do false -> chatty; If you wish to see more of what's going on inside the procedure INTERPALL you can do: trace getinterps buildinterp labelline; interpall(junctions) -> interps; You can try the procedure INTERPALL on a different list of junctions. For instance, use vturtle to make a picture of a cube, or a square with a diagonal across it, using the turtle. Then findlines(); findjuncs(); flush([line ==]); database ==> interpall(database) ==> -- Filtering Interpretations ------------------------------------------ We can now explore how to get rid of interpretations that cannot be part of a globally consistent set, by repeated checking whether any interpretations have no consistent neighbour according to the consistency rules for 2-D pictures of 3-D polyhedral scenes. (They may have other kinds of 3-D interpretations.) To load the filtering programs do lib waltz; Note that this makes CHATTY true again. You can now apply the procedure FILTER to the list INTERPS, produced by INTERPALL, thus filter(interps) ==> As before, making CHATTY FALSE will reduce printout. -- The library subroutines -------------------------------------------- -- -- filter The procedure FILTER repeatedly calls PRUNE with different junctions as input, until PRUNE cannot reduce the interpretation sets any more. -- -- prune PRUNE takes a set of interpretations of the sort produced by procedure BUILDINTERP in LIB LABELS, and gives each of the interpretations to procedure SOMEBADLABEL to see if it can find a labelling in the interpretation which is no good. -- -- somebadlabel SOMEBADLABEL uses the point at the junction to access all the linked junctions in the list of all interpretations, and for each of them in turn sees whether its set of interpretations contains something consistent with the given interpretation. It uses NOINTERP for this. SOMEBADLABEL returns TRUE if at least one of the labels cannot be accepted, otherwise false. If the result is true, then some pruning is possible. -- -- nointerp NOINTERP takes a labelled line, a neighbouring junction, and the set of all possible interpretations. It looks to see if there is no interpretation of the neighbour consistent with the label. If so it returns TRUE, but FALSE if there is at least one consistent neighbouring interpretation. To check whether the label is consistent with a neighbouring interpretation it uses LABELFITSINTERP. -- -- labelfitsinterp LABELFITSINTERP takes a label and an interpretation of a neighbouring junction, and sees if they are consistent, using the procedure CONSISTENT to compare the label with the individual labels in the interpretation. -- Possible exercises ------------------------------------------------- -- -- Exercise 1 1. Least ambitious: produce a report, of about 1000 words showing how LIB LABELS and LIB WALTZ work, with examples of pictures other than the one provided in LIB TETRA. If possible extend the set of rules in LIB LABELS, and include a picture with a FORK junction. Possible extension: Show what happens if, before you give a list of possible interpretations to FILTER, you remove some of the information about one of the junctions (as might happen in a picture with noise). The result can be that some other junctions end up with no possible interpretations. This process whereby a flaw in part of a picture is propagated to ruin the interpretation of the whole picture is described as 'gangrene' by Hinton, in his paper in AISB conference proceedings 1976. (This is why he recommended a less strict form of constraint manipulation, which allowed some constraints to be violated, but still try to find the best interpretation.) You can also demonstrate this by using pictures with junctions for which the list of possible interpretations in the list RULES, is empty. E.g. in a 'noisy' picture where some lines don't meet as they should, but have free ends, you can produce gangrene. (But you must then have an entry for 'end' in the list RULES. Is there any way of modifying the filtering process to deal with this? (Hard question) -- -- Exercise 2 (Harder) 2. There is an incomplete version of LIB WALTZ in the library, called TEACH * WALTZ2 You can make a copy of this in your directory by first examining TEACH * WALTZ2 and then change the file to have the name WALTZ.P, in your directory, by doing: name waltz.p Your task is to complete this version of the program so that it works like LIB WALTZ, and to write a report on the exercise. -- -- Exercise 3 (possible mini-project) 3. Most ambitious: produce your own version of LIB WALTZ (and LIB LABELS?)!! The library version does not use the most efficient control strategy. It repeatedly looks through the WHOLE list of junctions to see if any possibilities can be pruned, until a complete scan yields no pruning. You could try to avoid the repeated examination of junctions by doing a single scan and subsequently only looking at neighbours of 'pruned' junctions. (That is what Waltz did.) -- -- Others (less programming) 4. Read the account of the 'depth-first' lamina-labelling program provided in the LABELLING demo, and play with the program. Then write a description of the relationship between the two approaches (depth-first and Waltz filtering), illustrated with printout from both programs. 5. The technique of filtering out possibilities using 'local' considerations is not restricted to the process of analysing pictures. Discuss its relationship to various search strategies used in problem solving and show with an example how it might be applied to a non-visual problem. (A still more general form of local 'co-operative' computation used for some kinds of problem solving is called relaxation, used by Hinton in his Edinburgh 1977 PhD thesis, described in pages 420-430 of Ballard and Brown (see below). In particular, Hinton claims that use of relaxation is a good way to avoid the 'gangrene' problem sketched in question 1). 6. More difficult exercise: (optional) The filter procedure sketched in TEACH WALTZ2 is very inefficient. It repeatedly cycles through the database looking to see if anything can be filtered out, until there is 'no change'. Instead it could be done in a single pass throught the list of vertices looking to see if something can be filtered, and each time a vertex has interpretations filtered out all the neighbouring vertices could be put on a list of vertices to be looked at. Before moving to the next database entry, process vertices on this list. For any one of them which can be pruned, put ITS neighbours on the list, unless it is there already. (a) Try programming this. (b) Analyse the difference between this method and the one in TEACH WALTZ2. (c) Prove that a single pass through the database is all that is needed. -- See also: Ballard D.H. & Brown C.M. (1982). 'Computer Vision'. Englewood Cliffs, NJ: Prentice-Hall. --- $poplocal/local/teach/waltz --- The University of Birmingham 1994. --------------------------------