Having identified salient image features, an object is recognized by matching these with a stored model in which a number of required features are listed together with necessary properties and relationships between the features. Models will be expressed using a semantic net, where the nodes are the required salient features and the arcs are the relationships between features as well as the properties required of individual features (represented by an arc from a node to itself). Figure 8.11 shows a model for a London Underground sign, in which the required features are regions labelled hring1, hring2, hdisc1, hdisc2, corresponding to the upper and lower half-rings and the upper and lower half-discs of the sign.
[IMAGE ]
Figure 8.11: A model for an Underground sign.
We test for blob-like regions and thin regions by looking at the ratio of region boundary length to area enclosed (assuming the boundaries are reasonably smooth) and define two regions as `nearto' one another when the closest distance between them is less than some threshold (which may depend on the sizes of the regions).
One way of matching the model would be to find all combinations of assignments of regions to nodes, and then, for each combination, to check that all properties and relationships are satisfied. This would be very time-consuming since there will in general be a large number of possible combinations of assignments. However, there is a much better way of finding consistent matches, using a depth-first search strategy (chapter 4) in which large parts of the space of possible matches are eliminated by testing partial matches for consistency.
The depth-first matching strategy works as follows. One of the model nodes is selected and all regions are assigned to this node in turn. If a region does not satisfy all necessary properties associated with the node, the region is skipped and we move on to the next one. A second node is now selected, and for each successful region assignment to the first node, all regions are now assigned in turn to this second node. Again, region assignments to the second node are skipped if they do not satisfy all necessary properties associated with this node. Now, for each remaining assignment to the second node and the current assignment to the first node, any necessary relationships between these nodes are checked, and if not satisfied, the region assignment to the second node is skipped and the next one selected. For each assignment to the first and second nodes which satisfies all properties and relationships between the nodes, all possible regions are assigned to a third node. Again the required properties and relationships between this assignment and the first two are checked, and so the process continues until all nodes have been assigned a region and all properties and relationships are satisfied.
The depth-first search can be represented by a tree in which each level corresponds to a model node and each of the branches corresponds to a region assignment to that node which satisfies all properties and relationships with assignments of regions to nodes on the path to the root of the tree (figure 8.12).
[IMAGE ]
Figure 8.12: The search tree for matching model of Underground sign.
Returning to our example, we have already observed in figure 8.10 that the segmentation process extracts a number of regions which correspond well with meaningful parts of the Underground sign. In particular, the top and bottom segments of the ring, and the top and bottom half-discs within the sign, have been identified as single regions: call these a, b, c, and d, respectively. The search tree in figure 8.12 shows the successful match in which these regions are assigned to nodes hring1, hring2, hdisc1, and hdisc2.
As with method 1, a satisfactory match locates the Underground sign in the image, but for much less effort. However, there are problems. First, all properties and relationships must be satisfied; there is no room for error. A more sophisticated matching procedure is needed to get around this.
A second important problem with the kind of model described is that it does not take account of variations in the appearance of an object from different viewing angles. For example, when an Underground sign is looked at obliquely, the semi-circular regions within the sign become more like thin regions. Consequently, in practice, it is necessary to have a separate model corresponding to each substantially different viewpoint. Nevertheless, it is worth noting that there are relationships, like `nextto', which are not affected by viewing angle.
A more effective solution to this problem is to use stored models of the 3-dimensional structure of objects from which the characteristic arrangement of salient features from a particular viewing angle can be predicted. An object is recognized by looking for a collection of salient features which is consistent with the appropriate model for some viewing angle (see, for example, Brooks, 1981).