It has long been known that the human visual system is sensitive to certain salient spatial features of an image, so that, for example, a white blob on a dark background stands out as a single entity. This process by which the visual system segments images into collections of salient features seems to go on no matter what is being looked at and appears to be fundamental to visual perception. The Gestalt psychologists documented many examples of this phenomenon, and, in recent years, computational mechanisms have been proposed to account for it (e.g., Marr, 1976).
The close relationship between those features salient to people and the images of objects in a scene leads us to believe that such features play a central role in recognizing objects. Could it be that our visual systems detect salient features as a precursor to recognizing the objects in a scene? One piece of evidence for this may be found in the way in which we describe images using language. If you were asked to describe the image of the Underground sign in figure 8.2 as concisely as possible you might say something like ``There is a ring intersected by a bar.''
In general, the images of many everyday objects, from a limited range of viewpoints, can be characterized by listing a number of salient features and their properties (e.g., shape and size), together with topological and spatial relationships between these. We call such a characterization a model. Notice that models are kinds of structural description, of which we are already familiar from earlier chapters.
The existence of a model for a familiar object leads to a powerful method for recognizing that object from an image -- namely, to look for a collection of salient features of the image having the necessary properties and satisfying all of the spatial and topological relationships required by the model.