The advent of the computer has introduced a new way of finding out about human vision systems: by building a machine that can `see' and investigating this as a model of human vision. But what does this mean in practice? If we are to learn anything about human vision, what should the machine vision system be made to do?
[IMAGE ]
Figure 8.1: An image of the forecourt at Victoria Station.
A plausible answer is to select a task performed routinely by people, and then to build a machine to do the same thing. With this in mind, we consider the problem of building a mobile tourist guide (MTG), and, in particular, the task of recognizing and moving towards a visible Underground sign. If we succeed in building a machine to perform this task, then we have a possible model (which may of course be wrong) for the way in which people do the same task.
[IMAGE ]
Figure 8.2: The London Underground sign.
The MTG is equipped with a TV camera which feeds a stream of snapshots of the scene into an on-board computer. In practice, the `intelligence' of mobile robots is often situated off-board, remaining in contact with on-board sensors and actuators using a radio link or wire, but this does not alter the task of the visual sub-system with which we are concerned.
The basic problem is how to turn images from the TV camera into commands to control the MTG's transport mechanism. We have seen already, in earlier parts of this book, that we need an internal representation of the world to answer questions intelligently. It is hard to imagine how the MTG could possibly perform correctly without itself forming some kind of internal representation of the scene before its `eye', although the form of this representation is unclear. We suppose therefore that the role of the visual sub-system of the MTG is to produce an internal representation of the scene which is updated as changes occur in the world (e.g., the MTG moves to another position) and further suppose that a second sub-system, controlling the vehicle guidance mechanism, interrogates this representation to decide how to move the MTG and thereby guide it towards the Underground sign. The function of the visual sub-system is summarized by the following mapping:
TV images [IMAGE ] Internal Representation of Scene