WHAT ARE WE TRYING TO DO, AND HOW DO LOGIC AND PROBABILITY FIT INTO THE BIGGER PICTURE? Understanding the Functions of Animal Vision Aaron Sloman Draft 29 Jan 2008 As I said when I received the original invitation I don't have expertise regarding *probabilistic* approaches. It seems to me that insofar as manipulation of probabilities has a role in connection with uncertainty due to noise, poor resolution, occlusion, aperture problems, etc. we have no hope of producing good mechanisms unless we have very clear and effective ideas about what needs to be represented when there is NO uncertainty and how that information can be represented, transformed, and used. Putting in probabilistic mechanisms too soon is like building a repair kit for an engine before you have designed the engine. As far as the use of logic is concerned, I think that is merely one kind of representation, which is very useful because of its generality, but for many problems involving spatial structures, processes and causal interactions it can be more useful to use spatial (geometric and topological) representations, though not necessarily isomorphic with what they represent -- as pointed out in my IJCAI 1971 discussion of the importance of both Fregean and analogical representations, now online here: http://www.cs.bham.ac.uk/research/projects/cogaff/04.html#200407 However it has proved very difficult to design computer based virtual machines with the required properties. Perhaps that is because we are still not clear enough about the requirements. My work is mostly about requirements, but I have some sketchy design ideas. Keywords: analogical & Fregean representation animal vision, causation, geometry, processes, proto-affordances, representation, structures, topology Depending on the opportunities and the interests of others, I could talk about a collected of related issues that interest me: o I am not interested in solving specific engineering problems using TV cameras, though I am interested in trying to understand what the functions of animal vision (including human vision) are and what sorts of information-processing mechanisms and architectures can implement those functions. o Vision is not primarily about recognition, since you cannot learn to recognise anything that you cannot already see. o Recognition is a secondary function, and there are many different kinds of recognition which differ both in the content of the visual information (e.g. perception of processes, functions, causal interactions, affordances, dangers, etc. See below.) and in the uses to which the information is put (e.g. continuous servoing, testing generalisations, explaining, answering questions, generating new goals, predicting, designing, communicating, ....) o Vision has been widely interpreted as being concerned with acquiring information about spatial structures in the environment, whereas, for organisms, perception of static structures is a special case of perception of *processes*, i.e. static structures are processes where nothing is changing. (Structures are also parts of processes. And structures in the environment can be inferred from perceived processes.) o J.J.Gibson's revolution (a bit like the copernican revolution) was to point out that instead of regarding the role of animal vision as being to provide (viewer independent, though not viewpoint independent) information about what is 'out there' in the world (e.g. as assumed by Marr and many others) but rather to acquire information about what the viewer can and cannot do and with what consequences: These are positive and negative affordances, using an ontology that depends on the possible actions and possible goals of the animal. o I think this may be all that *some* species can do, and in many of them the ability to acquire and use information about affordances is 'compiled in' by evolution, whereas some animals (e.g. humans, hunting mammals, primates, nest-building birds, animals that have to dismember prey in order to eat) seem to have to acquire visual competences that are more general and allow novel situations to be dealt with: I suggest that in those cases perception of affordances depends on the ability to perceive lower level structures: proto-affordances. o The ability to perceive not only processes that are occurring in the environment but the possibilities of processes that *could* occur, and constraints on those possibilities is a key function of vision in intelligent animals, and is the foundation for the perception of affordances. o Perceiving proto-affordances involves perceiving what processes are *possible*, or *constrained* by a particular situation, independently of whether those processes are produced by or useful for the viewer or produced by or useful to any other agent. E.g. seeing that a twig can fit into an opening, or that a surface of one object could move closer to or further from a surface of another object, or that a flexible or articulated object has a shape that could change in certain ways. o The ability to see proto-affordances may be common to the ability to see affordances for the viewer, and also affordances for others: *vicarious* affordances. The others could be predators, prey, or conspecifics, e.g. infants needing help or protection. (There is much confusion about mirror neurones). o The ability to perceive processes requires the ability to *represent* processes of many kinds, including multi-strand processes in which multiple relationships (including metrical, topological, causal, functional relationships) at different levels of abstraction change concurrently, though not all admit temporal description at the same level of grain. o As far as I know, there are as yet no good proposals regarding how such processes can be represented, both while they are being perceived, and when the information about what has been perceived is used later, except in very simple cases (e.g. visual servoing of simple actions). [E.g. is a perceived process such as someone walking across the room. recorded as a repeated process, like auditory memories that use rehearsal, or as a description that can generate a process when required (e.g. a specification for simulations), or as a static information structure that can be put to various uses, or ...?] o The ability to perceive empty spaces is part of the ability to perceive possibilities. [How is empty space represented by a visual system? How do you see a blank sheet of paper? What would Picasso have seen there?] o The ability to perceive and use information about actual and possible processes (and the structures involved in those processes) is one of the foundations of human mathematical capabilities. So any theory of vision that does not contribute to the explanation of those mathematical capabilities, e.g. reasoning in Euclidean geometry and topology, is a mistaken or at best a partial theory of vision. [This is connected with perceiving Kantian structure-based causal relationships, as opposed to probabilistic Humean causal relationships.] http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#math-robot (How could child robot grow up to be a mathematician or a philosopher. PDF presentation.) o Vision can work together with other forms of perception, including auditory, haptic, proprioceptive and vestibular information processing: and we need to understand what forms of representation facilitate the integration of information from those various sources. [E.g. contrast the economy of a-modal exosomatic representations and ontologies with vs multi-modal somatic representations and ontologies, where the latter are only concerned with relations between sensory and motor signals at various levels of abstraction, and the latter are concerned with what exists outside and independently of the agent.] o One of the facts about vision seems to be that people born blind can still make use of important brain mechanisms that evolved in part to serve the functions of vision. A good theory of vision should explain how that works. (E.g. can haptic information processing in congenitally blind people make use of visual forms of representation, and processing mechanisms.) o Likewise there are things to be explained about how people with several physical abnormalities or deformities (e.g. people born limbless) use visual and other brain mechanisms that evolved to support normal bodies. o One of the under-rated functions of vision is provision of information not about the physical environment, or about affordances for action, but about *epistemic affordances*: e.g. information about what information is available and what you have to do to get it, etc. o Closely related is the role of vision in predicting how affordances (both action affordances and epistemic affordances can change if various possibilities are realised.) [Aspect graphs capture a special case of this.] http://www.cs.bham.ac.uk/research/projects/cosy/papers/#dp0702 Predicting Affordance Changes (HTML) o Any good theory about the use of probabilities or any other form of representation for handling uncertainty in visual processing should, as a special case, determine good ways to deal with visual information when there is NO uncertainty and everything is deterministic (e.g. seeing how an old-fashioned clock works, when everything is very clearly visible, etc.) o In humans (and probably some other altricial species) a visual system does not have a fixed set of functions but can extend itself to cope with new ontologies that are not definable in terms of the initial ontologies, and new functions -- e.g. learning to see new kinds of causal relations, learning to see new kinds of functional roles, learning to read music, learning to see intentions in actions, learning to see threats in board games, learning to see dances, learning to see computer program structures, including learning to detect bugs. Understanding how that architectural growth can occur seems to be one of the major unsolved problems of vision. o The speed with which many visual functions can be performed, along with the complexity of the information processed and the variety of cases that an individual can handle from one moment to the next, suggest that none of the currently available AI mechanisms for processing information can do the job. None of the neural theories I have heard presented provide adequate explanations either. http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#compmod07 Architectural and representational requirements for seeing processes and affordances (PDF) o How brains do it probably cannot be determined by bottom up neuroscientific investigations: some deep new theory may be required to direct research into what brains actually do. I have more questions than answers, alas. I have tried to be provocative. There is lot more related material in presentations and papers here: http://www.cs.bham.ac.uk/research/projects/cogaff/talks/ http://www.cs.bham.ac.uk/research/projects/cosy/papers/ Aaron http://www.cs.bham.ac.uk/~axs/ http://www.cs.bham.ac.uk/research/projects/cogaff/dag08