Neural Networks and Robot Attention

From IPRE Wiki
Jump to: navigation, search


Treating Robot ADD
Kerstin Baer and Priscy Pais
Mentor: Dr. Douglas Blank

Developmental Robotics is a relatively new approach to artificial intelligence. Instead of programming robots to perform pre-specified tasks, Developmental Robotics aims to equip them with a learning routine that allows them to make sense of their sensomotoric capabilities, discover their environment and perform self-motivated actions. Artificial neural networks have emerged as a promising learning and control mechanism. Modeled after biological neural networks, artificial networks consist of a set of interconnected units that perform calculations on a set of inputs (e.g. the sensor readings of a robot) to get some output that is appropriate for the given situation (e.g. motor commands for the robot). By manipulating the connections between units and adding more units when necessary, a neural network can adapt to new situations and learn more complex behaviors.

When a network-controlled robot is getting familiar with its environment, that is, learns to predict its environment, it has to make decisions where to move next to collect more information. Ideally the robot would pay attention to an object long enough to understand its basic features, but not so long that it forgets everything else around it. In this context we want to explore what it means for a robot to 'pay attention' to something. We intend to build an architecture around the neural network that might serve three purposes: make appropriate abstractions from the input data to make the robot “aware” of what it is paying attention to, guide the actions of the robot to keep it focused on the object of interest, and pre-process the input data to increase the network learning rate.


07/08/08: We have written a program that simulates an "attention window". It goes through a raw list of sensor values (e.g. camera image or sonar readings) and looks for clusters of points with similar values, which are assumed to be one object. Then it removes the background (everything besides this cluster) and rescales the cluster to fit the size of the network input layer before feeding it into the network. In an early test, a network equipped with this additional architecture learned quickly to distinguish between geometric shapes (circles, squares, triangles) regardless of their size or their position on the screen. A network equipped with the additional architecture can also easily handle images containing several objects. We assume that it would be an extremely hard task for a network alone to learn to distinguish a random number of objects of random sizes at random positions on the screen.

Initially we worked with multilayer feedforward networks, but these networks have their limitations. We would like to work with cascade correlation networks instead, but the current Python implementation of the cascade correlation simulator is too slow to be practical. We are currently rewriting ConX in C# and we hope that it will significantly reduce the computing time so that we can efficiently use cascade correlation networks.

07/21/08: We decided to go a step further from the previous experiment and train a network with pictures taken by a scribbler. The main gist of this experiment was to have the robot take a picture, 'decide' what was most interesting in the picture and then turn towards it.
We constructed a circular 'playpen' made from white paper to place the scribbler in. The playpen was situated directly under a light so that there would be a uniform background with minimal shadows. A number of small wooden objects, such as triangles, cylinders, cubes and cuboids, of various dimensions were placed around the circumference of the pen. The scribbler was fixed on a wooden stick in the center of the playpen so that it would not move away from the center while it rotated and took pictures of the objects.

Error creating thumbnail: /bin/bash: /usr/bin/convert: No such file or directory

Error code: 127
Top view of the 'playpen' with the scribbler in the center
Error creating thumbnail: /bin/bash: /usr/bin/convert: No such file or directory

Error code: 127
Close up of the pen
Error creating thumbnail: /bin/bash: /usr/bin/convert: No such file or directory

Error code: 127
Inside the pen
Error creating thumbnail: /bin/bash: /usr/bin/convert: No such file or directory

Error code: 127
Close up of the scribbler

We made the robot take 500 pictures that we would then modify and feed as input to the network. Each picture was initially in color and had 256 * 192 pixels. A program was written to turn each color picture into grayscale (i.e. black and white) and reduce the dimensions to 256 * 100 pixels.

Sample RGB picture taken by the scribbler
Dimensions: 256 * 192
Corresponding grayscale image
Dimensions: 256 * 100

When an image is converted from RGB (or color) to grayscale, the red, green and blue components of a pixel all have the same value. Hence, the red values of all 25600 pixels in the image were added to a list and fed into the network as input. In order to determine which of the 4 or 5 objects in the picture was most 'interesting', an arbitrary list of the objects in descending order of interest was written out. Based on this list, the most interesting object in a picture was chosen and the target output was created. The target was a 5 digit number comprising four 0s and one 1. The position of the 1 in the number was an indicator to the scribbler as to which direction to turn i.e.

 10000 - turn hard left
 01000 - turn left
 00100 - stay in the same position
 00010 - turn right
 00001 - turn hard right

According to the target output, the scribbler turns in a given direction.

As of this day, we were only able to feed in 160 of the 500 targets and hence train the network on only 160 pictures. This network is a backpropagation network written in Python and running on a Windows machine.

07/22/08: Upon completing the list of 500 targets, the Python network on the Windows machine started training again.

07/25/08: Kerstin finished writing, compiling and running a backpropagation network in C#. So we started training the C# network with our input pictures and targets. We ran two C# networks on training mode. One network was operating in Windows and the other in Linux. The objective of this was to make a time comparison and see if the programming language that the network was written in and / or the operating system that the network was running in could make it train faster.

07/29/08: We saw that the networks were hovering around 80% accuracy, which meant that they were probably guessing 0 most of the time. In order to make the networks more efficient, we changed the targets to consist of floating point numbers instead of the integers 0 and 1. We changed the 1 to 0.9 and the 0s were changed either to 0.5 if they were right next to a 1 or to 0.1. Thus

  01000 became 0.5 0.9 0.5 0.1 0.1

We had initially saved the weights of the connections after every few epochs but then we started saving the networks themselves after every epoch.

07/30/08: The Python network running in Windows couldn't work anymore because saving so many networks had taken up most of the hard drive space. So we deleted most of the earlier networks and then kept networks at intervals of 5.