Emergence of Communication

From IPRE Wiki
Jump to: navigation, search

Machine Learning Experiences in Artificial Intelligence

Author: Douglas Blank, Bryn Mawr College

Project: Evolution of Language and Intelligence

Overview

The aim of this project is to explore the self-organized language and behavior that can develop between and among simple robots in a simulated environment. The project will utilize two general tools from machine learning: the genetic algorithm, and the artificial neural network. You will be provided with the infrastructure for evolving your own robot behaviors. You will then design your own neural network and design a simulated world for which your agents will evolve. Finally, you will analyze the language which results.

Objectives

The goal of this project is to explore self-organization and evolutionary systems on autonomous robotic agents.

The learning objectives are:

  1. Learning the basics of the genetic algorithm
  2. Learning the basics of the artificial neural network
  3. Explore the factors in evolving neural networks on mobile robots
  4. Gain experience in analyzing self-organized systems

Prerequisites

The students should have a basic knowledge of programming, data structures, and statistics. This project will be implemented in Python, but they do not need previous Python experience. Before beginning the project, students may wish to read the recommended readings.

This project will use Pyro, the Python Robotics environment available at http://PyroRobotics.org/ for Linux, Macintosh, and Windows. A "Live CD" can be found at the website for running experiments without installing any software.

Background

The project is largely based on the paper:

Marocco, D., Nolfi, S. (2006), Self-Organization of Communication in Evolving Robots, In Rocha L. M. et al. (eds), Proceeding of the Tenth International Conference on Artificial Life, ALifeX. Boomington: MIT Press, pp. 199-205. Preprint retrieved on December 17, 2006 at http://laral.istc.cnr.it/marocco/Marocco_alife.pdf

This paper describes four robots that are able to evolve signals to help them solve a particular task. Although the robots evolve with simulated audio signals, these signals can be turned into real audio signals and listened to while observing their behavior. In addition, students can produce simulated audio signals, and observe the robots casual reactions.

To have a proper knowledge in the genetic algorithm and neural networks, students may wish to read some background material. For example, chapters from:

Stuart Russelland Peter Norvig. Artificial Intelligence: A Modern Approach, 2nd edition. Prentice Hall, Upper Saddle River, NJ, USA, 2003.

In addition, students may wish to become familiar with the Python Robotics system. There are on-line materials at:

http://PyroRobotics.org/?page=PyroCurriculum

Students may wish to read the materials on:

  1. Introduction to Pyro
  2. Introduction to Python
  3. Neural Networks
  4. Evolutionary Algorithms

Description

Genetic Algorithm

A Genetic Algorithm (GA) is used to evolve a list of 90 floating-point numbers. The numbers are arrange left to right, and each has an initial value close to zero, but may be negative or positive.

First, a population of 30 lists is created. Each of these can be thought of as a genome.

    +-----------------------------------------------------------------------------------------+
 1  |  1.2 |  0.0 | -1.0 | 0.8 | -0.2 |  0.5 |  0.5 | -1.2 |  0.1 |  0.8 | -0.1 | -0.9 | -0.3 |
    +-----------------------------------------------------------------------------------------+
 2  | -0.9 |  1.2 | -0.1 | 0.2 | -0.5 |  0.7 | -0.2 |  1.3 |  1.1 |  0.7 | -1.1 | -0.2 |  1.2 |
    +-----------------------------------------------------------------------------------------+
 3  |  0.1 |  0.3 |  1.1 |-0.1 |  0.6 | -0.8 |  0.0 |  0.2 | -0.9 | -0.6 |  0.8 | -0.1 | -0.9 |
    +-----------------------------------------------------------------------------------------+
 4  |  0.3 | -0.4 | -0.0 | 0.6 | -0.1 | -0.9 |  0.4 | -1.1 |  0.9 |  1.4 | -0.9 | -0.0 |  0.7 |
    +-----------------------------------------------------------------------------------------+
    ...
    +-----------------------------------------------------------------------------------------+
30  |  0.5 |  0.3 |  0.3 | 0.7 |  0.4 | -0.8 |  0.1 | -0.7 |  0.4 |  0.5 | -0.5 |  1.0 |  0.3 |
    +-----------------------------------------------------------------------------------------+

Neural Network

Each floating-point number ("gene") is interpreted to be the weight of a neural network with the following structure:

                                 +--------------+
                                 |   3 Output   |    Translate, Rotate, Sound                             Output Layer
                                 +--------------+
                                   ^          ^
                                   |          |
                                   |     +--------------+
                                   |     |   2 Hidden   | <-----------------------------------+           Hidden Layer
                                   |     +--------------+                                     |
                                   |          ^                                               |
                                   |          |                                               |
 +--------------------------------------------------------------------------------------+     |
 |+---------+  +---------------+  +------------------+  +-----------------+  +---------+|  +-------------+
 || 8 Sonar |  | 4 Directional |  | 1 Previous Sound |  | 1 Light Sensor  |  | 1 Stall ||  |  2 Context  |  Input Layer
 |+---------+  +---------------+  +------------------+  +-----------------+  +---------+|  +-------------+
 +--------------------------------------------------------------------------------------+

Each arrow represents a matrix of numbers (called "weights"), for a total of:

(8 + 4 + 1 + 1 + 1) * 2 + (8 + 4 + 1 + 1 + 1) * 3 + (2 * 2) + (2 * 3) = 85 weights
(2 + 3)                                                               =  5 bias/threshold
                                                                       ----
                                                                        90 numbers, one gene

Activation flows through the network, bottom to top, like so:

Net1.jpg

Propagating the activation, from bottom to top:

O1 = (I1 * W1) + (I2 * W2)
A1 = sigmoid(O1)

The network is an Simple Recurrent Network (SRN, see http://pyrorobotics.org/?page=Autoassociative_20and_20Recurrent_20Networks). There are short-cut connections between the input layer and the output layer. The input contains three types of sensor data: sonar, sound, and light.

Evolution

  1. Each list of the genes in the population is loaded into 4 neural networks controlling the four robots, and run
  2. A run is a series of steps:
    1. put each of the robots in a random place
    2. get sensor readings (which includes distances, stall sensor, target detection, and 4 directional microphones)
    3. put the readings into the bottom of each neural network
    4. propagate the activations to the top
    5. interpret the output activations as commands:
      1. rotation - left or right
      2. translation - forward or backward
      3. speech - signal between 0 and 1
    6. at each step, figure out a reward score, and sum to a total
  3. The sum of the fitness is this group's fitness
  4. Rank all of the genes by their fitness score
  5. Select them for the next generation based on how well they did
    1. select and mutate
    2. no crossover
  6. Repeat
+-----------------------------------------------------------------------------------------+
| -0.8 | -0.3 | -0.2 | 0.1 |  0.9 | -0.8 |  1.7 |  0.1 | -1.2 | -0.2 |  0.6 | -0.1 |  0.1 |
+-----------------------------------------------------------------------------------------+

The Environment

The fitness function of a gene is based on the performance of 4 robots in a square room with two lights. If a robot is in the yellow circle area, then the score is increased by .25 for that step. However, if there are more than 2 robots in a yellow area, then 1.0 point is removed for each extra robot (but never becoming negative).

Inaction.jpg Robot.jpg

The robots can send signals to each other through one of their output units, and is heard through their inputs. The inputs are "directional microphones" arranged like so:

         0
     \       /
      \     /
       \   /
        \ /
   3     +     1
        / \
       /   \
      /     \
     /       \
         2

Output from the closest robot in the four directions becomes input in this bank. The yellow spots are the feeding areas and can only be detected when a robot is directly over it.

Evolang.gif

The User Interface (UI) allows one to turn off any of the drawing components. You might want to turn off "speech" and "sonar" to increase drawing speed. Also, you might want to turn on "trail" as to compare the paths of each robot.

The program is divided into two parts, the brain and a controlling program. However, unlike brains that run in their own thread, these will be run from the controlling program. This allows a tighter coupling than normal, which is needed to make the simulation run faster than real time.

Here is the brain for each robot:

class NNBrain(Brain):
    def setup(self):
        self.robot.range.units = "scaled"
        self.net = SRN()
        self.sequenceType = "ordered-continuous"
        # INPUT: ir, ears, mouth[t-1]
        #        sonar, stall, ears, eyes, speech[t-1]
        self.net.addLayer("input", len(self.robot.range) + 1 + 4 + 1 + 1) 
        self.net.addContextLayer("context", 5, "hidden")
        self.net.addLayer("hidden", 5)
        # OUTPUT: trans, rotate, say
        self.net.addLayer("output", 3)
        # ----------------------------------
        self.net.connect("input", "output")
        self.net.connect("input", "hidden")
        self.net.connect("context", "hidden")
        self.net.connect("hidden", "output")
        self.net["context"].setActivations(.5)
        self.net.learning = 0

    def step(self, ot1, or1):
        t, r = [((v * 2) - 1) for v in [ot1, or1]]
        self.robot.move(t, r)
        
    def propagate(self, sounds):
        light = max(math.floor(v) for v in self.robot.light[0].values())
        inputs = (self.robot.range.distance() + [self.robot.stall] + 
                  sounds + light + [self.net["output"].activation[2]])
        self.net.propagate(input=inputs)
        self.net.copyHiddenToContext()
        return [v for v in self.net["output"].activation] # t, r, speech

The program for running these experiment is here: evolang.py (click on download).

To run the program:

If you have pyrobot installed in the Python system:

export PYROBOT=/usr/lib/python2.5/site-packages/pyrobot
python -i evolang.py [OPTIONS]

Or in a stand-alone directory (for example, in /usr/local/pyrobot):

export PYTHONPATH=/usr/local
export PYROBOT=/usr/local/pyrobot
python -i evolang.py [OPTIONS]

The program has the following options (which can be seen using the -h flag):

python evolang.py command line:

   -g 2d|3d|none  (graphics, default 2d)
   -n N           (robot count, default 4)
   -a             (automatic restart, default off)
   -e             (start evolving, default off)
   -p /dev/dsp    (sound device or none, default /dev/dsp)
   -l file.pop    (load a population of genes)
   -t T           (fitness function uses T trials, default 5)
   -s S           (sim seconds per trial, default 20)
   -z Z           (population size, default 100)
   -m M           (max generations, default 100)
   -c 0|1         (can hear?, default 1)

 CONTROL+c to stop at next end of generation
 CONTROL+c CONTROL+c to stop now
  • -g GRAPHICSTYPE allows you to set the graphics display type; default = 2d
  • -n N allows you to set the number of robots; default of N is 4
  • -a automatically will find the highest gen-*.pop file, load it, and set the current generation correctly
  • -e start evolving
  • -p SOUNDDEVICE turn sounds on and use the given sound device
  • -l file.pop loads a population of genes from the file file.pop
  • -t T fitness function uses T trials where each trial starts in a random location
  • -s S simulated seconds per trial; default = 20
  • -z Z population size, default 100
  • -m M maximum generations, default 100
  • -c 0|1 can the robot's hear? Default yes

You can get a basic run started with one of the following:

ipython evolang.py -- -e
python -i evolang.py -e

NOTE: When you run a program, it tries to run with as much system resources as it deserves. With a highly-computational program like this one---especially when you want to log in to another machine and run a long process---it may be useful to be nice to other users on that computer. You can limit the amount of resources with the renice command. A typical usage might look like renice 19 22456 where 22456 is the process id. You can get that from top or ps aux | grep dblank.

Audio

To play sounds:

 aumix

adjust volume levels; 'q' for quit.

Then:

 python evolang.py -p /dev/dsp -r 0

where 0 can be any number less than the number of robots.

The project is customizable to accommodate different approaches to teaching and different implementations.

Analysis

The following two videos show evolution at 40 generations, and at 100 generations.

Videos were made with xvidcap and uploaded to http://YouTube.com

Results from the Paper

The paper reports 5 types of "signals":

Arobot.jpgBrobot.jpgCrobot.jpgDrobot.jpgErobot.jpg

Can you find signals like that in your evolved language?

Additional exercises

  1. Given that each robot initially in a group has the same random "brain", and there is absolutely no randomness in the simulation, what accounts for variation in behaviors?
  2. How could you scientifically show that light, walls, or sounds effect behavior? How fine-grained can you be in determining what sense determines what behavior?
  3. How does behavior change over time? Plot the best, and average fitness over time.
  4. What are the robots talking about?
  5. How is it possible to evolve talking and hearing (effectively evolving meaning) simultaneously? Is this akin to the argument "what good is half an eye?"? Where does meaning come from in this experiment?
  6. Marocco and Nolfi collapse their robot's evolved communication into symbols (A through E). Is this appropriate? Explain.
  7. Marocco and Nolfi compare the average fitness over time. Why do you think they do that?
  8. Marocco and Nolfi compute the fitness for 4 individuals acting together rather than each individual having their own gene and fitness. Why might they do this?
  9. There are many possible alternate experiments to try. Propose a variation and try it.
    1. What differences do you see?
  10. There are probably limits to what these robots will evolve to do and say. Why? What could be done towards more open ended evolution and development?

Syllabus

Sample syllabi available at:

Additional readings are included in the Background section above.

500 Generation Weights: Media:Weights500.pop