Training and Learning in the Perceptron
For simple problems, the traditional computer obeying fixed rules can accomplish this task, but as the complexity of the decision making increases the process becomes too lengthy.
In contrast, the perceptron network can easily cope with this increased complexity. The method of ensuring it captures the right images is not controlled by a set of rules but by a learning process. In many respects this learning process is rather similar to the way the brain learns to distinguish certain patterns from others.
This learning process proceeds by way of presenting the network with a training set composed of input patterns together with the required response pattern. By present we mean that a certain pattern is fed into the input layer of the network. The net will then produce some firing activity on its output layer which can be compared with a `target' output. By comparing the output of the network with the target output for that pattern we can measure the error the network is making. This error can then be used to alter the connection strengths between layers in order that the network's response to the same input pattern will be better the next time around.
We start off with a network which gives a random output for a given input. We then train the network by presenting it with successive patterns drawn from an example set which is typical of the problem we want the network to work on. For each of these patterns, we look at the output pattern the network gives us and compare it with the output we would ideally like.
For example, suppose we want to train the network to learn about triangles in the input. We show it a picture which contains a triangle. The output (i.e. the "on" lights on the second screen) will in general be nothing like a triangle. However we can measure how far off the output is from the desired output. This error may be reduced by a judicious alteration of the connection strengths (hidden in our "black box"). The precise way this is achieved is beyond the scope of these lectures. It is sufficient to know that a well-defined mathematical procedure can be applied which changes the connections between layers in such a way that the error will always decrease. (For those interested the technical name for this procedure is called back-propagation). This process is somewhat similar to the way a memory is recalled in the Hopfield network. We can imagine a ball rolling on surface. As the training proceeds the ball moves downhill until eventually it reaches a well or low point at which it stops. In this case the set of network connections is represented by the ball and the height of the surface the network error on an output pattern.
Thus, to train the network on spotting triangles, we repeatedly show the network a picture containing a triangle, measure how "wrong" the response of the network is and then change the connection strengths accordingly. Eventually, we will have made the network's response to a triangle picture as close as we wish to the "ideal" response. The perceptron network now knows about triangles! We can then continue this process adding more types of shapes for the network to recognize and building in whatever responses we care to.
We give an example of this in another neural network simulation. You can access this by clicking on Perceptron Pattern Demonstration. The goal of this simulation is to teach the network to recognize various geometrical shapes.
Other applications of the PerceptronThe range of tasks that the perceptron can handle is much larger than just decisions concerning simple shapes and pattern recognition. For example, one could train the network to form the past tense of English verbs, read English text and handwriting, and a whole variety of other problems.
For example, neural networks have been used to predict financial markets and make medical diagnosis. All that is required to use networks in this way is a "code" which allows us to write problems in one field in terms of pattern classification problems.
For example, NETTalk is a perceptron network which is capable of transforming a written English text into its individual sound types (phonetworkic representations) and then pronouncing it using a voice synthesizer. In essence, this works by associating a given pattern of node activity (flashing of lights on the input screen of our simple model) with a given English word, for example "fish". However, on the output screen a given node activity is now tied to a given type of sound in English. Training the network means then ensuring that whenever the text "fish" is presented to the network input (in the form of a pattern of lights), the output will be a pattern of lights which codes for the sound of "fish" which is finally produced by the voice synthesizer. So, in broad terms, we are using the network to associate a given word (written in English text) with a particular sound (English pronunciation of the word).
NETTalk has around 300 neurons (nodes) (80 in the hidden layer) and 20,000 individual connections. The network was trained with isolated words and with continuous text. After twelve hours of learning, a 95% success rate was found on the learning data. With a new text about 80% success rate was achieved. In each case the errors the network made were quite close to the correct pronunciation. Indeed, many of the errors made by NETTalk were strikingly similar to those made by young children.
Towards thinking - networks can generalizeWe shall show you two further examples of neural networks which expose some of the power of these systems. The first illustrates that the presence of a hidden layer is a crucial feature which allows the network to make generalizations from the training data. The network can learn about common features in the input patterns such that when new patterns are presented which possess some of these common features the network can `sense' these features and produce sensible and useful output. An example will clarify this somewhat.
Consider a picture of an ellipse. This can be specified as a black and white image on the input layer by representing black portions of the image by firing nodes and white space by `off' nodes. However, the ellipse can also be represented in short hand by giving the position of its center, its height and width and its angle of tilt. This compact description of the ellipse may be stored on the hidden layer using far fewer nodes than is required by the image of the ellipse on the input layer. For example, a particular hidden layer node being `on' can represent an ellipse with zero angle of tilt, another being `on' might mean an angle of tilt of ninetworky degrees etc.
We can train the network on a host of examples of ellipses and by adjusting the connections to the output layer ensure that it produces a letter E on the output layer for all ellipses in the training set. Then if we present it with a new picture of an ellipse - not one it has ever encountered before it should be able to generalize from its earlier examples and produce the correct `E' response.
So, for input patterns which are "ordered" in some sense, the network can learn how to recognize this order and act upon it. The ability of the frog's brain to extract the most important features of what it receives from its visual senses, and to produce appropriate responses is based on this.
The Perceptron as an encoder networkTo illustrate this in a simple example, consider a very simple neural network consisting of four input and four output nodes with a hidden layer containing just two nodes. We use four input patterns which are just the four patterns gotten by setting three of the nodes be "off" and the other "on". The task of the network is just to produce the same output as input ! We shall call this network an encoder net since the goal will be to store four input patterns each needing four input nodes with just the two hidden layer nodes. The four patterns will be `encoded' with just two hidden layer nodes.
This task is at first sight somewhat tricky. There are four input patterns each composed of the firing activities of four nodes which must be reproduced on the output layer. But the activities on the output layer are determined by the activities of the hidden layer nodes of which there are only two! How can this be done? For any general set of four patterns it cannot but notice that there is a certain structure to the input patterns - there is only ever one `firing' node. So to be able to recognize any one of these input patterns and reproduce it on the output layer, the network must just be able to distinguish which is the `on' node in the input. There are just four possible nodes to be `on' so the hidden layer must be able to produce at least four distinct firing patterns to be able to distinguish between these four nodes. Of course, with two nodes there are precisely four different patterns of activity : (on,on), (on,off), (off,on) and (off,off).
These are precisely the patterns of activity that are seen in the simulation as we cycle through all the input patterns! What we have spent a whole paragraph trying to explain the network has discovered for itself without any careful programing! Furthermore, it has discovered these rather complex relationships by a simple `learning' process in which it is simply shown the patterns and desired responses a number of times. Between successive `viewings' it has adjusted its connections to try to improve its responses to the input patterns - to put them closer to the desired output patterns. It is quite remarkable that in doing so it has inadvertedly done what we might describe as some quite complicated thinking !
To try out this simulation click here on Perceptron Encoder Demonstration.
To recap, the network must learn about the special features in the input data (in this example that there is only ever one `on' node in the input) which allow it to be represented on a hidden layer which has fewer nodes. Notice that because the network adjusts its connections to learn about these features of the input data, we can then expect it to generalize - if we had trained it on only three of the patterns it would have produced the correct response for the fourth anyhow. Notice also that if we had only allowed it one hidden node it could never have been able to classify four patterns - the moral being that as we increase the number of patterns to be classified we must increase the size of the hidden layer.
For enthusiasts only: actually if we think of the input patterns as representing numbers we see that the network has learned about their binary representation!
The Perceptron learns to add integers!As our final example consider training a network to do simple integer arithmetic. In order that we treat this problem using neural networks, we need to translate it into a pattern classification problem. In other words, we ascribe some unique pattern of activities of the input nodes to a given pair of numbers to be added. The patterns are a "code" for the original numbers. We then require that the networks response be another pattern which under "decoding" yields the correct sum. In this way, we can train the network to learn that the sum of, say, 2 plus 3 is 5. In fact, in the example we give, we train the network on a sample of some sixty four sums of integers in the range zero through seven. The network we will use will have 6 input nodes, 4 output nodes and 15 nodes in the hidden layer.
As input, we present the two integers and the output will correspond to their sum. As you will see at first when we start to train the network it will not be able to respond with any number at all when we ask it for any sum in the training set - its output doesn't "look" like a number at all! However, after a while some of the sums will start to produce a valid number (which sometimes will be wrong!). After even longer, the network will begin to respond correctly to essentially all the sums in the training set. It is also possible to train the network on just a subset of all the possible sums of integers between zero and seven, say 60 rather than the full 64 sums. Then the network will never have seen certain sums during its training but the fascinating thing is that it is still able to produce the correct responses!
Again the network has been able to generalize (by forming some internal representation of the data) and supply a correct response to a new question - not one it had ever been asked during the training process. This decision making ability is built into the network in a highly non-trivial way and is perhaps illustrative of the way the brains neural networks are able to structure themselves to learn by example and to extract generalizations from experience.
To access the simulation read the Perception Arithmetic Demonstration.
The perceptron model is impressive in its versatility, power and
flexibility. It is already commercially very successful and
this surely will only increase with time as it finds
application in wider and more diverse areas. The method
of programming is very different from conventional computers
and much closer to the learning process used by the human
brain. This is encouraging but it must be emphasised that
the learning process is strictly `supervised' - a teacher must
train the network what is important and must manually
change the network connections to achieve this goal. In nature,
much of the brain must learn its own learning procedures - it
must be self-organizing.
This is what we shall turn to next.