Programming a Perceptron in Python

At HSR, I’m currently enrolled in a course about neural networks and machine
learning. One of the simplest forms of a neural network model is the
perceptron.

Background Information

A perceptron classifier is a simple model of a neuron. It has different inputs
($x_1$…$x_n$) with different weights ($w_1$…$w_n$).

$$s = sum_{i=0}^n w_i cdot x_i$$

The weighted sum $s$ of these inputs is then passed through a step function $f$
(usually a Heaviside step function).

$$f(s) = begin{cases} 1 & textrm{if } s ge 0 0 & textrm{otherwise} end{cases}$$

To make things cleaner, here’s a little diagram:

The mathematical model of a perceptron.

Python!

Here’s a simple version of such a perceptron using Python and NumPy. It will
take two inputs and learn to act like the logical OR function. First, let’s
import some libraries we need:

from random import choice
from numpy import array, dot, random

Then let’s create the step function. In reference to Mathematica, I’ll call
this function unit_step.

unit_step = lambda x: 0 if x < 0 else 1

Next we need to map the possible input to the expected output. The first two
entries of the NumPy array in each tuple are the two input values. The second
element of the tuple is the expected result. And the third entry of the array is
a “dummy” input (also called the bias) which is needed to move the threshold
(also known as the decision boundary) up or down as needed by the step function.
Its value is always 1, so that its influence on the result can be controlled by
its weight.

training_data = [
    (array([0,0,1]), 0),
    (array([0,1,1]), 1),
    (array([1,0,1]), 1),
    (array([1,1,1]), 1),
]

As you can see, this training sequence maps exactly to the definition of the OR
function:

A B A OR B
0 0 0
0 1 1
1 0 1
1 1 1

Next we’ll choose three random numbers between 0 and 1 as the initial weights:

w = random.rand(3)

Now on to some variable initializations. The errors list is only used to
store the error values so that they can be plotted later on. If you don’t want
to do any plotting, you can just leave it away. The eta variable controls
the learning rate. And n specifies the number of learning iterations.

errors = []
eta = 0.2
n = 100

In order to find the ideal values for the weights w, we try to reduce the
error magnitude to zero. In this simple case n = 100 iterations are enough;
for a bigger and possibly “noisier” set of input data much larger numbers should
be used.

First we get a random input set from the training data. Then we calculate the
dot product (sometimes also called scalar product or inner product) of the input
and weight vectors. This is our (scalar) result, which we can compare to the
expected value. If the expected value is bigger, we need to increase the
weights, if it’s smaller, we need to decrease them. This correction factor is
calculated in the last line, where the error is multiplied with the learning
rate (eta) and the input vector (x). It is then added to the weights
vector, in order to improve the results in the next iteration.

for i in xrange(n):
    x, expected = choice(training_data)
    result = dot(w, x)
    error = expected - unit_step(result)
    errors.append(error)
    w += eta * error * x

And that’s already everything we need in order to train the perceptron! It has
now “learned” to act like a logical OR function:

for x, _ in training_data:
    result = dot(x, w)
    print("{}: {} -> {}".format(x[:2], result, unit_step(result)))

[0 0]: -0.0714566687173 -> 0
[0 1]: 0.829739696273 -> 1
[1 0]: 0.345454042997 -> 1
[1 1]: 1.24665040799 -> 1

If you’re interested, you can also plot the errors, which is a great way to
visualize the learning process:

from pylab import plot, ylim
ylim([-1,1])
plot(errors)

It’s easy to see that the errors stabilize around the 60th iteration. If you
doubt that the errors are definitely eliminated, you can re-run the training
with an iteration count of 500 or more and plot the errors:

You could also try to change the training sequence in order to model an AND, NOR
or NOT function. Note that it’s not possible to model an XOR function using a
single perceptron like this, because the two classes (0 and 1) of an XOR
function are not linearly separable. In that case you would have to use multiple
layers of perceptrons (which is basically a small neural network).

Wrap Up

Here’s the entire code:

from random import choice
from numpy import array, dot, random

unit_step = lambda x: 0 if x < 0 else 1

training_data = [
    (array([0,0,1]), 0),
    (array([0,1,1]), 1),
    (array([1,0,1]), 1),
    (array([1,1,1]), 1),
]

w = random.rand(3)
errors = []
eta = 0.2
n = 100

for i in xrange(n):
    x, expected = choice(training_data)
    result = dot(w, x)
    error = expected - unit_step(result)
    errors.append(error)
    w += eta * error * x

for x, _ in training_data:
    result = dot(x, w)
    print("{}: {} -> {}".format(x[:2], result, unit_step(result)))

If you have any questions, or if you’ve discovered an error (which is easily
possible as I’ve just learned about this stuff), feel free to leave a comment
below.