Neural Networks Demystified [Part 2: Forward Propagation]

Last time we set up our neural network on paper. This time we'll implement it
the programming language Python.
We'll build our network as a Python class. In our init method we'll take care of
instantiating important constants and variables.
We'll make these values accessible to the whole class by placing a "self."
in front each variable name. Our network has two inputs,
three hidden units, and one output. These are examples of hyperparameters.
Hyperparameters are constants that establish the structure and behaviour
of our network. But are not
updated as we train the network. Our learning algorithm is not capable of
for example
deciding that it needs another hidden unit.
This is something that we must decide on before training.
What neural networks do learn on the parameters, specifically in the weight
on the synapses. We'll take care of moving our data through our network and the method called
forward. Rather than pass inputs through the network one at a time
we're going to use matrices to pass through multiple inputs at once.
Doing this allows for big computational speed ups, especially when using tools
like MATLAB or NumPy.
Our input data matrix X is a dimension 3 by 2
because we have three two-dimensional examples. Our corresponding output data
Y is a dimension 3 by 1.
Each input value or element in matrix X needs to be multiplied by a
corresponding weight
and then added together with all the other results for each neuron.
This is a complicated operation, but if we take the three outputs we're looking for
as a single row of the matrix
and place all our individual weights into a matrix of weights
we can create the exact behavior we need by multiplying our
input data matrix by our weight matrix. Using matrix multiplication allows us to pass
multiple inputs through at once
by simply adding rows to the matrix X. From here on out will refer to these matrices
X, W1, and Z2
where Z2 is the activity of our second layer.
Notice that each entry in Z is a sum of weighted inputs to each neuron.
Z is a size of three by three: one row for each example
and one column for each hidden unit. We now have our first official
formula: Z2 = X * W1.
Matrix notation is really nice here because it allows us to express the
complex underlying process
in a single line. Now that we have the activities of our second layer, Z2,
we need to apply the activation function. We'll independently apply the
function to each
entry in matrix Z using a Python method for this called sigmoid
because we using a sigmoid as our activation function. Using NumPy is
really nice here
because we can pass in a scalar, vector or matrix. NumPy will apply the
activation function
element-wise and returns a result and the same dimensions as it was given.
We now have our second formula for forward propagation
Using f to denote our activation function,
We can write that a2 (our second layer activity)
is equal to f(Z2).
a2 will be a matrix the same size as Z2.
in this case 3x3
To finish forward propagation
we need to propagate a2 all the way to our output y-hat.
We've already done the heavy lifting in our previous layer
So all we have to do now is to multiply a2
by our second layer weights W2
and apply one more activation function.
W2 is of size 3x1
one weight for each synapse
multiplying a2 (a 3x3)
by W2, a 3x1
results in a 3x1 matrix Z3
(the activity of our third layer).
Z3 has 3 activity values,
one for each example.
Last but not least we'll apply our activation function to Z3
yeilding our official estimate of test scores:
ลท
We need to implement our forward propagation formulas in Python.
First we'll initialize our weight matrices within our __init__ method.
For starting values, we'll use random numbers.
We'll implement forward propagation in our Forward method
using numpy built-in dot method for matrix multiplication and our own sigmoid method.
And there you have it.
A Python class capable of estimating your test score
given how many hours you sleep and how many hours you study.
We can pass in our own input data and get real output,
now you may be noticing that our estimates are quite terrible.
That's because we have not yet trained our network
and that's what we're going to work on next time.