Last time we set up our neural network on paper. This time we'll implement it

the programming language Python.

We'll build our network as a Python class. In our init method we'll take care of

instantiating important constants and variables.

We'll make these values accessible to the whole class by placing a "self."

in front each variable name. Our network has two inputs,

three hidden units, and one output. These are examples of hyperparameters.

Hyperparameters are constants that establish the structure and behaviour

of our network. But are not

updated as we train the network. Our learning algorithm is not capable of

for example

deciding that it needs another hidden unit.

This is something that we must decide on before training.

What neural networks do learn on the parameters, specifically in the weight

on the synapses. We'll take care of moving our data through our network and the method called

forward. Rather than pass inputs through the network one at a time

we're going to use matrices to pass through multiple inputs at once.

Doing this allows for big computational speed ups, especially when using tools

like MATLAB or NumPy.

Our input data matrix X is a dimension 3 by 2

because we have three two-dimensional examples. Our corresponding output data

Y is a dimension 3 by 1.

Each input value or element in matrix X needs to be multiplied by a

corresponding weight

and then added together with all the other results for each neuron.

This is a complicated operation, but if we take the three outputs we're looking for

as a single row of the matrix

and place all our individual weights into a matrix of weights

we can create the exact behavior we need by multiplying our

input data matrix by our weight matrix. Using matrix multiplication allows us to pass

multiple inputs through at once

by simply adding rows to the matrix X. From here on out will refer to these matrices

X, W1, and Z2

where Z2 is the activity of our second layer.

Notice that each entry in Z is a sum of weighted inputs to each neuron.

Z is a size of three by three: one row for each example

and one column for each hidden unit. We now have our first official

formula: Z2 = X * W1.

Matrix notation is really nice here because it allows us to express the

complex underlying process

in a single line. Now that we have the activities of our second layer, Z2,

we need to apply the activation function. We'll independently apply the

function to each

entry in matrix Z using a Python method for this called sigmoid

because we using a sigmoid as our activation function. Using NumPy is

really nice here

because we can pass in a scalar, vector or matrix. NumPy will apply the

activation function

element-wise and returns a result and the same dimensions as it was given.

We now have our second formula for forward propagation

Using f to denote our activation function,

We can write that a2 (our second layer activity)

is equal to f(Z2).

a2 will be a matrix the same size as Z2.

in this case 3x3

To finish forward propagation

we need to propagate a2 all the way to our output y-hat.

We've already done the heavy lifting in our previous layer

So all we have to do now is to multiply a2

by our second layer weights W2

and apply one more activation function.

W2 is of size 3x1

one weight for each synapse

multiplying a2 (a 3x3)

by W2, a 3x1

results in a 3x1 matrix Z3

(the activity of our third layer).

Z3 has 3 activity values,

one for each example.

Last but not least we'll apply our activation function to Z3

yeilding our official estimate of test scores:

ลท

We need to implement our forward propagation formulas in Python.

First we'll initialize our weight matrices within our __init__ method.

For starting values, we'll use random numbers.

We'll implement forward propagation in our Forward method

using numpy built-in dot method for matrix multiplication and our own sigmoid method.

And there you have it.

A Python class capable of estimating your test score

given how many hours you sleep and how many hours you study.

We can pass in our own input data and get real output,

now you may be noticing that our estimates are quite terrible.

That's because we have not yet trained our network

and that's what we're going to work on next time.