# How To Implement A Neural Network In Octave And What Is Back-propagation?

*Abstract: This post is targeting those people who have a basic idea of what neural network is but stuck in implement the program due to not being crystal clear about what is happening under the hood. If you are fresh new to Machine Learning, I suggest you bookmark this post and return to it in the future.*

Try answering these three questions may help you to decide if the reset of this post will help you:

- What is a, z and big Theta?
- Does a or z include the bias node? How about the big Theta?
- What is small delta and big Delta? Are they in the same order?

If you can't be clear about the questions, please continue reading; otherwise, abort.

## What is a Neural Network indeed?

It is no other than something like a logistic regression classifier that helps you to minimize the Theta sets and outputs a result telling your input belongs to which class.

For instance, given a picture of a handwriting "one", the program should output a result says the query belongs to One's class.

As you guys supposed, I will post the core part first.

This example shows a simple three layers neural network with input layer node = 3, hidden layer node = 5 and output layer node = 3.

I draw out only two theta relationships in each big Theta group for simpleness.

You should study this Neural Network "Guidelines" picture with the questions below:

- Is there a layer 0?
- Which layer contains a bias unit?
- Is there a node called a0 in each layer?
- What is z? Is it a vector starts at z1?
- What is the size difference between a and z in the same layer?
- The big Theta belongs to the layer on its left or right?

So these questions are for Feedforward procedure and they are easy to get the answer from the drawings. It is also not hard to implement this part.

## Back-Propagation

The key to solving this in programming is you have to know where to strip off the bias unit that we used in Feedforward. So what is Back-propagation? It is a way to help you calculate the gradients of big Theta faster. It measures the error effect from the output layer and propagates back the affectedness.

For example, back propagate theta1^(3) from a1^(3) should affect all the node paths that connecting from layer 2 to a1^(3).

Note and this is important: when you propagate from layer 2 to layer 1, you should not include the theta from the bias node!

I wrote down the details of the matrix demissions in calculating the whole network. Check them step by step will be helpful.

Just in case you forget how to do matrix multiplication in a row:

## Now Implement It! Please Think carefully if you want to continue!

Some quick steps for you to reference. Pictures snapped from Andrew's class. I also recommend you using their code framework to skip the less important part, which builds up the testing part and make your coding experience no fun if you do it by yourself.