Deep Learning From Scratch - Theory and Implementation
Open Source Your Knowledge, Become a Contributor
Technology knowledge has to be shared and made accessible for free. Join the movement.
Multi-layer perceptrons
Motivation
So now we are able to train linear classifiers of arbitrary dimensionality automatically. However, many real-world classes are not linearly separable. This means that there does not exist any line with all the points of the first class on one side of the line and all the points of the other class on the other side. Let's illustrate this with an example.
As we can see, it is impossible to draw a line that separates the blue points from the red points. Instead, our decision boundary has to have a rather complex shape. This is where multi-layer perceptrons come into play: They allow us to train a decision boundary of a more complex shape than a straight line.
Computational graph
As their name suggests, multi-layer perceptrons (MLPs) are composed of multiple perceptrons stacked one after the other in a layer-wise fashion. Let's look at a visualization of the computational graph:
As we can see, the input is fed into the first layer, which is a multidimensional perceptron with a weight matrix and bias vector
an MLP with one hidden layers computes the function
an MLP with two hidden layers computes the function
and, generally, an MLP with
Implementation
Using the library we have built, we can now easily implement multi-layer perceptrons without further work.
As we can see, we have learned a rather complex decision boundary. If we use more layers, the decision boundary can become arbitrarily complex, allowing us to learn classification patterns that are impossible to spot by a human being, especially in higher dimensions.
Recap
Congratulations on making it this far! You have learned the foundations of building neural networks from scratch, and in contrast to most machine learning practitioners, you now know how it all works under the hood and why it is done the way it is done.
Let's recap what we have learned. We started out by considering computational graphs in general, and we saw how to build them and how to compute their output. We then moved on to describe perceptrons, which are linear classifiers that assign a probability to each output class by squashing the output of
Next steps
You now know all the fundamentals for training arbitrary neural networks. As a next step, you should learn about the following topics (Google is your friend):
- The difference between training loss and test loss
- Overfitting and underfitting
- Regularization and early stopping
- Dropout
- Convolutional neural networks
- Recurrent neural networks
- Autoencoders
- Deep Generative Models
All of these topics are dealt with in the book "Deep Learning" by Ian Goodfellow, Yoshua Bengio and Aaron Courville, which I highly recommend everyone to read. A free online version of the book can be found at http://www.deeplearningbook.org/.