Today, neural networks are incredibly accessible as a method of leveraging artificial intelligence (AI) thanks to an abundance of available toolsets that facilitate virtually every phase of building and training. There are even tools that literally require no programming such as Liner or Teachable Machines. But using these tools, even popular frameworks such as PyTorch and Tensorflow, hinders users from tackling unsolved problems or even really properly understanding what is going on under the hood.
At least, the latter was exactly the case for me.
There are millions of tutorials and courses on neural networks that teach how to train them using popular frameworks, but all the ones that I encountered never delved into how they worked and why they worked that way. Even resources that cover the theory part often failed to propagate the a-ha! moment in my brain.
Something about that felt wrong; it may very well be my ADHD, but I could never wrap my head around how neural networks worked the way they did. I felt like I didn’t get the bigger picture. I was really stressed out by the fact that I couldn’t get the full picture without scraping bits of information from every corner of the internet and other academic resources—and still be clueless (and frankly my AI introductory class at uni only covered the surface).
But that is no longer the case—or at least, I think I have a better idea of what neural networks are. And now I think I also see why neural networks work like magic. So as an attempt to double-check my understanding, I shall explain here how neural networks work.
What is a Neural Network? 🔗
An artificial neural network, or neural network in short, is a method within machine learning that is characterized by its similary and inspiration to biological neural networks (for example the human brain and its neurons). They consist of layers upon layers of individual neurons that activate based on the calcluation of a function from a certain input (or batches of inputs).
Weights and Biases 🔗
Each input has a weight associated with it, and each neuron has a bias associated with it.
Weights and biases are merely parameters within the function that influence the outcome of said function’s output. This function is characterized by:
$$f(x) = wx + b$$
where x is the input, w is the weight and b is the bias. As one can observe, the function that determines the output of a neuron is a linear function, like the ones we learned in secondary school. Evidently w is the slope, and b is the y-intercept.
A neuron “fires” when a certain output is achieved. Combinations of particular neurons “firing” results in specific outputs. In a neural network, to determine whether a single neuron will activate or not (like the neurons in the human brain), a certain function must be used to determine whether it will do so or not; These functions are called activation functions, and there are a several of them depending on their use.
Neuron layers residing in between the input layer and the output layer (“hidden layers”) have activation functions that indicate the “rate” of activation. They also help map the function that will produce the desired outputs. Rectified Linear Unit (ReLU) is a typical example of an activation function used for the hidden layers.
$$ y = \begin{cases} x &\text{if } x > 0 \ 0 &\text{if } x \leq 0 \end{cases} $$
As for the output layer, we use a different kind of activation function, typically something like Softmax, to calculate a confidence score (in a case of a classification neural network). The confidence score and the accuracy (how often the highest confidence is actually the correct answer) are used to evaluate the performance of the neural network.
$$ S_{i,j} = \frac{e^{z_{i,j}}}{\sum_{l=1}^L e^{z_{i,j}}} $$
Loss and Optimization 🔗
But how does a neural network make itself accurate? To determine “how wrong” the NN is, we calculate the loss of the neural network, for example a categorical cross-entropy loss (used with Softmax). Based on this loss, the neural network gets optimized using methods such as backpropagation.
$$ L_{i} = -\sum_{j} y_{i,j}\ln{(\hat{y}_{i,j})} $$
“Training” or “fitting” a neural network, therefore, means to tune the weights and biases of individual neurons in a network so that it produces the desired outputs for a given scenario.
This is a summary of what I’ve learned so far using the following (amazing) resources:
Kinsley, H., & Kukieła, D. (2020). Neural Networks from Scratch in Python. https://nnfs.io
Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com/index.html