Neural Networks 101

Our human brain can instantly recognise a handwritten digit, like a '3', even if it's poorly written or presented in different styles. We see the abstract concept of "three" regardless of the specific pixel data hitting our eyes.

Now imagine doing all that with a computer, Here comes the concept of neural networks. But before going into that you must know what neurons are first…

What are Neurons?

A neuron is simply a container that holds a number between 0 and 1. This number is called its activation.

An activation of 0 means the neuron is "off" (inactive).
An activation of 1 means the neuron is "on" (fully active).

The Layers of the Network

Now, to understand what exactly this “network” is, you need to know that it consists of 3 layers.

Input Layer: This is where the data comes in.
Hidden Layers: These are the intermediate layers between the input and output. Their job is to process the information in stages. The structure (how many layers and how many neurons in each) is flexible. The example below uses two hidden layers, each with 16 neurons.
Output Layer: This layer gives the final result. The activation of each neuron represents the network's confidence that the input image matches that digit. The neuron with the highest activation is the network's final guess.

The Logic behind this Layered Structure

While reading this, you guys might be wondering: why not just connect the input directly to the output?

But the real power of hidden layers comes from breaking down a complex problem into simpler, hierarchical steps.

The human analogy would be that when you see a '9', your brain doesn't just see some 784 pixels. It sees a loop on top of a vertical line. These components are, in turn, made up of smaller edges.

Layer 1 (Input) → Layer 2 (Hidden): We hope the first hidden layer learns to recognise basic components, like small edges from the raw pixel data.
Layer 2 → Layer 3 (Hidden): The second hidden layer could then learn to combine these edges into more complex shapes, like loops or long lines.
Layer 3 → Layer 4 (Output): The final output layer learns which combination of these shapes corresponds to which digit. For example, (Upper Loop + Vertical Line) = 9.

This layered abstraction allows the network to build complex concepts from simple building blocks, a strategy that is useful for many tasks beyond image recognition (like parsing speech)

But, how does one layer influence the next?

It’s all about the math that combines the activations from the previous layer. Each connection has two tuneable parameters: a weight and a bias.

Weights

Each connection between neurons in adjacent layers has a weight, which is just a number. These weights tell a neuron in the next layer how much importance to give to the activation of each neuron from the previous layer.

Positive weights (excitatory): A high activation in the input neuron will increase the weighted sum for the next neuron.
Negative weights (inhibitory): A high activation in the input neuron will decrease the weighted sum for the next neuron.
Zero weight: The input neuron has no influence.

Example: Edge Detection

Imagine we want a neuron in the first hidden layer to detect a vertical edge in a specific spot. We could set the weights for the connections as follows:

Positive weights for pixels in the area where the edge should be.
Negative weights for the pixels immediately surrounding that area.

This way, the neuron's weighted sum will be highest when the target pixels are white (high activation) and the surrounding pixels are black (low activation).

Bias

After calculating the weighted sum of all input activations, we add a final number called the bias. The bias acts as a threshold, determining how high the weighted sum needs to be before the neuron starts to become meaningfully active.

A high negative bias means the weighted sum must be very large for the neuron to activate.
A bias near zero means the neuron activates more easily.

But, the weighted sum plus the bias can be any number, but a neuron's activation must be between 0 and 1. To achieve this, we pass the result through an activation function.

Activation Function (Sigmoid)

It "squishes" the entire number line into the range between 0 and 1.
Large negative numbers become close to 0.
Large positive numbers become close to 1.

So, the full process for a single neuron is:

Calculate the weighted sum of all activations from the previous layer.
Add the bias.
Apply the sigmoid function to the result.

From all these tiny neurons, learning in a neural network becomes an automated process of improving performance.

Get Training Data: We need a large dataset of examples with correct labels.
Define a "Cost Function": We need a way to measure how wrong the network is. This is called the cost function or "loss function."
Minimise the Cost: We use an algorithm called Gradient Descent to systematically "nudge" the 13,000 weights and biases in a direction that makes the cost lower.

We repeat this process thousands of times, and the network slowly converges towards a state where it performs the task correctly.

Command Palette