The Story: A Whisper Telephone Through Many Rooms
By the last room, the original pixels have been transformed into something much more abstract: a sentence of probabilities โ "90% cat, 7% fox, 3% dog."
That journey โ input โ weighted sums โ activations โ output โ is forward propagation. Nothing learns yet. It is pure, deterministic arithmetic flowing in one direction.
Forward propagation is the process of passing an input through every layer of a neural network โ computing weighted sums and applying activations โ to produce a final prediction. No weights change during the forward pass.
The Computation Graph โ Animated Flow
Each layer is a station. Data flows strictly left โ right. Every station performs two operations: an affine transformation and an activation. The graph below animates the full forward pass.
The Four Core Operations
Numerical 1 โ Single Neuron, One Layer
A neuron receives inputs x = [2, 3]แต, weights W = [0.5, โ0.4], bias b = 1. Activation: ReLU.
Numerical 2 โ Full 2-Layer Network + Softmax
Input: 2 neurons | Hidden: 2 neurons (ReLU) | Output: 2 neurons (Softmax) โ binary classification.
zยนโ = 0.3ร1 + 0.4ร2 = 0.3 + 0.8 = 1.1
โด zยน = [0.5, 1.1]แต
aยนโ = ReLU(1.1) = 1.1
โด aยน = [0.5, 1.1]แต (both positive, unchanged)
zยฒโ = (โ0.1)ร0.5 + 0.6ร1.1 = โ0.05 + 0.66 = 0.61
โด zยฒ = [โ0.08, 0.61]แต
ลทโ = 0.923 รท 2.763 โ 0.334 โ 33.4%
ลทโ = 1.840 รท 2.763 โ 0.666 โ 66.6%
โ Sum = 1.000 โ valid probability distribution
The network predicts Class 1 with 66.6% confidence. These are random weights โ no learning has happened yet. Backpropagation will later adjust Wยน, Wยฒ, bยน, bยฒ to improve this output.
Python Implementation
import numpy as np
# โโ Inputs and Weights โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
x = np.array([1, 2], dtype=float)
W1 = np.array([[0.1, 0.2],
[0.3, 0.4]])
b1 = np.zeros(2)
W2 = np.array([[ 0.5, -0.3],
[-0.1, 0.6]])
b2 = np.zeros(2)
# โโ Activation helpers โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def relu(z):
return np.maximum(0, z)
def softmax(z):
e = np.exp(z - np.max(z)) # subtract max for numerical stability
return e / e.sum()
# โโ Forward Propagation โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
z1 = W1 @ x + b1 # Layer 1 affine
a1 = relu(z1) # Layer 1 activation
z2 = W2 @ a1 + b2 # Layer 2 affine
y_hat = softmax(z2) # Softmax output
print(f"z1 = {z1}")
print(f"a1 = {a1}")
print(f"z2 = {z2}")
print(f"y_hat = {y_hat}")
print(f"Pred = Class {np.argmax(y_hat)}")
Golden Rules
e^(z โ max(z)). This prevents numerical overflow with zero effect
on the final probabilities.