The Network β Read the Diagram
The image above shows a neural network with 2 inputs β 2 hidden neurons β 1 output. Below is an exact reproduction with all weights labelled. We will solve this network completely β forward pass, loss, then full backpropagation β and the animated player lets you step through each computation exactly as you would on paper.
Inputs: xβ = 0.35, xβ = 0.7
Layer 1 weights: wβ,β = 0.2 (xββhβ), wβ,β = 0.2 (xββhβ),
wβ,β = 0.3 (xββhβ), wβ,β = 0.3 (xββhβ)
Layer 2 weights: wβ,β = 0.3 (hββoβ), wβ,β = 0.9 (hββoβ)
Activation: Sigmoid | Target y = 1.0 | Loss: MSE
Interactive Animated Step-Through
Press βΆ Auto Play to watch the computation animate, or use β β to step through manually at your own pace. Each step shows the exact formula and result as you would write it on paper.
Forward Pass β Complete Paper-Style Solution
Here is every calculation written out exactly as you would show it in an exam or on paper. No shortcuts. No skipping. Every intermediate value stated explicitly.
① Hidden Layer β Neuron hβ
z_h1 = (0.2)(0.35) + (0.2)(0.70)
z_h1 = 0.0700 + 0.1400 = 0.2100
a_h1 = 1 / (1 + 0.8106) = 1 / 1.8106 = 0.5523
② Hidden Layer β Neuron hβ
z_h2 = (0.3)(0.35) + (0.3)(0.70)
z_h2 = 0.1050 + 0.2100 = 0.3150
a_h2 = 1 / (1 + 0.7298) = 1 / 1.7298 = 0.5781
③ Output Layer β Neuron oβ
z_o3 = (0.3)(0.5523) + (0.9)(0.5781)
z_o3 = 0.1657 + 0.5203 = 0.6860
Ε· = 1 / (1 + 0.5037) = 1 / 1.5037 = 0.6651
L = Β½(β0.3349)Β² = Β½ Γ 0.1122 = 0.0561
| Neuron | Input Sum (z) | Activation Ο(z) | Note |
|---|---|---|---|
| hβ | 0.2100 | 0.5523 | First hidden neuron |
| hβ | 0.3150 | 0.5781 | Second hidden neuron |
| oβ | 0.6860 | 0.6651 | Prediction Ε· |
| Loss L | 0.0561 | Β½(Ε· β 1.0)Β² | |
Backward Pass β Full Chain Rule Derivation
Ο'(z) = Ο(z) Γ (1 β Ο(z))
This means you never need to recompute eβ»αΆ» β just reuse the stored activation value.
For any neuron with activation a: Ο'(z) = a Γ (1 β a)
① Output Error Signal Ξ΄_o3
dL/dΕ· = Ε· β y = 0.6651 β 1.0 = β0.3349
Ο'(z_o3) = a_o3 Γ (1 β a_o3) = 0.6651 Γ (1 β 0.6651)
= 0.6651 Γ 0.3349 = 0.2228
Ξ΄_o3 = dL/dΕ· Γ Ο'(z_o3) = (β0.3349) Γ 0.2228 = β0.074617
② Output-Layer Weight Gradients
③ Propagate Error to Hidden Layer
④ Input-Layer Weight Gradients (All 4 weights)
Weight Update β Before & After (Ξ· = 0.5)
Rule: W_new = W_old β Ξ· Γ (dL/dW) Applied to all 6 weights simultaneously.
| Weight | Connection | Old Value | Gradient | Ξ· Γ Gradient | New Value | Change |
|---|---|---|---|---|---|---|
| wβ,β | xβ β hβ | 0.2000 | β0.001938 | β0.000969 | 0.2010 | β +0.0010 |
| wβ,β | xβ β hβ | 0.2000 | β0.003875 | β0.001938 | 0.2019 | β +0.0019 |
| wβ,β | xβ β hβ | 0.3000 | β0.005733 | β0.002867 | 0.3029 | β +0.0029 |
| wβ,β | xβ β hβ | 0.3000 | β0.011466 | β0.005733 | 0.3057 | β +0.0057 |
| wβ,β | hβ β oβ | 0.3000 | β0.041212 | β0.020606 | 0.3206 | β +0.0206 |
| wβ,β | hβ β oβ | 0.9000 | β0.043140 | β0.021570 | 0.9216 | β +0.0216 |
Since Ε· = 0.665 was below the target y = 1.0, the network needed to predict higher. All gradients are negative, so subtracting them (W β Ξ· Γ negative) makes all weights increase. A larger network output on the next forward pass β exactly what we needed. Gradient descent is working correctly.
Python Verification β All Numbers Confirmed
import numpy as np
# ββ Network from the diagram ββββββββββββββββββββββββββββββ
x1, x2 = 0.35, 0.70
w11, w21 = 0.2, 0.2 # to h1
w12, w22 = 0.3, 0.3 # to h2
w13, w23 = 0.3, 0.9 # to o3
y = 1.0
lr = 0.5
def sig(z): return 1 / (1 + np.exp(-z))
def sigD(z): s = sig(z); return s * (1 - s)
# ββ FORWARD PASS ββββββββββββββββββββββββββββββββββββββββββ
z_h1 = w11*x1 + w21*x2 # 0.21
a_h1 = sig(z_h1)
z_h2 = w12*x1 + w22*x2 # 0.315
a_h2 = sig(z_h2)
z_o3 = w13*a_h1 + w23*a_h2
a_o3 = sig(z_o3) # Ε·
loss = 0.5 * (a_o3 - y)**2
print("=== FORWARD PASS ===")
print(f"z_h1 = {z_h1:.4f} a_h1 = {a_h1:.4f}")
print(f"z_h2 = {z_h2:.4f} a_h2 = {a_h2:.4f}")
print(f"z_o3 = {z_o3:.4f} y_hat = {a_o3:.4f}")
print(f"Loss = {loss:.4f}")
# ββ BACKWARD PASS βββββββββββββββββββββββββββββββββββββββββ
dL_do3 = a_o3 - y # dL/dΕ·
d_o3 = dL_do3 * sigD(z_o3) # Ξ΄_o3
dW13 = d_o3 * a_h1
dW23 = d_o3 * a_h2
dL_ah1 = d_o3 * w13
dL_ah2 = d_o3 * w23
d_h1 = dL_ah1 * sigD(z_h1) # Ξ΄_h1
d_h2 = dL_ah2 * sigD(z_h2) # Ξ΄_h2
dW11 = d_h1 * x1
dW21 = d_h1 * x2
dW12 = d_h2 * x1
dW22 = d_h2 * x2
print("\n=== BACKWARD PASS ===")
print(f"Ξ΄_o3 = {d_o3:.6f}")
print(f"dW13 = {dW13:.6f} dW23 = {dW23:.6f}")
print(f"Ξ΄_h1 = {d_h1:.6f} Ξ΄_h2 = {d_h2:.6f}")
print(f"dW11 = {dW11:.6f} dW21 = {dW21:.6f}")
print(f"dW12 = {dW12:.6f} dW22 = {dW22:.6f}")
# ββ WEIGHT UPDATES (Ξ· = 0.5) βββββββββββββββββββββββββββββββ
print("\n=== UPDATED WEIGHTS ===")
print(f"w11: {w11:.4f} β {w11 - lr*dW11:.4f}")
print(f"w21: {w21:.4f} β {w21 - lr*dW21:.4f}")
print(f"w12: {w12:.4f} β {w12 - lr*dW12:.4f}")
print(f"w22: {w22:.4f} β {w22 - lr*dW22:.4f}")
print(f"w13: {w13:.4f} β {w13 - lr*dW13:.4f}")
print(f"w23: {w23:.4f} β {w23 - lr*dW23:.4f}")
Paper-Exam Cheat-Sheet β The 8-Step Recipe
Z: compute pre-activation z |
A: activate β get a |
S: sum across layer |
A: again for next layer |
Ξ: delta at output |
W: weight gradients |
W: propagate delta back |
U: update weights
Say it out loud and you will never forget the order of operations.