The Full Pipeline โ Every CNN Block in Order
That is exactly what happens in one CNN block โ in that order, every time.
Two complete numericals. Each one goes through Step 1 โ Convolution (every dot product, position by position), Step 2 โ ReLU (zero out every negative), and Step 3 โ Max Pooling (slide a 2ร2 window and keep the maximum). Nothing skipped, nothing assumed.
Numerical 1 โ Full Pipeline: Conv โ ReLU โ Max Pool
Given: A 5ร5 input image and a 3ร3 kernel. No padding. Stride 1 for both conv and pool (2ร2 pool, stride 1).
This kernel has +1 in the left column, 0 in the middle, โ1 in the right column. It subtracts the right side from the left side of every 3ร3 patch โ a classic vertical edge detector. Bright on the left, dark on the right โ large positive output.
① Step 1 โ Convolution (9 dot products)
The 3ร3 kernel slides across the 5ร5 input with stride 1. Every position produces one value. Here are all 9:
= (1+0โ3) + (4+0โ6) + (7+0โ9)
= โ2 + (โ2) + (โ2) = โ6 โ FM[0,0] = โ6
= (2+0+0) + (5+0โ1) + (8+0+0)
= 2 + 4 + 8 = 14 โ FM[0,1] = 14
= (3+0โ1) + (6+0โ2) + (9+0โ3)
= 2 + 4 + 6 = 12 โ FM[0,2] = 12
= (4+0โ6) + (7+0โ9) + (2+0+0)
= โ2 + (โ2) + 2 = โ2 โ FM[1,0] = โ2
= (5+0โ1) + (8+0+0) + (1+0โ4)
= 4 + 8 + (โ3) = 9 โ FM[1,1] = 9
= (6+0โ2) + (9+0โ3) + (0+0โ5)
= 4 + 6 + (โ5) = 5 โ FM[1,2] = 5
= (7+0โ9) + (2+0+0) + (6+0โ2)
= โ2 + 2 + 4 = 4 โ FM[2,0] = 4
= (8+0+0) + (1+0โ4) + (3+0โ1)
= 8 + (โ3) + 2 = 7 โ FM[2,1] = 7
= (9+0โ3) + (0+0โ5) + (2+0+0)
= 6 + (โ5) + 2 = 3 โ FM[2,2] = 3
② Step 2 โ ReLU Activation: max(0, x)
Apply ReLU element-wise. Every negative value becomes 0. Every positive value stays unchanged.
| Position | Conv Output | ReLU Rule | Result |
|---|---|---|---|
| [0,0] | โ6 | โ max(0, โ6) | 0 |
| [0,1] | 14 | โ max(0, 14) | 14 |
| [0,2] | 12 | โ max(0, 12) | 12 |
| [1,0] | โ2 | โ max(0, โ2) | 0 |
| [1,1] | 9 | โ max(0, 9) | 9 |
| [1,2] | 5 | โ max(0, 5) | 5 |
| [2,0] | 4 | โ max(0, 4) | 4 |
| [2,1] | 7 | โ max(0, 7) | 7 |
| [2,2] | 3 | โ max(0, 3) | 3 |
③ Step 3 โ Max Pooling: 2ร2 window, Stride 1
Output size: O = โ(3 โ 2)/1โ + 1 = 2 โ 2ร2 output. Slide the 2ร2 window over the ReLU map:
Input 5ร5 โ Conv (3ร3 kernel, S=1, P=0) โ Feature Map 3ร3
[โ6,14,12 / โ2,9,5 / 4,7,3] โ ReLU
[0,14,12 / 0,9,5 / 4,7,3] โ MaxPool (2ร2, S=1)
โ Final [[14,14],[9,9]].
The two negatives (โ6 and โ2) were killed by ReLU. Max pooling then pulled the strongest
signal (14 โ the edge response) into both top cells.
Numerical 2 โ Different Kernel, Stride 2 Pool
Given: A 4ร4 input image and a 3ร3 sharpening kernel. No padding. Stride 1 conv, then 2ร2 MaxPool with stride 2 (non-overlapping).
① Step 1 โ Convolution (4 dot products on the 4ร4 input)
= (0โ4+0) + (โ5+40โ2) + (0โ3+0)
= โ4 + 33 + (โ3) = 26 โ FM[0,0] = 26
= (0โ1+0) + (โ8+10โ6) + (0โ7+0)
= โ1 + (โ4) + (โ7) = โ12 โ FM[0,1] = โ12
= (0โ8+0) + (โ1+15โ7) + (0โ2+0)
= โ8 + 7 + (โ2) = โ3 โ FM[1,0] = โ3
= (0โ2+0) + (โ3+35โ4) + (0โ5+0)
= โ2 + 28 + (โ5) = 21 โ FM[1,1] = 21
② Step 2 โ ReLU Activation
| Position | Conv Output | ReLU Rule | Result |
|---|---|---|---|
| [0,0] | 26 | โ max(0, 26) | 26 |
| [0,1] | โ12 | โ max(0, โ12) | 0 |
| [1,0] | โ3 | โ max(0, โ3) | 0 |
| [1,1] | 21 | โ max(0, 21) | 21 |
③ Step 3 โ Max Pooling: 2ร2 window, Stride 2
Output size: O = โ(2 โ 2)/2โ + 1 = 1 โ single scalar output. Only one window โ it covers the entire 2ร2 ReLU map:
Input 4ร4 โ Conv (sharpening 3ร3, S=1, P=0)
โ Feature Map 2ร2 [26, โ12, โ3, 21]
โ ReLU โ [26, 0, 0, 21]
โ MaxPool (2ร2, S=2) โ single value 26.
The sharpening kernel amplified the two "high-contrast" patches
(strong neighbours) and suppressed the rest. ReLU removed the two negative
responses. Max pool selected the strongest โ 26.
Side-by-Side Pipeline Summary
| Stage | Numerical 1 (5ร5 input) | Numerical 2 (4ร4 input) |
|---|---|---|
| Input | 5ร5 = 25 values | 4ร4 = 16 values |
| Kernel | 3ร3 vertical edge detector [1,0,โ1 / 1,0,โ1 / 1,0,โ1] |
3ร3 sharpening [0,โ1,0 / โ1,5,โ1 / 0,โ1,0] |
| After Conv | 3ร3 feature map [โ6,14,12 / โ2,9,5 / 4,7,3] |
2ร2 feature map [26,โ12 / โ3,21] |
| Negatives | 2 values (โ6, โ2) | 2 values (โ12, โ3) |
| After ReLU | [0,14,12 / 0,9,5 / 4,7,3] |
[26,0 / 0,21] |
| Pool Config | 2ร2, Stride 1 โ overlapping | 2ร2, Stride 2 โ non-overlapping |
| Final Output | [[14,14],[9,9]] โ 2ร2 |
[[26]] โ single scalar |
Python โ Verify Both Pipelines
import numpy as np
# โโ Convolution (cross-correlation, no flip) โโโโโโโโโโโโโโโโโโ
def conv2d(x, k):
"""No padding, stride 1."""
KH, KW = k.shape
OH, OW = x.shape[0]-KH+1, x.shape[1]-KW+1
out = np.zeros((OH, OW))
for i in range(OH):
for j in range(OW):
out[i, j] = np.sum(x[i:i+KH, j:j+KW] * k)
return out
# โโ ReLU โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def relu(x):
return np.maximum(0, x)
# โโ Max Pool โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
def max_pool(x, pool=2, stride=2):
OH = (x.shape[0] - pool) // stride + 1
OW = (x.shape[1] - pool) // stride + 1
out = np.zeros((OH, OW))
for i in range(OH):
for j in range(OW):
out[i, j] = x[i*stride:i*stride+pool, j*stride:j*stride+pool].max()
return out
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# NUMERICAL 1 โ 5ร5 input, vertical edge kernel
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
inp1 = np.array([
[1,2,3,0,1],
[4,5,6,1,2],
[7,8,9,0,3],
[2,1,0,4,5],
[6,3,2,1,0]
])
k1 = np.array([[1,0,-1],[1,0,-1],[1,0,-1]])
fm1 = conv2d(inp1, k1)
r1 = relu(fm1)
p1 = max_pool(r1, pool=2, stride=1)
print("N1 Feature Map:\n", fm1)
print("N1 After ReLU:\n", r1)
print("N1 Max Pool output:\n", p1)
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
# NUMERICAL 2 โ 4ร4 input, sharpening kernel
# โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
inp2 = np.array([
[2,4,1,3],
[5,8,2,6],
[1,3,7,4],
[0,2,5,9]
])
k2 = np.array([[0,-1,0],[-1,5,-1],[0,-1,0]])
fm2 = conv2d(inp2, k2)
r2 = relu(fm2)
p2 = max_pool(r2, pool=2, stride=2)
print("N2 Feature Map:\n", fm2)
print("N2 After ReLU:\n", r2)
print("N2 Max Pool output:\n", p2)