Why Neural Networks Need Activation Functions
Without activation functions, neural networks can only learn linear relationships. Linear functions are straight lines, for example:
y = 2x + 3
Stacking Linear Layers
If we have two layers without activation functions:
Layer 1: z₁ = w₁ * x + b₁ Layer 2: z₂ = w₂ * z₁ + b₂ = w₂ * (w₁ * x + b₁) + b₂ z₂ = (w₂ * w₁) * x + (w₂ * b₁ + b₂)
Notice that z₂ is still linear. Stacking more linear layers does not create curves or complex patterns.
Introducing Activation Functions
Activation functions are non-linear functions applied to each layer’s output, allowing the network to learn curves:
- Sigmoid: σ(x) = 1 / (1 + e-x)
- ReLU: ReLU(x) = max(0, x)
- Tanh: tanh(x) = (ex - e-x) / (ex + e-x)
Summary
Layers without activation = stacking flat cardboard sheets → still flat.
Layers with activation = stacking flexible rubber sheets → can form curves and complex patterns.
Activation functions introduce non-linearity to neural networks, enabling them to approximate complex patterns that linear layers alone cannot capture.