Softmax and Neural Network Classification
1. Softmax Function: Definition
The softmax function converts a vector of real numbers into a probability distribution.
For a vector z = [z1, z2, ..., zn], softmax is defined as:
softmax(zi) = e^(zi) / Σ e^(zj)
2. Properties of Softmax
(a) Monotonicity
- If an input increases, its softmax probability increases.
- If
a > b, thensoftmax(a) > softmax(b).
(b) Not Scale Invariant
Softmax does not preserve input ratios.
Example:
- Input ratio:
1 / 2 = 0.5 - Softmax output ratio: approx
0.3678
(c) Outputs Sum to 1
The output of softmax is always a probability distribution.
3. Softmax in PyTorch
PyTorch implements softmax as:
softmax = nn.Softmax(dim=1)
The dim argument specifies the dimension along which softmax is applied.
For batched data, dim=1 applies softmax to each row (each sample).
x = torch.tensor([[1.0, 2.0, 3.0],
[1.0, 2.0, 3.0]])
softmax(x)
4. Using Softmax in a Neural Network
A simple binary classifier (e.g., airplane vs bird):
model = nn.Sequential(
nn.Linear(3072, 512),
nn.Tanh(),
nn.Linear(512, 2),
nn.Softmax(dim=1)
)
Softmax at the end converts the model's raw outputs (logits) into class probabilities.
5. Preparing Image Data
CIFAR-10 images have shape: 3 × 32 × 32 = 3072.
Neural networks expect:
- Flattened input vectors
- A batch dimension
To prepare the image:
img_batch = img.view(-1).unsqueeze(0)
6. Running an Untrained Model
out = model(img_batch)
Example output:
tensor([[0.4784, 0.5216]])
These are probabilities but meaningless until the model is trained.
7. Class Predictions
The predicted class is the index of the highest probability.
_, index = torch.max(out, dim=1)
Index 0 might be "airplane"; index 1 might be "bird". This mapping is learned through training labels.
Neural networks often become overconfident after training. Bayesian neural networks can estimate uncertainty better.
Summary
- Softmax converts logits into probabilities.
- It is monotonic but not scale-invariant.
- Use
nn.Softmax(dim=1)for batched PyTorch data. - Images must be flattened and batched before feeding into a model.
- The highest softmax value gives the class prediction.
- Training assigns meaning to output units (class labels).