Softmax and Neural Network Classification

1. Softmax Function: Definition

The softmax function converts a vector of real numbers into a probability distribution.

For a vector z = [z1, z2, ..., zn], softmax is defined as:

softmax(zi) = e^(zi) / Σ e^(zj)

2. Properties of Softmax

(a) Monotonicity

If an input increases, its softmax probability increases.
If a > b, then softmax(a) > softmax(b).

(b) Not Scale Invariant

Softmax does not preserve input ratios.

Example:

Input ratio: 1 / 2 = 0.5
Softmax output ratio: approx 0.3678

(c) Outputs Sum to 1

The output of softmax is always a probability distribution.

3. Softmax in PyTorch

PyTorch implements softmax as:

softmax = nn.Softmax(dim=1)

The dim argument specifies the dimension along which softmax is applied. For batched data, dim=1 applies softmax to each row (each sample).

x = torch.tensor([[1.0, 2.0, 3.0],
                  [1.0, 2.0, 3.0]])
softmax(x)

4. Using Softmax in a Neural Network

A simple binary classifier (e.g., airplane vs bird):

model = nn.Sequential(
    nn.Linear(3072, 512),
    nn.Tanh(),
    nn.Linear(512, 2),
    nn.Softmax(dim=1)
)

Softmax at the end converts the model's raw outputs (logits) into class probabilities.

5. Preparing Image Data

CIFAR-10 images have shape: 3 × 32 × 32 = 3072.

Neural networks expect:

Flattened input vectors
A batch dimension

To prepare the image:

img_batch = img.view(-1).unsqueeze(0)

6. Running an Untrained Model

out = model(img_batch)

Example output:

tensor([[0.4784, 0.5216]])

These are probabilities but meaningless until the model is trained.

7. Class Predictions

The predicted class is the index of the highest probability.

_, index = torch.max(out, dim=1)

Index 0 might be "airplane"; index 1 might be "bird". This mapping is learned through training labels.

Neural networks often become overconfident after training. Bayesian neural networks can estimate uncertainty better.

Summary

Softmax converts logits into probabilities.
It is monotonic but not scale-invariant.
Use nn.Softmax(dim=1) for batched PyTorch data.
Images must be flattened and batched before feeding into a model.
The highest softmax value gives the class prediction.
Training assigns meaning to output units (class labels).

Search This Blog