1. What is Categorical Cross-Entropy (CCE)?
Categorical Cross-Entropy is used when the model predicts more than two classes (multi-class classification).
Examples:
- Image classification (cat, dog, horse)
- Digit recognition (0–9)
- News category classification
Formula:
Loss = - Σ yi log(pi)
Where:
- yi = actual class (one-hot encoded)
- pi = predicted probability for class i
Example:
Cat Dog Horse 0 1 0
Model prediction:
Cat Dog Horse 0.1 0.8 0.1
Loss becomes small because the correct class probability is high.
Usually used with Softmax activation.
2. What is Binary Cross-Entropy (BCE)?
Binary Cross-Entropy is a loss function used when the model predicts two classes only (binary classification).
Examples:
- Spam vs Not Spam
- Fraud vs Not Fraud
- Disease vs No Disease
Formula:
Loss = -[y log(p) + (1-y) log(1-p)]
Where:
- y = actual label (0 or 1)
- p = predicted probability
Explanation:
It measures how far the predicted probability is from the true binary label.
Example:
Actual = 1 (spam)
Model predicts = 0.9
Loss will be small because prediction is close.
Commonly used with Sigmoid activation.
3. Key Differences
| Feature | Binary Cross-Entropy | Categorical Cross-Entropy |
|---|---|---|
| Number of classes | 2 classes | 3 or more classes |
| Output neurons | 1 | Multiple (equal to number of classes) |
| Activation function | Sigmoid | Softmax |
| Label format | 0 or 1 | One-hot encoded |
| Example | Spam detection | Image classification |
4. Summary
Binary Cross-Entropy is used for binary classification problems and works with sigmoid output.
Categorical Cross-Entropy is used for multi-class classification problems and typically works with softmax output where labels are one-hot encoded.