We input a fruit: Apple (Color = Red, Size = Small). Random Forest predicts its class by sending it through multiple decision trees and taking a majority vote.
How Random Forest Works Internally
How Random Forest Differs from General ML Models
- General ML: Learns complex function from all training data (e.g., neural network weights, linear regression coefficients).
- Random Forest: Builds multiple simple decision trees on random subsets of data and features.
- General ML models often have continuous parameters and internal layers/tensors.
- Random Forest has no layers or tensors; its “learning” is simple rules at each node (splits) and aggregation by majority vote or averaging.
- Random Forest uses bootstrap sampling and random feature selection to reduce overfitting and improve generalization.
- Prediction in Random Forest = aggregate of simple tree predictions, while general ML models compute a complex mapping function.
1. Overview
Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve predictive accuracy and reduce overfitting.
- For classification: output is the majority vote of the trees.
- For regression: output is the average of the trees’ predictions.
Mathematically, it can be expressed as aggregating outputs of multiple trees.
2. Decision Tree Basics
A single decision tree partitions the feature space recursively:
- Let the input vector be
𝑥 = (x₁, x₂, ..., xₚ) - Tree splits create regions
Rₘin feature space. - Prediction in region Rₘ:
Regression: ŷ = (1 / Nₘ) Σi ∈ Rₘ yᵢ
Classification: ŷ = mode({yᵢ: i ∈ Rₘ})
Where Nₘ is the number of samples in region Rₘ.
3. Random Forest Prediction
Assume we have B trees {T₁, T₂, ..., T_B}:
Regression:
ŷ_RF(x) = (1 / B) Σb=1 to B T_b(x)
Classification:
ŷ_RF(x) = majority_vote { T₁(x), T₂(x), ..., T_B(x) }
Where T_b(x) is the prediction of the b-th tree.
4. Bootstrap Aggregation (Bagging)
- Each tree is trained on a bootstrap sample (random sample with replacement).
- Reduces variance by decorrelating trees.
If the training set is D = { (xᵢ, yᵢ) } for i = 1..N
Bootstrap sample for tree b: D_b ~ Uniform sample with replacement from D
5. Random Feature Selection
At each split, instead of using all p features, a random subset of m << p features is considered:
- Reduces correlation between trees.
- Split selection:
Choose feature j* = argmax (Information Gain or Gini Reduction)
from random subset of m features
6. Out-of-Bag (OOB) Error
- For each sample, some trees did not include it in their bootstrap set.
- OOB prediction for sample i:
ŷ_OOB,i = (1 / |B_i|) Σb ∈ B_i T_b(x_i)
Where B_i = set of trees where i was not included in training.
OOB error estimates generalization error without a separate validation set.
Random Forest = Bagging + Random Feature Selection
- Build
Btrees on bootstrap samples. - At each split, select the best split from a random subset of features.
- Predict by averaging (regression) or majority vote (classification).
- OOB samples estimate generalization error.
Random Forest Example with Dummy Dataset
Let’s break Random Forest down with a small, simple dataset for classification.
1. Dummy Dataset
Suppose we have a dataset of fruits with features Color and Size, and we want to predict if the fruit is Apple or Orange.
| Fruit | Color | Size |
|---|---|---|
| 1 | Red | Small |
| 2 | Red | Large |
| 3 | Orange | Large |
| 4 | Orange | Small |
| 5 | Red | Small |
| 6 | Orange | Large |
Notes: Color → Red/Orange, Size → Small/Large, Target → Apple (Red) or Orange
2. Step 1: Bootstrap Sampling
Random Forest trains each tree on a random sample with replacement. Example Tree 1 bootstrap sample:
| Fruit | Color | Size |
|---|---|---|
| 1 | Red | Small |
| 2 | Red | Large |
| 5 | Red | Small |
| 6 | Orange | Large |
| 3 | Orange | Large |
Some rows may be missed and some may repeat.
3. Step 2: Random Feature Selection
At each split, Random Forest randomly selects a feature instead of using all features.
- Suppose Tree 1 chooses
Colorfirst → split Red vs Orange. - Next, Tree 1 might consider
Sizein each branch.
4. Step 3: Build the Tree
Color?
/ \
Red Orange
/ \ / \
Small Large Large Small
Apple Apple Orange Orange
Leaves give predicted class based on majority vote.
5. Step 4: Build More Trees
Random Forest builds multiple trees with different bootstrap samples and random features.
Example predictions for new fruit (Red, Small):
| Tree | Prediction |
|---|---|
| 1 | Apple |
| 2 | Apple |
| 3 | Orange |
6. Step 5: Aggregate Predictions
- Classification: majority vote →
Apple - Regression: average of tree outputs.
Another Example
Key Points
- Random Forest uses multiple decision trees.
- Each tree sees a different random sample.
- Each tree considers a random subset of features at each split.
- Predictions are aggregated: majority vote (classification) or average (regression).
- This reduces overfitting compared to a single tree.
Dataset
| Fruit | Color | Size | Target |
|---|---|---|---|
| 1 | Red | Small | Apple |
| 2 | Red | Large | Apple |
| 3 | Orange | Large | Orange |
| 4 | Orange | Small | Orange |
| 5 | Red | Small | Apple |
| 6 | Orange | Large | Orange |
We want to predict the fruit type based on Color and Size.
Step 1: Bootstrap Sampling
- Tree 1 sample: 1, 2, 5, 6, 3
- Tree 2 sample: 2, 3, 4, 5, 6
- Tree 3 sample: 1, 3, 3, 4, 5
Notice some fruits repeat and some are missing.
Step 2: Build Trees with Random Feature Selection
At each split, Random Forest chooses a random subset of features.
Tree 1
Color?
/ \
Red Orange
/ \ / \
Small Large Large Small
Apple Apple Orange Orange
Tree 2
Size?
/ \
Small Large
/ \ / \
Red Orange Red Orange
Apple Orange Apple Orange
Tree 3
Color?
/ \
Orange Red
/ \ / \
Small Large Small Large
Orange Orange Apple Apple
Step 3: Predict a New Sample
New fruit: Color = Red, Size = Small
- Tree 1 predicts: Apple
- Tree 2 predicts: Apple
- Tree 3 predicts: Apple
Majority vote → Apple
Step 4: How Randomness Helps
- Bootstrap: Trees see different samples → reduces overfitting.
- Random features: Trees are less correlated → improves generalization.
Step 5: Aggregate Prediction
- Classification: majority vote → Apple
- Regression: average of tree outputs