Random Forest Optimization

We input a fruit: Apple (Color = Red, Size = Small). Random Forest predicts its class by sending it through multiple decision trees and taking a majority vote.

How Random Forest Works Internally

How Random Forest Differs from General ML Models

General ML: Learns complex function from all training data (e.g., neural network weights, linear regression coefficients).
Random Forest: Builds multiple simple decision trees on random subsets of data and features.
General ML models often have continuous parameters and internal layers/tensors.
Random Forest has no layers or tensors; its “learning” is simple rules at each node (splits) and aggregation by majority vote or averaging.
Random Forest uses bootstrap sampling and random feature selection to reduce overfitting and improve generalization.
Prediction in Random Forest = aggregate of simple tree predictions, while general ML models compute a complex mapping function.

1. Overview

Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve predictive accuracy and reduce overfitting.

For classification: output is the majority vote of the trees.
For regression: output is the average of the trees’ predictions.

Mathematically, it can be expressed as aggregating outputs of multiple trees.

2. Decision Tree Basics

A single decision tree partitions the feature space recursively:

Let the input vector be 𝑥 = (x₁, x₂, ..., xₚ)
Tree splits create regions Rₘ in feature space.
Prediction in region Rₘ:

Regression: ŷ = (1 / Nₘ) Σ_{i ∈ Rₘ} yᵢ
Classification: ŷ = mode({yᵢ: i ∈ Rₘ})

Where Nₘ is the number of samples in region Rₘ.

3. Random Forest Prediction

Assume we have B trees {T₁, T₂, ..., T_B}:

Regression:
ŷ_RF(x) = (1 / B) Σ_{b=1 to B} T_b(x)

Classification:
ŷ_RF(x) = majority_vote { T₁(x), T₂(x), ..., T_B(x) }

Where T_b(x) is the prediction of the b-th tree.

4. Bootstrap Aggregation (Bagging)

Each tree is trained on a bootstrap sample (random sample with replacement).
Reduces variance by decorrelating trees.

If the training set is D = { (xᵢ, yᵢ) } for i = 1..N
Bootstrap sample for tree b: D_b ~ Uniform sample with replacement from D

5. Random Feature Selection

At each split, instead of using all p features, a random subset of m << p features is considered:

Reduces correlation between trees.
Split selection:

Choose feature j* = argmax (Information Gain or Gini Reduction) 
from random subset of m features

6. Out-of-Bag (OOB) Error

For each sample, some trees did not include it in their bootstrap set.
OOB prediction for sample i:

ŷ_OOB,i = (1 / |B_i|) Σ_{b ∈ B_i} T_b(x_i)

Where B_i = set of trees where i was not included in training.

OOB error estimates generalization error without a separate validation set.

Random Forest = Bagging + Random Feature Selection

Build B trees on bootstrap samples.
At each split, select the best split from a random subset of features.
Predict by averaging (regression) or majority vote (classification).
OOB samples estimate generalization error.

Random Forest Example with Dummy Dataset

Let’s break Random Forest down with a small, simple dataset for classification.

1. Dummy Dataset

Suppose we have a dataset of fruits with features Color and Size, and we want to predict if the fruit is Apple or Orange.

Fruit	Color	Size
1	Red	Small
2	Red	Large
3	Orange	Large
4	Orange	Small
5	Red	Small
6	Orange	Large

Notes: Color → Red/Orange, Size → Small/Large, Target → Apple (Red) or Orange

2. Step 1: Bootstrap Sampling

Random Forest trains each tree on a random sample with replacement. Example Tree 1 bootstrap sample:

Fruit	Color	Size
1	Red	Small
2	Red	Large
5	Red	Small
6	Orange	Large
3	Orange	Large

Some rows may be missed and some may repeat.

3. Step 2: Random Feature Selection

At each split, Random Forest randomly selects a feature instead of using all features.

Suppose Tree 1 chooses Color first → split Red vs Orange.
Next, Tree 1 might consider Size in each branch.

4. Step 3: Build the Tree

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
Apple Apple Orange Orange

Leaves give predicted class based on majority vote.

5. Step 4: Build More Trees

Random Forest builds multiple trees with different bootstrap samples and random features.

Example predictions for new fruit (Red, Small):

Tree	Prediction
1	Apple
2	Apple
3	Orange

6. Step 5: Aggregate Predictions

Classification: majority vote → Apple
Regression: average of tree outputs.

Another Example

Key Points

Random Forest uses multiple decision trees.
Each tree sees a different random sample.
Each tree considers a random subset of features at each split.
Predictions are aggregated: majority vote (classification) or average (regression).
This reduces overfitting compared to a single tree.

Dataset

Fruit	Color	Size	Target
1	Red	Small	Apple
2	Red	Large	Apple
3	Orange	Large	Orange
4	Orange	Small	Orange
5	Red	Small	Apple
6	Orange	Large	Orange

We want to predict the fruit type based on Color and Size.

Step 1: Bootstrap Sampling

Tree 1 sample: 1, 2, 5, 6, 3
Tree 2 sample: 2, 3, 4, 5, 6
Tree 3 sample: 1, 3, 3, 4, 5

Notice some fruits repeat and some are missing.

Step 2: Build Trees with Random Feature Selection

At each split, Random Forest chooses a random subset of features.

Tree 1

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
 Apple  Apple Orange Orange

Tree 2

         Size?
       /       \
     Small     Large
    /   \       /    \
  Red Orange Red  Orange
 Apple Orange Apple Orange

Tree 3

         Color?
       /       \
     Orange     Red
    /    \      /   \
 Small Large Small Large
 Orange Orange Apple Apple

Step 3: Predict a New Sample

New fruit: Color = Red, Size = Small

Tree 1 predicts: Apple
Tree 2 predicts: Apple
Tree 3 predicts: Apple

Majority vote → Apple

Step 4: How Randomness Helps

Bootstrap: Trees see different samples → reduces overfitting.
Random features: Trees are less correlated → improves generalization.

Step 5: Aggregate Prediction

Classification: majority vote → Apple
Regression: average of tree outputs

Menu