Skip to main content

Random Forest Optimization


We input a fruit: Apple (Color = Red, Size = Small). Random Forest predicts its class by sending it through multiple decision trees and taking a majority vote.

How Random Forest Works Internally

Random Forest Decision Trees Diagram Tree 1 Color? Red→Apple Org→Org Tree 2 Size? Sml→Apple Lrg→Org Tree 3 Color? Red→Apple Org→Org Majority Vote → Apple

How Random Forest Differs from General ML Models

  • General ML: Learns complex function from all training data (e.g., neural network weights, linear regression coefficients).
  • Random Forest: Builds multiple simple decision trees on random subsets of data and features.
  • General ML models often have continuous parameters and internal layers/tensors.
  • Random Forest has no layers or tensors; its “learning” is simple rules at each node (splits) and aggregation by majority vote or averaging.
  • Random Forest uses bootstrap sampling and random feature selection to reduce overfitting and improve generalization.
  • Prediction in Random Forest = aggregate of simple tree predictions, while general ML models compute a complex mapping function.

1. Overview

Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve predictive accuracy and reduce overfitting.

  • For classification: output is the majority vote of the trees.
  • For regression: output is the average of the trees’ predictions.

Mathematically, it can be expressed as aggregating outputs of multiple trees.

2. Decision Tree Basics

A single decision tree partitions the feature space recursively:

  • Let the input vector be ๐‘ฅ = (x₁, x₂, ..., xโ‚š)
  • Tree splits create regions Rโ‚˜ in feature space.
  • Prediction in region Rโ‚˜:
Regression: ลท = (1 / Nโ‚˜) ฮฃi ∈ Rโ‚˜ yแตข
Classification: ลท = mode({yแตข: i ∈ Rโ‚˜})

Where Nโ‚˜ is the number of samples in region Rโ‚˜.

3. Random Forest Prediction

Assume we have B trees {T₁, T₂, ..., T_B}:

Regression:
ลท_RF(x) = (1 / B) ฮฃb=1 to B T_b(x)

Classification:
ลท_RF(x) = majority_vote { T₁(x), T₂(x), ..., T_B(x) }

Where T_b(x) is the prediction of the b-th tree.

4. Bootstrap Aggregation (Bagging)

  • Each tree is trained on a bootstrap sample (random sample with replacement).
  • Reduces variance by decorrelating trees.
If the training set is D = { (xแตข, yแตข) } for i = 1..N
Bootstrap sample for tree b: D_b ~ Uniform sample with replacement from D

5. Random Feature Selection

At each split, instead of using all p features, a random subset of m << p features is considered:

  • Reduces correlation between trees.
  • Split selection:
Choose feature j* = argmax (Information Gain or Gini Reduction) 
from random subset of m features

6. Out-of-Bag (OOB) Error

  • For each sample, some trees did not include it in their bootstrap set.
  • OOB prediction for sample i:
ลท_OOB,i = (1 / |B_i|) ฮฃb ∈ B_i T_b(x_i)

Where B_i = set of trees where i was not included in training.

OOB error estimates generalization error without a separate validation set.

Random Forest = Bagging + Random Feature Selection

  1. Build B trees on bootstrap samples.
  2. At each split, select the best split from a random subset of features.
  3. Predict by averaging (regression) or majority vote (classification).
  4. OOB samples estimate generalization error.

Random Forest Example with Dummy Dataset

Let’s break Random Forest down with a small, simple dataset for classification.

1. Dummy Dataset

Suppose we have a dataset of fruits with features Color and Size, and we want to predict if the fruit is Apple or Orange.

Fruit Color Size
1RedSmall
2RedLarge
3OrangeLarge
4OrangeSmall
5RedSmall
6OrangeLarge

Notes: Color → Red/Orange, Size → Small/Large, Target → Apple (Red) or Orange

2. Step 1: Bootstrap Sampling

Random Forest trains each tree on a random sample with replacement. Example Tree 1 bootstrap sample:

Fruit Color Size
1RedSmall
2RedLarge
5RedSmall
6OrangeLarge
3OrangeLarge

Some rows may be missed and some may repeat.

3. Step 2: Random Feature Selection

At each split, Random Forest randomly selects a feature instead of using all features.

  • Suppose Tree 1 chooses Color first → split Red vs Orange.
  • Next, Tree 1 might consider Size in each branch.

4. Step 3: Build the Tree

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
Apple Apple Orange Orange

Leaves give predicted class based on majority vote.

5. Step 4: Build More Trees

Random Forest builds multiple trees with different bootstrap samples and random features.

Example predictions for new fruit (Red, Small):

Tree Prediction
1Apple
2Apple
3Orange

6. Step 5: Aggregate Predictions

  • Classification: majority vote → Apple
  • Regression: average of tree outputs.

Another Example

Key Points

  1. Random Forest uses multiple decision trees.
  2. Each tree sees a different random sample.
  3. Each tree considers a random subset of features at each split.
  4. Predictions are aggregated: majority vote (classification) or average (regression).
  5. This reduces overfitting compared to a single tree.

Dataset

Fruit Color Size Target
1RedSmallApple
2RedLargeApple
3OrangeLargeOrange
4OrangeSmallOrange
5RedSmallApple
6OrangeLargeOrange

We want to predict the fruit type based on Color and Size.

Step 1: Bootstrap Sampling

  • Tree 1 sample: 1, 2, 5, 6, 3
  • Tree 2 sample: 2, 3, 4, 5, 6
  • Tree 3 sample: 1, 3, 3, 4, 5

Notice some fruits repeat and some are missing.

Step 2: Build Trees with Random Feature Selection

At each split, Random Forest chooses a random subset of features.

Tree 1

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
 Apple  Apple Orange Orange

Tree 2

         Size?
       /       \
     Small     Large
    /   \       /    \
  Red Orange Red  Orange
 Apple Orange Apple Orange

Tree 3

         Color?
       /       \
     Orange     Red
    /    \      /   \
 Small Large Small Large
 Orange Orange Apple Apple

Step 3: Predict a New Sample

New fruit: Color = Red, Size = Small

  • Tree 1 predicts: Apple
  • Tree 2 predicts: Apple
  • Tree 3 predicts: Apple

Majority vote → Apple

Step 4: How Randomness Helps

  • Bootstrap: Trees see different samples → reduces overfitting.
  • Random features: Trees are less correlated → improves generalization.

Step 5: Aggregate Prediction

  • Classification: majority vote → Apple
  • Regression: average of tree outputs

Further Reading

People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

Simulation of ASK, FSK, and PSK using MATLAB Simulink (with Online Simulator)

๐Ÿ“˜ Overview ๐Ÿงฎ How to use MATLAB Simulink ๐Ÿงฎ Simulation of ASK using MATLAB Simulink ๐Ÿงฎ Simulation of FSK using MATLAB Simulink ๐Ÿงฎ Simulation of PSK using MATLAB Simulink ๐Ÿงฎ Simulator for ASK, FSK, and PSK ๐Ÿงฎ Digital Signal Processing Simulator ๐Ÿ“š Further Reading ASK, FSK & PSK HomePage MATLAB Simulation Simulation of Amplitude Shift Keying (ASK) using MATLAB Simulink In Simulink, we pick different components/elements from MATLAB Simulink Library. Then we connect the components and perform a particular operation. Result A sine wave source, a pulse generator, a product block, a mux, and a scope are shown in the diagram above. The pulse generator generates the '1' and '0' bit sequences. Sine wave sources produce a specific amplitude and frequency. The scope displays the modulated signal as well as the original bit sequence created by the pulse generator. Mux i...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. ๐Ÿ“˜ Theory ๐Ÿงฎ Simulators ๐Ÿ’ป MATLAB Code ๐Ÿ“š Resources BER Definition SNR Formula BER Calculator MATLAB Comparison ๐Ÿ“‚ Explore M-ary QAM, PSK, and QPSK Topics ▼ ๐Ÿงฎ Constellation Simulator: M-ary QAM ๐Ÿงฎ Constellation Simulator: M-ary PSK ๐Ÿงฎ BER calculation for ASK, FSK, and PSK ๐Ÿงฎ Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit for a...

Antenna Gain-Combining Methods - EGC, MRC, SC, and RMSGC

๐Ÿ“˜ Overview ๐Ÿงฎ Equal gain combining (EGC) ๐Ÿงฎ Maximum ratio combining (MRC) ๐Ÿงฎ Selective combining (SC) ๐Ÿงฎ Root mean square gain combining (RMSGC) ๐Ÿงฎ Zero-Forcing (ZF) Combining ๐Ÿงฎ MATLAB Code ๐Ÿ“š Further Reading  There are different antenna gain-combining methods. They are as follows. 1. Equal gain combining (EGC) 2. Maximum ratio combining (MRC) 3. Selective combining (SC) 4. Root mean square gain combining (RMSGC) 5. Zero-Forcing (ZF) Combining  1. Equal gain combining method Equal Gain Combining (EGC) is a diversity combining technique in which the receiver aligns the phase of the received signals from multiple antennas (or channels) but gives them equal amplitude weight before summing. This means each received signal is phase-corrected to be coherent with others, but no scaling is applied based on signal strength or channel quality (unlike MRC). Mathematically, for received signa...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. ๐Ÿ“˜ Overview ๐Ÿงฎ Simulator ⚖️ Theory ๐Ÿ“š Resources Definitions Constellation Tool Key Points MATLAB Code ๐Ÿ“‚ Other Topics: M-ary PSK & QAM Diagrams ▼ ๐Ÿงฎ Simulator for M-ary PSK Constellation ๐Ÿงฎ Simulator for M-ary QAM Constellation BASK (Binary ASK) Modulation Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1. BFSK (Binary FSK) Modulation Transmits one ...

Coherence Bandwidth and Coherence Time (with MATLAB + Simulator)

๐Ÿงฎ Coherence Bandwidth ๐Ÿงฎ Coherence Time ๐Ÿงฎ MATLAB Code s ๐Ÿ“š Further Reading For Doppler Delay or Multi-path Delay Coherence time T coh ∝ 1 / v max (For slow fading, coherence time T coh is greater than the signaling interval.) Coherence bandwidth W coh ∝ 1 / ฯ„ max (For frequency-flat fading, coherence bandwidth W coh is greater than the signaling bandwidth.) Where: T coh = coherence time W coh = coherence bandwidth v max = maximum Doppler frequency (or maximum Doppler shift) ฯ„ max = maximum excess delay (maximum time delay spread) Notes: The notation v max −1 and ฯ„ max −1 indicate inverse proportionality. Doppler spread refers to the range of frequency shifts caused by relative motion, determining T coh . Delay spread (or multipath delay spread) determines W coh . Frequency-flat fading occurs when W coh is greater than the signaling bandwidth. Coherence Bandwidth Coherence bandwidth is...

OFDM Symbols and Subcarriers Explained

This article explains how OFDM (Orthogonal Frequency Division Multiplexing) symbols and subcarriers work. It covers modulation, mapping symbols to subcarriers, subcarrier frequency spacing, IFFT synthesis, cyclic prefix, and transmission. Step 1: Modulation First, modulate the input bitstream. For example, with 16-QAM , each group of 4 bits maps to one QAM symbol. Suppose we generate a sequence of QAM symbols: s0, s1, s2, s3, s4, s5, …, s63 Step 2: Mapping Symbols to Subcarriers Assume N sub = 8 subcarriers. Each OFDM symbol in the frequency domain contains 8 QAM symbols (one per subcarrier): Mapping (example) OFDM symbol 1 → s0, s1, s2, s3, s4, s5, s6, s7 OFDM symbol 2 → s8, s9, s10, s11, s12, s13, s14, s15 … OFDM sym...

BER performance of QPSK with BPSK, 4-QAM, 16-QAM, 64-QAM, 256-QAM, etc (MATLAB + Simulator)

๐Ÿ“˜ Overview ๐Ÿ“š QPSK vs BPSK and QAM: A Comparison of Modulation Schemes in Wireless Communication ๐Ÿ“š Real-World Example ๐Ÿงฎ MATLAB Code ๐Ÿ“š Further Reading   QPSK provides twice the data rate compared to BPSK. However, the bit error rate (BER) is approximately the same as BPSK at low SNR values when gray coding is used. On the other hand, QPSK exhibits similar spectral efficiency to 4-QAM and 16-QAM under low SNR conditions. In very noisy channels, QPSK can sometimes achieve better spectral efficiency than 4-QAM or 16-QAM. In practical wireless communication scenarios, QPSK is commonly used along with QAM techniques, especially where adaptive modulation is applied. Modulation Bits/Symbol Points in Constellation Usage Notes BPSK 1 2 Very robust, used in weak signals QPSK 2 4 Balanced speed & reliability 4-QAM ...

ASK, FSK, and PSK (with MATLAB + Online Simulator)

๐Ÿ“˜ ASK Theory ๐Ÿ“˜ FSK Theory ๐Ÿ“˜ PSK Theory ๐Ÿ“Š Comparison ๐Ÿงฎ MATLAB Codes ๐ŸŽฎ Simulator ASK or OFF ON Keying ASK is a simple (less complex) Digital Modulation Scheme where we vary the modulation signal's amplitude or voltage by the message signal's amplitude or voltage. We select two levels (two different voltage levels) for transmitting modulated message signals. Example: "+5 Volt" (upper level) and "0 Volt" (lower level). To transmit binary bit "1", the transmitter sends "+5 Volts", and for bit "0", it sends no power. The receiver uses filters to detect whether a binary "1" or "0" was transmitted. Fig 1: Output of ASK, FSK, and PSK modulation using MATLAB for a data stream "1 1 0 0 1 0 1 0" ( Get MATLAB Code ) ...