Skip to main content

Random Forest Optimization


We input a fruit: Apple (Color = Red, Size = Small). Random Forest predicts its class by sending it through multiple decision trees and taking a majority vote.

How Random Forest Works Internally

Random Forest Decision Trees Diagram Tree 1 Color? Red→Apple Org→Org Tree 2 Size? Sml→Apple Lrg→Org Tree 3 Color? Red→Apple Org→Org Majority Vote → Apple

How Random Forest Differs from General ML Models

  • General ML: Learns complex function from all training data (e.g., neural network weights, linear regression coefficients).
  • Random Forest: Builds multiple simple decision trees on random subsets of data and features.
  • General ML models often have continuous parameters and internal layers/tensors.
  • Random Forest has no layers or tensors; its “learning” is simple rules at each node (splits) and aggregation by majority vote or averaging.
  • Random Forest uses bootstrap sampling and random feature selection to reduce overfitting and improve generalization.
  • Prediction in Random Forest = aggregate of simple tree predictions, while general ML models compute a complex mapping function.

1. Overview

Random Forest is an ensemble learning method that builds multiple decision trees and combines their outputs to improve predictive accuracy and reduce overfitting.

  • For classification: output is the majority vote of the trees.
  • For regression: output is the average of the trees’ predictions.

Mathematically, it can be expressed as aggregating outputs of multiple trees.

2. Decision Tree Basics

A single decision tree partitions the feature space recursively:

  • Let the input vector be ๐‘ฅ = (x₁, x₂, ..., xโ‚š)
  • Tree splits create regions Rโ‚˜ in feature space.
  • Prediction in region Rโ‚˜:
Regression: ลท = (1 / Nโ‚˜) ฮฃi ∈ Rโ‚˜ yแตข
Classification: ลท = mode({yแตข: i ∈ Rโ‚˜})

Where Nโ‚˜ is the number of samples in region Rโ‚˜.

3. Random Forest Prediction

Assume we have B trees {T₁, T₂, ..., T_B}:

Regression:
ลท_RF(x) = (1 / B) ฮฃb=1 to B T_b(x)

Classification:
ลท_RF(x) = majority_vote { T₁(x), T₂(x), ..., T_B(x) }

Where T_b(x) is the prediction of the b-th tree.

4. Bootstrap Aggregation (Bagging)

  • Each tree is trained on a bootstrap sample (random sample with replacement).
  • Reduces variance by decorrelating trees.
If the training set is D = { (xแตข, yแตข) } for i = 1..N
Bootstrap sample for tree b: D_b ~ Uniform sample with replacement from D

5. Random Feature Selection

At each split, instead of using all p features, a random subset of m << p features is considered:

  • Reduces correlation between trees.
  • Split selection:
Choose feature j* = argmax (Information Gain or Gini Reduction) 
from random subset of m features

6. Out-of-Bag (OOB) Error

  • For each sample, some trees did not include it in their bootstrap set.
  • OOB prediction for sample i:
ลท_OOB,i = (1 / |B_i|) ฮฃb ∈ B_i T_b(x_i)

Where B_i = set of trees where i was not included in training.

OOB error estimates generalization error without a separate validation set.

Random Forest = Bagging + Random Feature Selection

  1. Build B trees on bootstrap samples.
  2. At each split, select the best split from a random subset of features.
  3. Predict by averaging (regression) or majority vote (classification).
  4. OOB samples estimate generalization error.

Random Forest Example with Dummy Dataset

Let’s break Random Forest down with a small, simple dataset for classification.

1. Dummy Dataset

Suppose we have a dataset of fruits with features Color and Size, and we want to predict if the fruit is Apple or Orange.

Fruit Color Size
1RedSmall
2RedLarge
3OrangeLarge
4OrangeSmall
5RedSmall
6OrangeLarge

Notes: Color → Red/Orange, Size → Small/Large, Target → Apple (Red) or Orange

2. Step 1: Bootstrap Sampling

Random Forest trains each tree on a random sample with replacement. Example Tree 1 bootstrap sample:

Fruit Color Size
1RedSmall
2RedLarge
5RedSmall
6OrangeLarge
3OrangeLarge

Some rows may be missed and some may repeat.

3. Step 2: Random Feature Selection

At each split, Random Forest randomly selects a feature instead of using all features.

  • Suppose Tree 1 chooses Color first → split Red vs Orange.
  • Next, Tree 1 might consider Size in each branch.

4. Step 3: Build the Tree

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
Apple Apple Orange Orange

Leaves give predicted class based on majority vote.

5. Step 4: Build More Trees

Random Forest builds multiple trees with different bootstrap samples and random features.

Example predictions for new fruit (Red, Small):

Tree Prediction
1Apple
2Apple
3Orange

6. Step 5: Aggregate Predictions

  • Classification: majority vote → Apple
  • Regression: average of tree outputs.

Another Example

Key Points

  1. Random Forest uses multiple decision trees.
  2. Each tree sees a different random sample.
  3. Each tree considers a random subset of features at each split.
  4. Predictions are aggregated: majority vote (classification) or average (regression).
  5. This reduces overfitting compared to a single tree.

Dataset

Fruit Color Size Target
1RedSmallApple
2RedLargeApple
3OrangeLargeOrange
4OrangeSmallOrange
5RedSmallApple
6OrangeLargeOrange

We want to predict the fruit type based on Color and Size.

Step 1: Bootstrap Sampling

  • Tree 1 sample: 1, 2, 5, 6, 3
  • Tree 2 sample: 2, 3, 4, 5, 6
  • Tree 3 sample: 1, 3, 3, 4, 5

Notice some fruits repeat and some are missing.

Step 2: Build Trees with Random Feature Selection

At each split, Random Forest chooses a random subset of features.

Tree 1

         Color?
       /       \
     Red       Orange
    /   \       /    \
  Small Large Large Small
 Apple  Apple Orange Orange

Tree 2

         Size?
       /       \
     Small     Large
    /   \       /    \
  Red Orange Red  Orange
 Apple Orange Apple Orange

Tree 3

         Color?
       /       \
     Orange     Red
    /    \      /   \
 Small Large Small Large
 Orange Orange Apple Apple

Step 3: Predict a New Sample

New fruit: Color = Red, Size = Small

  • Tree 1 predicts: Apple
  • Tree 2 predicts: Apple
  • Tree 3 predicts: Apple

Majority vote → Apple

Step 4: How Randomness Helps

  • Bootstrap: Trees see different samples → reduces overfitting.
  • Random features: Trees are less correlated → improves generalization.

Step 5: Aggregate Prediction

  • Classification: majority vote → Apple
  • Regression: average of tree outputs

Further Reading

People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

Constellation Diagrams of ASK, PSK, and FSK

๐Ÿ“˜ Overview of Energy per Bit (Eb / N0) ๐Ÿงฎ Online Simulator for constellation diagrams of ASK, FSK, and PSK ๐Ÿงฎ Theory behind Constellation Diagrams of ASK, FSK, and PSK ๐Ÿงฎ MATLAB Codes for Constellation Diagrams of ASK, FSK, and PSK ๐Ÿ“š Further Reading ๐Ÿ“‚ Other Topics on Constellation Diagrams of ASK, PSK, and FSK ... ๐Ÿงฎ Simulator for constellation diagrams of m-ary PSK ๐Ÿงฎ Simulator for constellation diagrams of m-ary QAM BASK (Binary ASK) Modulation: Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1.    BFSK (Binary FSK) Modulation: Transmits one of two signals: +√Eb​ ( On the y-axis, the phase shift of 90 degrees with respect to the x-axis, which is also termed phase offset ) or √Eb (on x-axis), where Eb​ is the energy per bit. These signals represent binary 0 and 1.  BPSK (Binary PSK) Modulation: Transmits one of two signals...

Fading : Slow & Fast and Large & Small Scale Fading

๐Ÿ“˜ Overview ๐Ÿ“˜ LARGE SCALE FADING ๐Ÿ“˜ SMALL SCALE FADING ๐Ÿ“˜ SLOW FADING ๐Ÿ“˜ FAST FADING ๐Ÿงฎ MATLAB Codes ๐Ÿ“š Further Reading LARGE SCALE FADING The term 'Large scale fading' is used to describe variations in received signal power over a long distance, usually just considering shadowing.  Assume that a transmitter (say, a cell tower) and a receiver  (say, your smartphone) are in constant communication. Take into account the fact that you are in a moving vehicle. An obstacle, such as a tall building, comes between your cell tower and your vehicle's line of sight (LOS) path. Then you'll notice a decline in the power of your received signal on the spectrogram. Large-scale fading is the term for this type of phenomenon. SMALL SCALE FADING  Small scale fading is a term that describes rapid fluctuations in the received signal power on a small time scale. This includes multipath propagation effects as well as movement-induced Doppler fr...

Online Simulator for ASK, FSK, and PSK

Try our new Digital Signal Processing Simulator!   Start Simulator for binary ASK Modulation Message Bits (e.g. 1,0,1,0) Carrier Frequency (Hz) Sampling Frequency (Hz) Run Simulation Simulator for binary FSK Modulation Input Bits (e.g. 1,0,1,0) Freq for '1' (Hz) Freq for '0' (Hz) Sampling Rate (Hz) Visualize FSK Signal Simulator for BPSK Modulation ...

DFTs-OFDM vs OFDM: Why DFT-Spread OFDM Reduces PAPR Effectively (with MATLAB Code)

DFT-spread OFDM (DFTs-OFDM) has lower Peak-to-Average Power Ratio (PAPR) because it "spreads" the data in the frequency domain before applying IFFT, making the time-domain signal behave more like a single-carrier signal rather than a multi-carrier one like OFDM. Deeper Explanation: Aspect OFDM DFTs-OFDM Signal Type Multi-carrier Single-carrier-like Process IFFT of QAM directly QAM → DFT → IFFT PAPR Level High (due to many carriers adding up constructively) Low (less fluctuation in amplitude) Why PAPR is High Subcarriers can add in phase, causing spikes DFT "pre-spreads" data, smoothing it Used in Wi-Fi, LTE downlink LTE uplink (as SC-FDMA) In OFDM, all subcarriers can...

Theoretical BER vs SNR for binary ASK, FSK, and PSK

๐Ÿ“˜ Overview & Theory ๐Ÿงฎ MATLAB Codes ๐Ÿ“š Further Reading Theoretical BER vs SNR for Amplitude Shift Keying (ASK) The theoretical Bit Error Rate (BER) for binary ASK depends on how binary bits are mapped to signal amplitudes. For typical cases: If bits are mapped to 1 and -1, the BER is: BER = Q(√(2 × SNR)) If bits are mapped to 0 and 1, the BER becomes: BER = Q(√(SNR / 2)) Where: Q(x) is the Q-function: Q(x) = 0.5 × erfc(x / √2) SNR : Signal-to-Noise Ratio N₀ : Noise Power Spectral Density Understanding the Q-Function and BER for ASK Bit '0' transmits noise only Bit '1' transmits signal (1 + noise) Receiver decision threshold is 0.5 BER is given by: P b = Q(0.5 / ฯƒ) , where ฯƒ = √(N₀ / 2) Using SNR = (0.5)² / N₀, we get: BER = Q(√(SNR / 2)) Theoretical BER vs ...

MATLAB Code for ASK, FSK, and PSK

๐Ÿ“˜ Overview & Theory ๐Ÿงฎ MATLAB Code for ASK ๐Ÿงฎ MATLAB Code for FSK ๐Ÿงฎ MATLAB Code for PSK ๐Ÿงฎ Simulator for binary ASK, FSK, and PSK Modulations ๐Ÿ“š Further Reading ASK, FSK & PSK HomePage MATLAB Code MATLAB Code for ASK Modulation and Demodulation % The code is written by SalimWireless.Com % Clear previous data and plots clc; clear all; close all; % Parameters Tb = 1; % Bit duration (s) fc = 10; % Carrier frequency (Hz) N_bits = 10; % Number of bits Fs = 100 * fc; % Sampling frequency (ensure at least 2*fc, more for better representation) Ts = 1/Fs; % Sampling interval samples_per_bit = Fs * Tb; % Number of samples per bit duration % Generate random binary data rng(10); % Set random seed for reproducibility binary_data = randi([0, 1], 1, N_bits); % Generate random binary data (0 or 1) % Initialize arrays for continuous signals t_overall = 0:Ts:(N_bits...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...

๐Ÿ“˜ Overview of BER and SNR ๐Ÿงฎ Online Simulator for BER calculation of m-ary QAM and m-ary PSK ๐Ÿงฎ MATLAB Code for BER calculation of M-ary QAM, M-ary PSK, QPSK, BPSK, ... ๐Ÿ“š Further Reading ๐Ÿ“‚ View Other Topics on M-ary QAM, M-ary PSK, QPSK ... ๐Ÿงฎ Online Simulator for Constellation Diagram of m-ary QAM ๐Ÿงฎ Online Simulator for Constellation Diagram of m-ary PSK ๐Ÿงฎ MATLAB Code for BER calculation of ASK, FSK, and PSK ๐Ÿงฎ MATLAB Code for BER calculation of Alamouti Scheme ๐Ÿงฎ Different approaches to calculate BER vs SNR What is Bit Error Rate (BER)? The abbreviation BER stands for Bit Error Rate, which indicates how many corrupted bits are received (after the demodulation process) compared to the total number of bits sent in a communication process. BER = (number of bits received in error) / (total number of tran...

Theoretical BER vs SNR for m-ary PSK and QAM

Relationship Between Bit Error Rate (BER) and Signal-to-Noise Ratio (SNR) The relationship between Bit Error Rate (BER) and Signal-to-Noise Ratio (SNR) is a fundamental concept in digital communication systems. Here’s a detailed explanation: BER (Bit Error Rate): The ratio of the number of bits incorrectly received to the total number of bits transmitted. It measures the quality of the communication link. SNR (Signal-to-Noise Ratio): The ratio of the signal power to the noise power, indicating how much the signal is corrupted by noise. Relationship The BER typically decreases as the SNR increases. This relationship helps evaluate the performance of various modulation schemes. BPSK (Binary Phase Shift Keying) Simple and robust. BER in AWGN channel: BER = 0.5 × erfc(√SNR) Performs well at low SNR. QPSK (Quadrature...