Skip to main content

Model Validation Explained


Model Validation Explained

In machine learning, model validation is the process of checking how good your model is at making predictions on new, unseen data. The goal is simple: we want our model to perform well in the real world, not just on the data it was trained on.

Why Do We Need Model Validation?

A common mistake is evaluating a model using the same data it was trained on. This is called an "in-sample evaluation".

This can be misleading because the model may simply memorize the training data instead of learning real patterns.

Simple Example

Imagine your dataset shows that houses with green doors are expensive. The model may learn this pattern and assume all green-door houses are expensive.

  • This pattern may only exist in your training data
  • It may not be true in real-world data
  • So the model will fail when used in practice

The Solution: Train–Validation Split

To fix this, we split the dataset into two parts:

Training Data: Used to build the model

Validation Data: Used to test the model on unseen data

Measuring Accuracy: Mean Absolute Error (MAE)

One common way to measure model performance is Mean Absolute Error (MAE).

error = |actual − predicted|

MAE tells us:

"On average, how far off are our predictions?"

Python Example


from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor

# Split data into training and validation sets
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

# Create model
model = DecisionTreeRegressor()

# Train model
model.fit(train_X, train_y)

# Predict on validation data
val_predictions = model.predict(val_X)

# Calculate error
mae = mean_absolute_error(val_y, val_predictions)
print(mae)
    

Summary

A model may perform extremely well on training data but perform very poorly on validation data.

This means it is not reliable for real-world predictions.
  • Always evaluate models on unseen data
  • Never trust training accuracy alone
  • Use validation data to estimate real-world performance
  • Lower MAE means better predictions

People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

MATLAB code for BER vs SNR for M-QAM, M-PSK, QPSk, BPSK, ...(with Online Simulator)

🧮 MATLAB Code for BPSK, M-ary PSK, and M-ary QAM Together 🧮 MATLAB Code for M-ary QAM 🧮 MATLAB Code for M-ary PSK 📚 Further Reading MATLAB Script for BER vs. SNR for M-QAM, M-PSK, QPSK, BPSK % Written by Salim Wireless clc; clear; close all; num_symbols = 1e5; snr_db = -20:2:20; psk_orders = [2, 4, 8, 16, 32]; qam_orders = [4, 16, 64, 256]; ber_psk_results = zeros(length(psk_orders), length(snr_db)); ber_qam_results = zeros(length(qam_orders), length(snr_db)); for i = 1:length(psk_orders) psk_order = psk_orders(i); for j = 1:length(snr_db) data_symbols = randi([0, psk_order-1], 1, num_symbols); modulated_signal = pskmod(data_symbols, psk_order, pi/psk_order); received_signal = awgn(modulated_signal, snr_db(j), 'measured'); demodulated_symbols = pskdemod(received_signal, psk_order, pi/psk_order); ber_psk_results(i, j) = sum(data_symbols ~= demodulated_symbols) / num_symbols; end end for i...

Amplitude, Frequency, and Phase Modulation Techniques (AM, FM, and PM)

📘 Overview 🧮 Amplitude Modulation (AM) 🧮 Online Amplitude Modulation Simulator 🧮 MATLAB Code for AM 🧮 Q & A and Summary 📚 Further Reading Amplitude Modulation (AM): The carrier signal's amplitude varies linearly with the amplitude of the message signal. An AM wave may thus be described, in the most general form, as a function of time as follows .                       When performing amplitude modulation (AM) with a carrier frequency of 100 Hz and a message frequency of 10 Hz, the resulting peak frequencies are as follows: 90 Hz (100 - 10 Hz), 100 Hz, and 110 Hz (100 + 10 Hz). Figure: Frequency Spectrums of AM Signal (Lower Sideband, Carrier, and Upper Sideband) A low-frequency message signal is modulated with a high-frequency carrier wave using a local oscillator to make communication possible. DSB, SSB, and VSB are common amplitude modulation techniques. We find a lot of bandwi...

Analog vs Digital Modulation Techniques | Advantages of Digital ...

Modulation Techniques Analog vs Digital Modulation Techniques... In the previous article, we've talked about the need for modulation and we've also talked about analog & digital modulations briefly. In this article, we'll discuss the main difference between analog and digital modulation in the case of digital modulation it takes a digital signal for modulation whereas analog modulator takes an analog signal.  Advantages of Digital Modulation over Analog Modulation Digital Modulation Techniques are Bandwidth efficient Its have good resistance against noise It can easily multiple various types of audio, voice signal As it is good noise resistant so we can expect good signal strength So, it leads high signal-to-noise ratio (SNR) Alternatively, it provides a high data rate or throughput Digital Modulation Techniques have better swathing capability as compared to Analog Modulation Techniques  The digital system provides better security than the a...

Shannon Limit Explained: Negative SNR, Eb/No and Channel Capacity

Understanding Negative SNR and the Shannon Limit Understanding Negative SNR and the Shannon Limit An explanation of Signal-to-Noise Ratio (SNR), its behavior in decibels, and how Shannon's theorem defines the ultimate communication limit. Signal-to-Noise Ratio in Shannon’s Equation In Shannon's equation, the Signal-to-Noise Ratio (SNR) is defined as the signal power divided by the noise power: SNR = S / N Since both signal power and noise power are physical quantities, neither can be negative. Therefore, the SNR itself is always a positive number. However, engineers often express SNR in decibels: SNR(dB) When SNR = 1, the logarithmic value becomes: SNR(dB) = 0 When the noise power exceeds the signal power (SNR < 1), the decibel representation becomes negative. Behavior of Shannon's Capacity Equation Shannon’s channel capacity formula is: C = B log₂(1 + SNR) For SNR = 0: log₂(1 + SNR) = 0 When SNR becomes smaller (in...

MATLAB Codes for Various types of beamforming | Beam Steering, Digital...

📘 How Beamforming Improves SNR 🧮 MATLAB Code 📚 Further Reading 📂 Other Topics on Beamforming in MATLAB ... MIMO / Massive MIMO Beamforming Techniques Beamforming Techniques MATLAB Codes for Beamforming... How Beamforming Improves SNR The mathematical [↗] and theoretical aspects of beamforming [↗] have already been covered. We'll talk about coding in MATLAB in this tutorial so that you may generate results for different beamforming approaches. Let's go right to the content of the article. In analog beamforming, certain codebooks are employed on the TX and RX sides to select the best beam pairs. Because of their beamforming gains, communication created through the strongest beams from both the TX and RX side enhances spectrum efficiency. Additionally, beamforming gain directly impacts SNR improvement. Wireless communication system capacity = bandwidth*log2(1+SNR)...

MATLAB Code for Pulse Width Modulation (PWM) and Demodulation

📘 Overview & Theory 🧮 MATLAB Code for Pulse Width Modulation and Demodulation 🧮 Generating a PWM Signal in detail 🧮 Other Pulse Modulation Techniques (e.g., PWM, PPM, DM, and PCM) 🧮 Simulation results for comparison of PAM, PWM, PPM, DM, and PCM 📚 Further Reading   MATLAB Code for Analog Pulse Width Modulation (PWM) clc; clear all; close all; fs=30; %frequency of the sawtooth signal fm=3; %frequency of the message signal sampling_frequency = 10e3; a=0.5; % amplitide t=0:(1/sampling_frequency):1; %sampling rate of 10kHz sawtooth=2*a.*sawtooth(2*pi*fs*t); %generating a sawtooth wave subplot(4,1,1); plot(t,sawtooth); % plotting the sawtooth wave title('Comparator Wave'); msg=a.*sin(2*pi*fm*t); %generating message wave subplot(4,1,2); plot(t,msg); %plotting the sine message wave title('Message Signal'); for i=1:length(sawtooth) if (msg(i)>=sawtooth(i)) pwm(i)=1; %is message signal amplitude at i th sample is greater than ...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

📘 Overview of BER and SNR 🧮 Online Simulator for BER calculation of m-ary QAM and m-ary PSK 🧮 MATLAB Code for BER calculation of M-ary QAM, M-ary PSK, QPSK, BPSK, ... 📚 Further Reading 📂 View Other Topics on M-ary QAM, M-ary PSK, QPSK ... 🧮 Online Simulator for Constellation Diagram of m-ary QAM 🧮 Online Simulator for Constellation Diagram of m-ary PSK 🧮 MATLAB Code for BER calculation of ASK, FSK, and PSK 🧮 MATLAB Code for BER calculation of Alamouti Scheme 🧮 Different approaches to calculate BER vs SNR What is Bit Error Rate (BER)? The abbreviation BER stands for Bit Error Rate, which indicates how many corrupted bits are received (after the demodulation process) compared to the total number of bits sent in a communication process. BER = (number of bits received in error) / (total number of tran...

Online Simulator for ASK, FSK, and PSK

Try our new Digital Signal Processing Simulator!   Start Simulator for binary ASK Modulation Message Bits (e.g. 1,0,1,0) Carrier Frequency (Hz) Sampling Frequency (Hz) Run Simulation Simulator for binary FSK Modulation Input Bits (e.g. 1,0,1,0) Freq for '1' (Hz) Freq for '0' (Hz) Sampling Rate (Hz) Visualize FSK Signal Simulator for BPSK Modulation ...