Skip to main content

Model Validation Explained


Model Validation Explained

In machine learning, model validation is the process of checking how good your model is at making predictions on new, unseen data. The goal is simple: we want our model to perform well in the real world, not just on the data it was trained on.

Why Do We Need Model Validation?

A common mistake is evaluating a model using the same data it was trained on. This is called an "in-sample evaluation".

This can be misleading because the model may simply memorize the training data instead of learning real patterns.

Simple Example

Imagine your dataset shows that houses with green doors are expensive. The model may learn this pattern and assume all green-door houses are expensive.

  • This pattern may only exist in your training data
  • It may not be true in real-world data
  • So the model will fail when used in practice

The Solution: Train–Validation Split

To fix this, we split the dataset into two parts:

Training Data: Used to build the model

Validation Data: Used to test the model on unseen data

Measuring Accuracy: Mean Absolute Error (MAE)

One common way to measure model performance is Mean Absolute Error (MAE).

error = |actual − predicted|

MAE tells us:

"On average, how far off are our predictions?"

Python Example


from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor

# Split data into training and validation sets
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

# Create model
model = DecisionTreeRegressor()

# Train model
model.fit(train_X, train_y)

# Predict on validation data
val_predictions = model.predict(val_X)

# Calculate error
mae = mean_absolute_error(val_y, val_predictions)
print(mae)
    

Summary

A model may perform extremely well on training data but perform very poorly on validation data.

This means it is not reliable for real-world predictions.
  • Always evaluate models on unseen data
  • Never trust training accuracy alone
  • Use validation data to estimate real-world performance
  • Lower MAE means better predictions

People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

How to use MATLAB Simulink

Introduction to MATLAB Simulink MATLAB Simulink is a popular add-on of MATLAB. Here, you can use different blocks like modulator, demodulator, AWGN channel, etc. And you can do experiments on your own. Steps to Get Started 1. Go to the 'Simulink' tab at the top navbar of MATLAB. If not found, click on the add-on tab, search 'Simulink,' and then click on it to add. 2. Once you installed the simulation, click the 'new' tap at the top left corner. 3. Then, search the required blocks in the 'Simulink library.' Then, drag it to the editor space. 4. You can double-click on the blocks to see the input parameters. 5. Then, connect the blocks by dragging a line from one block's output terminal to another block's input. 6. If the connection is complete, click the 'run' tab in the middle of the top navbar. 7. After clicking on the run ...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. 📘 Theory 🧮 Simulators 💻 MATLAB Code 📚 Resources BER Definition SNR Formula BER Calculator MATLAB Comparison 📂 Explore M-ary QAM, PSK, and QPSK Topics ▼ 🧮 Constellation Simulator: M-ary QAM 🧮 Constellation Simulator: M-ary PSK 🧮 BER calculation for ASK, FSK, and PSK 🧮 Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit for a...

Theoretical vs. simulated BER vs. SNR for ASK, FSK, and PSK (MATLAB Code + Simulator)

📘 Overview 🧮 Simulator 💻 Theoretical Code 📊 Simulated Code 📚 Resources Overview BER vs. SNR denotes how many bits in error are received for a given signal-to-noise ratio, typically measured in dB. Common noise types in wireless systems: 🚀 1. Additive White Gaussian Noise (AWGN) 🌊 2. Rayleigh Fading AWGN adds random noise; Rayleigh fading attenuates the signal variably. A good SNR helps reduce these effects. Bit Error Rate (BER) Equations BER formulas for ASK, FSK, and PSK modulation schemes. ASK BER = 0.5 × erfc(0.5 × √SNR) FSK BER = 0.5 × erfc(√(SNR / 2)) PSK BER = 0.5 × erfc(√SNR) erfc / Q-function (Click here) Live BER S...

Rayleigh vs Rician Fading (with MATLAB + Simulator)

  In Rayleigh fading , the channel coefficients tend to have a Rayleigh distribution, which is characterized by a random phase and magnitude with an exponential distribution. This means the magnitude of the channel coefficient follows an exponential distribution with a mean of 1. In Rician fading , there is a dominant line-of-sight component in addition to the scattered components. The channel coefficients in Rician fading can indeed tend towards 1, especially when the line-of-sight component is strong. When the line-of-sight component dominates, the Rician fading channel behaves more deterministically, and the channel coefficients may tend towards the value of the line-of-sight component, which could be close to 1.   MATLAB Script clc; clear all; close all; % Define parameters numSamples = 1000; % Number of samples K_factor = 5; % K-factor for Rician fading SNR_dB = 20; % Signal-to-noise ratio (in dB) % Generate complex Gaussian random variable for Rayleigh fading channel h_r...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. 📘 Overview 🧮 Simulator ⚖️ Theory 📚 Resources Definitions Constellation Tool Key Points MATLAB Code 📂 Other Topics: M-ary PSK & QAM Diagrams ▼ 🧮 Simulator for M-ary PSK Constellation 🧮 Simulator for M-ary QAM Constellation BASK (Binary ASK) Modulation Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1. BFSK (Binary FSK) Modulation Transmits on...

UGC NET Electronic Science Previous Year Question Papers

Home / Engineering & Other Exams / UGC NET 2022 PYQ 📥 Download UGC NET Electronics PDFs Complete collection of previous year question papers, answer keys and explanations for Subject Code 88. Start Downloading UGC-NET (Electronics Science, Subject code: 88) Subject_Code : 88; Department : Electronic Science; 📂 View All Question Papers Q. UGC Net Electronic Science Question Paper [June 2025] A. UGC Net Electronic Science Question Paper With Answer Key Download Pdf [June 2025] with full explanation Q. UGC Net Electronic Science Question Paper [December 2024] A. UGC Net Electronic Science Question Paper With Answer Key Download Pdf [December 2024] Q. UGC Net Electronic Science Question Paper [Aug 2024] A. UGC Net Electronic Scien...

UGC-NET Electronic Science Question Paper With Answer Key and Full Explanation [Dec 2023]

    UGC-NET Electronic Science Question Paper With Answer Key Download Pdf [Dec 2023] Download Question Paper               See Answers   2025 | 2024 | 2023 | 2022 | 2021 | 2020 UGC-NET Electronic Science  2023 Answers with Explanations 51. (A): The stacking fault is the most common area defect found in silicon. These faults typically occur along the 111 plane. In the crystalline structure of silicon, atoms are arranged in a specific pattern known as a diamond lattice. A stacking fault refers to a disruption in the normal order of atomic layers within this lattice, which usually occurs in the 111 plane due to the geometric arrangement of the atoms. This type of defect can affect the electrical and mechanical properties of the material, such as the mobility of charge carriers and mechanical strength. 52. (C): The important figure of merit for the microwave application of a Schot...

OFDM Waveform with MATLAB Code (with Simulator)

  In OFDM (Orthogonal Frequency Division Multiplexing) , we transmit multiple orthogonal subcarriers simultaneously. Since the subcarriers are orthogonal , they do not interfere with each other, which is one of the main advantages of OFDM. Practically, OFDM converts a wideband signal into multiple narrowband orthogonal subcarriers. For typical wireless communication, if the signal bandwidth (or symbol duration) exceeds the coherence bandwidth of the channel, the signal experiences frequency-selective fading . Fading distorts the signal, making it difficult to recover the original information. By using OFDM, we transmit the same wideband signal across multiple orthogonal narrowband subcarriers, reducing the effect of fading. For example, if we want to transmit a signal of bandwidth 1024 kHz , we can divide it into N = 8 subcarriers . Each subcarrier is then spaced by: Δf = Total Bandwidth N = 1024 8 kHz...