Model Validation Explained

In machine learning, model validation is the process of checking how good your model is at making predictions on new, unseen data. The goal is simple: we want our model to perform well in the real world, not just on the data it was trained on.

Why Do We Need Model Validation?

A common mistake is evaluating a model using the same data it was trained on. This is called an "in-sample evaluation".

This can be misleading because the model may simply memorize the training data instead of learning real patterns.

Simple Example

Imagine your dataset shows that houses with green doors are expensive. The model may learn this pattern and assume all green-door houses are expensive.

This pattern may only exist in your training data
It may not be true in real-world data
So the model will fail when used in practice

The Solution: Train–Validation Split

To fix this, we split the dataset into two parts:

Training Data: Used to build the model

Validation Data: Used to test the model on unseen data

Measuring Accuracy: Mean Absolute Error (MAE)

One common way to measure model performance is Mean Absolute Error (MAE).

error = |actual − predicted|

MAE tells us:

"On average, how far off are our predictions?"

Python Example


from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor

# Split data into training and validation sets
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)

# Create model
model = DecisionTreeRegressor()

# Train model
model.fit(train_X, train_y)

# Predict on validation data
val_predictions = model.predict(val_X)

# Calculate error
mae = mean_absolute_error(val_y, val_predictions)
print(mae)

Summary

A model may perform extremely well on training data but perform very poorly on validation data.

This means it is not reliable for real-world predictions.

Always evaluate models on unseen data
Never trust training accuracy alone
Use validation data to estimate real-world performance
Lower MAE means better predictions

Search This Blog