K-Nearest Neighbors (KNN)

K-Nearest Neighbors (KNN) Algorithm: Simple Math and Example

Let's break down the mathematical concept behind the KNN algorithm and go through a simple example.

1. Mathematical Concept

KNN works by finding the K closest points in the feature space (based on distance metrics such as Euclidean distance) to classify or predict a new data point.

Step 1: Choose a number K (the number of nearest neighbors).
Step 2: Calculate the distance between the new data point and all other points in the training dataset.
Step 3: Sort the distances and pick the K smallest distances.
Step 4: For classification, assign the class label based on the majority of the K neighbors.
Step 5: For regression, calculate the average value of the K neighbors and assign it as the prediction.

2. Distance Metric: Euclidean Distance

For classification and regression, KNN typically uses the Euclidean distance to measure how "far" a point is from others.

The Euclidean distance between two points (x₁, y₁) and (x₂, y₂) in 2D space is given by:

            d = √((x₁ - x₂)² + (y₁ - y₂)²)

For higher-dimensional spaces (e.g., 3D, 4D), it’s generalized as:

            d = √((x₁ - x₂)² + (y₁ - y₂)² + (z₁ - z₂)² + ...)

3. Example of KNN for Classification (2D Space)

Imagine you have a simple 2D dataset with two classes: Red and Blue. Your goal is to classify a new point based on its K nearest neighbors.

Step-by-Step Example:

Dataset:

We have a set of 6 points in 2D space with labels:

X₁	X₂	Label
1	2	Red
2	3	Red
3	3	Blue
6	6	Blue
7	8	Blue
8	8	Red

New Point:

Let's classify the new point P = (5, 5).

Distance Calculation (using Euclidean distance):

            d(P, (1, 2)) = √((5 - 1)² + (5 - 2)²) = √(16 + 9) = √25 = 5

            d(P, (2, 3)) = √((5 - 2)² + (5 - 3)²) = √(9 + 4) = √13 ≈ 3.61

            d(P, (3, 3)) = √((5 - 3)² + (5 - 3)²) = √(4 + 4) = √8 ≈ 2.83

            d(P, (6, 6)) = √((5 - 6)² + (5 - 6)²) = √(1 + 1) = √2 ≈ 1.41

            d(P, (7, 8)) = √((5 - 7)² + (5 - 8)²) = √(4 + 9) = √13 ≈ 3.61

            d(P, (8, 8)) = √((5 - 8)² + (5 - 8)²) = √(9 + 9) = √18 ≈ 4.24

Find K Nearest Neighbors:

Let’s choose K = 3. The 3 closest points to P = (5, 5) are:

(6, 6) with distance 1.41 (Label: Blue)
(3, 3) with distance 2.83 (Label: Blue)
(2, 3) with distance 3.61 (Label: Red)

Majority Voting:

Among the 3 nearest neighbors, 2 are Blue and 1 is Red. According to majority voting, we classify the point P = (5, 5) as Blue.

Final Classification:

The new point P = (5, 5) is classified as Blue based on the majority label of its K nearest neighbors.

4. Example of KNN for Regression (1D Space)

Let’s now use KNN for regression to predict a value instead of a class label.

Step-by-Step Example:

Dataset:

We have a dataset of 5 data points:

X	Y
1	2
2	3
3	4
5	7
6	8

New Point:

The goal is to predict the value of Y for a new point X = 4.

Distance Calculation:

            d(4, 1) = |4 - 1| = 3

            d(4, 2) = |4 - 2| = 2

            d(4, 3) = |4 - 3| = 1

            d(4, 5) = |4 - 5| = 1

            d(4, 6) = |4 - 6| = 2

Find K Nearest Neighbors:

Let’s choose K = 3. The 3 closest points to X = 4 are:

(3, 4) with distance 1
(5, 7) with distance 1
(2, 3) with distance 2

Prediction (Regression):

The predicted value of Y is the average of the 3 nearest neighbors' Y-values:

            Y_pred = (4 + 7 + 3) / 3 = 14 / 3 ≈ 4.67

Final Prediction:

The predicted value for X = 4 is approximately 4.67.

Search This Blog