Skip to main content

K-Nearest Neighbors (KNN)


K-Nearest Neighbors (KNN) Algorithm: Simple Math and Example

Let's break down the mathematical concept behind the KNN algorithm and go through a simple example.

1. Mathematical Concept

KNN works by finding the K closest points in the feature space (based on distance metrics such as Euclidean distance) to classify or predict a new data point.

  • Step 1: Choose a number K (the number of nearest neighbors).
  • Step 2: Calculate the distance between the new data point and all other points in the training dataset.
  • Step 3: Sort the distances and pick the K smallest distances.
  • Step 4: For classification, assign the class label based on the majority of the K neighbors.
  • Step 5: For regression, calculate the average value of the K neighbors and assign it as the prediction.

2. Distance Metric: Euclidean Distance

For classification and regression, KNN typically uses the Euclidean distance to measure how "far" a point is from others.

The Euclidean distance between two points (x₁, y₁) and (x₂, y₂) in 2D space is given by:

            d = √((x₁ - x₂)² + (y₁ - y₂)²)
        

For higher-dimensional spaces (e.g., 3D, 4D), it’s generalized as:

            d = √((x₁ - x₂)² + (y₁ - y₂)² + (z₁ - z₂)² + ...)
        

3. Example of KNN for Classification (2D Space)

Imagine you have a simple 2D dataset with two classes: Red and Blue. Your goal is to classify a new point based on its K nearest neighbors.

Step-by-Step Example:

Dataset:

We have a set of 6 points in 2D space with labels:

X₁ X₂ Label
1 2 Red
2 3 Red
3 3 Blue
6 6 Blue
7 8 Blue
8 8 Red

New Point:

Let's classify the new point P = (5, 5).

Distance Calculation (using Euclidean distance):

            d(P, (1, 2)) = √((5 - 1)² + (5 - 2)²) = √(16 + 9) = √25 = 5
        
            d(P, (2, 3)) = √((5 - 2)² + (5 - 3)²) = √(9 + 4) = √13 ≈ 3.61
        
            d(P, (3, 3)) = √((5 - 3)² + (5 - 3)²) = √(4 + 4) = √8 ≈ 2.83
        
            d(P, (6, 6)) = √((5 - 6)² + (5 - 6)²) = √(1 + 1) = √2 ≈ 1.41
        
            d(P, (7, 8)) = √((5 - 7)² + (5 - 8)²) = √(4 + 9) = √13 ≈ 3.61
        
            d(P, (8, 8)) = √((5 - 8)² + (5 - 8)²) = √(9 + 9) = √18 ≈ 4.24
        

Find K Nearest Neighbors:

Let’s choose K = 3. The 3 closest points to P = (5, 5) are:

  • (6, 6) with distance 1.41 (Label: Blue)
  • (3, 3) with distance 2.83 (Label: Blue)
  • (2, 3) with distance 3.61 (Label: Red)

Majority Voting:

Among the 3 nearest neighbors, 2 are Blue and 1 is Red. According to majority voting, we classify the point P = (5, 5) as Blue.

Final Classification:

The new point P = (5, 5) is classified as Blue based on the majority label of its K nearest neighbors.

4. Example of KNN for Regression (1D Space)

Let’s now use KNN for regression to predict a value instead of a class label.

Step-by-Step Example:

Dataset:

We have a dataset of 5 data points:

X Y
1 2
2 3
3 4
5 7
6 8

New Point:

The goal is to predict the value of Y for a new point X = 4.

Distance Calculation:

            d(4, 1) = |4 - 1| = 3
        
            d(4, 2) = |4 - 2| = 2
        
            d(4, 3) = |4 - 3| = 1
        
            d(4, 5) = |4 - 5| = 1
        
            d(4, 6) = |4 - 6| = 2
        

Find K Nearest Neighbors:

Let’s choose K = 3. The 3 closest points to X = 4 are:

  • (3, 4) with distance 1
  • (5, 7) with distance 1
  • (2, 3) with distance 2

Prediction (Regression):

The predicted value of Y is the average of the 3 nearest neighbors' Y-values:

            Ypred = (4 + 7 + 3) / 3 = 14 / 3 ≈ 4.67
        

Final Prediction:

The predicted value for X = 4 is approximately 4.67.

Further Reading


People are good at skipping over material they already know!

View Related Topics to







Contact Us

Name

Email *

Message *

Popular Posts

Simulation of ASK, FSK, and PSK using MATLAB Simulink (with Online Simulator)

📘 Overview 🧮 How to use MATLAB Simulink 🧮 Simulation of ASK using MATLAB Simulink 🧮 Simulation of FSK using MATLAB Simulink 🧮 Simulation of PSK using MATLAB Simulink 🧮 Simulator for ASK, FSK, and PSK 🧮 Digital Signal Processing Simulator 📚 Further Reading ASK, FSK & PSK HomePage MATLAB Simulation Simulation of Amplitude Shift Keying (ASK) using MATLAB Simulink In Simulink, we pick different components/elements from MATLAB Simulink Library. Then we connect the components and perform a particular operation. Result A sine wave source, a pulse generator, a product block, a mux, and a scope are shown in the diagram above. The pulse generator generates the '1' and '0' bit sequences. Sine wave sources produce a specific amplitude and frequency. The scope displays the modulated signal as well as the original bit sequence created by the pulse generator. Mux i...

BER vs SNR for M-ary QAM, M-ary PSK, QPSK, BPSK, ...(MATLAB Code + Simulator)

Bit Error Rate (BER) & SNR Guide Analyze communication system performance with our interactive simulators and MATLAB tools. 📘 Theory 🧮 Simulators 💻 MATLAB Code 📚 Resources BER Definition SNR Formula BER Calculator MATLAB Comparison 📂 Explore M-ary QAM, PSK, and QPSK Topics ▼ 🧮 Constellation Simulator: M-ary QAM 🧮 Constellation Simulator: M-ary PSK 🧮 BER calculation for ASK, FSK, and PSK 🧮 Approaches to BER vs SNR What is Bit Error Rate (BER)? The BER indicates how many corrupted bits are received compared to the total number of bits sent. It is the primary figure of merit for a...

Antenna Gain-Combining Methods - EGC, MRC, SC, and RMSGC

📘 Overview 🧮 Equal gain combining (EGC) 🧮 Maximum ratio combining (MRC) 🧮 Selective combining (SC) 🧮 Root mean square gain combining (RMSGC) 🧮 Zero-Forcing (ZF) Combining 🧮 MATLAB Code 📚 Further Reading  There are different antenna gain-combining methods. They are as follows. 1. Equal gain combining (EGC) 2. Maximum ratio combining (MRC) 3. Selective combining (SC) 4. Root mean square gain combining (RMSGC) 5. Zero-Forcing (ZF) Combining  1. Equal gain combining method Equal Gain Combining (EGC) is a diversity combining technique in which the receiver aligns the phase of the received signals from multiple antennas (or channels) but gives them equal amplitude weight before summing. This means each received signal is phase-corrected to be coherent with others, but no scaling is applied based on signal strength or channel quality (unlike MRC). Mathematically, for received signa...

Constellation Diagrams of ASK, PSK, and FSK (with MATLAB Code + Simulator)

Constellation Diagrams: ASK, FSK, and PSK Comprehensive guide to signal space representation, including interactive simulators and MATLAB implementations. 📘 Overview 🧮 Simulator ⚖️ Theory 📚 Resources Definitions Constellation Tool Key Points MATLAB Code 📂 Other Topics: M-ary PSK & QAM Diagrams ▼ 🧮 Simulator for M-ary PSK Constellation 🧮 Simulator for M-ary QAM Constellation BASK (Binary ASK) Modulation Transmits one of two signals: 0 or -√Eb, where Eb​ is the energy per bit. These signals represent binary 0 and 1. BFSK (Binary FSK) Modulation Transmits one ...

Coherence Bandwidth and Coherence Time (with MATLAB + Simulator)

🧮 Coherence Bandwidth 🧮 Coherence Time 🧮 MATLAB Code s 📚 Further Reading For Doppler Delay or Multi-path Delay Coherence time T coh ∝ 1 / v max (For slow fading, coherence time T coh is greater than the signaling interval.) Coherence bandwidth W coh ∝ 1 / Ï„ max (For frequency-flat fading, coherence bandwidth W coh is greater than the signaling bandwidth.) Where: T coh = coherence time W coh = coherence bandwidth v max = maximum Doppler frequency (or maximum Doppler shift) Ï„ max = maximum excess delay (maximum time delay spread) Notes: The notation v max −1 and Ï„ max −1 indicate inverse proportionality. Doppler spread refers to the range of frequency shifts caused by relative motion, determining T coh . Delay spread (or multipath delay spread) determines W coh . Frequency-flat fading occurs when W coh is greater than the signaling bandwidth. Coherence Bandwidth Coherence bandwidth is...

OFDM Symbols and Subcarriers Explained

This article explains how OFDM (Orthogonal Frequency Division Multiplexing) symbols and subcarriers work. It covers modulation, mapping symbols to subcarriers, subcarrier frequency spacing, IFFT synthesis, cyclic prefix, and transmission. Step 1: Modulation First, modulate the input bitstream. For example, with 16-QAM , each group of 4 bits maps to one QAM symbol. Suppose we generate a sequence of QAM symbols: s0, s1, s2, s3, s4, s5, …, s63 Step 2: Mapping Symbols to Subcarriers Assume N sub = 8 subcarriers. Each OFDM symbol in the frequency domain contains 8 QAM symbols (one per subcarrier): Mapping (example) OFDM symbol 1 → s0, s1, s2, s3, s4, s5, s6, s7 OFDM symbol 2 → s8, s9, s10, s11, s12, s13, s14, s15 … OFDM sym...

BER performance of QPSK with BPSK, 4-QAM, 16-QAM, 64-QAM, 256-QAM, etc (MATLAB + Simulator)

📘 Overview 📚 QPSK vs BPSK and QAM: A Comparison of Modulation Schemes in Wireless Communication 📚 Real-World Example 🧮 MATLAB Code 📚 Further Reading   QPSK provides twice the data rate compared to BPSK. However, the bit error rate (BER) is approximately the same as BPSK at low SNR values when gray coding is used. On the other hand, QPSK exhibits similar spectral efficiency to 4-QAM and 16-QAM under low SNR conditions. In very noisy channels, QPSK can sometimes achieve better spectral efficiency than 4-QAM or 16-QAM. In practical wireless communication scenarios, QPSK is commonly used along with QAM techniques, especially where adaptive modulation is applied. Modulation Bits/Symbol Points in Constellation Usage Notes BPSK 1 2 Very robust, used in weak signals QPSK 2 4 Balanced speed & reliability 4-QAM ...

ASK, FSK, and PSK (with MATLAB + Online Simulator)

📘 ASK Theory 📘 FSK Theory 📘 PSK Theory 📊 Comparison 🧮 MATLAB Codes 🎮 Simulator ASK or OFF ON Keying ASK is a simple (less complex) Digital Modulation Scheme where we vary the modulation signal's amplitude or voltage by the message signal's amplitude or voltage. We select two levels (two different voltage levels) for transmitting modulated message signals. Example: "+5 Volt" (upper level) and "0 Volt" (lower level). To transmit binary bit "1", the transmitter sends "+5 Volts", and for bit "0", it sends no power. The receiver uses filters to detect whether a binary "1" or "0" was transmitted. Fig 1: Output of ASK, FSK, and PSK modulation using MATLAB for a data stream "1 1 0 0 1 0 1 0" ( Get MATLAB Code ) ...