What is K-Means Clustering?
K-Means Clustering is an unsupervised learning algorithm that groups data into K clusters based on similarity.
- K = number of groups you want
- Means = average (center of each group)
Intuition
You tell the algorithm:
“Divide this data into K groups.”
- Place K random centers (centroids)
- Assign each point to nearest center
- Recalculate the center
- Repeat until stable
Step-by-Step
- Choose number of clusters (K)
- Initialize K centroids randomly
- Assign each data point to nearest centroid
- Update centroids (mean of points)
- Repeat until no change
Example
Student Marks:
[35, 40, 45, 70, 75, 80]
K = 2
Initial Centroids:
C1 = 40 C2 = 75
Cluster Assignment:
- Cluster 1 → 35, 40, 45
- Cluster 2 → 70, 75, 80
Updated Centroids:
C1 = 40 C2 = 75
Algorithm stops as no change occurs.
Final Clusters
- Cluster 1 → Low scores
- Cluster 2 → High scores
Real-Life Use Cases
- Customer segmentation
- Image compression
- Document grouping
- Market analysis
Advantages
- Simple and fast
- Works well for large datasets
Limitations
- Need to choose K beforehand
- Sensitive to initial centroids
- Not good for irregular shapes
Summary
K-Means groups similar data into K clusters by updating cluster centers iteratively.