Clustering Analysis in Excel: Grouping Data Points Using K-Means Algorithm

Introduction
Data analysis plays a prominent role in business intelligence, helping organisations extract valuable insights from large datasets. One of the most widely used techniques in data analysis is clustering, which groups similar data points together based on specific characteristics. Among various clustering algorithms, K-Means is one of the most popular due to its simplicity, efficiency, and ability to handle large datasets.
While advanced data science tools such as Python and R are commonly used for clustering, Excel can also be leveraged to perform K-Means clustering effectively. Many professionals taking a Data Analyst Course learn to apply K-Means clustering in Excel as part of their training to analyse and segment data efficiently.
This article explores how to conduct clustering analysis in Excel using the K-Means algorithm, covering key concepts, step-by-step implementation, advantages, and best practices.
Understanding Clustering Analysis
Clustering analysis is an unsupervised machine learning technique used to group similar data points together based on patterns and characteristics. It helps businesses and analysts:
- Segment customers based on purchasing behaviour.
- Identify patterns in large datasets.
- Optimise marketing strategies by targeting similar customer groups.
- Detect anomalies or outliers in data.
Clustering does not require labelled data, making it highly useful for exploratory data analysis. Students in a well-rounded data analyst learning program such as a Data Analytics Course in Mumbai, often learn clustering techniques to analyse business, financial, and customer datasets effectively.
What is the K-Means Algorithm?
The K-Means algorithm is a partition-based clustering method that divides a dataset into K clusters, where K is a user-defined number. The algorithm works as follows:
- Initialise K cluster centres randomly.
- Assign each data point to the nearest cluster based on the Euclidean distance.
- Recalculate the centroid of each cluster as the mean of all data points assigned to it.
- Repeat steps 2 and 3 until cluster assignments remain stable or a stopping criterion is met.
This iterative process ensures that similar data points are grouped together while minimising intra-cluster variance. In many a Data Analyst Course, K-Means clustering is introduced as a fundamental technique in machine learning and data segmentation.
Preparing Data for K-Means in Excel
Before performing K-Means clustering in Excel, it is essential to prepare and format the data correctly:
- Ensure data is numerical – K-Means relies on distance calculations, which require numerical values.
- Standardise data (optional) – If variables have different scales, normalising them can improve clustering accuracy.
- Remove duplicates and missing values – Incomplete or duplicate data can affect cluster assignments.
Example Dataset for K-Means Clustering
Assume we have a dataset containing customer purchase behaviour:
Customer ID | Annual Income ($) | Spending score (1-100) |
1 | 15,000 | 45 |
2 | 40,000 | 80 |
3 | 55,000 | 30 |
4 | 22,000 | 55 |
5 | 85,000 | 70 |
… | … | … |
This dataset will be used to segment customers based on income and spending patterns. K-Means clustering is frequently used in any Data Analyst Course to demonstrate customer segmentation and data-driven marketing strategies.
Implementing K-Means Clustering in Excel
Excel does not have a built-in K-Means clustering function, but it can be implemented using formulas and iterative calculations. Here is the procedure for this as will be generally taught in a standard data course such; for example, a Data Analytics Course in Mumbai.
Step 1: Choose the Number of Clusters (K)
The number of clusters (K) should be determined based on the dataset.
A common approach is the Elbow Method, which involves plotting within-cluster variance vs. K and identifying the point where adding more clusters no longer significantly reduces variance.
For this example, assume K = 3.
Step 2: Initialise Cluster Centroids
Select K random data points as initial centroids.
Place these values in separate cells for reference.
Example:
Cluster | Annual Income ($) | Spending score (1-100) |
1 | 20,000 | 50 |
2 | 60,000 | 40 |
3 | 90,000 | 80 |
Step 3: Assign Each Data Point to the Nearest Cluster
Use the Euclidean distance formula to calculate the distance of each data point to all cluster centroids:
(Distance)2 = (X1−X2)2 + (Y1−Y2)2
In Excel, the formula for distance between a customer and Cluster 1 (assuming Annual Income in Column B and Spending Score in Column C) is:
excel
=SQRT((B2 – $E$2)^2 + (C2 – $F$2)^2)
Repeat this for all clusters, and assign each data point to the nearest cluster.
Step 4: Compute New Cluster Centroids
After assigning data points, calculate the new centroid for each cluster:
The new centroid is the mean of all points assigned to a cluster:
excel
=AVERAGEIFS(B:B, D:D, “Cluster 1”)
=AVERAGEIFS(C:C, D:D, “Cluster 1”)
Replace old centroids with new values.
Step 5: Repeat Until Convergence
Recalculate distances and reassign points to clusters.
Repeat the centroid recalculation process.
Stop when cluster assignments no longer change or a set number of iterations (e.g., 10) is reached.
Visualising Clusters in Excel
Once the clusters are determined, use Excel’s scatter plot to visualise them:
- Select the data (Annual Income and Spending Score).
- Insert a Scatter Plot (Excel > Insert > Scatter Chart).
- Use different colours for each cluster (Format Data Series).
- Plot centroids separately for clarity.
This visualisation helps understand how well the clusters are formed.
Advantages of K-Means Clustering in Excel
- No additional software is required. It can be implemented using built-in Excel functions.
- Easy visualisation – Scatter plots help illustrate clusters effectively.
- Quick exploratory analysis – Useful for segmenting data without advanced coding knowledge.
- Works on small to medium datasets – Excel can handle clustering efficiently for datasets under 50,000 rows.
Many professionals enrol in a reputed, intermediate-level data course such as a Data Analytics Course in Mumbai to acquire the skills needed to perform clustering in Excel before transitioning to more advanced tools like Python and R.
Conclusion
K-Means clustering is a powerful method for grouping similar data points. While Excel lacks built-in clustering functions, it can still be used effectively for K-Means analysis. Analysts can segment datasets into meaningful clusters using only Excel formulas by leveraging distance calculations, iterative assignments, and centroid updates.
While Excel is suitable for small-scale clustering, businesses dealing with large datasets may benefit from more advanced tools like Python, R, or Power BI for faster and more scalable clustering analysis.
By mastering K-Means clustering in Excel, professionals—including those taking a Data Analyst Course—can unlock valuable insights for customer segmentation, market analysis, and data-driven decision-making.
Business Name: ExcelR- Data Science, Data Analytics, Business Analyst Course Training Mumbai
Address: Unit no. 302, 03rd Floor, Ashok Premises, Old Nagardas Rd, Nicolas Wadi Rd, Mogra Village, Gundavali Gaothan, Andheri E, Mumbai, Maharashtra 400069, Phone: 09108238354, Email: enquiry@excelr.com.