K-Means Algorithm for Clustering: Unsupervised Learning

What is the k-Means algorithm? , How does it work? , What are some limitations of k-means clustering?

Manik Soni
5 min readOct 1, 2020

What is the K-Means Algorithm?

K- Means Algorithm is an unsupervised learning algorithm. K-Means Algorithm helps to categorize the dataset, which further helps us to give information about the categorization or information gain about the dataset.

How does it work?

There are sequential steps that will help us to do categorization. So let us consider a dataset having some features :

Dataset

Step 1. Choose the number K of the clusters. We can choose any random number of clusters. Let K=2.

Step 2: Select at random K points, the centroids(not necessarily from your dataset)

Step 3: Assign each data points to the closest centroid that will help to form K clusters.

Now, join the centroids with a line

and draw a line perpendicular to the line joining the centroid. This perpendicular line helps us to do categorization.

Now assign a particular color to each category in order to do visualization.

Step 4: Compute and place the new centroid of each cluster. So in order to calculate new centroid points, we take an average of all ‘x’ and ‘y’ axis points and assign the centroid to the new average ‘x’ and ‘y’ points.

Step 5: Reassign each data point to the new closest centroid.

If any reassignment took place, go to Step 4, otherwise, go to FIN(END or model is ready)

So, reassignment takes place for 2 points.

Now we go to Step 4. and recalculate the centroid.

Again perform the above Step 5

Now reassign the data points.

Go to Step 4 to calculate the centroid.

Step 5 to reassign each data point.

Now reassignment takes place again and goes to step 4.

Now go to Step 5

Now, We can see that no reassignment takes place which means our model is ready.

This is how K- Means Clustering Algorithm works.

What are some limitations of k-means clustering?

With the potential to do the classification, there is a limitation of K-Means clustering that is a Random Initialization Trap.

Let's understand this concept with an example,

So, the above is a scatter plot, and we need to do the categorization of the above points.

So manually if we do the categorization, we can analyze that there are 3 clusters.

Now we initialize the centroid, to the above clusters manually.

So, these are the clusters that are formed

But now, if we perform the steps of the K-Means Algorithm then,

According to the K-Means Algorithm,

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Now, we have performed all the steps to do the categorization.

If just compare with our previous result then you can see that categorization is different. This is the limitation of K-Means Algorithm.

--

--

No responses yet