Difference between K Means Clustering and Hierarchical Clustering

Cluster analysis or simply k means clustering is the process of partitioning a set of data objects into subsets. Each subset is a cluster such that the similarity within the cluster is greater and the similarity between the clusters is less. Cluster analysis is used in many applications such as business intelligence, image pattern recognition, Web search etc. To realize the importance of clustering we need to discuss with an application of it.

For example, we take the case of business intelligence. In business, intelligence clustering can be used in grouping the customers where the similarity within the group is then the similarity between the groups which facilitates the development of business strategies for enhancing customer relationship management. In the time of grouping, we go through several methods such as Partitioning Methods, Hierarchical Methods, Density-Based methods, Grid-based Methods.

In Partitioning methods, we find the mutually exclusive cluster of spherical shape based on distance. In this case, we can use mean or median as a cluster centre to represent each cluster. It is helpful in the small and medium size of data.

In Hierarchical methods, we create hierarchical decomposition of the given set of data. We create hierarchical decomposition in two ways such as from bottom to the top or top to down. On the basis how we create hierarchical decomposition we divide this method into two approaches one is agglomerative approach and other is the divisive approach.

The main problem of this process is once a step is done it can never be undone. In Density-Based methods, we just find arbitrarily shaped clusters which are dense regions of objects in space that are separated by low-density regions. In this method, each point must be belonging in the strongly dense neighbourhood which detects outlier so easily. Grid-based methods quantize the object space into a finite number of cells that form a grid structure without depending on the number of the data objects.

We discuss till now what is clustering, why clustering is so much required in data analysis and different methods of clustering also. Now our target is to discuss first two methods in details and also the difference between them.

Before going to the discussion about Partitioning methods let me tell you a funny story. Suppose you and your ten friends plan that when your parents will sleep all of you go to the playground to play cricket. After taking lunch all of your parents are going asleep but suddenly rain starts and after thirty minutes the rain stopped but when they are running through the road their footsteps make the cluster on the road.

Suddenly your mother is waking up and see that you are not in your house then she calls others’ parents and all of them come out and reach the destination following the cluster on the road. In data analysis, the same situation occurs if you find the cluster in the data then you catch the pattern of the data. Now we start the discussion in details about the first two methods.

In Partitioning Method, we organize the objects of a set into several exclusive groups of clusters. Formally, given a data set D of n objects and k the number of clusters to form, a partitioning algorithm organizes the objects into k partitions where k ≤ n where each partition represents a cluster.

Suppose a data set D, contains n objects in Euclidian space. Using partitioning method we distribute n data points in k non overlapping cluster such as c[1],c[2],…,c[k].where c[i] Ո c[j]=Փ . Now our objective is to find a representative of each of k clusters. In this case, we can apply two types of the representative of each of k clusters. One is centroid which is the centre of each cluster and another is most popular elements of each cluster such that the variation within a cluster is minimal.

If we use mean as representative of each cluster, then the clustering method is named as K-means clustering and if we use medoid as representative of each cluster then the method of clustering is named as K-Medoid clustering. Mathematically it can be said in this way that let p[i] be the centre of ith cluster c[i] and c[i,k] be the kth point of ith cluster if dist(p[i],c[i,k]) is minimum then p[i] be the centre of ith cluster. Mathematically we check the closest point p[i] of the ith cluster by minimizing the sum of the square of errors or sum of absolute errors. Let ith cluster c[i] consists mi points such that

p[i] be the centre of the ith cluster.
c[i,j] be the jth point of the ith cluster.

We know that SSA is minimum when p[i] = median(c[i])

Perform k-Means clustering we just follow this type of algorithm.


k: The number of clusters
D: A data set containing n objects
Output: A set of k clusters


  • Arbitrarily choose k objects from D as the initial cluster centres;
  • Repeat;
  • Reassign each object to the cluster to which the objects the most similar based on the mean of the objects in the cluster;
  • Update the cluster means, that calculates the mean values of the objects for each cluster;
  • Until no change.

For example, suppose the data points are

D = {2,4,10,12,3,20,30,11,25}
Let p[1]=2, p[2]=4
C[1]={2,3}, C[2]={4,10,12,20,30,11,25}
p[1]=2, p[2]=4

Repeat the process
p[1]=3, p[2]=18
C[1]={2,3,4,10}, C[2]={12,20,30,11,,25}

Repeat then
p[1]=4.75, p[2]=19.6
C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat then
p[1]=7, p[2]=25
C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat also
p[1]=7, p[2]=25

But we see that in the last step p[1] and p[2] is equal of the previous step so we stop the iteration in the previous step.

Now we discuss Hierarchical clustering. While partitioning methods meet basic clustering requirements of organizing the set of objects into a number of exclusive groups, in some situations we may want to partition our data into groups at different levels such as in a hierarchy.

This clustering method works by grouping data objects into a hierarchy or “tree” of clusters. For example, a company decides to share the entire information about their people. So they divide the employee into three groups such as executives, managers and staff. Then they divide the staff into three subgroups such as senior officer, officer and trainee. This is the method of Hierarchical Clustering.

Now we concentrate on how we do Hierarchical clustering. There are two approaches to do this clustering one is Agglomerative and other is Divisive. In Agglomerative approach, we start with the points as individual clusters and at each step merge the closest pair of clusters just like from bottom to the top.

In Divisive approach, we need to decide which cluster to split at each step and how to do the splitting. We start with one of the all-inclusive clusters and then divide it at each step in sub-clusters until we get only singleton cluster of individual points to remain. This method looks from top to bottom approach. Hierarchical Clustering is often displayed graphically using a tree-like diagram which is called a Dendrogram.

To check the entire data set in wine1:

Adding filter:

Running Kmeans Clustering on the above dataset

Clustering vector:

Within cluster sum of squares by cluster:

[1] 1343168.5

[2] 443166.7

[3] 566572.5 (between_SS / total_SS =  86.5 %)

Available components:

[1] “cluster”     [2] “centers”   [3] “totss”   [4] “withinss”    [5] “tot.withinss”

[6] “betweenss”   [7] “size”    [8] “iter”      [9] “ifault”

From the above two diagrams, we realize that there are some differences between the two methods of clustering. Now we discuss the difference between these two types of clustering. Hierarchical clustering builds clusters within clusters and does not require a pre-specified number of clusters like k means. Now we discuss this difference in details. The most important difference is the hierarchy. Actually, there are two different approaches that fall under this name: top-down and bottom-up.

In top-down hierarchical clustering, we divide the data into 2 clusters (using k-means with k=2k=2, for example). Then, for each cluster, we can repeat this process, until all the clusters are too small or too similar for further clustering to make sense, or until we reach a preset number of clusters.

In bottom-up hierarchical clustering, we start with each data item having its own cluster. We then look for the two items that are most similar and combine them in a larger cluster. We keep repeating until all the clusters we have left are too dissimilar to be gathered together, or until we reach a preset number of clusters.

In k-means clustering, we try to identify the best way to divide the data into k sets simultaneously. A good approach is to take k items from the data set as initial cluster representatives, assign all items to the cluster whose representative is closest, and then calculate the cluster mean as a new representative; until it converges (all clusters stay the same).

You might also like More from author