Difference between K-Means Clustering and Hierarchical Clustering

Cluster analysis or simply clustering is the process of partitioning a set of data objects into subsets. Each subset is a cluster such that the similarity within the cluster is greater and similarity between the clusters is less.  Cluster analysis is used in many applications such as business intelligence, image pattern recognition, Web search etc. To realize the importance of clustering we need to discuss with an application of it. For example, we take the case of business intelligence. In business intelligence clustering can be used in grouping the customers where the similarity within the group is than the similarity between the groups which facilitates the development of business strategies for enhancing customer relationship management. In the time of grouping we go through several methods such as Partitioning Methods, Hierarchical Methods, Density-Based methods, Grid based Methods.

In Partitioning methods, we find mutually exclusive cluster of spherical shape based on distance. In this case we can use mean or median as cluster center to represent each cluster. It is helpful in small and median size of data. In Hierarchical methods we create hierarchical decomposition of the given set of data. We create hierarchical decomposition in two ways such as from bottom to the top or top to down. On the basis how we create hierarchical decomposition we divide this method in two approaches one is agglomerative approach and other is divisive approach. The main problem of this process is once a step is done it can never be undone. In Density-Based methods we just find arbitrarily shaped clusters which are dense regions of objects in space that are separated by low density regions. In this method each point must belonging in strongly dense neighborhood which detects outlier so easily. Grid-based methods quantize the object space into a finite number of cells that form a grid structure without depending on the number of the data objects.

We discuss till now what is clustering, why clustering is so much required in data analysis and different methods of clustering also. Now our target is to discuss about first two methods in details and also the difference between them.

Before going to the discussion about Partitioning methods let me tell you a funny story. Suppose you and your ten friends plan that when your parents will sleep all of you go to the playground to play cricket. After taking lunch all of your parents are going asleep but suddenly rain starts and after thirty minutes the rain stopped but when they are running through the road their footsteps make cluster on the road. Suddenly your mother is waking up and see that you are not in your house then she calls others’ parents and all of them come out and reach the destination following the cluster on the road. In data analysis same situation occurs if you find the cluster in the data then you catch the pattern of the data. Now we start discussion in details about first two methods.

In Partitioning Method, we organize the objects of a set in to several exclusive groups of clusters. Formally, given a data set D of n objects and k the number of clusters to form, a partitioning algorithm organizes the objects into k partitions where k ≤ n where each partition represents a cluster. Suppose a data set D, contains n objects in Euclidian space. Using partitioning method we distribute n data points in k non overlapping cluster such as c[1],c[2],…,c[k].where c[i] Ո c[j]=Փ . Now our objective is to find a representative of each of k clusters. In this case we can apply two types of representative of each of k clusters. One is centroid which is centre of each clusters and another is most popular elements of each cluster such that the variation within a cluster is minimal. If we use mean as representative of each cluster, then the clustering method is named as K-means clustering and if we use medoid as representative of each cluster then the method of clustering is named as K-Medoid clustering. Mathematically it can be said in this way that let p[i] be the centre of ith cluster c[i] and c[i,k] be the kth point of ith cluster if dist(p[i],c[i,k]) is minimum then p[i] be the centre of ith cluster. Mathematically we check the closest point p[i] of the ith cluster by minimizing sum of square of errors or sum of absolute errors. Let ith cluster c[i] consists mi points such that

p[i] be the centre of the ith cluster.
c[i,j] be the jth point of the ith cluster.

We know that SSA is minimum when p[i] = median(c[i])

Perform k-Means clustering we just follow this type of algorithm.

Input:

k: The number of clusters
D: A data set containing n objects
Output: A set of k clusters

Method:

  • Arbitrarily choose k objects from D as the initial cluster centers;
  • Repeat;
  • Re assign each object to the cluster to which the objects the most similar based on the mean of the objects in the cluster;
  • Update the cluster means, that is calculate the mean values of the objects for each cluster;
  • Until no change.

For example, suppose the data points are

D = {2,4,10,12,3,20,30,11,25}
k=2
Let p[1]=2, p[2]=4
C[1]={2,3}, C[2]={4,10,12,20,30,11,25}
p[1]=2, p[2]=4

Repeat the process
p[1]=3, p[2]=18
C[1]={2,3,4,10}, C[2]={12,20,30,11,,25}

Repeat then
p[1]=4.75, p[2]=19.6
C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat then
p[1]=7, p[2]=25
C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat also
p[1]=7, p[2]=25

But we see that in the last step p[1] and p[2] is equal of the previous step so we stop the iteration in the previous step.

Now we discuss about Hierarchical clustering. While partitioning methods meet basic clustering requirements of organizing set of objects into a number of exclusive groups, in some situations we may want to partition our data into groups at different levels such as in a hierarchy. This clustering method works by grouping data objects into a hierarchy or “tree” of clusters. For example, a company decides to share the entire information about their people. So they divide the employee in three groups such as executives, managers and staff. Then they divide the staff into three sub groups such as senior officer, officer and trainee. This is the method of Hierarchical Clustering.

Now we concentrate how we do Hierarchical clustering. There are two approaches to do this clustering one is Agglomerative and other is Divisive. In Agglomerative approach we start with the points as individual clusters and at each step merge the closest pair of clusters just like from bottom to the top. In Divisive approach we need to decide which cluster to split at each step and how to do splitting. We start with one of all-inclusive clusters and then divide it at each step in sub clusters until we get only singleton cluster of individual points remain. This method looks like from top to bottom approach. A Hierarchical Clustering is often displayed graphically using tree like diagram which is called a Dendrogram.

To check the entire data set in wine1:

Adding filter:

Running Kmeans Clustering on the above dataset

Clustering vector:

Within cluster sum of squares by cluster:

[1] 1343168.5

[2] 443166.7

[3] 566572.5 (between_SS / total_SS =  86.5 %)

Available components:

[1] “cluster”     [2] “centers”   [3] “totss”   [4] “withinss”    [5] “tot.withinss”

[6] “betweenss”   [7] “size”    [8] “iter”      [9] “ifault”

From above two diagrams we realize that there is some differences between two methods of clustering. Now we discuss the difference between these two types of clustering. Hierarchical clustering builds clusters within clusters and does not require a pre specified number of clusters like k means. Now we discuss this difference in details. The most important difference is the hierarchy. Actually, there are two different approaches that fall under this name: top-down and bottom-up.

In top-down hierarchical clustering, we divide the data into 2 clusters (using k-means with k=2k=2, for example). Then, for each cluster, we can repeat this process, until all the clusters are too small or too similar for further clustering to make sense, or until we reach a preset number of clusters.

In bottom-up hierarchical clustering, we start with each data item having its own cluster. We then look for the two items that are most similar, and combine them in a larger cluster. We keep repeating until all the clusters we have left are too dissimilar to be gathered together, or until we reach a preset number of clusters.

In k-means clustering, we try to identify the best way to divide the data into k sets simultaneously. A good approach is to take k items from the data set as initial cluster representatives, assign all items to the cluster whose representative is closest, and then calculate the cluster mean as a new representative; until it converges (all clusters stay the same).

You might also like More from author

Leave A Reply

Your email address will not be published.