# Difference between K Means Clustering and Hierarchical Clustering

Cluster analysis or simply k means clustering is the process of partitioning a set of data objects into subsets. Each subset is a cluster such that the similarity within the cluster is greater and the similarity between the clusters is less. Cluster analysis is used in many applications such as business intelligence, image pattern recognition, Web search etc. To realize the importance of clustering we need to discuss with an application of it.

For example, we take the case of business intelligence. In business, intelligence clustering can be used in grouping the customers where the similarity within the group is then the similarity between the groups which facilitates the development of business strategies for enhancing customer relationship management. In the time of grouping, we go through several methods such as Partitioning Methods, Hierarchical Methods, Density-Based methods, Grid-based Methods.

In Partitioning methods, we find the mutually exclusive cluster of spherical shape based on distance. In this case, we can use mean or median as a cluster centre to represent each cluster. It is helpful in the small and medium size of data.

In Hierarchical methods, we create hierarchical decomposition of the given set of data. We create hierarchical decomposition in two ways such as from bottom to the top or top to down. On the basis how we create hierarchical decomposition we divide this method into two approaches one is agglomerative approach and other is the divisive approach.

The main problem of this process is once a step is done it can never be undone. In Density-Based methods, we just find arbitrarily shaped clusters which are dense regions of objects in space that are separated by low-density regions. In this method, each point must be belonging in the strongly dense neighbourhood which detects outlier so easily. Grid-based methods quantize the object space into a finite number of cells that form a grid structure without depending on the number of the data objects.

We discuss till now what is clustering, why clustering is so much required in data analysis and different methods of clustering also. Now our target is to discuss first two methods in details and also the difference between them.

Before going to the discussion about Partitioning methods let me tell you a funny story. Suppose you and your ten friends plan that when your parents will sleep all of you go to the playground to play cricket. After taking lunch all of your parents are going asleep but suddenly rain starts and after thirty minutes the rain stopped but when they are running through the road their footsteps make the cluster on the road.

Suddenly your mother is waking up and see that you are not in your house then she calls others’ parents and all of them come out and reach the destination following the cluster on the road. In data analysis, the same situation occurs if you find the cluster in the data then you catch the pattern of the data. Now we start the discussion in details about the first two methods.

In Partitioning Method, we organize the objects of a set into several exclusive groups of clusters. Formally, given a data set D of n objects and k the number of clusters to form, a partitioning algorithm organizes the objects into k partitions where k ≤ n where each partition represents a cluster.

Suppose a data set D, contains n objects in Euclidian space. Using partitioning method we distribute **n** data points in **k** non overlapping cluster such as c[1],c[2],…,c[k].where c[i] Ո c[j]=Փ . Now our objective is to find a representative of each of k clusters. In this case, we can apply two types of the representative of each of k clusters. One is centroid which is the centre of each cluster and another is most popular elements of each cluster such that the variation within a cluster is minimal.

If we use mean as representative of each cluster, then the clustering method is named as K-means clustering and if we use medoid as representative of each cluster then the method of clustering is named as K-Medoid clustering. Mathematically it can be said in this way that let p[i] be the centre of i^{th} cluster c[i] and c[i,k] be the k^{th} point of i^{th} cluster if dist(p[i],c[i,k]) is minimum then p[i] be the centre of i^{th} cluster. Mathematically we check the closest point p[i] of the i^{th} cluster by minimizing the sum of the square of errors or sum of absolute errors. Let i^{th} cluster c[i] consists m_{i} points such that

p[i] be the centre of the i^{th} cluster.

c[i,j] be the j^{th} point of the i^{th} cluster.

We know that SSA is minimum when p[i] = median(c[i])

Perform k-Means clustering we just follow this type of algorithm.

**Input:**

**k:** The number of clusters

**D:** A data set containing n objects

**Output:** A set of k clusters

Method:

- Arbitrarily choose k objects from D as the initial cluster centres;
- Repeat;
- Reassign each object to the cluster to which the objects the most similar based on the mean of the objects in the cluster;
- Update the cluster means, that calculates the mean values of the objects for each cluster;
- Until no change.

For example, suppose the data points are

D = {2,4,10,12,3,20,30,11,25}

k=2

Let p[1]=2, p[2]=4

C[1]={2,3}, C[2]={4,10,12,20,30,11,25}

p[1]=2, p[2]=4

Repeat the process

p[1]=3, p[2]=18

C[1]={2,3,4,10}, C[2]={12,20,30,11,,25}

Repeat then

p[1]=4.75, p[2]=19.6

C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat then

p[1]=7, p[2]=25

C[1]={2,3,4,10,11,12}, C[2]={20,30,25}

Repeat also

p[1]=7, p[2]=25

But we see that in the last step p[1] and p[2] is equal of the previous step so we stop the iteration in the previous step.

Now we discuss Hierarchical clustering. While partitioning methods meet basic clustering requirements of organizing the set of objects into a number of exclusive groups, in some situations we may want to partition our data into groups at different levels such as in a hierarchy.

This clustering method works by grouping data objects into a hierarchy or “tree” of clusters. For example, a company decides to share the entire information about their people. So they divide the employee into three groups such as executives, managers and staff. Then they divide the staff into three subgroups such as senior officer, officer and trainee. This is the method of Hierarchical Clustering.

Now we concentrate on how we do Hierarchical clustering. There are two approaches to do this clustering one is Agglomerative and other is Divisive. In Agglomerative approach, we start with the points as individual clusters and at each step merge the closest pair of clusters just like from bottom to the top.

In Divisive approach, we need to decide which cluster to split at each step and how to do the splitting. We start with one of the all-inclusive clusters and then divide it at each step in sub-clusters until we get only singleton cluster of individual points to remain. This method looks from top to bottom approach. Hierarchical Clustering is often displayed graphically using a tree-like diagram which is called a Dendrogram.

1 2 3 4 5 6 7 8 9 10 |
wineurl<-"https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data" wine<-read.csv(wineurl) wine1<-edit(wine) head(wine1) wine1 wine2<-wine1[,which(names(wine1) != "cultivar")] wine2 winek3<-kmeans(wine2,centers=3) winek3 |

1 2 3 4 |
plot(winek3$cluster) wine1<-edit(wine) head(wine1) |

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
cultivar alcohol malicacid ash alcalinity magnesium totalphenols flavanoids nonflavanoidphenols proanthocynin 1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 3 1 14.37 1.95 2.50 16.8 113 3.85 3.49 0.24 2.18 4 1 13.24 2.59 2.87 21.0 118 2.80 2.69 0.39 1.82 5 1 14.20 1.76 2.45 15.2 112 3.27 3.39 0.34 1.97 6 1 14.39 1.87 2.45 14.6 96 2.50 2.52 0.30 1.98 colorintensity hue od280od315ofdilutedwines proline 1 4.38 1.05 3.40 1050 2 5.68 1.03 3.17 1185 3 7.80 0.86 3.45 1480 4 4.32 1.04 2.93 735 5 6.75 1.05 2.85 1450 6 5.25 1.02 3.58 1290 |

To check the entire data set in wine1:

1 2 3 4 5 6 7 |
wine1 cultivar alcohol malicacid ash alcalinity magnesium totalphenols flavanoids nonflavanoidphenols proanthocynin 1 1 13.20 1.78 2.14 11.2 100 2.65 2.76 0.26 1.28 2 1 13.16 2.36 2.67 18.6 101 2.80 3.24 0.30 2.81 ------------------------------------------------------------------------------------------------------------------- |

**Adding filter:**

1 2 3 |
wine2 <- wine1[,which(names(wine1) != "cultivar")] wine2 |

Running Kmeans Clustering on the above dataset

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
winek3<-kmeans(wine2,centers=3) winek3 Output: K-means clustering with 3 clusters of sizes 46, 69, 62 Cluster means: alcohol malicacid ash alcalinity magnesium totalphenols flavanoids nonflavanoidphenols proanthocynin 1 13.79522 1.887174 2.426087 17.05435 105.04348 2.868696 3.013261 0.2854348 1.902174 2 12.51667 2.494203 2.288551 20.82319 92.34783 2.070725 1.758406 0.3901449 1.451884 3 12.92984 2.504032 2.408065 19.89032 103.59677 2.111129 1.584032 0.3883871 1.503387 colorintensity hue od280od315ofdilutedwines proline 1 5.703913 1.0791304 3.096522 1197.9783 2 4.086957 0.9411594 2.490725 458.2319 3 5.650323 0.8839677 2.365484 728.3387 |

**Clustering vector:**

1 2 3 4 5 6 7 8 9 10 11 12 13 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 1 1 3 3 1 1 3 1 1 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 1 1 1 1 3 3 1 1 3 3 1 1 3 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 3 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 2 3 2 2 3 2 2 3 3 3 2 2 1 3 2 2 2 3 2 2 3 3 2 2 2 2 2 3 3 2 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 2 2 2 2 3 3 2 3 2 3 2 2 2 3 2 2 2 2 3 2 2 3 2 2 2 2 2 2 2 3 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 2 2 2 2 2 2 2 2 2 3 2 2 3 3 3 3 2 2 2 3 3 2 2 3 3 2 3 3 2 2 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 2 2 3 3 3 2 3 3 3 2 3 2 3 3 2 3 3 3 3 2 2 3 3 3 3 3 2 |

Within cluster sum of squares by cluster:

[1] 1343168.5

[2] 443166.7

[3] 566572.5 (between_SS / total_SS = 86.5 %)

Available components:

[1] “cluster” [2] “centers” [3] “totss” [4] “withinss” [5] “tot.withinss”

[6] “betweenss” [7] “size” [8] “iter” [9] “ifault”

1 2 |
plot(winek3$cluster) |

1 2 3 |
wine_H <-hclust(d=dist(wine2)) plot(wine_H) |

From the above two diagrams, we realize that there are some differences between the two methods of clustering. Now we discuss the difference between these two types of clustering. Hierarchical clustering builds clusters within clusters and does not require a pre-specified number of clusters like k means. Now we discuss this difference in details. The most important difference is the **hierarchy**. Actually, there are two different approaches that fall under this name: top-down and bottom-up.

In top-down hierarchical clustering, we divide the data into 2 clusters (using k-means with k=2k=2, for example). Then, for each cluster, we can repeat this process, until all the clusters are too small or too similar for further clustering to make sense, or until we reach a preset number of clusters.

In bottom-up hierarchical clustering, we start with each data item having its own cluster. We then look for the two items that are most similar and combine them in a larger cluster. We keep repeating until all the clusters we have left are too dissimilar to be gathered together, or until we reach a preset number of clusters.

In **k-means** clustering, we try to identify the best way to divide the data into k sets simultaneously. A good approach is to take k items from the data set as initial cluster representatives, assign all items to the cluster whose representative is closest, and then calculate the cluster mean as a new representative; until it converges (all clusters stay the same).