# Find Clusters From Dendrogram in Hierarchical Clustering Using Python

You might have studied various tutorials on hierarchical clustering that teach how to plot a dendrogram. This article discusses how to find clusters from a dendrogram in Python.

## What are Dendrograms?

In hierarchical clustering, a dendrogram is a visual tool illustrating how each cluster is composed by drawing links between the clusters based on their similarities. I have already discussed how to create a dendrogram from a given dataset using different linkage methods in this article on how to plot a dendrogram in python.

Dendrograms are a great tool for visualizing clustering if there is a hierarchy present in the dataset. We use dendrograms in hierarchical clustering and I have already discussed it in detail in this article on agglomerative clustering numerical example.

## Find Clusters From Dendrograms using Python in Hierarchical Clustering

**You can find clusters from the linkage matrix of the dendrograms in python. While creating a dendrogram we need to create the distance matrix from the dataset. Using the distance matrix, we create the linkage matrix for the dendrogram.**

Using the linkage matrix, we can find the clusters from the dendrogram in Python. For this, we can use the `fcluster()`

function defined in the `scipy.cluster.hierarchy`

module. The `fcluster()`

function has the following syntax.

`fcluster(Z, t, criterion='inconsistent', depth=2, R=None)`

Here,

- The
`Z`

parameter takes the linkage matrix as its input argument. - The parameter
`t`

takes the number of clusters or the threshold to apply when forming clusters based on the`criterion`

parameter. - The
`criterion`

parameter is used to specify how the clusters are formed.- If the
`criterion`

is set to ‘`inconsistent`

‘, If a cluster node and all its descendants have an inconsistent value less than or equal to the value in parameter`t`

, then all its leaf descendants belong to the same cluster. When no non-singleton cluster meets this criterion, every node is assigned to its own cluster. - If the
`criterion`

is set to “`distance`

”, the clusters are formed in a manner so that the original observations in each cluster have no greater cophenetic distance than`t`

. - If the
`criterion`

is set to “`maxclust`

”, the function finds a minimum threshold`r`

so that the cophenetic distance between any two original observations in the same cluster is no more than r and no more than t clusters are formed. - There are other values for the
`criterion`

parameter that you can study about on this link.

- If the
- The
`depth`

parameter is used only when the criterion parameter is set to “`inconsistent`

”. It specifies the maximum depth to perform the inconsistency calculation. It has no meaning for the other criteria. - The parameter
`R`

takes the inconsistency matrix to use for the ‘`inconsistent`

’ criterion. This matrix is computed if not provided.

After execution, the `fcluster()`

function returns a numpy array containing the cluster labels for each cluster. You can find the clusters from the linkage matrix of the dendrogram using the `fcluster()`

function as shown below.

```
import pandas as pd
from scipy.spatial import distance_matrix
from scipy.cluster.hierarchy import dendrogram, linkage
from scipy.cluster.hierarchy import fcluster
data = [[1, 1], [2, 3], [3, 5],[4,5],[6,6],[7,5]]
points=["A","B","C","D","E","F"]
df = pd.DataFrame(data, columns=['xcord', 'ycord'],index=points)
ytdist=pd.DataFrame(distance_matrix(df.values, df.values), index=df.index, columns=df.index)
linkage_matrix = linkage(ytdist, "ward")
cluster_labels = fcluster(linkage_matrix,3,criterion='maxclust')
print("The data points are:")
print(data)
print("cluster labels are:")
print(cluster_labels)
```

Output:

```
The data points are:
[[1, 1], [2, 3], [3, 5], [4, 5], [6, 6], [7, 5]]
cluster labels are:
[1 1 2 2 3 3]
```

In this example, we have a dataset of 6 points. We have divided the dataset into three clusters using the `fcluster()`

function. For this,

- We have passed the linkage matrix as the first input argument which is assigned to the parameter
`Z`

. - We passed the number of clusters as the second input argument to the
`fcluster()`

function. It is assigned to the parameter`t`

. - The third input argument is the literal “
`maxclust`

“. It is assigned to the`criterion`

parameter.

Instead of using the above approach, you can also find the clusters from a dendrogram using python in hierarchical clustering using the `AgglomerativeClustering()`

function defined in the sklearn module. I have discussed this approach in the article on agglomerative clustering in Python.

## Conclusion

In this article, we have discussed how to find clusters from a dendrogram using the linkage matrix in python. I hope you enjoyed reading this article.

