# K-Medoids Clustering Algorithm With Numerical Example

Clustering is an unsupervised machine learning technique that is used in many applications. In this article, we will discuss K-Medoids clustering algorithm with a numerical example.

## What is K-Medoids Clustering?

K-Medoids clustering is an unsupervised machine learning algorithm used to group data into different clusters. It is an iterative algorithm that starts by selecting k data points as medoids in a dataset. After this, the distance between each data point and the medoids is calculated. Then, the data points are assigned to clusters associated with the medoid at the minimum distance from each data point. Here, the medoid is the most centrally located point in the cluster. Once we assign all the data points to the clusters, we calculate the sum of the distance of all the non-medoid data points to the medoid of each cluster. We term the sum of distances as the cost.

After calculating the cost, we select a temporary non-medoid point randomly from the dataset and swap a medoid with the selected point. Then we recalculate the cost of all the non-medoid data points to the medoid of each cluster considering the temporarily selected point as the medoid. If the newly calculated cost is less than the previous cost, we make the temporary point the permanent centroid. If the new cost is greater than the previous cost, we undo the changes. Then, we again select a non-medoid point and repeat the process until the cost is minimized.

The K-Medoids clustering is called a partitioning clustering algorithm. The most popular implementation of K-medoids clustering is the Partitioning around Medoids (PAM) clustering. In this article, we will discuss the PAM algorithm for K-medoids clustering with a numerical example.

## K-Medoids Clustering Algorithm

Having an overview of K-Medoids clustering, let us discuss the algorithm for the same.

1. First, we select K random data points from the dataset and use them as medoids.
2. Now, we will calculate the distance of each data point from the medoids. You can use any of the Euclidean, Manhattan distance, or squared Euclidean distance as the distance measure.
3. Once we find the distance of each data point from the medoids,  we will assign the data points to the clusters associated with each medoid. The data points are assigned to the medoids at the closest distance.
4. After determining the clusters, we will calculate the sum of the distance of all the non-medoid data points to the medoid of each cluster. Let the cost be Ci.
5. Now, we will select a random data point Dj from the dataset and swap it with a medoid Mi. Here, Dj becomes a temporary medoid. After swapping, we will calculate the distance of all the non-medoid data points to the current medoid of each cluster. Let this cost be Cj
6. If Ci>Cj, the current medoids with Dj as one of the medoids are made permanent medoids. Otherwise, we undo the swap, and Mi is reinstated as the medoid.
7. Repeat 4 to 6 until no change occurs in the clusters.

## K-Medoids Clustering Numerical Example With Solution

Now that we have discussed the algorithm, let us discuss a numerical example of k-medoids clustering.

The dataset for clustering is as follows.

### Iteration 1

Suppose that we want to group the above dataset into two clusters. So, we will randomly choose two medoids.

Here, the choice of medoids is important for efficient execution. Hence, we have selected two points from the dataset that can be potential medoid for the final clusters. Following are two points from the dataset that we have selected as medoids.

• M1 = (3, 4)
• M2 = (7, 3)

Now, we will calculate the distance between each data point and the medoids using the Manhattan distance measure. The results have been tabulated as follows.

The clusters made with medoids (3, 4) and (7, 3) are as follows.

• Points in cluster1= {(2, 6), (3, 8), (4, 7), (3, 4)}
• Points in cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}

After assigning clusters, we will calculate the cost for each cluster and find their sum. The cost is nothing but the sum of distances of all the data points from the medoid of the cluster they belong to.

Hence, the cost for the current cluster will be 3+4+4+2+2+0+1+3+3+0=22.

### Iteration 2

Now, we will select another non-medoid point (7, 4) and make it a temporary medoid for the second cluster. Hence,

• M1 = (3, 4)
• M2 = (7, 4)

Now, let us calculate the distance between all the data points and the current medoids.

The data points haven’t changed in the clusters after changing the medoids. Hence, clusters are:

• Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
• Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}

Now, let us again calculate the cost for each cluster and find their sum. The total cost this time will be 3+4+4+3+1+1+0+2+2+0=20.

Here, the current cost is less than the cost calculated in the previous iteration. Hence, we will make the swap permanent and make (7,4) the medoid for cluster 2. If the cost this time was greater than the previous cost i.e. 22, we would have to revert the change. New medoids after this iteration are (3, 4) and (7, 4) with no change in the clusters.

### Iteration 3

Now, let us again change the medoid of cluster 2 to (6, 4). Hence, the new medoids for the clusters are M1=(3, 4) and M2= (6, 4 ).

Let us calculate the distance between the data points and the above medoids to find the new cluster. The results have been tabulated as follows.

Again, the clusters haven’t changed. Hence, clusters are:

• Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
• Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}

Now, let us again calculate the cost for each cluster and find their sum. The total cost this time will be 3+4+4+2+0+2+1+3+3+0=22.

The current cost is 22 which is greater than the cost in the previous iteration i.e. 20. Hence, we will revert the change and the point (7, 4) will again be made the medoid for cluster 2.

So, the clusters after this iteration will be cluster1 = {(2, 6), (3, 8), (4, 7), (3, 4)} and  cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}. The medoids are (3,4) and (7,4).

We keep replacing the medoids with a non-medoid data point. The set of medoids for which the cost is the least, the medoids, and the associated clusters are made permanent. So, after all the iterations, you will get the final clusters and their medoids.

The K-Medoids clustering algorithm is a computation-intensive algorithm that requires many iterations. In each iteration, we need to calculate the distance between the medoids and the data points, assign clusters, and compute the cost. Hence, K-Medoids clustering is not well suited for large data sets.

## Different Implementations of K-Medoids Clustering

K-Medoids clustering algorithm has different implementations such as PAM, CLARA, and CLARANS. Let us discuss their properties one by one.

### PAM Implementation

The PAM implementation of the K-Medoids clustering algorithm is the most basic implementation. In the numerical example on K-medoids clustering discussed above, we have discussed the PAM implementation.

The PAM implementation performs iterative clustering on the entire dataset and is computationally intensive. Hence, it is not very efficient for large datasets.

### CLARA Implementation

CLARA implementation of k-medoids clustering is an improvement over the PAM implementation.

• In CLARA implementation, the algorithm first selects samples of data points from the data. Then, it performs PAM clustering on the samples. After applying PAM on the samples, CLARA gives output the best clusters from the sampled outputs.
• CLARA performs clustering on samples of a given dataset. Hence, we can use it to perform clustering on large datasets too. With increasing dataset size, the effectiveness of the CLARA implementation of k-medoids clustering also increases. For small datasets, CLARA doesn’t work efficiently. Hence, you should use it for large datasets for better performance.
• In the case of unbalanced datasets, if the samples are also biased, then the results of the clustering algorithm won’t be good. Hence, you need to take measures during data preprocessing to make the data as balanced as possible.

### CLARANS Implementation

CLARANS implementation of the k-medoids clustering algorithm is the most efficient implementation.

• While clustering, CLARANS chooses a sample of neighboring data points. Due to this, it doesn’t restrict the sampling on a particular area and gives the best results based on all the samples.
• CLARANS uses the neighbor data points at each step. Hence, for a large dataset having a large number of clusters. The time of execution of CLARANS implementation of k-medoids clustering becomes high.
• We can define the number of local optimums while clustering using CLARANS. IN PAM or CLARA, the algorithm runs for all data points in the cluster to find  a local optimum by randomly picking another data point once it finds a local optimum. We can limit the number of times this process is repeated in CLARANS.

## Applications of K-Medoids Clustering in Machine Learning

K-Medoids clustering is an alternative to the K-means clustering algorithm. Hence, you can use k-medoids clustering wherever k-means is applied.

• You can use k-medoids clustering to document classification. The documents are first converted to vectors called word embeddings. Then, you can use k-medoids clustering to group documents into clusters using the word embeddings of the documents.
• E-commerce websites, supermarkets, and malls use transaction data for customer segmentation based on buying behavior. It helps them to increase sales by recommending products according to user behavior using recommendation systems.
• K-Medoids clustering can also be used in cyber profiling. It involves grouping people based on their connection to each other. By collecting usage data from individual users, you can cluster them into groups for profiling.
• By grouping similar pixels into clusters, you can also perform image segmentation using k-medoids clustering.
• You can also perform fraud detection on banking and insurance data using k-medoids clustering. By using historical data on frauds, you can find probable fraudulent transactions from a given dataset.
• You can also use k-medoids clustering in various other tasks such as social media profiling, ride-share data analysis, identification of localities with a high rate of crime, etc.

• K-Medoids clustering is a simple iterative algorithm and is very easy to implement.
• K-Medoids clustering is guaranteed to converge. Hence, we are guaranteed to get results when we perform k-medoids clustering on any dataset.
• K-Medoids clustering doesn’t apply to a specific domain. Owing to the generalization, we can use k-medoids clustering in different machine learning applications ranging from text data to Geo-spatial data and financial data to e-commerce data.
• The medoids in k-medoids clustering are selected randomly.  We can choose the initial medoids in a way such that the number of iterations can be minimized.  For improving the performance of the algorithm, you can warm start the choice of medoids by selecting specific data points as medoids after data analysis.
• Compared to other partitioning clustering algorithms such as K-median and k-modes clustering, the k-medoids clustering algorithm is faster in execution.
• K-Medoids clustering algorithm is very robust and it effectively deals with outliers and noise in the dataset. Compared to k-means clustering, the k-medoids clustering algorithm is a better choice for analyzing data with significant noise and outliers.

• K-Medoids clustering is not a scalable algorithm. For large datasets, the execution time of the algorithm becomes very high. Due to this, you cannot use the K-Medoids algorithm for very large datasets.
• In K-Medoids clustering, we don’t know the optimal number of clusters. Hence, we need to perform clustering using different values of k to find the optimal number of clusters.
• When the dimensionality in the dataset increases, the distance between the data points starts to become similar. Due to this, the distance between a data point and various medoids becomes almost the same. This introduces inefficiency in the execution. To overcome this problem, you can use advanced clustering algorithms like spectral clustering. Alternatively, you can also try to reduce the dimensionality of the dataset while data preprocessing.
• K-Medoids clustering algorithm chooses the initial medoids in a random manner. Also, the output clusters are primarily dependent on the initial medoids. Due to this, every time you run the k-medoids clustering algorithm, you will get unique clusters.
• K-Medoids clustering algorithm uses the distance from a central point (medoid). Due to this, the k-medoids algorithm gives circular/spherical clusters. This algorithm is not useful for clustering data into arbitrarily-shaped clusters.

## Conclusion

In this article, we have discussed k-medoids clustering with a numerical example. We have also briefly discussed different implementations of k-medoids clustering, it applications, advantages, and disadvantages.