ECLAT Algorithm Numerical Example

We use different algorithms in market basket analysis to find frequent itemsets. In this article, we will discuss the Equivalence Class Clustering and bottom-up Lattice Traversal (ECLAT) algorithm. Here, we will discuss the basics of the Eclat algorithm with a numerical example. We will also discuss the advantages and disadvantages of the ECLAT algorithm over other algorithms like apriori and fp-growth.

What is The ECLAT Algorithm?

ECLAT is an acronym for Equivalence Class Clustering and bottom-up Lattice Traversal. ECLAT algorithm is a frequent pattern mining algorithm just like the apriori algorithm. We can say that the ECLAT algorithm is an efficient and scalable version of the apriori algorithm with the following improvements.

  • The apriori algorithm and the fp-growth algorithm work with the horizontal transaction dataset. On the contrary, the ECLAT algorithm works on a vertical data format. 
  • ECLAT algorithm uses a depth-first search approach to traverse the itemsets. The Apriori algorithm uses a breadth-first approach to traverse the transaction dataset.

Due to the above changes, the ECLAT algorithm is generally faster than the Apriori algorithm on dense datasets with a small number of distinct items and a large number of transactions. However, it can be slower compared to the FP-Growth algorithm on sparse datasets with a large number of distinct items.

What is The Itemset Lattice in The ECLAT Algorithm?

In the ECLAT algorithm, we try to create an itemset lattice just like the apriori algorithm to represent the search space for frequent itemsets. The itemset lattice consists of all the frequent itemsets and their support counts.

To construct the itemset lattice, we recursively generate frequent itemsets of increasing size. At each level of the recursion, we generate candidate itemsets by joining the itemsets represented by the previous step. If the candidate itemset meets the minimum support threshold, we add it to the itemset lattice.  

The itemset lattice is stored in the memory as a tree data structure. To generate frequent itemsets, we need to scan the tree. 

Step-By-Step ECLAT Algorithm Explanation

To perform association rule mining using the ECLAT algorithm, we first define the minimum support, confidence, and lift. After this, we will convert the transaction dataset to vertical format if it isn’t already so. Next, we perform candidate generation, pruning, database scan, and rule generation to create association rules. These steps are almost similar to the apriori algorithm. 

Step 1: Convert Transaction Data to Vertical Format

Normally, the transactions in a dataset are stored in horizontal format. It means that each row in the dataset contains a transaction ID and the corresponding items in the transaction as shown below. 

Transaction IDItems
T1I1, I3, I4
T2I2, I3, I5, I6
T3I1, I2, I3, I5
T4I2, I5
T5I1, I3, I5
The dataset in Horizontal Format

In vertical format, the rows in the transaction data contain an item and the corresponding transactions in which the item is present. The dataset in the vertical format looks as follows.

ItemsTransaction IDs
I1T1,T3,T5
I2T2,T3,T4
I3T1,T2,T3,T5
I4T1
I5T2,T3,T4,T5
I6T2
The dataset in Vertical Format

Step 2: Candidate Generation From the Dataset

After transforming the dataset into the vertical format, we use the candidate generation step to generate itemsets that can possibly be frequent itemsets. For this, we start by creating sets containing single items. If there are N items in the dataset, we create N candidate sets. 

After creating the candidate sets, we use the minimum support count to select frequent itemsets containing one item. Once we get the frequent itemsets with one item, we iteratively join them to create larger sets containing 2, 3, 4, 5, or more items. 

In the candidate generation process, we generate the candidate itemsets containing k items by joining the frequent itemsets with k-1 items in common. This process is repeated until no new frequent itemsets can be generated.

Step 3: Pruning the Candidate Itemsets

The pruning step in the ECLAT algorithm is derived from the apriori algorithm. It is based on the concept that a subset of a frequent itemset must also be a frequent itemset. In other words, if we have an itemset having a subset that is not a frequent itemset, the itemset cannot be a frequent itemset.

We use pruning to remove the candidate sets before even scanning the dataset to calculate the support count and minimize the time taken in executing the algorithm. After creating itemsets of K items, we use the following steps to prune the candidate set.

For each candidate set having k items, we check if each of its subsets having k-1 is a frequent itemset or not. If yes, the candidate set is considered for generating frequent itemsets. Even if we find a single subset of the candidate set that is not a frequent itemset, we reject or prune the itemsets.

Step 4: Frequent Itemset Generation

After Pruning, we check the support count of the remaining candidate itemsets. For this, we scan the transaction dataset to find the support of each frequent itemset. 

After calculating the support count of each candidate itemset, we drop the itemsets having a support count less than the minimum support count from the candidate list. The rest of the itemsets are considered frequent itemsets. 

After generating the frequent itemsets having k items, we create candidate itemsets having k+1 items, perform pruning, database scan, and then frequent itemset generation to generate frequent itemsets having k+1 items. 

We iterate through steps 2 to 4 until we cannot generate more frequent itemsets.

Step 4: Association Rule Generation

After creating frequent itemsets, we generate association rules. If we have a frequent itemset {I}, we can create association rules in the form of {S}-> {I-S}. Here {S} is a subset of the frequent itemset {I}. 

ECLAT Algorithm Numerical Example

To explain the ECLAT algorithm using the numerical example, we will use the following dataset.

Transaction IDItems
T1I1, I3, I4
T2I2, I3, I5, I6
T3I1, I2, I3, I5
T4I2, I5
T5I1, I3, I5
The dataset in Horizontal format

The above transaction dataset is in horizontal format. It contains five transactions having transaction IDs T1, T2, T3, T4, and T5. The dataset contains six different items namely I1, I2, I3, I4, I5, and I6.  

Convert Transaction Data to Vertical Format

To proceed with the explanation of the ECLAT algorithm using numerical examples, we need to represent the dataset in vertical format. In vertical format, each row of the dataset represents an item and all the transactions in which the item is present. The transformed dataset looks as follows.

ItemsTransaction IDs
I1T1,T3,T5
I2T2,T3,T4
I3T1,T2,T3,T5
I4T1
I5T2,T3,T4,T5
I6T2
the dataset in vertical format

The above dataset for the ECLAT algorithm numerical example contains five transactions having transaction IDs T1, T2, T3, T4, and T5. In the transactions, it contains six different items namely I1, I2, I3, I4, I5, and I6. 

Let us now use the Eclat algorithm to find association rules from the above dataset. For our numerical example, we will use the minimum support count of 2 and minimum confidence of 75 percent. To help us calculate the support of the itemsets, we will create a matrix representing the presence of items in a transaction as shown below.

T1T2T3T4T5
I110101
I201110
I311101
I410000
I501111
I601000
Transaction matrix

The above matrix contains Items on the vertical axis and transaction IDs on the horizontal axis. If an item is present in a transaction, the corresponding cell is set to 1. Otherwise, it is set to 0. We will use this matrix to calculate the support count of itemsets as it is easier to scan this matrix compared to the transaction dataset.

To calculate the support count of any given itemset, we will find the number of columns in which all the items in the given itemset are set to 1 in the above matrix. 

Create Frequent Itemsets With 1 Item

The ECLAT algorithm starts by creating candidate itemsets with one item. For this, let us calculate the support count of each item.

ItemsetTransactionsSupport Count
{I1}T1, T3, T53
{I2}T2, T3, T43
{I3}T1,T2,T3,T54
{I4}T11
{I5}T2,T3,T4,T54
{I6}T21
candidate itemset with one item

The above table contains the support count of candidate itemsets with one item. Here, you can observe that the itemsets {I4} and {I6} have support count 1 which is less than the minimum support count 2. Hence, we will omit these itemsets from the candidate table. After this, we will get the table containing frequent itemsets with a single item as shown below.

ItemsetTransactionsSupport Count
{I1}T1, T3, T53
{I2}T2, T3, T43
{I3}T1,T2,T3,T54
{I5}T2,T3,T4,T54
Frequent itemset with one item

In the above table, we have created frequent itemsets containing a single item. Now, we will calculate the frequent itemsets with two items.

Create Frequent Itemsets With 2 Items

To create frequent itemsets with two items, we will first create the candidate itemset with two items. For this, we will join all the frequent itemsets with one item with each other. After joining, we will get the following item sets.

{I1,I2}, {I1,I3}, {I1,I5},{I2,I3},{I2,I5}, and {I3,I5} 

After creating the itemsets with two items, we need to prune the itemsets having subsets that are not frequent itemsets. As the {I1}, {I2}, {I3}, and {I5} all are frequent itemsets, no itemsets will be removed from the above list. 

As the next step, we will calculate the support count of each itemset having two items to create the candidate itemset. The result is tabulated below.

ItemsetTransactionsSupport Count
{I1,I2}T31
{I1,I3}T1, T3, T53
{I1,I5}T3, T52
{I2,I3}T2,T32
{I2,I5}T2, T3, T43
{I3,I5} T2, T3, T53
Candidate itemset with 2 items

In the above candidate itemset, you can observe that the itemset {I1, I2} has the support count 1 which is less than the minimum support count. Hence, we will remove the above itemset from the table and obtain the table containing frequent itemsets with two items as shown below.

ItemsetTransactionsSupport Count
{I1,I3}T1, T3, T53
{I1,I5}T3, T52
{I2,I3}T2,T32
{I2,I5}T2, T3, T43
{I3,I5} T2, T3, T53
Frequent Itemsets with 2 items

Here, we have obtained frequent itemsets with two items. Let us now calculate the frequent itemsets with three items.

Calculate Frequent Itemsets With Three Items

To calculate the frequent itemsets with three items, we first need to calculate the candidate set. For this, let us first join the frequent itemsets with two items and create the following itemsets with three items.

{I1, I3, I5}, {I1, I2, I3}, {I1, I2, I5}, {I2, I3, I5}

On the above itemsets, we will perform pruning to remove any itemset that has a subset that is not a frequent itemset. For this, we will create subsets of 2 items for each itemset and check if they are frequent itemsets or not. All the subsets of the above itemsets are tabulated below.

ItemsetSubsetsAll the subsets are frequent itemsets?
{I1, I3, I5}{I1, I3},{I1, I5},{I3, I5}Yes
{I1, I2, I3}{I1, I2}, {I1, I3}, {I2, I3}No
{I1, I2, I5}{I1, I2},{I1, I5},{I2, I5}No
{I2, I3, I5}{I2, I3}, {I2, I5}, {I3, I5}Yes
Pruning the Itemsets with three items

In the table, you can observe that the itemset {I1,I2 , I3} and {I1, I2, I5} contain the itemset {I1, I2} which is not a frequent itemset. Hence, we will prune the itemsets {I1, I2, I3} and {I1, I2, I5}. After this, we will get the itemsets {I1, I3, I5} and {I2, I3, I5} as candidate itemsets for the itemsets having three items. Let us calculate their support count.

ItemsetTransactionsSupport Count
{I2, I3, I5}T2, T32
{I1, I3, I5}T3,T52
Candidate itemsets with 3 items

In the above table, both itemsets have a support count of 2 which is equal to the minimum support count. Hence, both itemsets will be considered frequent itemsets. 

ItemsetTransactionsSupport Count
{I2, I3, I5}T2, T32
{I1, I3, I5}T3,T52
Frequent Itemsets with 3 items

Calculate Frequent Itemsets With Three Items.

Now, we will calculate the frequent itemsets with four items. For this, we will first join the items in the frequent itemsets with three items to create itemsets with four items. We will get only one item set as shown below.

{I1, I2, I3, I5}

Now, the above itemset has four subsets with three elements i.e. {I2, I3, I5},{I1, I3, I5}, {I1, I2, I5}, {I1, I2, I3}. In these itemsets,  {I1, I2, I5} and {I1, I2, I3} are not frequent itemsets. Hence, we will prune the itemset {I1, I2, I3, I5}. Thus, we have no candidate set for itemsets with 4 items. Hence, the process of frequent itemset generation stops here. 

Now, let us tabulate all the frequent itemsets created in this numerical example on the ECLAT algorithm.

ItemsetSupport Count
{I1}3
{I2}3
{I3}4
{I5}4
{I1,I3}3
{I1,I5}2
{I2,I3}2
{I2,I5}3
{I3,I5} 3
{I2, I3, I5}2
{I1, I3, I5}2
Frequent Itemset Lattice

The above table contains all the frequent itemsets in the given transaction data. This table is also called the itemset lattice. We often store this itemset lattice in the form of a tree in the memory. Then, the tree is used to generate the association rules.

Generate Association Rules

I have already discussed how to generate association rules from the frequent itemsets in this article

Conclusion

In this article, we discussed the Eclat algorithm with a numerical example. Here, we discussed all the steps in the Eclat algorithm and also solved a numerical example. We also discussed how the ECLAT algorithm is better than the apriori algorithm with dense datasets having very few itemsets and a lot of transactions. 

To read more about machine learning, you can read this article on k-means clustering numerical examples. You might also like this article on KNN regression numerical examples.

I hope you enjoyed reading this article. Stay tuned for more informative articles. 

Happy Learning!

Similar Posts