Apriori Algorithm Numerical Example

The Apriori algorithm is one of the first algorithms used for association rule mining. In this article, we will discuss the basics of the apriori algorithm using a numerical example. Here, we will use a sample transaction dataset to obtain frequent itemsets and create association rules using the apriori algorithm. We will also discuss the advantages and disadvantages of the apriori algorithm.

Before proceeding to this article, I suggest you read this article on association rule mining. This will help you understand the basic terms used in this article. You can also read this article on market basket analysis to overview how association rule mining is used in different industries.  

What is Apriori Algorithm?

The Apriori algorithm is a frequent pattern mining algorithm used in market basket analysis. We use the apriori algorithm to generate frequent itemsets in a transaction dataset. It is an iterative algorithm that we used to generate frequent itemsets starting from a set of one item to create bigger itemsets.

To generate association rules using the apriori algorithm we use the following steps.

  • First, we first generate frequent itemsets. 
  • Next, we use the frequent itemsets to generate association rules. 
  • Finally, we select the important association rules by eliminating a subset of generated association rules using the confidence and Lift.

A Step-by-Step Explanation of The Apriori Algorithm

To use the apriori algorithm for association rule mining, we first define the minimum support, confidence, and lift. After this, we use candidate generation, pruning, database scan, and rule generation to create association rules. All the steps are discussed below.

Step 1: Candidate Generation in Apriori Algorithm

We use the candidate generation step to generate itemsets that can possibly be frequent itemsets. For this, we start by creating sets containing single items. If there are N items in the dataset, we create N candidate sets. 

After creating the candidate sets, we use the minimum support count to select frequent itemsets containing one item. Once we get the frequent itemsets with one item, we iteratively join them to create larger sets containing 2, 3, 4, 5, or more items. Here, we generate the candidate itemsets containing k items by joining the frequent itemsets with k-1 items in common. This process is repeated until no new frequent itemsets can be generated.

Step 2: Pruning in Apriori Algorithm

The pruning step in the apriori algorithm is based on the concept that a subset of a frequent itemset must also be a frequent itemset. In other words, if we have an itemset having a subset that is not a frequent itemset, the itemset cannot be a frequent itemset.

We use pruning to minimize the time taken in executing the apriori algorithm. After creating itemsets of K items, we use the following steps to prune the candidate set.

For each candidate set having k items, we check if each of its subsets having k-1 is a frequent itemset or not. If yes, the candidate set is considered for generating frequent itemsets. Otherwise, we reject or prune the itemsets having subsets that are not frequent itemsets.

Step 3: Database Scan and Frequent Itemset Generation

After Pruning, we check the support count of the remaining candidate itemsets. For this, we scan the transaction dataset to find the support of each frequent itemset. 

After calculating the support count of each candidate itemset, we reject the itemsets having a support count less than the minimum support count. The rest of the itemsets are considered frequent itemsets. 

After generating the frequent itemsets having k items, we create candidate itemsets having k+1 items, perform pruning, database scan, and then frequent itemset generation to generate frequent itemsets having k+1 items.  We iterate through these processes until we cannot generate more frequent itemsets.

Step 4: Association Rule Generation in Apriori Algorithm

After creating frequent itemsets, we will create association rules. If we have a frequent itemset {I}, we can create association rules in the form of {S}-> {I-S}. Here {S} is a subset of the frequent itemset {I}. 

After generating association rules, we use the minimum confidence and lift to identify the association rules used in business decisions.

Apriori Algorithm Numerical example

By now, we have discussed the definition and procedure for the apriori algorithm. Let us now discuss a numerical example using the apriori algorithm to understand it in a better manner. For the numerical example, we will use the following dataset.

Transaction IDItems
T1I1, I3, I4
T2I2, I3, I5, I6
T3I1, I2, I3, I5
T4I2, I5
T5I1, I3, I5
Dataset for Apriori Algorithm

The above dataset for the apriori algorithm numerical example contains five transactions having transaction IDs T1, T2, T3, T4, and T5. In the transactions, it contains six different items namely I1, I2, I3, I4, I5, and I6. Let us now use the apriori algorithm to find association rules from the above dataset. For our numerical example, we will use the minimum support count of 2 and minimum confidence of 75 percent.

To help us calculate the support of the itemsets, we will create a matrix representing the presence of items in a transaction as shown below.

I1I2I3I4I5I6
T1101100
T2011011
T3111010
T4010010
T5101010
Transaction Matrix

The above matrix contains Items on the horizontal axis and transaction IDs on the vertical axis. If an item is present in a transaction, the corresponding cell is set to 1. Otherwise, it is set to 0. We will use this matrix to calculate the support count of itemsets as it is easier to scan this matrix compared to the transaction dataset.

To calculate the support count of any given itemset using the above matrix, we will find the number of rows in which all the items in the given itemset are set to 1. 

Create Frequent Itemsets With One Item

The Apriori algorithm starts by creating candidate itemsets with one item. For this, let us calculate the support count of each item.

ItemsetSupport Count
{I1}3
{I2}3
{I3}4
{I4}1
{I5}4
{I6}1
Candidate Itemsets with one item

The above table contains the support count of candidate itemsets with one item. Here, you can observe that the itemsets {I4} and {I6} have support count 1 which is less than the minimum support count 2. Hence, we will omit these itemsets from the above table. After this, we will get the table containing frequent itemsets with a single item as shown below.

ItemsetSupport Count
{I1}3
{I2}3
{I3}4
{I5}4
Frequent itemsets with one item

Now that we have created frequent itemsets containing a single item, we will move to calculate the frequent itemsets with two items.

Create Frequent Itemsets With Two Items

To create frequent itemsets with two items, we will first create the candidate itemset with two items. For this, we will join all the frequent itemsets with one item with each other. After joining, we will get the following itemsets.

{I1,I2}, {I1,I3}, {I1,I5},{I2,I3},{I2,I5}, and {I3,I5} 

After creating the itemsets with two items, we need to prune the itemsets having subsets that are not frequent itemsets. As the {I1}, {I2}, {I3}, and {I5} all are frequent itemsets, no itemsets will be removed from the above list while pruning. 

As the next step, we will calculate the support count of each itemset having two items to create the candidate itemset. The result is tabulated below.

ItemsetSupport Count
{I1,I2}1
{I1,I3}3
{I1,I5}2
{I2,I3}2
{I2,I5}3
{I3,I5} 3
Candidate itemsets with two items

In the above candidate itemset, you can observe that the itemset {I1, I2} has the support count 1 which is less than the minimum support count. Hence, we will remove the above itemset from the table and obtain the table containing frequent itemsets with two items as shown below.

ItemsetSupport Count
{I1,I3}3
{I1,I5}2
{I2,I3}2
{I2,I5}3
{I3,I5} 3
Frequent itemsets with two items

Now, we have calculated frequent itemsets with two items. Let us calculate the frequent itemsets with three items.

Calculate Frequent Itemsets With Three Items

To calculate the frequent itemsets with three items, we first need to calculate the candidate set. For this, let us first join the frequent itemsets with two items and create itemsets with three items as shown below.

{I1, I3, I5}, {I1, I2, I3}, {I1, I2, I5}, {I2, I3, I5}

On the above itemsets, we will perform pruning to remove any itemset that has a subset that is not a frequent itemset. For this, we will create subsets of 2 items for each itemset and check if they are frequent itemsets or not. All the subsets of the above itemsets are tabulated below.

ItemsetSubsetsAll the subsets are frequent itemsets?
{I1, I3, I5}{I1, I3},{I1, I5},{I3, I5}Yes
{I1, I2, I3}{I1, I2}, {I1, I3}, {I2, I3}No
{I1, I2, I5}{I1, I2},{I1, I5},{I2, I5}No
{I2, I3, I5}{I2, I3}, {I2, I5}, {I3, I5}Yes
Pruning itemsets with three items

In the table, you can observe that the itemset {I1,I2 , I3} and {I1, I2, I5} contain the itemset {I1, I2} which is not a frequent itemset. Hence, we will prune the itemsets {I1, I2, I3} and {I1, I2, I5}. After this, we will get the itemsets {I1, I3, I5} and {I2, I3, I5} as candidate itemsets for the itemsets having three items. Let us calculate their support count.

ItemsetSupport Count
{I2, I3, I5}2
{I1, I3, I5}2
Candidate itemsets with three items

In the above table, both itemsets have a support count of 2 which is equal to the minimum support count. Hence, both itemsets will be considered frequent itemsets. 

ItemsetSupport Count
{I2, I3, I5}2
{I1, I3, I5}2
Frequent itemsets with three items

Calculate Frequent Itemsets With Four Items

Now, we will calculate the frequent itemsets with four items. For this, we will first join the items in the frequent itemsets with three items to create itemsets with four items. We will get only one itemset as shown below.

{I1, I2, I3, I5}

The above itemset has four subsets with three elements i.e. {I2, I3, I5},{I1, I3, I5}, {I1, I2, I5}, {I1, I2, I3}. In these itemsets,  {I1, I2, I5} and {I1, I2, I3} are not frequent itemsets. Hence, we will prune the itemset {I1, I2, I3, I5}. Thus, we have no candidate set for itemsets with 4 items. Hence, the process of frequent itemset generation stops here. 

Now, let us tabulate all the frequent itemsets created in this numerical example on the apriori algorithm.

ItemsetSupport Count
{I1}3
{I2}3
{I3}4
{I5}4
{I1,I3}3
{I1,I5}2
{I2,I3}2
{I2,I5}3
{I3,I5} 3
{I2, I3, I5}2
{I1, I3, I5}2
Frequent itemsets in the dataset

The above table contains all the frequent itemsets in the given transaction data. We will now create association rules using the frequent itemsets.

Generate Association Rules From Frequent Itemsets

To create association rules from the frequent itemsets, we will select each itemset. Then, we will select a subset from the itemset and make it an antecedent of the association rule. We will make the rest of the items as the consequent of the association rule. All the association rules that can be generated from the frequent itemsets are shown in the following table.

ItemsetAssociation Rules
{I1}x
{I2}x
{I3}x
{I5}x
{I1,I3}{I1}->{I3}, {I3}->{I1}
{I1,I5}{I1}->{I5}, {I5}->{I1}
{I2,I3}{I2}->{I3}, {I3}->{I2}
{I2,I5}{I2}->{I5}, {I5}->{I2}
{I3,I5} {I3}->{I5}, {I5}->{I3}
{I2, I3, I5}{I2}->{I3, I5}, {I3}->{I2, I5}, {I5}->{I2, I3}, {I2, I3}->{I5}, {I2, I5}->{I3}, {I3, I5}->{I2}
{I1, I3, I5}{I1}->{I3, I5}, {I3}->{I1, I5}, {I5}->{I1, I3}, {I1, I3}->{I5}, {I1, I5}->{I3}, {I3, I5}->{I1}
Association rules

You can observe that we got 23 association rules from the frequent itemsets. Now, we will calculate the confidence of each association rule to find the most important association rules. I have already discussed how to calculate the confidence of a given association rule in the article on association rule mining. The confidence of each rule is tabulated below.

Association RuleConfidence
{I1}->{I3}100%
{I3}->{I1}75%
{I2}->{I3}66.67%
{I3}->{I2}50%
{I1}->{I5}66.67%
{I5}->{I1}50%
{I2}->{I5}100%
{I5}->{I2}75%
{I3}->{I5}75%
{I5}->{I3}75%
{I2}->{I3, I5}66.67%
{I3}->{I2, I5}50%
{I5}->{I2, I3}50%
{I2, I3}->{I5}100%
{I2, I5}->{I3}66.67%
{I3, I5}->{I2}66.67%
{I1}->{I3, I5}, 66.67%
{I3}->{I1, I5},50%
{I5}->{I1, I3}, 50%
{I1, I3}->{I5}, 66.67%
{I1, I5}->{I3}, 100%
{I3, I5}->{I1}, 66.67%
Association rules with confidence

In the above table, we have calculated the confidence of all the association rules. Now, we have put the minimum threshold of confidence as 75%. Hence, we will delete all the association rules having the confidence of less than 75%. Finally, we will get the following association rules.

Association RuleConfidence
{I1}->{I3}100%
{I3}->{I1}75%
{I2}->{I5}100%
{I5}->{I2}75%
{I3}->{I5}75%
{I5}->{I3}75%
{I2, I3}->{I5}100%
{I1, I5}->{I3}, 100%
Association rules with confidence of more than 75%

Thus, we have calculated the association rules with their confidence using the apriori algorithm numerical example.

Advantages and Disadvantages of the Apriori Algorithm

The Apriori algorithm is an easy-to-understand algorithm. It is an iterative algorithm and you can easily learn and implement it. However, it has very high computational complexity due to repeated scans of the entire database. 

Also, you need to select the minimum support of the itemsets and the confidence of the association rules very carefully. If you choose very low minimum support, your algorithm will calculate too many frequent itemsets and execution will be very slow. On the other hand, if you choose a very high minimum support count, you might not get good results. 

Conclusion

In this article, we discussed the apriori algorithm in data mining. Here, we discussed a numerical example using the apriori algorithm to find frequent itemsets and association rules from a transaction dataset. To learn more data mining concepts, you can read this article KNN classification numerical example. You might also like this article on how to find clusters from a dendrogram in Python.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

Similar Posts