HomeTECHNOLOGYAssociation Rule Mining (Apriori) in Python: Finding Patterns in Pune's E-commerce Data

Association Rule Mining (Apriori) in Python: Finding Patterns in Pune’s E-commerce Data

In the bustling city of Pune, where e-commerce is thriving, businesses leverage data to uncover hidden patterns in customer behaviour. One powerful technique for this is Association Rule Mining, specifically using the Apriori algorithm. This method helps identify relationships between purchased items, enabling businesses to optimise marketing strategies, improve product placements, and boost sales. For those looking to master such techniques, enrolling in a data analyst course in Pune can provide the skills needed to analyse and interpret complex datasets effectively. In this blog, we’ll explore Association Rule Mining, dive into the Apriori algorithm, and demonstrate how to implement it in Python using a hypothetical e-commerce dataset from Pune.

What is Association Rule Mining?

Association Rule Mining is a data mining technique used to discover interesting relationships, or associations, between items in large datasets. It’s widely used in market basket analysis, where retailers analyse customer transactions to find items frequently purchased together. For example, if customers in Pune often buy laptops and laptop bags together, this insight can inform cross-selling strategies.

An association rule is expressed as {A} → {B}, meaning that item B is likely to be purchased if item A is bought. These rules are evaluated based on three key metrics:

  • Support: The frequency of occurrence of the itemset (e.g., {laptop, laptop bag}) in the dataset.
  • Confidence: The probability that a transaction containing A also contains B.
  • Lift: A measure of the strength of a rule, indicating how much more often A and B occur together compared to if they were independent.

The Apriori algorithm is a popular method for mining these rules efficiently.

Understanding the Apriori Algorithm

The Apriori algorithm, developed by R. Agrawal and R. Srikant in 1994, is based on the principle that all subsets of a frequent itemset must also be frequent. This allows the algorithm to prune infrequent itemsets early, making it computationally efficient. Here’s how it works:

  1. Identify frequent itemsets: Start by finding frequent items (above a minimum support threshold) in the dataset.
  2. Generate candidate itemsets: Use frequent itemsets to create larger combinations (e.g., pairs, triplets).
  3. Prune infrequent itemsets: Eliminate combinations that don’t meet the minimum support threshold.
  4. Generate association rules: From the frequent itemsets, create rules that meet minimum confidence and lift thresholds.

Let’s see how we can apply this to Pune’s e-commerce data.

Setting Up the Environment

We’ll use the extended library to implement the Apriori algorithm in Python, which efficiently implements Apriori and association rule mining. You’ll also need pandas for data manipulation. Install the required libraries using pip:

pip install mlxtend pandas

For this example, we’ll create a hypothetical dataset representing e-commerce transactions in Pune. Each transaction contains a list of items purchased by a customer.

Creating a Sample E-commerce Dataset

Imagine an e-commerce platform in Pune selling electronics, clothing, and accessories. We’ll simulate a dataset of customer transactions.

import pandas as pd

# Sample dataset of transactions

data = [

[‘laptop’, ‘laptop bag’, ‘mouse’],

[‘smartphone’, ‘earphones’, ‘screen protector’],

[‘laptop’, ‘mouse’, ‘keyboard’],

[‘smartphone’, ‘charger’, ‘earphones’],

[‘laptop’, ‘laptop bag’, ‘charger’],

[‘smartphone’, ‘screen protector’],

[‘laptop’, ‘keyboard’, ‘mouse’],

[‘earphones’, ‘charger’],

[‘laptop’, ‘laptop bag’],

[‘smartphone’, ‘earphones’, ‘charger’]

]

# Convert to a DataFrame

transactions = pd.DataFrame(data, columns=[‘item_1’, ‘item_2’, ‘item_3’])

This dataset represents 10 transactions, with each row listing up to three items purchased together.

Preprocessing the Data

The Apriori algorithm requires the data in a one-hot encoded format, where each column represents an item, and each row indicates whether the item was purchased (1) or not (0). We’ll use mlxtend’s TransactionEncoder to transform the data.

from mlxtend.preprocessing import TransactionEncoder

# Convert transactions to a list of lists

transaction_list = transactions.values.tolist()

# Remove None values (if any)

transaction_list = [[item for item in transaction if pd.notna(item)] for transaction in transaction_list]

# One-hot encode the transactions \n”,

   “data using TransactionEncoder

te = TransactionEncoder()

te_ary = te.fit(transaction_list).transform(transaction_list)

df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

The resulting df_encoded is a DataFrame where each column is an item, and each row represents a transaction with 1s and 0s.

Applying the Apriori Algorithm

Now, we’ll use the apriori function from mlxtend to find frequent itemsets.

from mlxtend.frequent_patterns import apriori

# Apply Apriori algorithm

frequent_itemsets = apriori(df_encoded, min_support=0.2, use_colnames=True)

# Display frequent itemsets

print(frequent_itemsets)

Here, min_support=0.2 means we’re interested in itemsets that appear in at least 20% of transactions (i.e., 2 out of 10 transactions). The output will list frequent itemsets and their support values, such as:

   support         itemsets

0 0.5        (laptop)

1 0.4     (smartphone)

2 0.4     (laptop bag)

3 0.4      (earphones)

4 0.3        (charger)

5 0.3          (mouse)

6 0.2     (keyboard)

7 0.2  (screen protector)

8 0.3 (laptop, laptop bag)

9 0.2   (laptop, mouse)

Generating Association Rules

Next, we’ll generate association rules from the frequent itemsets using the association_rules function.

from mlxtend.frequent_patterns import association_rules

# Generate association rules

rules = association_rules(frequent_itemsets, metric=”confidence”, min_threshold=0.6)

# Display rules

print(rules[[‘antecedents’, ‘consequents’, ‘support’, ‘confidence’, ‘lift’]])

Here, min_threshold=0.6 ensures that only rules with at least 60% confidence are included. The output might look like:

    antecedents  consequents  support  confidence  lift

0 (laptop bag)    (laptop)  0.3    0.75  1.500000

1     (mouse)    (laptop)  0.2    0.67  1.333333

2  (earphones)  (smartphone)  0.3    0.75  1.875000

Interpreting the Results

  • Rule 1: If a customer buys a laptop bag, there’s a 75% chance they’ll also buy a laptop (confidence=0.75). The lift of 1.5 indicates this combination occurs 1.5 times more often than if the items were purchased independently.
  • Rule 2: If a customer buys earphones, there’s a 75% chance they’ll buy a smartphone, with a lift of 1.875, suggesting a strong association.

These insights can help Pune’s e-commerce businesses create targeted promotions, such as bundling laptops with laptop bags or offering discounts on earphones with smartphones.

Visualising the Results

We can visualise the association rules using a scatter plot to make the results more engaging.

import matplotlib.pyplot as plt

plt.scatter(rules[‘support’], rules[‘confidence’], alpha=0.5)

plt.xlabel(‘Support’)

plt.ylabel(‘Confidence’)

plt.title(‘Association Rules: Support vs Confidence’)

plt.savefig(‘association_rules_plot.png’)

This plot helps identify rules with high support and confidence, which are the most actionable for businesses.

Practical Applications in Pune’s E-commerce

The insights from association rule mining can transform e-commerce strategies in Pune:

  • Product Bundling: Offer discounts on frequently purchased combinations, like laptops and laptop bags.
  • Recommendation Systems: Suggest complementary products (e.g., earphones with smartphones) during checkout.
  • Inventory Management: Stock-related items are used together to streamline operations.
  • Targeted Marketing: Create campaigns targeting customers likely to buy specific item combinations.

Challenges and Considerations

While the Apriori algorithm is powerful, it has limitations:

  • Scalability: Large datasets with many unique items can be computationally expensive.
  • Sparsity: E-commerce datasets may have many items with low support, making it hard to find meaningful rules.
  • Interpretability: Too many rules can overwhelm analysts, requiring careful filtering.

To address these, you can adjust the min_support and min_confidence thresholds or use advanced algorithms like FP-Growth for larger datasets.

Conclusion

Association Rule Mining with the Apriori algorithm is a game-changer for uncovering Pune’s e-commerce data patterns. Businesses can make data-driven decisions to enhance customer experiences and boost profitability by identifying frequently purchased item combinations. Whether you’re a retailer or an aspiring data professional, mastering these techniques is essential in today’s data-driven world. Enrolling in a data analyst course can equip you with the skills to implement such algorithms and drive impactful business outcomes. Start exploring your data today and unlock the hidden patterns waiting to be discovered!

Business Name: ExcelR – Data Science, Data Analyst Course Training

Address: 1st Floor, East Court Phoenix Market City, F-02, Clover Park, Viman Nagar, Pune, Maharashtra 411014

Phone Number: 096997 53213

Email Id: enquiry@excelr.com

Most Popular

FOLLOW US