Exploratory Data Analysis and Market Basket Analysis with Python

In the realm of retail, understanding customer behavior and optimizing product offerings can be a game-changer. In this blog post, we’ll explore how to perform Exploratory Data Analysis (EDA) and Market Basket Analysis using Python, specifically focusing on a dataset related to retail transactions.

Introduction

The dataset we’re working with contains information about retail transactions. It includes details such as InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID, and Country. Our goal is to explore customer purchase patterns and uncover associations between different products.

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

# Load the dataset
data = pd.read_excel('data_path')

The dataset contains information about retail transactions, including details such as Invoice Number, Stock Code, Description, Quantity, Invoice Date, Unit Price, Customer ID, and Country.

data.info()
data.head()

Data Overview

Before diving into the analysis, let’s get an overview of the dataset. We observe that there are over 500,000 entries, and some columns, such as Customer ID and Description, have missing values.

data.info()
data.Country.value_counts()
len(data.CustomerID.unique())

We have transactions from various countries, with the majority coming from the United Kingdom. There are 4,373 unique customers.

Customer Insights

Let’s identify the top customers based on their total purchase amount:

data['TotalPrice'] = data['Quantity'] * data['UnitPrice']
customer_purchase = data.groupby(['CustomerID']).TotalPrice.sum().sort_values(ascending=False)

The top customers in terms of total purchase amount are identified, with CustomerID 14646.0 leading the pack.

Data Cleaning

To ensure accurate analysis, we clean the data by removing credit records:

# Remove credit records
data = data[~data.InvoiceNo.astype('str').str.startswith('C')]
# Strip whitespace from Description
data['Description'] = data.Description.str.strip()

Basket Creation

Now, we focus on transactions from a specific country, for instance, Germany:

data_Germany = data[data.Country == 'Germany']
basket_Germany = data_Germany.groupby(['InvoiceNo', 'Description'])['Quantity'].sum().unstack().reset_index().fillna(0).set_index('InvoiceNo')

The basket is created, and the dataset is encoded for further analysis:

basket_encoded = basket_Germany.applymap(lambda x: 0 if x <= 0 else 1)
basket_Germany = basket_encoded

Market Basket Analysis

Using Apriori algorithm, we identify frequent itemsets and generate association rules:

frq_items = apriori(basket_Germany, min_support=0.05, use_colnames=True)
rules = association_rules(frq_items, metric="confidence", min_threshold=.1)
rules = rules.sort_values(['confidence', 'lift'], ascending=[False, False])

The association rules provide insights into item relationships. Let’s explore some of the interesting rules:

rules.head(20)

Among the top rules, we find interesting associations like “JUMBO BAG WOODLAND ANIMALS” being associated with “POSTAGE” with high confidence.

Conclusion

In this blog post, we embarked on a journey of exploring retail transaction data, identifying top customers, cleaning the data, and performing Market Basket Analysis. Understanding customer behavior and product associations can empower businesses to make informed decisions.

This is just a glimpse into the vast world of data analysis and its application in the retail domain. Further exploration and fine-tuning of parameters can reveal deeper insights, paving the way for data-driven strategies.

By leveraging Python and its rich ecosystem of libraries, businesses can unlock valuable information hidden within their data, driving growth and enhancing customer satisfaction.

Feel free to experiment with your own datasets and adapt the code to suit your specific business needs. Happy analyzing!

Image Processing and Object Comparison using Python

ByKishore January 18, 2024May 27, 2024

Introduction: Image processing is a crucial aspect of computer vision and machine learning applications. In this tutorial, we’ll explore basic image manipulation techniques using Python libraries like PIL (Pillow), NumPy, and matplotlib. Additionally, we’ll delve into object comparison and similarity measurement. Setting Up the Environment: Before we start, ensure you have the required libraries installed….

Data Analytics

Visualizing Data for Regression

ByKishore January 11, 2024May 27, 2024

Exploratory Data Analysis (EDA) Exploratory Data Analysis (EDA) is a crucial step in understanding and preparing data for building predictive models. In this lab, we focus on visualizing the dataset related to automobile pricing using Python. The dataset is loaded and cleaned, and now we’ll explore it through various visualizations. Summarizing and Manipulating Data: Developing…

Deep Learning

Optimizing Deep Learning: A Comprehensive Guide to Batch Normalization

ByKishore March 21, 2024May 25, 2024

Batch Normalization (BN) is a technique used in deep learning to improve the training of deep neural networks by reducing the internal covariate shift problem. This problem occurs when the distribution of the inputs to each layer of the network changes during training, making it difficult to train the network effectively. BN addresses this issue…

Deep Learning

Understanding Epochs in Neural Networks: A Comprehensive Guide

ByKishore February 9, 2024May 26, 2024

In this tutorial, we’ll dive deep into the concept of epochs in neural networks. We’ll explore how the number of epochs impacts training convergence and how early stopping can be used to optimize model generalization. Neural Networks: A Brief Overview Neural networks are powerful supervised machine learning algorithms commonly used for solving classification or regression…

Machine Learning

Mastering Linear Models: Regression, Classification, and Beyond

ByKishore February 5, 2024May 27, 2024

Introduction: Linear models play a fundamental role in the field of machine learning, providing a versatile toolkit for both regression and classification tasks. In this comprehensive guide, we’ll delve into various aspects of linear models, exploring techniques for regression, classification, and addressing challenges such as outliers and non-linear relationships. Buckle up as we journey through…

Deep Learning

Mastering Transfer Learning: Enhancing Computer Vision with Pre-Trained Models

ByKishore March 20, 2024May 24, 2024

Transfer learning is a powerful technique in the field of deep learning, especially in computer vision, where it allows us to leverage pre-trained models to solve new tasks with limited data. In this blog post, we’ll explore transfer learning in the context of computer vision and demonstrate how it can be implemented using Python and…

Introduction

Data Overview

Customer Insights

Data Cleaning

Basket Creation

Market Basket Analysis

Conclusion

Similar Posts

Leave a Reply Cancel reply