## Explore the Depths of Buy-till-You-Die (BTYD) Modeling and Practical Coding Techniques

**TL; DR: **The Customer Lifetime Value (CLV) model is a key technique in customer analytics which help companies identify who valuable customers are. Neglecting CLV can lead to overinvestment in short-term customers who may only make a single purchase. ‘Buy Till You Die’ modeling, which utilizes the BG/NBD and Gamma-Gamma models, can estimate CLV. Although the best practices vary depending on data size and modeling priorities, PyMC-Marketing is a recommended Python library for those looking to quickly implement CLV modeling.

**The definition of CLV is the total net revenue a company can expect from a single customer throughout their relationship.** Some of you might be more familiar with the term ‘LTV’ (Lifetime Value). Yes, CLV and LTV are interchangeable.

- The first goal is to calculate and predict future CLV, which will help you find out how much money can be expected from each customer.
- The second objective is to identify profitable customers. The model will tell you who those valuable customers are by analyzing the characteristics of the high CLV customers.
- The third goal is to take marketing actions based on the analysis and from there, you will be able to optimize your marketing budget allocation accordingly.

Let’s take the e-commerce site of a fashion brand like Nike, for example, which might use advertisements and coupons to attract new customers. Now, let’s assume that college students and working professionals are two major important customer segments. For first-time purchases, the company spends $10 on advertising for college students and $20 for working professionals. And both segments make purchases worth around $100.

If you were in charge of marketing, which segment would you want to invest more in? You might naturally think it’s more logical to invest more in the college students segment, considering their lower cost and higher ROI.

So, what if you knew this information?

The college student segment tends to have a high churn rate, meaning they don’t purchase anymore after that one purchase, resulting in $100 being spent on average. On the other hand, the working professionals segment has a higher rate of repeat purchases, resulting in an average of $400 per customer.

In that case, you would likely prefer to invest more in the business professionals segment, as it promises a higher ROI. This may seem like a simple thing that anyone can understand. However, surprisingly, most marketing people are focused on achieving the Cost Per Acquisition (CPA), but they are not considering who the profitable customers are in the long run.

By adjusting the “cost per acquisition”, CPA, we can attract more high-value customers and improve our ROI. This graph on the left represents the approach without considering CLV. The red line represents CPA.’ , which is the maximum cost we can spend to get a new customer. Using the same marketing budget for every customer leads to overinvestment in low-value customers and underinvestment in high-value customers.

Now, the graph on the right side shows the ideal spending allocation when utilizing CLV. We set a higher CPA for high-value customers, and a lower CPA for low-value customers.

It’s similar to the hiring process. If you aim to hire ex-Goolers, offering a competitive salary is essential, right? By doing this, we can acquire more high-value customers without changing the total marketing budget.

The CLV model I’m introducing only uses sales transaction data. As you can see, we have three** **data columns: customer_id, transaction date, and transaction value. In terms of data volume, CLVs typically require two to three years of transaction data.

**4.1 Approaches for CLV Modeling**

Let’s start by understanding the two broad types to calculate CLV: the Historical Approach and the Predictive Approach. Under the Predictive approach, there are two models. The Probabilistic Model and the Machine Learning Models.

**4.2 Traditional CLV Formula**

First, let’s start by considering a traditional CLV formula. Here, CLV can be broken down into three components. : Average order value, Purchase Frequency, and Customer lifespan.

Let’s consider a fashion company for example, on average:

- Customers spend $100 per order
- They shop 4 times per year
- They stay loyal for 3 years

In this case, the CLV is calculated as 100 times 4 times 3, which equals $1,200 per customer. This Formula is very simple and looks straightforward, right? However, there are some limitations.

**4.3 Limitations of Traditional CLV Formula**

**Limitation #1: Not All Customers Are The Same**

This traditional formula assumes that all customers are homogenous by assigning one average number. When some customers make exceptionally large purchases, the average doesn’t represent the characteristics of all customers.

**Limitation #2 : Differences in First Purchase Timing**

Let’s say, we use the last 12 months as our data collection period.

This man made his first purchase about a year ago. In this case, we can accurately calculate his purchase frequency per year. It’s 8.

How about two customers? One started purchasing 6 months ago, and the other began 3 months ago. Everyone has been buying at the same pace. However, when we look at the total number of purchases over the past year, they differ. The key point here is we need to consider the tenure of the customer, meaning the duration since they made their first purchase.

**Limitation #3 : Dead or Alive?**

Determining when a customer is considered “churned” is tricky. For subscription services like Netflix, we can consider a customer to have churned once they unsubscribe. However, in the case of retail or E-commerce, whether a customer is ‘Alive’ or ‘Dead’ is ambiguous.

A customer’s ‘Probability of Being Alive’ depends on their past purchasing patterns. For example, if someone who normally buys every month doesn’t make a purchase in the next three months, they might switch to a different brand. However, there’s no need to worry if a person who typically shops only once every six months doesn’t buy anything in the next three months.

To address these challenges, we often turn to ‘Buy Till You Die’ (BTYD) modeling. This approach comprises two sub-models:

- BG-NBD model:This predicts the likelihood of a customer being active and their transaction frequency.
- Gamma-Gamma model: This estimates the average order value.

By combining the results from these sub-models, we can effectively forecast the Customer Lifetime Value (CLV).

**5.1 BG/NBD model**

We believe that there are two processes in the customer’s status: the ‘Purchase Process,’ where customers are actively buying, and the ‘Dropout Process,’ where customers have stopped purchasing.

During the Active Purchasing Phase, the model forecasts the customer’s purchase frequency with the “Poisson process”.

There’s always a chance that a customer might drop out after each purchase. The BG/NBD model assigns a probability ‘p’ to this possibility.

Consider the image below for illustration. The data indicates this customer made five purchases. However, under the assumption, the model thinks that if the customer had remained active, they would have made eight purchases in total. But, because the probability of being alive dropped at some point, we only see five actual purchases.

The purchase frequency follows a Poisson process while they are considered ‘active’. The Poisson distribution typically represents the count of randomly occurring events. Here, ‘λ’ symbolizes the purchase frequency for each customer. However, the customer’s purchase frequency can fluctuate. The Poisson distribution accounts for such variability in purchase frequency.

The graph below illustrates how ‘p’ changes over time. As the time since the last purchase increases (T=31), the probability of a customer being ‘alive’ decreases. When a repurchase occurs (around T=36), you’ll notice that ‘p’ increases once again.

This is the graphical model. As mentioned earlier, it includes lambda (λ) and p. Here, λ and p vary from person to person. To account for this diversity, we assume that heterogeneity in λ follows a gamma distribution and Heterogeneity in p follows a “beta distribution. In other words, this model uses a layered approach informed by Bayes’ theorem, which is also called Bayesian hierarchical modeling.

**5.2 Gamma-Gamma model**

We assume that Gamma Distribution models the Average Order Value. The Gamma Distribution is shaped by two parameters: the shape parameter and the scale parameter. As this graph shows, the form of the Gamma distribution can change quite a bit by changing these two parameters.

This diagram illustrates the graphical model in use. The model employs two Gamma distributions within a Bayesian hierarchical approach. The first Gamma distribution represents the “average order value” for each customer. Since this value differs among customers, the second Gamma distribution captures the variation in average order value across the entire customer base. The parameters p, q, and γ (gamma) for the prior distributions are determined by using Half-flat priors.

**Useful CLV libraries**

Here, let me introduce two great OSS libraries for CLV modeling. The first one is PyMC-Marketing and the second is CLVTools. Both libraries incorporate Buy-till-you-die modeling. The most significant difference is that PyMC-Marketing is a Python-based library, while CLVTools is R-based. PyMC-Marketing is built on PyMC, a popular Bayesian library. Previously, there was a well-known library called ‘Lifetimes’. However, ‘Lifetimes’ is now in maintenance mode, so it has transitioned into a PyMC-Marketing.

**Full code**

The full code can be found on my Github below. My sample code is based on yMC-Marketing’s official quick start.

**Code Walkthrough**

First, you will need to import pymc_marketing and other libraries.

`import arviz as az`

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import pymc as pm

from arviz.labels import MapLabellerfrom IPython.display import Image

from pymc_marketing import clv

You will need to download the “Online Retail Dataset” from the “UCI Machine Learning Repository”. This dataset contains transactional data from a UK-based online retailer and is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

`import requests`

import zipfile

import os# Download the zip file

url = "https://archive.ics.uci.edu/static/public/352/online+retail.zip"

response = requests.get(url)

filename = "online_retail.zip"

with open(filename, 'wb') as file:

file.write(response.content)

# Unzip the file

with zipfile.ZipFile(filename, 'r') as zip_ref:

zip_ref.extractall("online_retail_data")

# Finding the Excel file name

for file in os.listdir("online_retail_data"):

if file.endswith(".xlsx"):

excel_file = os.path.join("online_retail_data", file)

break

# Convert from Excel to CSV

data_raw = pd.read_excel(excel_file)

data_raw.head()

**Data Cleansing**

A quick data cleansing is needed. For instance, we need to handle return orders, filter out records without a customer ID, and create a ‘total sales’ column by multiplying the quantity and unit price together.

`# Handling Return Orders`

# Extracting rows where InvoiceNo starts with "C"

cancelled_orders = data_raw[data_raw['InvoiceNo'].astype(str).str.startswith("C")]# Create a temporary DataFrame with the columns we want to match on, and also negate the 'Quantity' column

cancelled_orders['Quantity'] = -cancelled_orders['Quantity']

# Merge the original DataFrame with the temporary DataFrame on the columns we want to match

merged_data = pd.merge(data_raw, cancelled_orders[['CustomerID', 'StockCode', 'Quantity', 'UnitPrice']],

on=['CustomerID', 'StockCode', 'Quantity', 'UnitPrice'],

how='left', indicator=True)

# Filter out rows where the merge found a match, and also filter out the original return orders

data_raw = merged_data[(merged_data['_merge'] == 'left_only') & (~merged_data['InvoiceNo'].astype(str).str.startswith("C"))]

# Drop the indicator column

data_raw = data_raw.drop(columns=['_merge'])

# Selecting relevant features and calculating total sales

features = ['CustomerID', 'InvoiceNo', 'InvoiceDate', 'Quantity', 'UnitPrice', 'Country']

data = data_raw[features]

data['TotalSales'] = data['Quantity'].multiply(data['UnitPrice'])

# Removing transactions with missing customer IDs as they don't contribute to individual customer behavior

data = data[data['CustomerID'].notna()]

data['CustomerID'] = data['CustomerID'].astype(int).astype(str)

data.head()

Then, we need to create a summary table using this ‘clv_summary’ function. The function returns the dataframe in an RFM-T format. RFM-T means Recency, Frequency, Monetary, and Tenure of each customer. These metrics are popular in shopper analysis.

`data_summary_rfm = clv.utils.clv_summary(data, 'CustomerID', 'InvoiceDate', 'TotalSales')`

data_summary_rfm = data_summary_rfm.rename(columns={'CustomerID': 'customer_id'})

data_summary_rfm.index = data_summary_rfm['customer_id']

data_summary_rfm.head()

**BG/NBD model**

The BG/NBD model is available as a BetaGeoModel function in this library. When you execute bgm.fit(), your model begins the training.

When you execute bgm.fit_summary(), the system provides a statistical summary of the learning process. For example, this table shows the mean, standard deviation, High-Density Interval, HDI for short, etc. for the parameters. We can also check r_hat value, which helps assess whether a Markov Chain Monte Carlo (MCMC) simulation has converged. R-hat is considered acceptable if it’s 1.1 or less.

`bgm = clv.BetaGeoModel(`

data = data_summary_rfm,

)

bgm.build_model()bgm.fit()

bgm.fit_summary()

The matrix below is called the Probability Alive Matrix. With this, we can infer users who are likely to return and those who are unlikely to return. The X-axis represents the customer’s historical purchase frequency and the y-axis represents the customers’ recency. The color shows the probability of being alive. Our new customers are in the bottom-left corner: Low frequency and high recency. Those customers have a high probability of being alive. Our loyal customers are those on the bottom-right: High-frequency and High-recency customers. If they don’t purchase for a long time, loyal customers become at-risk customers, which have low probability of being alive.

`clv.plot_probability_alive_matrix(bgm);`

The next thing we can do is to predict the future transactions for each customer. You can use the expected_num_purchases function. Having fit the model, we can ask what is the expected number of purchases in the next period.

`num_purchases = bgm.expected_num_purchases(`

customer_id=data_summary_rfm["customer_id"],

t=365,

frequency=data_summary_rfm["frequency"],

recency=data_summary_rfm["recency"],

T=data_summary_rfm["T"]

)sdata = data_summary_rfm.copy()

sdata["expected_purchases"] = num_purchases.mean(("chain", "draw")).values

sdata.sort_values(by="expected_purchases").tail(4)

**Gamma-Gamma model**

Next, we will move on to the Gamma-Gamma model to predict the average order value. We can predict the expected “average order value” with ‘Expected_customer_spend’ function.

`nonzero_data = data_summary_rfm.query("frequency>0")`

dataset = pd.DataFrame({

'customer_id': nonzero_data.customer_id,

'mean_transaction_value': nonzero_data["monetary_value"],

'frequency': nonzero_data["frequency"],

})

gg = clv.GammaGammaModel(

data = dataset

)

gg.build_model()

gg.fit();expected_spend = gg.expected_customer_spend(

customer_id=data_summary_rfm["customer_id"],

mean_transaction_value=data_summary_rfm["monetary_value"],

frequency=data_summary_rfm["frequency"],

)

The graph below shows the expected average order value of 5 customers. The average order value of these two customers is more than $500, while the average order value of these three customers is around $350.

`labeller = MapLabeller(var_name_map={"x": "customer"})`

az.plot_forest(expected_spend.isel(customer_id=(range(5))), combined=True, labeller=labeller)

plt.xlabel("Expected average order value");

**Outcomes**

Finally, we can combine two sub-models to estimate the CLV of each customer. One thing I want to mention here is the parameter: Discount_rate. This function uses the DCF method, short for “discounted cash flow.” When a monthly discount rate is 1%, $100 in one month is worth $99 today.

`clv_estimate = gg.expected_customer_lifetime_value(`

transaction_model=bgm,

customer_id=data_summary_rfm['customer_id'],

mean_transaction_value=data_summary_rfm["monetary_value"],

frequency=data_summary_rfm["frequency"],

recency=data_summary_rfm["recency"],

T=data_summary_rfm["T"],

time=120, # 120 months = 10 years

discount_rate=0.01,

freq="D",

)clv_df = az.summary(clv_estimate, kind="stats").reset_index()

clv_df['customer_id'] = clv_df['index'].str.extract('(\d+)')[0]

clv_df = clv_df[['customer_id', 'mean', 'hdi_3%', 'hdi_97%']]

clv_df.rename(columns={'mean' : 'clv_estimate', 'hdi_3%': 'clv_estimate_hdi_3%', 'hdi_97%': 'clv_estimate_hdi_97%'}, inplace=True)

# monetary_values = data_summary_rfm.loc[clv_df['customer_id'], 'monetary_value']

monetary_values = data_summary_rfm.set_index('customer_id').loc[clv_df['customer_id'], 'monetary_value']

clv_df['monetary_value'] = monetary_values.values

clv_df.to_csv('clv_estimates_output.csv', index=False)

Now, I am going to show you how we can improve our marketing actions. The graph below shows an estimated CLV by Country.

`# Calculating total sales per transaction`

data['TotalSales'] = data['Quantity'] * data['UnitPrice']

customer_sales = data.groupby('CustomerID').agg({

'TotalSales': sum,

'Country': 'first' # Assuming a customer is associated with only one country

})customer_countries = customer_sales.reset_index()[['CustomerID', 'Country']]

clv_with_country = pd.merge(clv_df, customer_countries, left_on='customer_id', right_on='CustomerID', how='left')

average_clv_by_country = clv_with_country.groupby('Country')['clv_estimate'].mean()

customer_count_by_country = data.groupby('Country')['CustomerID'].nunique()

country_clv_summary = pd.DataFrame({

'AverageCLV': average_clv_by_country,

'CustomerCount': customer_count_by_country,

})

# Calculate the average lower and upper bounds of the CLV estimates by country

average_clv_lower_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_3%'].mean()

average_clv_upper_by_country = clv_with_country.groupby('Country')['clv_estimate_hdi_97%'].mean()

# Add these averages to the country_clv_summary dataframe

country_clv_summary['AverageCLVLower'] = average_clv_lower_by_country

country_clv_summary['AverageCLVUpper'] = average_clv_upper_by_country

# Filtering countries with more than 20 customers

filtered_countries = country_clv_summary[country_clv_summary['CustomerCount'] >= 20]

# Sorting in descending order by CustomerCount

sorted_countries = filtered_countries.sort_values(by='AverageCLV', ascending=False)

# Prepare the data for error bars

lower_error = sorted_countries['AverageCLV'] - sorted_countries['AverageCLVLower']

upper_error = sorted_countries['AverageCLVUpper'] - sorted_countries['AverageCLV']

asymmetric_error = [lower_error, upper_error]

# Create a new figure with a specified size

plt.figure(figsize=(12,8))

# Create a plot representing the average CLV with error bars indicating the confidence intervals

# We convert the index to a regular list to avoid issues with matplotlib's handling of pandas Index objects

plt.errorbar(x=sorted_countries['AverageCLV'], y=sorted_countries.index.tolist(),

xerr=asymmetric_error, fmt='o', color='black', ecolor='lightgray', capsize=5, markeredgewidth=2)

# Set labels and title

plt.xlabel('Average CLV') # x-axis label

plt.ylabel('Country') # y-axis label

plt.title('Average Customer Lifetime Value (CLV) by Country with Confidence Intervals') # chart title

# Adjust the y-axis to display countries from top down

plt.gca().invert_yaxis()

# Show the grid lines

plt.grid(True, linestyle='--', alpha=0.7)

# Display the plot

plt.show()

Customers in France tend to have a high CLV. On the other hand, customers in Belgium tend to have a lower CLV. From this output, I recommend increasing the marketing budget for acquiring customers in France and reducing the marketing budget for acquiring customers in Belgium. When we do the modeling with the U.S.-based data., we would use the states instead of the country.

You might be wondering:

- Can we utilize additional types of data, such as access logs?
- Is it possible to incorporate more features like demographic information or marketing activity into the model?

Basically, the BTYD model only requires transaction data. If you want to use other data or other features, an ML approach might be an option. After that, you can assess the performance of both Bayesian and ML models, choosing the one that offers better accuracy and interpretability.

The flowchart below shows a guideline for better CLV modeling.

First, consider your data size. If your data isn’t large enough or you only have transaction data, BTYD modeling using PyMC Marketing might be the best choice. Even though your data is large enough, I think a good approach is to start with a BTYD model and if it underperforms, try a different approach. Specifically, if your priority is accuracy over interpretability, neural networks, XGboost, LightGBM, or ensemble techniques could be beneficial. If interpretability is still important to you, consider methods like Random Forest or the explainable AI approach.

In summary, I recommend starting with PyMC Marketing is a good first step in any case!

Here are some key takeaways.

- Customer lifetime value (CLV) is the total net profit a company can expect from a single customer throughout their relationship.
- We can build a Probabilistic model (BTYD) using the BG/NBD model and the Gamma-Gamma model.
- If you are familiar with Python, PyMC-Marketing is where you can start.

Thank you for reading! If you have any questions/suggestions, feel free to contact me on Linkedin! Also, I would be happy if you follow me on Towards Data Science.

This post originally appeared on TechToday.