2019-05-31 12:52:41.441911
As Data Scientists as Warply, all our projects are driven by a relationship commerce imperative. Our experience shows that metrics, such as broad averages, are usually not enough to detect hidden patterns on customer level. Such broad metrics makes it difficult to gain a full picture of what makes each customer unique, and therefore difficult to take effective and profitable bussiness decitions.
In order to address this problem, the models that our Data Science team build, focus mainly on the purchasing behaviour of each individual customer. Such an approach is particularly useful when we try to predict the Customer Lifetime Value (abbreviated CLV) of customers, which is defined as the total amount that customers will spend over their lifetime.
Unfortunately, most companies that estimate the CLV of their customers, are forecasting customer spending in aggregate, faceting by acquisition channel or other broad categories based on geography, sales channels, or demographics. Our experience shows that such an approach is completely wrong and can lead to largely unseen biases. Companies that ignore individual level variation in CLV are likely to over- or underestimate their customers’ future spending. This can lead to an incorrect picture of the health of the business, resulting in misallocated resources, misguided strategies, and missed revenue opportunities.
The ultimate goal is to undertand the potential spending patterns of individual customers in order to unlock the full potential of relationship-building and loyalty programs.
The approach that we describe in this blogpost, leverages the latest advances in probabilistic programming, in order to estimate the full distributions of potential value for every single customer in our dataset. We will use the Pythonic package PyStan (a package for Bayesian Inference that uses the No-U-Turn sampler, a faster variant of Hamiltonian Monte Carlo) to estimate the CLV with individual-level granularity.
from IPython.display import Image
Image(filename='/Users/warply/Desktop/CLV.png', width = 1000, height = 1000)
Customers exhibit a wide range of spending behavior, from one-and-done transactions to long-term relationships or subscriptions (loyal customers). Understanding and leveraging those differences lets companies build and protect customer relationships and increase market share. For example, if two companies X and Y have the same average customer churn rate, but company X has a mix of very loyal customers and one-and-done customers, and all of company Y’s customers are homogenous, company X can be much more valuable — if they recognize this and act upon it to their advantage.
At Warply, we provide Loayalty solutions to hundreds of customers, so taking into account the customer-level heterogeneity is of paramount importance!
To illustrate our approach to CLV modeling, we use a simplified dataset, which includes only customer id's, purchase amounts and dates. The dataset comes from a Big Player in the Retail Industry that uses Warply's Loyalty Solutions. All the data has been cleaned and anonymized, due to a confidentiality agreement. The first purchase in the dataset occured on 7th of March, 2019 and the last on 10th of May.
We won't dive into the math behind the model, as it can get very tricky (For those interesed in the inner-workings of the CLV model, a worth reading paper that explains the mathematics, is that of Peter S. Fader and Bruce G. S. Hardie: "here"). Instead, we will focus on the intuition, which is very simple but equally important!
We break each customer’s spending into three component parts:
1) Transaction rate: which is the number of transactions a customer makes in a given time period.
2) Dropout rate: is the probability that a customer stops shopping over a given time period, and
3) Average spending: which is the customer’s average transaction amount.
The steps that we follow bellow, are the typical of a Data Science pipeline, and involves: Data Cleaning/Preprocessing, Data Visualization, Model Selection, Model Evaluation and a lot of iterations!
As you can see in the cells below, first we import all the required libraries for this project, then we load the dataset into a Pandas DataFrame, convert the transaction_date column to a datetime column and finally check the final dataframe for missing values.
#import all libraries
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import datetime
from datetime import datetime
import os
import sys
import numpy as np
import pystan
import time
import arviz as az
import joypy
%matplotlib inline
#load data
df = pd.read_csv(filepath_or_buffer = "/Users/warply/Desktop/transactional_data.txt")
#display the first 5 rows of the dataframe
df.head(n = 5)
#display the last 5 rows of the dataframe
df.tail(n = 5)
print("Number of rows in the dataframe: {:,}".format(df.shape[0]))
print("Date of First Transaction: {}".format(df["transaction_date"].min()))
print("Date of Last Transaction: {}".format(df["transaction_date"].max()))
#get the data type of each columns
df.dtypes.to_frame(name = "data_types")
#convert dates into a datetime format
df["transaction_date"] = pd.to_datetime(df["transaction_date"])
#check for missing values
df.isnull().sum(axis = 0).to_frame(name = "number_of_missing_values")
Using a scatterplot, we will plot the purchasing behaviour of the first onehundred customers in the dataset for the time period between 7th of March, 2019 and the 10th of May.
Each dot in the plot represents a transaction, the x-axis shows when it occurred in time, and the size of the dot shows its relative amount. Some customers have been making transactions regularly throughout the entire period; others have made few transactions sparsely. Some customers tend to spend a lot in each transaction, others generally spend little.
The most interesting part here is the purchasing pattern a few days before 7th of April, where the purchase frequency increses for almost all customers.
#get the first 100 customers
consumer_mask = df["customer_id"] < 100
onehundred_customers = df[consumer_mask].sort_values(by = "customer_id", ascending = True)
print("The dataframe contains: {} rows.".format(onehundred_customers.shape[0]))
print("The dataframe contains: {} customers.".format(onehundred_customers["customer_id"].nunique()))
print("The first transaction happened on: {}".format(str(onehundred_customers["transaction_date"].min()).split(" ")[0]))
print("The last transaction happened on: {}".format(str(onehundred_customers["transaction_date"].max()).split(" ")[0]))
#get the labels and indexes for the first 100 customers
onehundred_customer_indexes = onehundred_customers["customer_id"].unique().tolist()
onehundred_customer_labels = list(map(lambda e: "Customer Id: {}".format(e), onehundred_customer_indexes))
#matplotlib.rcdefaults()
#register matplotlib converters
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
from matplotlib.ticker import MultipleLocator
import matplotlib.dates as dt
fig, ax = plt.subplots(figsize = (60, 45),
dpi = 100)
scatter = ax.scatter(onehundred_customers["transaction_date"],
onehundred_customers["customer_id"],
s = 10*onehundred_customers["value"],
alpha=0.5,
linewidths = 2.,
edgecolors = "r")
#set the grid for each minor tick
minor_ticks = np.arange(0, 100, 1)
ax.set_yticks(minor_ticks, minor=True)
ax.grid(which='minor', alpha=0.5)
ax.xaxis.grid(linewidth=1.)
#delete the ticks/labels on the y-axis and plot the labels on the y-axis
ax.yaxis.set_major_locator(plt.NullLocator())
ax.set_yticklabels(onehundred_customer_labels, minor = True, size = 30)
#set the limits of the x-axis
ax.set_xlim((datetime(2019, 3, 6), datetime(2019, 5, 20)))
#set the ticks and labels on the x-axis
ax.xaxis.set_minor_locator(dt.DayLocator())
ax.set_xticks([datetime(2019, 3, 7)] + [datetime(2019, month, 7) for month in range(4, 6)] + [datetime(2019, 5, 10)])
tick_labels = [datetime(2019, 3, 7)] + [datetime(2019, month, 7) for month in range(4, 6)] + [datetime(2019, 5, 10)]
ax.set_xticklabels(list(map(lambda e: e.strftime("%Y/%m/%d"), tick_labels)), size = 35)
#rotate the last label
ax.get_xticklabels()[-1].set_rotation(90)
#set the title name
ax.set_title("Customer Purchases - First 100 Customers", size = 65, y = 1.004)
#set the vertical line
ax.vlines(x = datetime(2019, 5, 10),
ymin = 0,
ymax = 100,
colors = "k",
linestyles = "dashed",
**{"linewidths": 5})
#set the text right to the vertical line
ax.text(s = '?',
x = 0.935,
y = 0.5,
horizontalalignment='center',
verticalalignment='center',
wrap = True,
transform=ax.transAxes,
fontsize = 130,
**{"clip_on":True})
ax.text(s = 'Forecast Period',
x = 0.935,
y = 0.019,
horizontalalignment='center',
verticalalignment='center',
wrap = True,
transform=ax.transAxes,
fontsize = 50,
**{"clip_on":True})
In order to use the Bayesian model, we transform the data to a RFM format by calculating the following columns/quantities: days_since_first_transaction: the days that have passed since the first purchase of a customer, days_since_last_transaction: the days that have passed since the last purchase of a customer, average_money_spent: the average amount of money a customer spend during his "lifetime" up to date and the numer_of_repeated_transactions: which inludes the number of repeated transactions a customer made since his first purchase.
#display the first 5 rows of the original dataframe
df.head(n = 5)
#sort the dataframe in a descending order by the customer_id column
df = df.sort_values(by = "customer_id", ascending = True)
#find the first transaction of each customer
first_transaction_df = df.groupby(["customer_id"])["transaction_date"].min().to_frame(name = "first_transaction")
#find the last transaction of each customer
last_transaction_df = df.groupby(["customer_id"])["transaction_date"].max().to_frame(name = "last_transaction")
#find the number of days since the first transaction for each customer
days_since_first_transaction = first_transaction_df.apply(lambda t: datetime.now() - t)\
.loc[:,"first_transaction"].dt.days\
.to_frame(name = "days_since_first_transaction")
#find the number of days since the last transaction for each customer
days_since_last_transaction = last_transaction_df.apply(lambda t: datetime.now() - t)\
.loc[:,"last_transaction"].dt.days\
.to_frame(name = "days_since_last_transaction")
#find the average amount each customer spent
average_money_spent = df.groupby(["customer_id"])["value"].mean().to_frame(name = "average_money_spent")
#find the number of repeated transactions for each customer
numer_of_repeated_transactions = df.groupby(["customer_id"])["transaction_date"].count().to_frame(name = "number_of_repeated_transactions")
#get the customer id as a separate column
customer_id = numer_of_repeated_transactions.reset_index(drop = False).loc[:,["customer_id"]]
customer_id = customer_id.set_index("customer_id", drop = False)
#concatenate all the tables above in a final dataframe
final_df = pd.concat([customer_id,
days_since_first_transaction,
days_since_last_transaction,
average_money_spent,
numer_of_repeated_transactions],
axis = "columns")
#display the first 5 rows in the dataframe
final_df.head(n = 5)
print("The number of unique customers is equal to: {:,}".format(final_df["customer_id"].nunique()))
We rename all the columns above and finally get the following data on each customer:
#rename the columns according to the following scheme:
# number_of_repeated_transactions : x
# days_since_last_transaction : t_x (most recent transaction)
# days_since_first_transaction : t_cal (oldest transaction)
# average_transaction_amount: mx (total amount of spent money)
final_df.rename(columns = {"number_of_repeated_transactions":"x",
"days_since_last_transaction":"t_x",
"days_since_first_transaction":"t_cal",
"average_money_spent":"mx"},
inplace = True)
Now, let's create the PyStan CLV model.
In order to estimate distributions for each customers’ transaction rate, churn rate and lifetime value, we use the following Bayesian model of customer lifetime value based on Fader and Hardie work (2003 and 2013).
For each customer we define the following probabilistic quantities:
For all customers we define the following quantities:
We take advantage of the PysTAN blocks of code to declare and transform our data and parameters, define our prior and likelihood functions.
In the functions block, we combine the transaction rate and dropout rate distributions into a custom likelihood function.
#define the PyStan model
line_code = """
// Likelihood function
functions {
vector llh(vector x, vector t_x, vector t_cal,
vector lambda1, vector mu1,
vector log_lambda, vector log_mu,
vector log_mu_lambda, int N) {
vector[N] p_1;
vector[N] p_2;
p_1 = x .* log_lambda + log_mu - log_mu_lambda - t_x .* mu1 - t_x .* lambda1;
p_2 = x .* log_lambda + log_lambda - log_mu_lambda - t_cal .* mu1 - t_cal .* lambda1;
return(log(exp(p_1) + exp(p_2)));
}
}
data {
int<lower = 1> N; // Number of customers
int<lower = 0> N_months; // Number of months for ltv calibration
vector<lower = 0>[N] x; // Repeat transactions per customer (frequency)
vector<lower = 0>[N] t_x; // Time of most recent transation (recency)
vector<lower = 0>[N] t_cal; // Time since first transaction
vector[N] mx; // Average transaction amount
}
transformed data {
vector<lower = 0>[N] x_tot; // Total number of transactions per cust
x_tot = x + 1;
}
parameters {
vector<lower = 0>[N] lambda; // Transaction rate
vector<lower = 0>[N] mu; // Dropout probability
real<lower = 0> rr; // Transaction shape parameter
real<lower = 0> alpha; // Transaction scale parameter
real<lower = 0> ss; // Dropout shape parameter
real<lower = 0> beta; // Dropout scale parameter
real <lower=0> p; // Shape of trans amt gamma
vector<lower=0>[N] v; // Scale of trans amt gamma (cust specific)
real <lower=0> q; // Shape of scale dist gamma
real <lower=0> y; // Scale of scale dist gamma
}
transformed parameters {
vector[N] log_lambda;
vector[N] log_mu;
vector[N] log_mu_lambda;
vector[N] log_lik;
vector<lower=0> [N] px; // Shape of total spend distribution
vector<lower=0> [N] nx; // Scale of total spend distribution
px = p * x_tot;
nx = v .* x_tot;
log_lambda = log(lambda);
log_mu = log(mu);
log_mu_lambda = log(mu + lambda);
log_lik = llh(x, t_x, t_cal, lambda, mu, log_lambda, log_mu, log_mu_lambda, N);
}
model {
// Priors for rates
lambda ~ gamma(rr, alpha);
mu ~ gamma(ss, beta);
rr ~ exponential(1);
alpha ~ exponential(1);
ss ~ exponential(0.1);
beta ~ exponential(0.1);
// Likelihood for rate
target += log_lik;
// Priors for spend
p ~ exponential(0.1);
q ~ exponential(0.1);
y ~ exponential(0.1);
v ~ gamma(q, y);
// Likelihood for spend
mx ~ gamma(px, nx);
}
generated quantities {
vector[N] p_alive; // Probability that they are still "alive"
vector[N] exp_trans; // Expected number of transactions
vector[N] mx_pred; // Per transaction spend
vector[N] lt_val; // Lifetime value
for(i in 1:N) {
p_alive[i] = 1/(1+mu[i]/(mu[i]+lambda[i])*(exp((lambda[i]+mu[i])*(t_cal[i]-t_x[i]))-1));
exp_trans[i] = (lambda[i]/mu[i])*(1 - exp(-mu[i]*N_months));
mx_pred[i] = gamma_rng(px[i], nx[i]);
}
lt_val = exp_trans .* mx_pred;
}
"""
#check the PyStan version
print("PyStan version: {}".format(pystan.__version__))
Next, we compile our probabilistic model:
#compile the PyStan model
start = time.time()
sm = pystan.StanModel(model_code = line_code, model_name = 'CVL')
end = time.time()
print("PyStan model compiled in: {0:.2f}sec".format(end - start))
We can see that the model compiles in about 42.65 second, which is a reasonable time.
The dataset is too big - it contains 18.004 unique customers! To reduce the model inference time, we sample our initial dataset to create a new small dataset of about 90 customers.
#get a sample from the data
final_df_sampled = final_df.sample(frac = 0.005)
#save all the sampled data to a dictionary
CLV_data = dict(x = final_df_sampled.loc[:,"x"],
t_x = final_df_sampled.loc[:,"t_x"],
t_cal = final_df_sampled.loc[:,"t_cal"],
mx = final_df_sampled.loc[:,"mx"],
N = final_df_sampled.shape[0],
N_months = 2)
Finally, we hit the MCMC button!
#perform sampling
fit = sm.sampling(data = CLV_data,
chains = 10,
iter = 3000,
warmup = 1000)
Once we have performed MCMC sampling, it's time to assess the convergence of our simulation.
az.style.use('arviz-darkgrid')
#initialize the plot
fig, ax = plt.subplots(figsize = (12,7))
#plot marginal energy
inference_data = az.convert_to_inference_data(fit)
az.plot_energy(inference_data, ax = ax)
ax.set_title("Marginal Energy", size = 15)
Now, we are ready to plot, for each customer, the estimated full distributions for their feature transactions rates, churn probability and total spending.
The plots below display the distributions of feature transactions, churn probabilities and lifetime value for all of our customers.
#load samples into a pandas dataframe
exp_trans_distribution = pd.DataFrame(data = fit.extract()["exp_trans"],
columns = final_df_sampled["customer_id"].values)
#transform data into a specific format for visualization
exp_trans_distribution_unstacked = exp_trans_distribution.unstack()
exp_trans_distribution_unstacked = exp_trans_distribution_unstacked.to_frame(name = "exp_trans_distribution")
exp_trans_distribution_final = exp_trans_distribution_unstacked.reset_index(level = 0)\
.rename(columns = {"level_0":"customer_id"})
#get customer indexes
customer_indexes = exp_trans_distribution_final["customer_id"].unique()
#get the customer Id's in a nice format
customers = list(map(lambda e: "{:,}".format(e), customer_indexes))
#get the labels on the y-axis
labels = list(map(lambda e: "Customer Id: " + e ,customers))
fig, axes = joypy.joyplot(exp_trans_distribution_final,
by = "customer_id",
column = "exp_trans_distribution",
range_style = "own",
figsize = (10,15),
legend = False,
grid = "both",
fade = True,
linewidth = 0.5,
kind = "counts",
tails = 0.01,
x_range = [-0.07,1],
labels = labels)
axes[-1].set_title("Expected Transactions - Next Two Months)", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel("Days", size = 19)
axes[-1].set_xticklabels(["0d","15d","30d","45d", "60d"])
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.295, -10)
#set the label size on the y-axis
for i, axis in enumerate(axes):
axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)
#get the churn_rate probability = 1 - p_alive
churn_p_distribution = pd.DataFrame(data = 1 - fit.extract()["p_alive"],
columns = final_df_sampled["customer_id"].values)
#get the churn distribution per customer
churn_p_dist = churn_p_distribution.unstack()\
.to_frame(name = "churn_rate_dist")\
.reset_index(level = 0, drop = False)\
.rename(columns = {"level_0":"customer_id"})
#churn_p_dist.describe()
#churn_p_dist["churn_rate_dist"] = churn_p_dist["churn_rate_dist"].apply(lambda e: 1 if e > 1 else 0)
fig, axes = joypy.joyplot(churn_p_dist,
by = "customer_id",
column = "churn_rate_dist",
range_style = "own",
figsize = (10,15),
legend = False,
grid = "both",
fade = True,
linewidth = 0.5,
kind = "counts",
x_range = [-0.2,1.25], #-0.15
labels = labels)
axes[-1].set_title("Expected Churn - Next Two Months", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel(" ", size = 19)
axes[-1].set_xlabel("Churn Probability", size = 19)
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.25, -9)
#set the label size on the y-axis
for i, axis in enumerate(axes):
axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)
lt_val_distribution = pd.DataFrame(data = fit.extract()["lt_val"],
columns = final_df_sampled["customer_id"].values)
#get the churn distribution per customer
lt_val_dist = lt_val_distribution.unstack()\
.to_frame(name = "lt_val_dist")\
.reset_index(level = 0)\
.rename(columns = {"level_0":"customer_id"})
fig, axes = joypy.joyplot(lt_val_dist,
by = "customer_id",
column = "lt_val_dist",
range_style = "own",
figsize = (30,25),
legend = False,
grid = "both",
fade = True,
kind = "kde",
labels = labels,
x_range = [-0.07,5],
tails = 0.01,)
axes[-1].set_title("Customer Lifetime Value - Next Two Months", size = 20)
axes[1].set_ylabel("Customers", size = 19)
axes[-1].set_xlabel("Revenue (€)", size = 19)
axes[-1].set_xticklabels(["0€","100€","200€","300€", "400€"])
#set the y-label position on the y-axis
axes[1].yaxis.set_label_coords(-0.25, -5)
#set the label size on the y-axis
for i, axis in enumerate(axes):
axes[i].tick_params(axis = 'y', which = 'major', labelsize = 12.5)
#set the label size on the x-axis
axes[-1].tick_params(axis = 'x', which = 'major', labelsize = 14.5)
The customer lifetime value (CLV) metric might not sound very important, but failing to calculate it can put you behind your competitors.
CLV tells you how well you’re resonating with your audience, how much your customers like your products or services, and what you’re doing right — as well as how you can improve. It is essentially a measurement of how much a business’s customers are worth over their lifetime.
Despite its importance, most companies are doing it wrong. At face value, CLV is an easy concept to understand. In practice though, it’s deceptively hard to implement in a way that accurately captures the variation in customer behavior.
Most companies are calculating their customers CLV by aggregate, using segments such as acquisition channel or other broad categories based on geography, sales channels, or demographics. From our experience, such an approach can lead to large unseen biases and therefore, to an incorrect picture of the health of the business, which can subsequently result in misallocated resources, misguided strategies, and missed revenue opportunities.
In this blogpost we showed how by using a simple Bayesian model, our DS team managed to overcome all the problems mentioned above and deliver an easy-to-deploy and highly interpretable analytics solution to one of the biggest players in the Retail Industry. And the best af all: by estimating the CLV with individual-level granularity, you automatically get bonus two additional metrics: the expected transactions and the churn probability of each customer.
In the end, I think we all agree that CLV is so valuable to every business, that putting in the time and study to estimate it properly it’s definitely worth it.
2019-01-21 12:49:52.633478
2018-10-19 08:22:01.419442
2018-09-10 11:25:54.616537
2018-09-03 13:50:05.468625