Which Customers Should Starbucks Attract and How?

19 min readDec 15, 2020

A Capstone Project for the Udacity Data Science NanoDegree

1. Project Definition

Project Overview

Starbucks create offers and share them with their customers frequently to improve their sales. Some of those offers are more effective than others as the customers leverage them to get the best price for their favorite coffee. Others are not that effective as customers don’t see them, or use them, and some are unnecessary! as the customers will get their favorite coffee regardless of the price and the offer!

In this article, I will analyze a data set that contains simulated data that mimics customer behavior on the Starbucks rewards mobile app to answer the following questions:

Which offer is the most effective?
Which demographic groups respond best to offers?

This is a capstone project of the Udacity Data Science Nanodegree Program.

Problem Statement

The goal of this project is to combine transaction, demographic, and offer data to determine which demographic groups respond best to which offer type. This dataset is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks sells dozens of products.

This goal is achieved by following this startegy:

Explore and visualize the data set
Pre-process and clean the data
Analyze the data
Create a supervised machine learning model
Evaluate the model using the performance metric accuracy
Enhance the model if needed

Metrics

To evaluate the ML model, I’ll use the metric: accuracy. This metric is used since it will let us know how well the model is predicting by calculating the ratio of number of correct predictions to the total number of input samples. In addition, the problem I have in hand is a simple classification problem. Therefore, accuracy should be suffecient.

2. Analysis

Data Exploration and Visualization

“ Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks.

Not all users receive the same offer, and that is the challenge to solve with this data set. In addition, someone using the app might make a purchase through the app without having received an offer or seen an offer.

Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You’ll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.”

This data set has three files:

portfolio: It contains information about each offer. The schema and explanation of each variable in the files:

It has 6 columns:

id (string) — offer id
offer_type (string) — type of offer ie BOGO, discount, informational
difficulty (int) — minimum required spend to complete an offer
reward (int) — reward given for completing an offer
duration (int) — time for offer to be open, in days
channels (list of strings) — four possible channels are used: email, mobile, social and/or web

These are all the 10 records in the portfolio data set

Exploration of portfolio:

There are 10 different offers. Each one has a unique id.
The data frame has no null values
It has no duplicates
There are 4 different channels where the offers can be sent to customers: email, mobile, web and social. Not all offers are sent in the same ways
There are 3 types of offers: BOGO (buy one, get one), discount and informational. Four of the offers are BOGO, four are discount, and two are informational
There are different reward amount for the offers. Informational doesn’t have any reward
For every offer, there is a validity duration in days. The offer expires after the duration is over
Difficulty is measured in dollars. In order to earn an offer, there is a minimum amount of money to be spent

2. profile: It has the demographic data of 17,000 unique customers. The columns are:

age (int) — age of the customer
became_member_on (int) — date when customer created an app account
gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
id (str) — customer id
income (float) — customer’s income

These are 10 records from the profile data set. There are around 2,175 records missing data such as age and gender. Those customers will be eliminated from this analysis.

Exploration of profile:

It has no duplicates
It has the demographic data of 17,000 unique customers. However, there are 2,175 records with no income nor gender, and those records were also missing age (the value recorded was 118 for all of those 2,175, and it was given in the project overview that the missing values in the age column were encoded as 118). These will be cleaned (dropped) in the preprocessing step. Therefore, we are keeping 14,825 unique customers
Gender of Starbucks customers:

There are 3 gender categories: Female, Male, Others as shown in the bar chart below:

gender
F    6129
M    8484
O     212

More males than females get their drinks from Starbucks.

Age of Starbucks Customers:

Age ranges between 18–101

The average age of the customers is 54.4. Age distribution is shown in the histogram below:

Income of Starbucks Customers:

Income of customers ranges from $30,000 to $120,000

The average income of customers is $65,405. The distribution is shown in the histogram below:

3. transaction: It has 306,534 records for transactions, offers received, offers viewed, and offers completed. The 4 columns are:

event (str) — record description (ie transaction, offer received, offer viewed, etc.)
person (str) — customer id
time (int) — time in hours since start of test. The data begins at time t=0
value — (dict of strings) — either an offer id or transaction amount depending on the record

These are 10 records from the transcript data set

Exploration of transcript:

It has no duplicates nor missing values (null)
There are 4 events: offer received, offer viewed, transaction and offer completed
The status of events are grouped as shown in the table and graph below:

Event             Records
offer completed     33579
offer received      76277
offer viewed        57725
transaction        138953

As mentioned above, there are three types of offers, shown below:

As for the completed offers:


Event            Offer Type       Records
offer_completed  bogo             15100
                 discount         16970
offer_received   bogo             26537
                 discount         26664
                 informational    13300
offer_viewed     bogo             22039
                 discount         18461
                 informational     9360

Column value is tricky as it has a dictionary of different values based on the value in column event:

When event == offer recieved:

{'offer id': '9b98b8c7a33c4b65b9aebfe6a799e6d9'}

When event == offer viewed:

{'offer id': 'f19421c1d4aa40978ebb69ca19b0e20d'}

When event == transaction:

{'amount': 34.56}

When event == offer completed:

{'offer_id': '2906b810c7d4411798c6938adc9daaa5', 'reward': 2}

Feature Engineering will be applied on this column to extract useful information in the preprocessing step.

3. Methodology:

Data Preprocessing

portfolio:

Copy the portfolio data frame to portfolio_preprocessed_data. I’ll keep the original file, portfolio, as is since I need to go back to it sometimes to understand the data more. I’ll be using the latter in the preprocessing and analysis
Rename columns:

“id” to “offer_id”. This is to prepare for joining this data set with transcript
“duration” to “duration_days”. This is because I’ll create another column “duration_hours”. This step is to prepare for possible analysis of validating the expiration of the offers

3. Replace the offer ids with more descriptive ids:

The 4 BOGO offers will start with B followed with a number
The 2 Informational offers will start with I followed with a number
The 4 Discount offers will start with D followed by a number

‘ae264e3637204a6fb9bb56bc8210ddfd’: ‘B1’,
‘4d5c57ea9a6940dd891ad53e9dbe8da0’: ‘B2’,
‘3f207df678b143eea3cee63160fa8bed’: ‘I1’,
‘9b98b8c7a33c4b65b9aebfe6a799e6d9’: ‘B3’,
‘0b1e1539f2cc45b7b9fa7c272da2e1d7’: ‘D1’,
‘2298d6c36e964ae4a3e7e9706d1fb8c2’: ‘D2’,
‘fafdcd668e3743c1bb461111dcafc2a4’: ‘D3’,
‘5a8bc65990b245e5a138643cd4eb9837’: ‘I2’,
‘f19421c1d4aa40978ebb69ca19b0e20d’: ‘B4’,
‘2906b810c7d4411798c6938adc9daaa5’: ‘D4’

4. Create new columns:

“duration_hours” = duration_days / 24

5. Feature Engineering to handle Categorical Data:

Convert them to Numerical Data (binary/dummy variables) using One-Hot encoding:

channels column will be expanded to 4 columns (add a prefix to enhance readability: channel_), then it’ll be dropped:

channel_email, channel_web, channel_social, channel_mobile

offer_type will be expanded to 3 columns (I’m also adding a prefix: offer_type_). However, I’m going to keep it as I will use it in the analysis:

offer_type_bogo, offer_type_discount, offer_type_informational

6. Re-order the columns to enhance the readability of the data frame

Results of data preprocessing:

profile:

Copy the profile data frame to profile_preprocessed_data. I’ll keep the original file, profile, as is since I need to go back to it sometimes to understand the data more. I’ll be using the latter in the preprocessing and analysis
Drop the rows of null values. All rows where age is 118 will be dropped.
Rename columns:

“id” to “customer_id”. This is to prepare for joining this data set with transcript
“became_member_on” to “membership_date”.

4. Change membership_date to date and use the format: %Y%m%d

5. Create new columns:

‘membership_year’, ‘membership_month’, ‘membership_day’
‘age_group’: to enhance the analysis, I grouped the customer based on their age into groups/categories:

As shown here, Adults, or customers between the age of 25 and 64 make the most customers of Starbucks.

‘income_class’: to enhance the analysis, I grouped the customer based on their income into classes:

Reference: https://www.thebalance.com/ (I added an additional class: Lower middle class to zoom in the middle class since the majority of the customer are from that class)

The Lower Middle Class make up the most customers as per the graph below:

6. Feature Engineering to handle Categorical Data:

Convert them to Numerical Data (binary/dummy variables) using One-Hot encoding:

age_group column will be expanded to columns (add a prefix to enhance readability: age_group_), however, I’ll keep it as I will use it in the analysis:

age_group_Adult, age_group_Senior, age_group_Youth

income_class will be expanded to columns (I’m also adding a prefix: income_class_). However, I’m going to keep it as I will use it in the analysis:

income_class_Middle class,..etc.

gender will be expanded to columns (I’m also adding a prefix: gender_). However, I’m going to keep it as I will use it in the analysis:

gender_F, gender_M, gender_O

7. Re-order the columns to enhance the readability of the data frame

I will not drop the columns with Categorical data yet as I might use them in the straight-forward analysis moving forward.

Results of data preprocessing:

The preprocessed profile data frame (first half of the columns)

The preprocessed profile data frame (2nd half of the columns)

transcript:

Copy the transcript data frame to transcript_preprocessed_data. I’ll keep the original file, transcript, as is since I need to go back to it sometimes to understand the data more. I’ll be using the latter in the preprocessing and analysis
Rename columns:

“person” to “customer_id”. This is to prepare for joining this data set with the previous data frames
“time” to “time_hours”
‘reward’ to ’rewarded_amount’

3. Drop customers who don’t have profiles (those are the ones who I deleted since they didn’t have complete records as they have null values)

4. Feature Engineering to handle Categorical Data:

value column: It was given in the project overview that the value column is a dictionar of different values depending on event type. However, we seem to have more than 3 values. Let’s see what they are:

['offer id', 'amount', 'offer_id', 'reward']

Expanding the column into these 4 columns as listed above

5. Copy the data from column offer id to offer_id before dropping it

6. Drop columns:

“offer id and “values”

6. Handle Categorical Data:

Convert them to Numerical Data (binary/dummy variables) using One-Hot encoding:

event column will be expanded to 4 columns:

offer_received, offer_viewed, transaction, offer_completed

7. Replace the offer ids with more descriptive ids (as shown in the portfolio preprocessing)

8. Re-order the columns to enhance the readability of the data frame

Results of data preprocessing:

Implementation and Refinement

Now we have 3 preprocessed data frames ready to be merged in a new data frame: df.

df has 272,762 rows and 40 columns.

There are 374 duplicated records, we should drop them and end up with 272,388 records.

We also have null values in 14 columns:

Handling missing values:

Total of 14 columns have missing values:

Two columns ‘amount’ and ‘rewarded_amount ’ have missing values. As explained above when preprocessing transaction data frame:

when event == transaction, there will be an amount value, which is what the customer paid.

when event == offer_complete, there will be a rewarded_amount, which is what the customer was rewarded as a result of utilizing the offer.

Therefore, we will have missing values in the column ‘amount’, whenever event != transaction, and missing values in the column ‘rewarded_amount’, whenever event != offer_complete. It makes sense to replace the missing values in these columns with 0.

12 columns have missing values in columns related to missing offer_id when event == ‘transaction’. I will try to fill in the offer_id when possible. This will be done by reading every record in df when event == offer_completed and every records when event == transaction, check for each customer if there is a transaction at the same time as the offer is completed (time_hours is the same for both records), then copy that offer_id to the transaction record. The code for that is:

This took many trials and different approaches. Here is one attempt that took many days to run, and I eventually had to stop it!

I was satisfied with the final results and performance as it took almost 6 hours. Therefore, I saved the results to CSV file to easily read the file every time I need it instead of running the code again.

Now I have an additional data frame that has offer_id for the records where event == transaction. This will be merged with the main data frame df, then copy the offer_id from the additional column offer_id_1 to offer_id, to fill as many records as possible with offer_id.

Then, merge again between df and portfolio to get the information of the offers of the updated records. The merge duplicated columns and renamed old columns and new columns by adding x and y. I’ll manually delete them.

I’ll of this was to deleted the null values. However, I still have 94,37 records with null values in 12 columns. That’s when event == transaction. All of these records should be dropped before I build the model!

The resulted DF information:

4. Results:

a. Analysis

Sending Out Offers:

# Is there a difference in how many times Starbucks sent the 10 offers?

Starbucks sent out all 10 offers in equal amounts as shown in the chart below:

The mean is 6650.

These are the records of event == offer-received

Customer Interaction with Received Offers:

# Were all offers viewed by the customers in the same amount?

Some offers were viewed more than the others:

These are the records of event == offer-viewed

Most Effective Channels:

# Is there a better channel, to send offers, that ensures that the customer will view the offers?

The most effective channels are those channels that managed to have more views of each offer. We have 3 channels: email, mobile, social, web.

B2, B4, D2, D3 were the top viewed offers, followed by I2, B1, then D4, B3, I1, and the last is D1

By looking at the details of all offers, we can see that email, mobile and social media are the most effective channels to send offers to customers if all are used at the same time. As the same time, as less people visit websites, the web is becoming less effective. More details are shown in the tables below:

The top 4 offers viewed by customers were sent via all 4 channels

The second top group of offers viewed by customers were sent via 3 channels

The third group off offers received 50% of the views were sent via 3 channels as well

D1 was the only offer sent by 2 channels: email and web.

Membership Importance:

# How often Starbucks attract members?

The graph below shows how many new members join the membership program yearly:

Year    New Members
2013     274
2014     662
2015    1597
2016    3024
2017    5599
2018    3669In 2017, the number of new members almost doubled compared to the previous year.

This graph caught my attention: in 2017, more customers became members. Drilling down in the 2017 data in the following graph, it seems that the period from August to December was “special” in a way where more than 1,000 new members joined.

More data can be helpful to understand the event to replicate it if possible. Was there a marketing camping? were more branches active more than others? are those branches around universities? the data available doesn’t tell much about this spike.

Most Utilized (Completed) Offers:

# What are the most completed/utilized offers?

Both D2 and D3 were completed more than others:

They are both ‘discount’ and were channeled via all 4 channels:

# Is there a relationship between offer_viewed and offer_completed?

There is a correlation between offer_viewed and offer_completed, which means that the more the offers are viewed by customers, the more likely they will be used.

However, there is a gap between the mean of offer_viewed and offer_received of more than 1,600. This tells us that there is an area for improvement in the way the offers are made and channeled to customers.

Most Effective Offers:

Approach 1: Measure the amount of money made of the offers. The offers with most amount is the most effective.

# Which offers helped Starbuck made more money?

Looking at the amount vs. the rewarded amount, Starbucks made more money out of our 2 top offers: D2 and D3, which make them the most effective offers.

Offer Info            Mean
amount             60635.575
rewarded_amount    15863.000
offer_received      6650.100
offer_viewed        4986.000
transaction         2958.100
offer_completed     3207.000

Ratio of reward amount to full amount should be small. In the table below, we see that our top 2 records got a ratio of 17.7 and 11.5. We also have an interesting offer where the ratio was 10.5. This offer could have great potential if analyzed further.

However, offer B1 and B2 seem to be less effective. Now knowing that these offers are BOGO (buy one get one) we understand why the ratio is almost 50%. While the other 2 top offers are discount offers. Starbucks made more money out of the discount offers than the BOGO.

Approach 2:

I came up with 3 values for effectiveness:

1 means offer is effective: if offer_viewed == offer_completed. This means the customer saw the offer and felt motivated to use it.
0 means offer is ineffective: if offer_viewed > offer_completed. This means that the customer saw it but didn’t care for using it.
-1 offer is unnecessary: if offer_viewed < offer_completed. This means that the customer didn’t know about it, and was willing to pay full price.

The offers can be grouped in the 3 categories, and the accumulated information is shown in the table below:

In the beginning of the analysis, I decided to follow this approach. Preparing the data was a lengthy process and it took so much effort. I also faced many challenges. At the end, the results weren’t very useful in deciding which offer is more effective. What I had in mind is that I will have one of theses 3 values (1,0,-1) for each offer to tell me how effective it is.

However, after doing more analysis, I decided to sum all the values in the offer_effectiveness column to get the most effective offer. Again, this approach confirms that D2 and D3 have the most offer_effectiveness value.

Now we move to the customers…

Customers who Interacted Well with the Offers:

The most number of offer_received per customer is 6. The following grouped charts show the characteristics of the customers who viewed and completed all 6 offers:

Gender wise, there is no big difference. However, the majority of those customers are Adults from the Lower Middle Class who became members in 2016.

Loyal Customers:

These customers will buy their favorite drink no matter what! They don’t really need offers!

Let’s take the first example:

The first person in the list received 5 offers, opened all of them, however used only 3! on the other hand, he had 36 transactions! which is the biggest number of transactions per person in this data set.

From the grouped charts below of the customers who had more than 20 transactions:

Many more males than females, adults, lower middle class and those who became members in 2017.

Note that many transactions doesn’t mean a lot of amount spent.

Customers with the Most Spent Amount:

The table below shows the customers who spent the most amount:

Most Rewarded Customers:

The most rewarded customers are those who completed the offer and were rewarded:

More detailed information on the most rewarded customers:

b. Modeling

I made a simple supervised learning model that can predict if a customer will complete an offer. To prepare the data frame for modeling, I dropped all rows with no offer_id value since they have missing values in some of the columns.

I also changed the offer_id values to numerical values (1 to 10).

I split the data into X and y:

X is the Features: ‘offer_id’, ‘time_hours’,’amount’,’rewarded_amount’,’offer_viewed’, ‘age_group_Adult’,’age_group_Senior’,’age_group_Youth’,’gender_F’,’gender_M’,’gender_O’‘membership_year’,’duration_days’,’offer_reward’,’channel_email’,’channel_mobile’, ‘channel_social’ ‘channel_web’, ’offer_type_bogo’, ’offer_type_discount’, ’offer_type_informational’]]

y is the Target: offer_complete

It has one of 2 numerical value:

1: offer is complete

0: offer is not complete

and assign the train_size to 0.6 (the test will be 0.4)

The size of each data frame is:

The Correlation of Features with Target:

Since this is a binary problem, RandomForestRegressor, DecisionTreeClassifie or LogisticRegression are more promising. However, I’ll try to use more models to see if there will be any difference.

For building the model, I used the following classiefers:

DecisionTreeClassifier:

The accuracy of using the model on the training set and for prediction was 100%

2. LogisticRegression:

It gave a warning:

I enhanced this model and solved this warning by 2 things:

Scaled some of the features:

Edited the value of max_iter. I made it = 500

When I ran the model again, I got these results:

3. RandomForestRegressor:

4.KNeighborsClassifier:

All of these models have high accuracy score since the dataset I have is small and the text values were easily transformed into numerical.

5. Conclusion:

The data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. It has 10 offers, 14825 complete records of customers, and 306534 records in the event log.

As a result of the analysis, we found out that both offers: D2 and D3 were utilized more than the other offers and more money was made out of them. In addition, they both were communicated to customers via all 4 channels. We also realized that there is a correlation between offer_viewed and offer_completed: the more the offers are viewed by customers, the more likely they will be used.

On the customer side, the majority of the customers who viewed and completed all offers received were adults (both genders) from the lower middle class who became members in 2016.

The Loyal Customers who will buy their favorite drink no matter what are mostly males than females, adults, lower middle class and those who became members in 2017.

We also saw that a senior male spent 1608 on coffee that month!

Reflection

Understanding the data took so much time. I kept learning and I still think there is so much to learn.
There is no end for the possibilities for the analysis. Having a clear idea at the beginning is important to avoid being stuck in analysis paralysis.
Having a plan on how the data will be used is critical when collecting the data. Some records are outliers where for example there are some records of completed offers but no transactions related to that record. This and others harm the analysis and decrease its accuracy.

Improvement

Data Collection: When running the experiment, add a primary key to connect between the offer different events, especially transaction and completed. Offer_expired value can be calculated on the spot and added to the data instead of reverse engineer it.

Performance: I have re-written some of the logic to enhance the performance. However, I haven’t measured the time to run the entire workspace. More enhancements could be done.

Model: More complicated models can be built to do the following:

Predict the age_group, income_class, gender of the customers who will most likely view and complete offers.
Predict the max time when a specific customer will respond to an offer.
Predict the best offer to be sent to a particular customers.
and many more..

Project Source Code

The source code of this project can be found in github: https://github.com/se-Laila/capstone/tree/master

Which Customers Should Starbucks Attract and How?

1. Project Definition

Project Overview

Problem Statement

Metrics

2. Analysis

Data Exploration and Visualization

3. Methodology:

Data Preprocessing

Implementation and Refinement

The resulted DF information:

4. Results:

a. Analysis

Sending Out Offers:

Customer Interaction with Received Offers:

Most Effective Channels:

Membership Importance:

Most Utilized (Completed) Offers:

Most Effective Offers:

Customers who Interacted Well with the Offers:

Loyal Customers:

Customers with the Most Spent Amount:

Most Rewarded Customers:

b. Modeling

5. Conclusion:

Reflection

Improvement

Project Source Code

Written by Laila M