Dependencies

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rms)
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve

Abstract

Nameplates for Industries (NFI) is a global leader in the production of custom, high-performing graphic solutions including labels, nameplates, faceplates and membrane switches.

The company has been operating for over 40 years, however, the recent COVID surge requires the company to review their future strategy. Therefore, they would like to have a better understanding of their current sales to understand what works best and what does not as well as how they can better address and adjust to their customer demand. In order to do so, the company has included a third party, made of 2 analyst: Ines CERDAN and Raidhy HERRERA.

According to the current managerial problem and based on primary research, the following research question has been designed:

“How can NFI sales strategy be improved working on customer retention, to reduce close lost deals and maximize closed win deals?”

We are expecting that, by decreasing the number of inactivity and by focusing on distributors as well as by prioritizing sales on Doming and membrane switch, we will have greater sales in the future.

In terms of limitation, despite having a large dataset, most of the data are considered as categorical therefore we will need to create a dummy variable that will enable us to identify why the close losses impact the most NFI sales activity. The dataset also includes various outliers. We found that “Die Cut Only,” “Doming,” “Overlays” were the top 3 with most outliers. Something exciting that these three categories explain almost half of our data’s variation, which means that our customers are sensitive to the price.

An introduction to NFI

NFI Industry

Nameplates for Industries (NFI) is a global leader in the production of custom, high-performing graphic solutions including labels, nameplates, faceplates and membrane switches.

For the last five years, the US digital printing industry, on average, has increased 2.5% its revenue achieving a value of 10.9 Billion by 2019. Due to COVID-19, the digital printing industry dropped by 1.83% (10.7 Billion) in 2020 only, enabling a slow recovery for the next five years. IBIS World predicts that by 2022 the digital printing industry will recover. However, by 2025, the annual growth will not be as usual compared to the previous years. (Ibis World, 2020)

For that reason, this project will allow us to prepare a strategy for NFI that enables them to respond feasibly and quickly to changes in the industry.

NFI problem and primary research

The company has been operating for over 40 years, however, the recent COVID surge requires the company to review their future strategy. Therefore, they would like to have a better understanding of their current sales to understand what works best and what does not as well as how they can better address and adjust to their customer.

In order to better identify the managerial problem, we decided to look at the driver of sales growth as we believe this approach allows to better tackle the company problem. Back to the literature, we find out that customer retention is one of the most critical drivers as it captures how likely a customer is willing to do business with NFI. (Cross, 2015)

Moving forward, something to consider is: why a person buys/does not buy a product, which can tell us how the company is driving its sales strategy. (Cross, 2015)

Therefore, our first objective will be to understand what products and reasons combined benefit to NFI positively and negatively for close wins and close losses purchases in order to purpose a future strategy to the company.

NFI Data

cwcl <- read.csv("C:/Users/Raidhy Herrera/OneDrive/Escritorio/Pair Project/csv/cwcl.csv")
  view(cwcl)

With the company agreement, a dataset including all the deals made by the sales people and available on the company Hubspot crm has been collected.

The dataset covers deals from January 1st, 2018 to August 31st, 2020. (The company did not have crm records prior 2018)

The Dataset includes 13339 observations of 9 variables. The data types are either numerical or text.

str(cwcl) ## display a summary of the data
## 'data.frame':    13339 obs. of  9 variables:
##  $ Company_ID       : num  1.32e+08 4.09e+08 1.29e+08 1.29e+08 1.29e+08 ...
##  $ Deal_ID          : num  2.81e+09 2.78e+09 2.79e+09 2.79e+09 2.79e+09 ...
##  $ Deal_Name        : chr  "Ecotech Refrigeration - 51436 P/N ECOTECH NP" "Flextronics  Mexico - 51000 P/N 178173" "Aristocrat Technologies (NV) - 32906-1 PN 09-25183 SET OF 11" "Aristocrat Technologies (NV) - 51000 pn 178159-77" ...
##  $ Amount           : num  2132 2035 1992 3607 3002 ...
##  $ Deal_Stage       : chr  "Closed Won" "Closed Won" "Closed Won" "Closed Won" ...
##  $ NFI_Industries   : chr  "Business & Service" "Business & Service" "Business & Service" "Business & Service" ...
##  $ Close_Date       : chr  "2020-08-14 14:31" "2020-08-14 11:40" "2020-08-14 10:28" "2020-08-14 10:28" ...
##  $ Product          : chr  "OSP Misc" "OSP Misc" "OSP Misc" "Doming" ...
##  $ Close_Lost_reason: chr  "N/A" "N/A" "N/A" "N/A" ...

Below is an extract of the 5 first rows:

head(cwcl, n = 5) # view the first five rows

All columns are relevant as they describe a deals on different aspect that match our managerial problem. Here are more details:

  1. Company_ID correspond to one customer
  2. Deal_ID identify a single deal
  3. The Amount is the amount the customer paid for the deal.
  4. The Deal_Stage specify either the company manage or not to close the deal successfully.
  5. The NFI_industry indicates to which industry the deal was related to.
  6. The Close_Date indicates the date and time when the deal was closed. 5
  7. The Product refers to the type of product that has been sold to the client.
  8. The Close_Lost_reason specify the reason in a case of a Close Lost deal.

From the above data it is possible to understand how a deal went as well as to get a broader picture of the company sales performance.

Exploratory data analysis and reseach question

Graph and interpretation

How much sales per product has the company made for each closing stage ?

average_sales_product <- cwcl %>%
  group_by(Product, Deal_Stage) %>% 
  summarise(Amount=mean(Amount)) ## summarise data to get average product price
ggplot(data=average_sales_product, aes(x=Product, y=Amount)) + 
  geom_bar(stat = "identity", fill="steelblue") + guides(fill = guide_legend(reverse = TRUE)) +
  coord_flip() +facet_wrap(~Deal_Stage) + labs(title = "Product average selling price per Deal Stage ", x = "Product", y = "Amount ($)") ## Plot the graph + transform it to vertical with color + seperate per stage

The above bar chart represents the average sales per product for each closing stage. We can observe that most closed lost have an higher average selling price than their close won equivalent. Accordingly, these closed lost deals might impact the closed won result if the company manage to capture them. Therefore to better understand how to capture them, it appears interesting to investigate the average sales per industry for each closing stage to determine which industry has the greatest average selling price and therefore would be interesting to focus on first.

How much sales per industry has the company made for each closing stage ?

average_sales_industry <- cwcl %>%
  group_by(NFI_Industries, Deal_Stage) %>%
  summarise(Amount=mean(Amount)) 
## summarise data to get average product price for each industry for each stage
ggplot(data=average_sales_industry, aes(x=NFI_Industries, y=Amount)) + 
  geom_bar(stat = "identity", fill="forest green") + guides(fill = guide_legend(reverse = TRUE)) +
  coord_flip() + facet_wrap(~Deal_Stage) + labs(title = "Industry average product selling price per Deal Stage ", x = "Industry", y = "Amount ($)")  ## Plot the graph + transform it to vertical with color + seperate per stage

The above bar chart represents the average selling product price per industry for each closing stage. We can observe that most closed lost have a similar average selling product price, however some difference appears when looking at the transportation, the trade, the military/ government, and technology and communications categories. Knowing the general average selling price for each product and industry, lets determine the top 5 products with most closed won deals to understand which product has most deals and later be able to contrast it with the average selling price.

What are the top 5 best performing products?

top_five_product_won <- cwcl %>%
  filter(Deal_Stage == "Closed Won") %>%
  group_by(Product) %>%
  summarise(Count_Amount=n())%>%
  head(n=5) ## summarize top 5 product price Close won
  ggplot(data=top_five_product_won, aes(x=reorder(Product, -Count_Amount), y=Count_Amount, fill= Product)) + 
  geom_bar(stat = "identity") + labs(title = "Top 5 performing product for close won deals ", x = "Product", y = "Count") +theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) ## Plot the graph + transform it to vertical with color + seperate per stage + remove x axis text

This bar chart represents the top 5 products for closed won deals. Die Cut appears to be the most sold product by the company, however back to our first graph, we can see that is also the last one in terms of average selling price. Doming ranks second of our top five however doming ranked 4th of the top 5 average selling price it is therefore, a good performing product. It is now essential to identify what are the top 5 products with the most closed lost deals to understand where the company could improve.

What are the top 5 products with the most close lost deals?

top_five_product_lost <- cwcl %>%
  filter(Deal_Stage == "Closed Lost") %>%
  group_by(Product) %>%
  summarise(Count_Amount=n())%>%
  head(n=5) ## summarize top 5 product price Close Lost
  ggplot(data=top_five_product_lost, aes(x=reorder(Product, -Count_Amount), y=Count_Amount, fill=Product)) + 
  geom_bar(stat = "identity") + labs(title = "Top 5 products with most close lost deals ", x = "Product", y = "Count")+theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) ## Plot the graph + transform it to vertical with color + seperate per stage + remove x axis text

The top 5 products most closed lost deals reveals that most close deals includes labels which despite not having the greatest selling product price might have a positive impact on NFI Total sales if closed won. Membrane Switches are also interesting to highlight has they have the greatest average selling price of all NFI closed lost product. Both with doming might be good products to focus on. At the same time, it appears important to look at the top 5 industries with most closed won deals to know where most of the total sales are made and be able to compare it with the top 5 industries with the most closed lost deal to figure out how the company can improve their customer retention for these segment.

What are the top 5 best performing industries?

top_five_industry_won <- cwcl %>%
  filter(Deal_Stage == "Closed Won") %>%
  group_by(NFI_Industries) %>%
  summarise(Count_Amount=n())%>%
  head(n=5) ## summarize top 5 Industries Close Won
  ggplot(data=top_five_industry_won, aes(x=reorder(NFI_Industries, -Count_Amount), y=Count_Amount, fill= NFI_Industries)) + 
  geom_bar(stat = "identity") + labs(title = "Top 5 industries with most closed won deals ", x = "Industry", y = "Count") +theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) ## Plot the graph + transform it to vertical with color + seperate per stage + remove x axis text

The top 5 industries bar chat reveal a different scheme than the average selling product price per industry. Indeed distributors have a greater average selling product price but only ranked 4 in term of number of close won deals. On the other hand Industrial manufacturing ranked first in the number of close won deals while being the last in terms of average selling product price per industry. By balancing the number of deals, it might result in a positive impact for the total NFI sales. Looking at closed lost deal is now also important as it can allow to identify where improvement could also be made there.

What are the top 5 industries with the most close lost deals?

top_five_industry_lost <- cwcl %>%
  filter(Deal_Stage == "Closed Lost") %>%
  group_by(NFI_Industries) %>%
  summarise(Count_Amount=n())%>%
  head(n=5) ## summarize top 5 industries Close Lost
  ggplot(data=top_five_industry_lost, aes(x=reorder(NFI_Industries, -Count_Amount), y=Count_Amount, fill= NFI_Industries)) + 
  geom_bar(stat = "identity") + labs(title = "Top 5 industries with most close lost deals ", x = "Industry", y = "Count")+ theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) ## Plot the graph + transform it to vertical with color + seperate per stage + remove x axis text

When looking at the above bar chart, distributor appears third on the ranking of the most close deal industry. When comparing with the average product price per industry, distributors arrive at the first place. Therefore adding this category to the close won could have a great impact on the total sales. Now that we understand which product and industry might be the most performing as well as the categories that need to be work on. It is interesting to look at the reason behind the close lost.

What are the top 5 reason for closed lost deals?

top_five_lost_reason <- cwcl %>%
  filter(Deal_Stage == "Closed Lost") %>%
  group_by(Close_Lost_reason) %>%
  summarise(Count_Amount=n())%>%
  head(n=5)## summarize top 5 Closed Lost reasons
  ggplot(data=top_five_lost_reason, aes(x=reorder(Close_Lost_reason, -Count_Amount), y=Count_Amount, fill=Close_Lost_reason)) + 
  geom_bar(stat = "identity") + labs(title = "Top 5 closed lost deals reasons ", x = "Closed lost reasons", y = "Count")+theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) ## Plot the graph + transform it to vertical with color + seperate per stage + remove x axis text

From the above chart, inactivity apprears to be the main cause of close lost. According to the company variable definiton, inactivity means that customers have never been followed during the deal process; either when submitting a form or after the initial quote. Therefore by working on improving the inactivity, more closed won could be made.

Analysis and research question

From the above exploratory analysis, we found out that:

  1. Closed lost deals might impact the closed won result if the company manage to capture them.

  2. Transportation, trade, military/ government, and technology and communications categories have greater average selling price.

  3. Doming ranks second of our top five however doming ranked 4th of the top 5 average selling price it is therefore, a good performing product.

  4. When looking at close lost deals, membrane Switches are interesting to highlight as they have the greatest average selling price of all NFI closed lost product. Both with doming might be good products to focus on.

  5. Distributors have a greater average selling product price but only ranked 4 in term of number of close won deals while, industrial manufacturing ranked first in the number of close won deals while being the last in terms of average selling product price per industry.

  6. Inactivity appears to be the main cause of close lost.

From the primary research we also know that customer retention is one of the most critical drivers as it captures how likely a customer is willing to do business with NFI.

If we go back to the company point of view we learn that they would like to have a better understanding of their current sales to understand what works best and what does not as well as how they can better address and adjust to their customer demand.

Therefore, we following final research questions was draw:

“How can NFI sales strategy be improved working on customer retention, to reduce close lost deals and maximize closed win deals?”

We are expecting that, by decreasing the number of inactivity, by focusing on distributors and by prioritizing sales on Doming and membrane switch, we will have greater sales in the future.

Limitations and methodology

Despite having a large dataset, most of the data are considered as categorical therefore we will need to create a dummy variable that will enable us to identify why the close losses impact the most NFI sales activity.

On the other hand, the dataset includes various outliers. From the amount column, we identified the average price ($1089.27) and how we determine the 1,200. When we first created the boxplot, we only had one outlier after the 2,000, and it didn’t make sense to keep it because this didn’t tell anything about the data.

We utilize it to establish this as a parameter to our code. Then we observed that the majority were within the average price, and it helped us identify which three categories had the most outliers.

cwcl_no <-cwcl%>%
  filter(Amount <1200)
cwcl_no %>%
  filter(Product %in% c("Die Cut Only", "Doming", "Overlays")) %>%
  ggplot(., aes(x=Product, y=Amount, color=Product)) +
geom_boxplot() + theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank())

As a result, we found that “Die Cut Only,” “Doming,” “Overlays” were the top 3 with most outliers. Something exciting that these three categories explain almost half of our data’s variation, which means that our customers are sensitive to the price.

Customers might also compare NFI prices with the competition, which may be why they do not proceed. Considering that in our previous chart on closed lost deal reasons, we identified that inactivity was the main reason people didn’t move with a purchase.

A high level of customer inactivity also tells us that if the sales team put more effort towards reaching out to customers, that may be a differentiate, so customers consider purchasing.

Our next analytic step will include:

  1. To create two dummy variables; one to evaluate the impact of the close lost reason and more specifically inactivity, on the product sales and one to evaluate the customer retention by comparing close lost reason with companies IDs.
cwcl_dummy<- cwcl_no%>%
  mutate(P_Doming_Membrane = ifelse(Product == "Doming", 1, ifelse(Product == "Membrane Switches", 1,0))) %>%
  mutate(NFI_Distributors =ifelse(NFI_Industries == "Distributors", 1, 0))%>%
  mutate(CLR_Inactivity =ifelse(Close_Lost_reason == "Inactivity", 1, 0))
  1. To run our regression between our dummy

  2. To determine the company’s customer retention and determine how to reduce close lost sales by identifying better ways to reduce customer inactivity and increase customer retention.

Model and Recommendation

Model

The model aims to answer our above hypothesis, by identifying the impact of the sales amount, the doming and membrane switches, the distributors, and the inactivity on the deal stage.

lrm(cwcl_dummy,formula=Deal_Stage~Amount+P_Doming_Membrane+NFI_Distributors+CLR_Inactivity)
## Logistic Regression Model
##  
##  lrm(formula = Deal_Stage ~ Amount + P_Doming_Membrane + NFI_Distributors + 
##      CLR_Inactivity, data = cwcl_dummy)
##  
##                        Model Likelihood    Discrimination    Rank Discrim.    
##                              Ratio Test           Indexes          Indexes    
##  Obs         9911    LR chi2    2643.38    R2       0.891    C       0.739    
##   Closed Lost1684    d.f.             4    g        2.188    Dxy     0.478    
##   Closed Won 8227    Pr(> chi2) <0.0001    gr       8.919    gamma   0.478    
##  max |deriv|  0.7                          gp       0.139    tau-a   0.135    
##                                            Brier    0.092                     
##  
##                    Coef     S.E.    Wald Z Pr(>|Z|)
##  Intercept           1.3907  0.0832 16.71  <0.0001 
##  Amount              0.0011  0.0001  7.89  <0.0001 
##  P_Doming_Membrane   0.5166  0.1166  4.43  <0.0001 
##  NFI_Distributors   -5.6010  1.0172 -5.51  <0.0001 
##  CLR_Inactivity    -15.2248 29.4455 -0.52  0.04051  
## 

From the above result, we can say that only 89.1% of the data can be explained by our model. On the other hand, the really low p_values observed show that the amount, the product type as doming or membrane switches, and the client industry are good predictors for such a model. The inactivity also does appears as a good predictor for this model.

Therefore we came up with the below interpretation

  1. Consumers are price sensitive

  2. Doming and membranes switches are a product to focus on

  3. Distributors are a missed market by the company

  4. Inactivity has a clear negative impact on sales

Now that our model and hypothesis have been verifying, let’s identify the customer lifetime value to determine which type of customer (repetitive or not) would be interesting for the company to focus on.

Customer lifetime value

Repetitive Customer

repetitive_customer <- cwcl_dummy %>%
  group_by(Company_ID) %>%
  summarise(Count_RC=n(),Amount_RC=mean(Amount)) %>%
  filter(Count_RC>1) 

mean(repetitive_customer$Amount_RC)
## [1] 631.3309

Non repetitive Customer

non_repetitive_customer <- cwcl_dummy %>%
  group_by(Company_ID) %>%
  summarise(Count_Amount_NRC=n(),Amount_NRC=mean(Amount)) %>%
  filter(Count_Amount_NRC==1)
  
mean(non_repetitive_customer$Amount_NRC)
## [1] 612.1049

From the above filtering, we find out that the average purchasing amount for a non-repetitive customer is 612.1 with an average number of purchases equals 4, while the mean for the repetitive customer is 631.33 for a single purchase. From the company, we know that their average profit margin is equal to 8.21%. Accordingly, we calculate both repetitive and non-repetitive average customer lifetime value, and we found out the below results:

NFI_PM=0.821
AVSRC= mean(repetitive_customer$Amount_RC)
AVSNRC=mean(non_repetitive_customer$Amount_NRC)
ANTransactionRC=mean(repetitive_customer$Count_RC)

lifetime_valueRC= ((AVSRC *ANTransactionRC*2))
lifetime_valueNRC=((AVSNRC *1))

ALTVRC=lifetime_valueRC*NFI_PM
ALTVNRC=lifetime_valueNRC*NFI_PM

ALTVNRC
## [1] 502.5381
ALTVRC
## [1] 8430.059

As a non-repetitive customer, we found an average customer lifetime value equal to 502.54, while we found an average customer lifetime value equals 8,430.06. Based on this result, it would make sense for the company to focus on repetitive customers as they will create greater returns than a non repetitive customer.

Conclusion and References

Conclusion

From the recommendation, we unlocked great potential revenues for the company by reorienting sales from new to existing customer. Focusing on distributors, membrane switches and doming will be the opportunity for the company to reach out to under-estimated markets while working on empowering salespeople will also enable them to close win more deals and therefore improve customer retention as well as to create more revenues for the company.

However, this analysis lacks direct customer opinion and feedback, allowing the recommendations to be more customer expectation and satisfaction orientated. Such information is essential to allow for planning for the future. Accordingly, our final recommendation would be to conduct a customer interview and survey regarding satisfaction.

References

Cross, S. (2015, June 25). Morgan Cross. Retrieved from https://morgancross.co.uk/growth/the-7-drivers-of-new-sales-growth/

Jaura, R. (2020, August). IBIS World. Retrieved from https://my-ibisworld-com.hult.idm.oclc.org/us/en/industry-specialized/od5754/industry-performance