Retailers should make a concerted effort to build predictive infrastructures whose validation frameworks, modeling approaches, and maintenance protocols align with tangible business goals.
Welcome to Part Two of our three-post series on the major steps you must make between deciding you want action-oriented predictive capabilities and actually getting them.
As we covered in Part One, retailers who want to deliver meaningful messaging to customers who have seemingly endless choices are discovering predictive modeling as a means of pulling away from their competition in the race for retail gold.
But before you even start to think about putting a predictive model to work, you need to do some organizational soul-searching to clearly identify the problems you want to solve and what metrics you’ll use to gauge your progress along the way.
Then, you need to meticulously prepare your data: there’s data standardization, validation, deduplication, transformation, and restructuring, all covered in more detail in Part One and even more detail in our webinar on the topic here.
This is all-important prep work, but as soon as you’ve cleared this first hurdle, you’d better believe that another hurdle is right around the corner: building your actual model.
The Second Challenge: Building the Model
Having gathered and prepared the right raw ingredients, your data analytics team is ready to start assembling the main course.
While there is a wide range of factors to consider when cooking up a predictive model, data analytics teams can ensure their development processes proceed according to plan by keeping some best practices top of mind.
1. Develop a Benchmarking and Validation Framework
Many marketers come to us seeking clarity about the best methodology for quantifying the “goodness” of a model. Should you judge it by out-of-time validation or in-sample fit? What techniques are out there about which you don’t know? First thing’s first: don’t panic.
Before embarking on a scattershot research-and-development mission, refocus your resources. First, define what job you would like your model to perform. The way you measure your model’s effectiveness—i.e., your benchmarking procedures—will vary based on what the model seeks to measure and what kinds of outputs it’s designed to produce.
Start by distinguishing the granularity of your model’s concerns.
Will your proposed model forecast the behavior of individual customers or clustered groups? The answer will significantly influence your next steps.
You might use a root mean squared logarithmic error (RMSLE) technique when engineering an individual-level predictive model, whereas you might make use of a confusion matrix when you’re aiming for categorical predictions.
Regardless of your model’s specs, don’t let your modeling approach dictate how you evaluate your model’s utility. Instead, tailor your benchmarking and validation processes to the actual business outcomes your model is intended to facilitate.
2. Experiment with Different Models
As with benchmarking techniques, there’s no scarcity of approaches to predictive modeling itself, and experimentation will help you identify which one is best-suited to the analyses you’d like to perform. It’s incredibly important to make sure you’ve thoroughly explored the wide range of forms your model could take before you finalize it.
Consider customer lifetime value (CLV) as an example. If you want to segment customers based on their revenue-driving potential, you’ll need to build a predictive model that can determine CLV at scale. There are a ton of ways you could try out to accomplish this—straight-line extrapolation, supervised learning algorithms, probabilistics, RFM analysis, or the Hidden Markov model.
When weighing these options, remember the trade-offs between interpretability and model power. While it may be tempting to throw more explanatory variables into the regression, increasing complexity can lead to overfitting your model. Beyond a certain point (let’s call it the sweet spot), you’re going to see diminishing returns on increased model power—at the expense of comprehensibility, to boot.
As your model becomes more convoluted, you decrease the probability that you can cogently explain how your model operates, why it runs as it does, and why—and in what ways—its output is significant to the concrete business problem(s) at hand. If the key stakeholders within your org don’t embrace the model, you’re going to have a hard time leveraging the insights it produces.
In lieu of a hyper-robust but arcane model, choose a model that strikes the right balance between predictive power (how accurately and meaningfully it represents the future) and interpretability (how easily users can understand the model’s mechanisms and extract consequential, shareable insights).
3. Establish Automated Monitoring and Maintenance
The last step in building your model is to incorporate triggers that will sound the alarm when performance degradation has reached a designated threshold. Think of these as bumpers in the bowling alley of predictive modeling. Without firmly established protocols to keep your efforts in the right lane, you’ll soon be rolling gutterballs in every frame.
No matter how well you build your model, its predictive capabilities will naturally degrade over time—an analytical aging, of sorts. This happens for a variety of reasons that are entirely unrelated to high-quality data science, including internal changes to business strategy and unforeseen external changes to the marketplace.
As a hypothetical, imagine you’re a retailer who has decided to rebrand to appeal to up-market shoppers. In one fell swoop, you’ve changed your brand’s identity, shifted your target customer base, and altered your existing customers’ relationships with your brand. Suddenly, the data sample you’ve based your model on is much less representative of the current state of your business, and the model’s predictive performance will understandably suffer.
Further, data changes as your business grows (hopefully in response to your excellent predictive modeling). Let’s say data originally tagged as “blue” is now sorted into sixteen shades of blue. This kind of splintering or restructuring of data will inevitably impact how effective your model is.
By integrating monitoring and maintenance triggers into your model, you can head off mistakes stemming from such splintering or restructuring. These triggers will alert your data science team when it’s time to conduct routine check-ups or immediate recalibrations of your model, helping to conserve resources and improve both the longevity and accuracy of your predictive infrastructure.
Once you’ve progressed through these three phases of model-building, you’re ready for the final challenge: making the outputs useful.
This is Part Two of a three-part series. Check out Part One if you missed it, and Part Three too! If you’d like to learn about this topic in more detail, check out our webinar of the same name, Jumping the 3 Big Hurdles to Predictive Modeling.