BLOG

# Competiton: Projecting Pebble Sales

Two weeks into the month-long campaign the Pebble has become the highest grossing Kickstarter project of all time. One of us finally broke down and kicked-in, and we started wondering how many will they sell when the campaign finishes.

We decided to take a stab at predicting the final number of backers. Once we got started, we thought it would be even more fun to challenge anyone else out there to come up with better prediction. (Details at the bottom of post)

Here I discuss several different ways that we can predict new product launches, that should serve as a starting point for your models. We'll see the best method on May 18th when the project is over. Also we encourage readers to contribute guesses and send us their solutions. We'll do a follow up post in May to see who gets the closest.

The Pebble project launched on Kickstarter on April 11, initially it got a lot of traction, and then got a second wave of backers on April 17-19. One simple approach is to assume constant velocity of sales. They have 44,643 backers in 15 days, so we can project that they will sell 110,119 over 37 days.

Clearly this is not the case, as the initial wave of sales has likely already passed.

So let's investigate some additional models. We'll evaluate one spline based regression model which handles the problem encountered with the polynomial regression, and then look at three probability models.

We can fit a natural spline to the cumulative number of orders. The natural spline works by fitting local polynomials internally and then enforcing the constraint that the function is linear outside of the bounds. This is equivalent to extrapoaliting based on the last portion of the sales.

You can do it using the following R code

`lm(cumulative~ns(day,df=3),data=pebble)`

However extrapolation is a very difficult problem. We are trying to use the data that we have observed to project out into the future. Regression based approaches are often very good for interpolation, i.e. predicting a response from predictors that are within an observed range. However, regression is of limited utility in extrapolating much beyond the range of the data.

As an alternative to regression, we can use probability models. Probability models make some relatively strong, yet robust assumptions about customer level behavior, and then are able to extrapolate based on that.

Here we'll evaluate three succesively more complicated probability models and display the results.

#### Exponential Gamma

In this model we assume that individual customers have an exponentially distributed interpuchase time and that these interpurchase times are distributed across the population with a gamma distribution. Breaking this down a little more. Exponential interpurchase time essentially means that a customer who has not yet bought a Pebble has a constant probability of purchasing. Now there is heterogenity across the population. Some people are more likely to make a purchase, while the opposite is true for others. This is accounted for by the gamma distribution which characterizes the spread of purchase frequencies across the customer base.

```peg <- function(x,r,a){
return(1-((a)/(a+x))^r)
}
```

#### Weibull Gamma

This is is similar to the previous distribution, but we replace the exponential distribution with a weibull distribution. A weibull distribution is similar to the exponential distribution, but it has an additional term that introduces duration dependence. Duration dependence specifies that a customer who has not made a purchase after a certain time is either more likely to make a purchase or less likely to make a purchase. Again, we assume the scale parameter of the Weibull distribution is distributed across the population with the a gamma distribution.

```pwg <- function(x,r,a,c){
1-(a/(a+x^c))^r
}
```

#### Weibull Latent Class

Here we again use the weibull distribution as our underlying probability distribution, but in this case rather than characterizing the customer base with a continous distribution we assume that there are two classes of customers. These are latent (unobserved) classes, and we infer them from the underlying distribution. Using this distribution we can start to capture the second peak in the data.

```weibull.latentclass <- function(x,r,a,p, trans=F){
if(!(length(r) == length(a) && length(p) == length(r)-1))
stop(paste("r and a must be same length, p must be one",
"less than number of classes"))
classes <- 1:length(r)
p.c <- c(p,0)
weights <- exp(p.c)/sum(exp(p.c))
unweighted <- sapply(classes, function(y)
weibull.trunc(x,r[y],a[y],trans))
weighted <- unweighted %*% weights
return(weighted)
}
```

#### Projections

We can visualize the daily and cumulative projections of the different models:  Of the four reasonable models, each of the models comes up with slightly different predictions for the number of backers:

Method Prediction
Spline: 63,854
Exponential Gamma: 53,865
Weibull Gamma: 63,854
Weibull Latent Class: 45,421

Averaging these together, and we predict that the Pebble will have 55,583 backers by the time the funding closes on May 18th. Assuming that the average backing size stays constant at \$147 per backer, this will yield a total amount raised of \$8.19MM.

Just for fun, here are the guesses of the rest of the team:

Person Backers Amount Raised
Aaron 55,583 \$8,191,220
Corey 60,000 \$8,842,800
Jon 65,231 \$9,613,744.38
David 66,840.5 \$9,850,230.12

#### Competition

These are some pretty basic models, and none of them fit the data exceptionally well. We could try to improve the models by aggregating data from other Kickstarter projects, or layering covariates such as media coverage of the Pebble's success, or it's promotion as a Kickstarter featured project. You could also come up with a more sophisticated model for how average contribution size changes over time.

Interested in making a prediction? Send your predictions to pebble@custora.com by 11:59 EST Monday April 30. The winner gets some serious street cred, a beer on us if they're in town, and we'll throw in a free black Pebble. We'll also invite you to write a guest post on our blog describing your solution. To get you started, here is some code you can use to pull data from Kickstarter. Winner is the person who comes the closest to predicting the total amount raised, and who provides the methodology for how they came up with the solution (excel, matlab, R or whatever tools you want to use are fine).

```#!/usr/bin/perl
use LWP::Simple;
use Date::Format;
use DateTime::Format::Strptime;```

my \$maxpage = 901
my \$strp = new DateTime::Format::Strptime(pattern => "%b %d %Y",on_error=>'croak');
my \$strf = new DateTime::Format::Strptime(pattern => "%Y-%m-%d",on_error=>'croak');

for (my \$i = 1; \$i <=\$maxpage; \$i++){
my \$url = "http://www.kickstarter.com/projects/597507018/pebble-e-paper-watch-for-iphone-and-android/backers?page=\$i";
my \$content = get \$url;
die "Couldn't get \$url" unless defined \$content;

while(\$content =~ m/

([^<]*)/g){ \$date=\$1; if(\$date =~ m/^\d/){ my @dt = localtime(time); print strftime("%Y-%m-%d\n",@dt); } else{ my \$formatted = \$strp->parse_datetime("\$date 2012"); print \$strf->format_datetime(\$formatted)."\n"; } } }

Methodologies: Data was pulled from the Kickstarter to find the number of backers at a given day. We fit the probability models using truncated models and using maximum likelihood estimation for the difference of CDFs.

Hardie, Bruce G. S., Peter S. Fader, and Michael Wisniewski (1998),
“An Empirical Comparison of New Product Trial Forecasting Models,” Journal of Forecasting, 17 (June–July), 209–229.

Morrison, Donald G. and David C. Schmittlein (1980), “Jobs, Strikes, and Wars: Probability Models for Duration,” Organizational Behavior and Human Performance, 25 (April), 224–251. Tagged : Story

### Like this? You might also enjoy these.

Companies often want to track things like the size of first orders and the size...