What up humans?! Bradley Voytek here once again to drop some #uberdata on you all.
As I’m sure many of you know by now, recently we announced that we’re running a price experiment. While we explained the basics in that post, Uber’s Team Science wanted to go into a little more detail about how this experiment’s being conducted. We ♥ data, and we ♥ talking about data, so we’re not gonna go without explaining what we’re doing.
Here at Uber HQ, we’re all up in the twitterverse, the facebooks, and the interwebs because we like to hear what you, our users, say about us. And you like us, you really like us!
- @TheClayFox: The line for a cab at 4th and King was 20 people long. @uber_sf picked us up in 2 minutes. Literally. I. Love. @uber.
- @ellentupman: Customer service awesomeness from the good people @Uber_SF. Again. If you haven’t tried @Uber, you are missing out.
- @gina_oreilly: @Uber_SF I just can’t resist you. Hands down the fastest and easiest way to get around this city.
- @dariusmiranda: @Uber_SF Car was prompt. Immaculate. Driver was courteous. Awesome ride. I moved the Uber app to front page of my iPhone. Thanks!
We’ve argued in our Hidden Cost of Cabs post that our San Francisco pickup times are so much lower than taxis that our premium is worth the time you saved by riding with us. Our reliability, speed, and quality user experience counts for something, right?
::sigh::
Okay. Fine. Cost.
We hear you. You want lower prices. That’s why we’re doing Science.
What’s even better than Science? The foundation of the mathematics of our experiment basically comes down to beer. Seriously. But more on that in a bit…
We’re always looking for ways to make Uber Uberer. We do a lot of work on the tech side to make this happen: we’ve already talked about what Uber Team Science does here. In collaboration with uberengineering, we do a lot of behind the scenes work to make your Uber experience better and to make the tech better.
If you haven’t read it yet, check out Henry’s post about how we’re building city-specific models of estimated travel times. This is a great illustration of what I’m talking about.
Back to the matter at hand: how much can we reasonably drop our price?
To answer this, we turn to our trusted friend #uberdata (aka: MATH!). So, where, exactly, will we get the data we need? From you! Or, at least, from some of you.
For an extended period of time in San Francisco we’re running a test offering reduced rates to a small group of randomly selected users. If you happen to be one of these people, you have already received an email from us with the details.
There are a few important things to note:
- This test is only running in San Francisco.
- Random = random. Sorry, we can’t add you to the test group. It’ll break Science. Don’t break Science.
You may not know this, but I’m actually an academic. A neuroscientist. And my summer with Uber is making me miss writing papers like I used to. So although a blog post isn’t up to the standards of peer review, I’m just gonna geek out a bit and write a “science” paper about our testing.
Actually, that sounds boring. Let’s do this like a high school science experiment instead.
QUESTION
WILL PEOPLE USE UBER MORE IF IT’S CHEAPER?
VARIABLES
Independent:
- Percent discount
Dependent:
- Number of rides taken
HYPOTHESIS
Okay, seriously. We’re conducting what’s known as a “price elasticity test”. The idea is simple: if we decrease our price by 10% and that results in a 20% increase in ridership, then overall it’s better for business if we are 10% cheaper. The goal is to find the cost/demand point that’s best for everyone.
Our hypothesis is that lower price will result in more rides.
GROUPS
We’re conducting a random experiment. We have four total experimental groups, with each group getting a different discount rate.
These groups are random, but they are also relatively homogeneous. (That means they’re pretty similar in all the metrics we’re tracking.) This gives us a good starting point, statistically speaking, from which to launch our experiment.
Each group will have the same number of participants. We determined the number of people using what’s known in experimental research as a “power analysis”. This lets us calculate the number of data points we need in order to detect a statistical effect at a given level of certainty.
Importantly, in statistical hypothesis testing, you have two main sources of error: the false positive rate (α) and the false negative rate (β). The α is the same as the significance level, which is a number in the domain [0, 1]. An acceptable α is chosen prior to commencing the research. Usually in science this is 0.05.
The β is the probability of not detecting an effect that truly exists. “Power” is equal to 1-β, so the higher the power, the better your sensitivity at seeing the effect of interest.
After taking into account the desired power, the a priori significance level, and other internal factors such as the proportion of our users that we can expect to take a ride within the given timeframe of the experiment, we settled on 200 users. With this number, we have both a specificity and power greater than 0.99. This means that any changes in rider behavior have a high probability of being detected.
Also important to an experiment is the control group. Because of normal company growth and other factors, we need to make sure that any changes we see in ridership behavior is due specifically to the change in cost. While in essence anyone not getting a discount could be part of the control group, to ensure that we’re comparing like-to-like and to make the groups more comparable, we’re segmenting out a group of 200 users who have a similar ridership profile as our three discount groups. This lets us compare our experimental groups against this group of 200 lucky 0%-off users! Hooray you!
METHODS
First and foremost we’re observing the ridership behavior of our experimental guinea pigs users. We’re taking into account a lot of factors that, in the end, will inform us about an optimal pricing strategy.
Here’s where the beer comes in.
In the early 1900s, Guinness had the amazing foresight that math and science people are an invaluable resource for any good company. ::cough, cough::
So they hired a statistician: William Gosset, the original home-brewing, mustachioed, tiny-bespectacled, industry math geek.
Gosset wanted to test the quality of the beer they were brewing. But he couldn’t test all of the beer. He could only test small batches.
He had an incomplete knowledge of the system.
So he did something quite clever: he randomly sampled small amounts of beer and created a statistical formula that allowed him quantify the probability that these small samples represented the rest of the beer.
This method became the backbone of experimental research, giving researchers a simple tool to infer the behavior of a population based upon a small, random sample: the Student’s t-test.
So not only did beer help Uber Team Science write the code to conduct our testing, but it also gave birth to the very testing technique we’re using to analyze our data!
Beer, math, and programming: BFFs!
Eventually Gosset’s methods were expanded upon, but to this day the simple t-test is commonly used in experimental research. The variant we’re using is called the one-way ANOVA, which can be simplistically thought of as a t-test for many groups simultaneously while reducing the penalty for correcting for multiple comparisons:
So what’s the end result?
RESULTS
Cheaper Uber FTW!
Our ultimate goal is to minimize your cost while providing our service at a price that maximizes business. The data we collect from this experiment will ultimately benefit all of our riders.
If you’re sad that you didn’t get to take part in this experiment, stay strong — we’re likely going to be lowering prices for everyone.
And rest-assured, this is only our first, simple foray into Uber experiments! Uber Team Science has a lot more planned for the future!