Tuesday, February 4, 2014

The $1 Billion Bracket - Part Two

Bra-ket notation
This is part two of a two part post on the probability of a perfect bracket, motivated by Quicken Loans announcement that they will pay one billion dollars to anybody that can submit said perfect bracket. Part one used information from the betting markets, the moneyline to be exact, to get an order of magnitude estimate on the probability. For part two I will calculate the probability more directly, finding that the odds of picking a perfect bracket are about 1 in 50 billion when you have a reliable ranking methodology.

That high level analysis from part one resulted in a probability range of one in 500 trillion to 1 in 20,000 trillion. As I pointed out in part 1, this is much higher than the naive "50/50" estimate of 9.2 million trillions, but still well below estimates from math professor Jeff Bergen and "Numbers Guy" Carl Bialick.

The main drawback of my previous analysis is that it estimated the probability of a particular bracket, not an optimal one. In order to calculate the probability of an optimal bracket getting every game right, you need a way to rank every team in the NCAA and calculate the odds of winning for a matchup between any two teams.

A Reverse-Engineered Vegas Power Ranking

Fortunately, I don't have to look far for such a ranking system. I publish rankings for various sports on this site, including college basketball. See my methodology page for an overview of my approach. In a nutshell, I'm using linear regression techniques to derive a power ranking implied by the Vegas point spreads.

The internet is lousy with power rankings, and I'm just one egghead among many that is attaching a number to each NCAA basketball team. The bracket probabilities I calculate here are only as valid as the ranking system from which they spring. I could be ranking teams by fierceness of mascot, or where their school falls in the latest Playboy "party school" list. So, the last part of this post will focus on the accuracy of my model. For example, when my rankings say a favorite has an 85% win probability, do those favorites win about 85% of the time?

Converting Point Spreads to Probabilities

Each tournament is different, with a different mix of matchups, so my analysis will look back over the past four years (2010-2013) and calculate the perfect bracket odds for each of those four tournaments. I use my betting market rankings to select the winner of each game. To calculate the probability of each game prediction being correct, I need an additional bit of math.

The key metric from my ranking system is what I call "Generic Points Favored", or GPF. It's what you would expect a team to be favored by against an average team. I can readily calculate the predicted point spread of any matchup by simply subtracting the GPF's of the two teams. So, I need a way to convert a point spread into a game probability. Here is the formula I use:

  • win probability = 1 / (1 + exp((point spread)/6))

Where I am using the standard convention that a negative point spread indicates a team is favored. As an example, if a team is favored by 7 points, their implied win probability is 76%. I derived the factor of 6 by running a logistic regression against the Vegas point spread over multiple NCAA seasons.

The 2013 NCAA Tournament

Here are the detailed results for the 2013 tournament. I run my rankings with data as of March 18 (the eve of the tournament) and populate the bracket. Here are the odds of picking each regional bracket correctly, picking the final four games correctly, and the combined perfect bracket odds:

2013 Perfect Bracket Probability
Region Probability Winner
South 1 in 166 Florida
East 1 in 253 Indiana
West 1 in 515 Ohio State
Midwest 1 in 357 Louisville
Final Four 1 in 6 Indiana
Total 1 in 47 billion Indiana

A few more orders of magnitude bite the dust. We are now down to 1 in 47 billion, 4 to 6 orders of magnitude lower than my estimates from part one of this post. My estimate is also now below those of Professor Bergen (1 in 128 billion) and Carl Bialick (1 in 728 billion). If you remember the 2013 tournament, you'll know that my predicted bracket was far from perfect, with Louisville being the only correct final four team (hands up if you had Wichita State, Michigan, or Syracuse in your office pool final four).

Combined 2010-2013 Results

The table below summarizes the perfect bracket odds for the past four tournaments. In addition to calculating the optimal odds (optimal according to my rankings), I also have the odds if you took a simpler approach and just picked the best seed for every matchup.

Perfect Bracket Probabilities
Year Optimal By Seed
2010 1 in 72 billion 1 in 488 billion
2011 1 in 39 billion 1 in 190 billion
2012 1 in 69 billion 1 in 216 billion
2013 1 in 47 billion 1 in 649 billion

Based on the past four years, the perfect bracket odds seem to be in the neighborhood of 1 in 50 billion. If you just pick the highest seeds throughout, you cut your chances by about a factor of 10.

A Fair Price

Because a billion dollar loss, no matter how improbable, would be ruinous to Quicken Loans, they have insured their contest with Warren Buffett (through Berkshire Hathaway), for whom a billion dollar loss would be less "ruinous", and more a source of "mild discomfort". The contest is limited to 10 million entries. If we assume a 1 in 50 billion chance of getting a perfect bracket, you get the following estimate of the expected value of the insurance contract:
  • (1 billion) x (10 million) / (50 billion) = $200,000 
But there's a slight flaw in the above calculation. This assumes 10 million people all enter the same optimal bracket, and that each will get a billion dollars if the bracket wins. The actual rules of the contest work like the lottery though. In the event of multiple winners, they will split the billion dollars evenly. That being said, there are probably a fair amount of "quasi-optimal" brackets one could create that are in the ballpark of the 1 in 50 billion optimal odds.

Rather than take a stab at that calculation, I think I'll stop at my $200,000 figure above. It may overestimate the true expected value of the contract, but it makes for a reasonable floor on the fair price. I would be shocked if Buffett accepted anything less, and would put the actual price at seven figures at least.

Post Script: Ranking Model Accuracy

To test the accuracy of my ranking methodology, I will take the past six seasons of NCAA basketball, calculate the rankings just prior to the start of the tournament, and then see how accurately those rankings predict the post season games. To maximize sample size, I am looking at all games, including the other tournaments (e.g. NIT, etc.).

One test of accuracy is how well my rankings duplicate the actual point spread for each game, as set by the betting market. Here are the results for the past six seasons:

Predicting the Point Spread
Season Mean Absolute Error
2007-2008 1.32
2008-2009 1.49
2009-2010 1.34
2010-2011 1.14
2011-2012 1.32
2012-2013 1.18
Average 1.29

Another measure of accuracy is how well the model predicts outright winners. The table below compares the ranking's success to that of the Vegas point spread.

Predicting Winners
Season Me Vegas
2007-2008 77.7% 75.9%
2008-2009 69.3% 70.9%
2009-2010 69.8% 66.7%
2010-2011 73.0% 73.0%
2011-2012 71.2% 69.2%
2012-2013 71.2% 71.2%
Average 71.9% 71.0%

Over the past six seasons, the rankings have outperformed (slightly) the betting market consensus when predicting winners.

For a more specific test of accuracy, I will group the model's predictions into levels of certainty, and see if the predicted probabilities are in line with the actual results. For example, if I take all the games in which the predicted favorite has a win probability between 50% and 60%, the favorite should have won about 55% of the time. The table below compares predicted probabilities and actual win percentages for both my model and the Vegas point spread.

50% to 59% 185 55.1% 58.9% 124 52.5% 49.2%
60% to 69% 202 65.4% 67.3% 247 62.9% 64.8%
70% to 79% 186 74.9% 70.4% 195 73.5% 71.8%
80% to 89% 137 85.0% 86.1% 151 84.4% 88.1%
90% to 99% 74 94.3% 94.6% 67 94.3% 94.0%

In general, the actual probabilities line up fairly well with what the model expects, although the correlation seems a bit tighter for the Vegas point spread.

Check back when the 2014 bracket is announced for an "optimal" bracket and its corresponding probability of success.


  1. Interesting posting! I find that a normal distribution provides a better translation from point spreads to winning percentage.


  2. I'll give that a shot. On a technical level, I think that's the difference between using a probit vs. logit model. A logit model is nice because you can calculate the probabilities directly without having to look up an inverse of the cumulative normal distribution. But if probit has better accuracy I think I can live with the inconvenience.

  3. It's nice to dream, but as one who lives in las vegas and bets sports for a living, I know as I'm sure you know, that logic would never produce the perfect bracket, unless one could "logically" pick the bracket like my grandmother. I expect their will be a lot of dead grandmothers before Quicken Loans ever has to pay off. In fact your own calculations imply that the Sun will be expanding into the Earth's atmosphere before it ever happens. Isn't the insurance business great?