Saturday, December 27, 2014

Transitive College Football Power Rankings - Bowl Games Edition

Three paths from A to BIn a recent post (My Team's Proxy Can Beat Up Your Team's Proxy), I laid out an approach for comparing two college football teams (or two teams from any sport, really). The original version allowed the user to compare any two top 25 teams. For this post, I have created a Bowl Games version which automatically generates these comparisons for every Bowl Game matchup of the season. The basic idea is to generate comparisons by trying to connect teams via "paths". We find our paths by looking at prior game results.

For example, the Alabama Crimson Tide play the Ohio State Buckeyes on New Years Day in the first round of the College Football Playoffs. These two teams have not played each other this season, so there are no direct paths connecting them. In addition, they do not have any common opponents, so there are no paths of length two connecting them either. But some of their opponents have played each other, so there are paths of length three available:

Wednesday, December 17, 2014

Turnover Index - Week 16

Here are the Turnover Index picks for Week 16. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 14 results

Our week 14 bet was successful against the spread, with the Raiders covering against the Niners. Here are the season to date results:
  • Against the Spread: 8-2
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,073 (8% ROI)
And here are the week by week results:

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048
10 2 2 $1,048 $32 (3.1%) $29 $1,077
12 2 1 $1,077 $26 (2.5%) ($11) $1,066
13 1 1 $1,066 $7 (0.7%) $6 $1,073
14 1 1 $1,073 $6 (0.6%) $5 $1,079

Week 16 Playoff Implications

Week 16 playoff implications are now available at FiveThirtyEight. This week, the "best case" interactive has been upgraded with a "top pick" button, giving fans of the Buccaneers and the Titans something to do besides sob quietly. For determining draft order, the first tiebreaker is strength of schedule, which leads to some interesting implications for Tampa Bay and Tennessee. Their list of "rooting interest" games is extensive.

UPDATE 2014-12-21: As now called out in the article, there was an error in the tiebreaker logic that led to incorrect probabilities for the top seed and bye week probabilities in the NFC. This has now been corrected. The programming error was 100% my own.

Saturday, December 6, 2014

Turnover Index - Week 14

Here are the Turnover Index picks for Week 14. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 13 results

Our week 13 bet was successful against the spread (the Jets covered against Miami in a losing effort). Here are the season to date results:
  • Against the Spread: 7-2
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,073 (7% ROI)
And here are the week by week results:

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048
10 2 2 $1,048 $32 (3.1%) $29 $1,077
12 2 1 $1,077 $26 (2.5%) ($11) $1,066
13 1 1 $1,066 $7 (0.7%) $6 $1,073

Thursday, December 4, 2014

My Team's Proxy Can Beat Up Your Team's Proxy

In a few days, the College Football Playoff Selection Committee will release their final team rankings. The committee ranks the top 25 teams in the nation, but it's the top four that everyone will pay attention to.

As a public service/argument starter, I have created a comparison tool for any two teams currently ranked in the top 25 by the Selection Committee. The purpose of this tool is to find "connections" between the two teams, with the goal of determining which of the two is superior.

The basic idea is to view the combined output of all college football games as a map. But the terrain we are mapping has some peculiar and inconsistent topology. Let's take the week 12 matchup between Alabama and Mississippi State. Alabama won that game by five points. One way to interpret that is to say that Alabama is five points "better" than Mississippi State. Or, to use our map analogy, "Mount Alabama" is 5 points higher in elevation than "Mount Mississippi State".

Week 14 Playoff Implications

Week 14 Playoff Implications are now up at FiveThirtyEight. Reuben Fischer-Baum has added yet another nifty interactive to our weekly feature. This one tells you which games are most important to any given team (note that the probabilities aren't independent so they only add approximately). The most important game this week is between the Ravens and the Dolphins. The winner emerges with a 70% playoff probability. The loser drops below 20%.

Wednesday, November 26, 2014

Turnover Index - Week 13

Here are the Turnover Index picks for Week 13. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 12 results

Our week 12 bets went 1-1 against the spread. Our bet on the Redskins was successful, but we only wagered 0.8% of our bankroll. Our bet on the Jets was unsuccessful (to say the least), and we had a higher stake on that game: 1.7% of our bankroll. Here are the season to date results:
  • Against the Spread: 6-2
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,066 (7% ROI)
And here are the week by week results:

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048
10 2 2 $1,048 $32 (3.1%) $29 $1,077
12 2 1 $1,077 $26 (2.5%) ($11) $1,066

Week 13 Playoff Implications

Your guide to meaningful football is now available at FiveThirtyEight. This week's column is fairly brief. We focus on the Thanksgiving games, of which all three have significant playoff implications. We also call out some of the more extreme playoff scenarios that emerge from our simulations. For example, there is a 0.02% probability of a five win division winner hosting a twelve win wildcard team in the first round.

Wednesday, November 19, 2014

Week 12 Playoff Implications Available at FiveThirtyEight

Our weekly NFL playoff implications cheat sheet is now available at FiveThirtyEight. This week, I took at the possibility of a five win NFC South champion, applied Gini coefficients to each conference's playoff race, and explored the disconnect between Arizona's record and what the market thinks of them.

Turnover Index - Week 12

Here are the Turnover Index picks for Week 12. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 11 results

There were no week 11 games that satisfied our betting criteria, but here is a summary of season to date performance. We are at a 7.7% ROI so far off of our hypothetical $1,000 bankroll (we have gone 4-1 against the spread).

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048
10 2 2 $1,048 $32 (3.1%) $29 $1,077

Saturday, November 15, 2014

Turnover Index - Week 11

There are no picks that satisfy our betting criteria this week, but here is a check in on last week's performance. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 10 results

No games satisfy our betting criteria this week, but here are updated results from Week 10.

Our two Week 10 picks went 2-0 against the spread, with both the Chiefs and the Jets covering. So, we're off to a good start. Here are the season to date results:
  • Against the Spread: 5-1-0
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,077 (8% ROI)
Here is the week by week performance:

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048
10 2 2 $1,048 $32 (3.1%) $29 $1,077

The "Old" Index

Under the original, simpler betting approach (bet any game with at least a 10 defensive turnover differential between the teams), season to date results for 2014 are 4-0 against the spread.

Thursday, November 13, 2014

Week 11 Playoff Implications Now Available

Week Eleven playoff implications are now available at FiveThirtyEight. As an Indianapolis native, I don't need any help getting geared up for Colts-Pats, but this game has huge implications for securing a first round bye in the playoffs. It's important.

Saturday, November 8, 2014

NBA Win Probability Graphs for the 2014-15 Season

20140101 Kevin Martin (7)
Current WPA leader,
Minnesota's Kevin Martin
NBA Win Probability Graphs and Box Scores are now available for the 2014-15 season. All of last year's features are back: Win Probability Graphs and Box Scores, Player Win Probability Totals, and the Top Game Finder.

Last night's double-overtime game between the Atlanta Hawks and the Charlotte Hornets was the most exciting game (so far) of the season (according to the Excitement Index, which sums the win probability ups and downs of each game). The Hawks were able to comeback from a 0.1% win probability in the first overtime (down 6 with 22 seconds to go) to force a second overtime. A third overtime seemed likely before Lance Stephenson's buzzer beating off-the-backboard three pointer won it for Charlotte. Prior to last night's performance, Stephenson had the worst total Win Probability Added of any player. Even with that clutch shot, he is still in the bottom ten. In terms of expected Win Probability Added (eWPA), which is a context-independent version of WPA based just on box score stats, Lance is still at the bottom.

Timberwolf Kevin Martin has the early lead in the MVP race, with +131% in win probability added over four games (with most of that coming in a losing effort against the Bulls).

As I indicated previously, I have a bigger rollout planned with new data and new features, but here is the "old" version in the meantime while I continue to work out the kinks.

Turnover Index - Week 10

Here are the Turnover Index picks for Week 10. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 9 results

There were no week 9 games that satisfied our betting criteria, but here is a summary of season to date performance. We are at a 4.8% ROI so far off of our hypothetical $1,000 bankroll.

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048

Thursday, November 6, 2014

Week 10 Playoff Implications now available

Week 10 Playoff Implications are now available at FiveThirtyEight. We have some added interactivity for this week's feature. Games can now be assessed and ranked based on how they affect playoff seeding outcomes, in addition to overall playoff probability. Seeding is important, especially under the NFL's current playoff system. FiveThirtyEight's Reuben Fischer-Baum has created a very slick and intuitive tool for making sense of the Week 10 playoff picture.

Friday, October 31, 2014

Turnover Index - Week 9

Here are the Turnover Index picks (of which there are none) for Week 9. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 8 results

No games satisfy our betting criteria this week, but here are updated results from Week 8.

Our two Week 8 picks went 1-1 against the spread. The Saints covered against the Packers quite easily. The Jets, however, did not (despite cunning attempts at subterfuge). Although we went 1-1, we actually had a positive ROI this week (one dollar!) because our Kelly Criterion-based betting rule placed slightly more of our bankroll on the Saints than on the Jets. Here are the season to date results:
  • Against the Spread: 3-1-0
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,048 (5% ROI)
Here is the week by week performance:

weekbetswonstarting
bankroll
amount betprofitsending
bankroll
7 2 2 $1,000 $51 (5.2%) $47 $1,047
8 2 1 $1,047 $26 (2.6%) $1 $1,048

The "Old" Index

Under the original, simpler betting approach (bet any game with at least a 10 defensive turnover differential between the teams), season to date results for 2014 are 2-0 against the spread.

Wednesday, October 29, 2014

Week Nine Playoff Implications now available

Week nine playoff implications, complete with interactive charts, are now available at FiveThirtyEight. There are a lot of high leverage games this week, including Thursday Night's Panthers-Saints matchup and Sunday's Ravens-Steelers game. In this week's write-up, we delve into the intricacies of playoff seeding, most notably the race for the top AFC seed and how Sunday Night's Brady-Manning duel will shape that contest.

Tuesday, October 28, 2014

The NBA Returns

Note: It appears I have jumped the gun with my original post. My play by play data source drastically altered its formatting over the offseason, wrecking all of my code. So, NBA Win Probability graphs for the 2014-15 season will be on hold until I am ready to roll out my planned enhancements (which will be based on a different data source). The original, incorrect, version of this post is below.

The 2014-15 NBA season kicks off tonight, and with it, so will my win probability features:


I also have been working on several key enhancements to these features. I had hoped to have them ready by the time the season opened, but I ran out of time. However, I hope to roll them out soon. In the meantime, the existing tools will be available and updated daily(ish) until I am ready for the big rollout.

Wednesday, October 22, 2014

Week Eight Playoff Implications now up at FiveThirtyEight

Week eight playoff implications, complete with interactive charts, are now available at FiveThirtyEight. Cincinnati is on the hot seat this week, with 27% playoff probability at stake in their game against the Ravens. In the NFC, pretty much everybody will be hoping for the Panthers to upset the Seahawks (aside from the Saints, Falcons, and Buccaneers).

Turnover Index - Week 8

Here are the Turnover Index picks for Week 8. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Week 7 results

Our betting strategy got off to a good start, going 2-0 against the spread. The Patriots, despite forcing 14 turnovers in their first six games, were not able to manage a single one against the Jets. They won the game, but failed to cover the 9.5 point spread. On Monday Night, the Texans saw first hand how random the turnover battle can be, conceding two quick touchdowns to the Steelers off of turnovers deep in their territory (see how their win probability went into freefall).

Here are the results. We are up $47 off of our starting $1,000 bankroll.
  • Against the Spread: 2-0
  • Starting Bankroll: $1,000
  • Current Bankroll: $1,047 (4.7% ROI)

Thursday, October 16, 2014

Playoff Implications moving to FiveThirtyEight

I'm excited to announce that my Playoff Implications feature is moving to FiveThirtyEight. Here is a link to the week seven column. Data visualization genius Reuben Fischer-Baum has created a slick and intuitive interactive chart for displaying the various probabilities. Under the hood, I have increased the number of simulations from 10,000 to 50,000. As a result, some of the more minor implications are now passing statistical muster, leading to a fuller picture of playoff interdependencies.

The intention is to run this as a weekly column at FiveThirtyEight. Each week, I'll post a heads up here when the column is up.

Turnover Index - Week 7

Finally, a chance to lose some money. Week seven marks the first official appearance of the Turnover Index. The Turnover Index is a simple betting strategy based on the theory that the market overvalues defensive turnovers when judging team strength. See here and here for more background.

Driving the lack of betting opportunities is a lack of turnovers in general. Turnovers in the NFL have been declining for well over twenty years now. Through week six, teams have averaged just 1.5 turnovers per game, the lowest level going back to 1989. See chart below:

Wednesday, October 8, 2014

Playoff Implications - Week 6

Here is your week 6 guide to playoff implications. The purpose of this feature is to highlight games that have a significant impact on the playoff picture (see this post for background).

The playoff implications below are derived from the same simulation I run to calculate playoff seed probabilities for my daily NFL rankings. I can group the simulation runs by the outcome of each game and then see how a team's playoff chances vary between the two groups. The interactive table at the bottom of the post will allow you to see corresponding results for any game or team.

Ranking Week 6 Games by Leverage

The table below ranks the week 6 games by total leverage. Leverage in this context is a measure of both how uncertain a game's outcome is (games between evenly matched teams have higher leverage) and how much the playoff picture swings as a result of that outcome.

Saturday, October 4, 2014

Turnover Index - Week 5

Turnover 001Just a heads up that I plan on continuing the Turnover Index for the 2014 season. But there are no games that satisfy our betting criteria this week. Check back next week.

NFL Week 4 Power Ranking Roundup

Cam Newton 2014Last year the Carolina Panthers, after starting the season 1-2, led the NFL in win percentage from weeks five through sixteen, winning 10 of their next 12. Who saw that coming? Prior to their 10-2 run, ESPN had the Panthers at #21 in their weekly power rankings. My own betting market rankings had them ranked 18th. Football Outsiders' DVOA was closer to the mark, ranking the Panthers 6th. What's known as the Simple Ranking System was even closer though, ranking the Panthers 3rd.

When it comes to prediction, we are quick to forget how wrong we often are. So, in a post from earlier this year, I compiled early season NFL power rankings from multiple sources over multiple years and attempted to objectively measure how good each ranking system is at predicting wins (i.e. a more comprehensive version of the Panthers example above).

I intend to do the same comparison for 2014, so I figured I would go ahead and archive the week 4 rankings in this post, and then check back on them when the season has finished. Note that when I say "week 4 rankings", they are rankings compiled after week 4 of the season has been completed (and prior to Thursday night's Packers-Vikings blowout). Here are the ranking systems I'm comparing:

Tuesday, September 30, 2014

Playoff Implications - Week 5

Here is your week 5 guide to playoff implications. The purpose of this feature is to highlight games that have a significant impact on the playoff picture (see this post for background).

The playoff implications below are derived from the same simulation I run to calculate playoff seed probabilities for my daily NFL rankings. I can group the simulation runs by the outcome of each game and then see how a team's playoff chances vary between the two groups. The interactive table at the bottom of the post will allow you to see corresponding results for any game or team.

Ranking Week 5 Games by Leverage

The table below ranks the week 5 games by total leverage. Leverage in this context is a measure of both how uncertain a game's outcome is (games between evenly matched teams have higher leverage) and how much the playoff picture swings as a result of that outcome.

Wednesday, September 24, 2014

Playoff Implications - Week 4

Here is your week 4 guide to playoff implications. The purpose of this feature is to highlight games that have a significant impact on the playoff picture (see this post for background).

The playoff implications below are derived from the same simulation I run to calculate playoff seed probabilities for my daily NFL rankings. I can group the simulation runs by the outcome of each game and then see how a team's playoff chances vary between the two groups. The interactive table at the bottom of the post will allow you to see corresponding results for any game or team.

Ranking Week 4 Games by Leverage

The table below ranks the week 4 games by total leverage. Leverage in this context is a measure of both how uncertain a game's outcome is (games between evenly matched teams have higher leverage) and how much the playoff picture swings as a result of that outcome.

Tuesday, September 23, 2014

College Football Rankings Now Available

Betting market rankings for college football are now available (and will update daily). As I did with my NFL rankings, I have moved these to a new home which will hopefully be more stable. Here is the link: College Football Betting Market Rankings.

All of the features outlined in this post are back: Generic Points Favored (total, offense, defense), projected wins, and strength of schedule rankings (both past and future). Some observations:

  • Oregon currently tops the week 5 rankings, but not by much. The top six teams are all within a couple points of each other (Oregon, Alabama, Baylor, Auburn, Florida State, and Oklahoma)
  • I have Marshall at a 50/50 shot of going undefeated in the regular season (not counting the Conference USA championship game).
  • SEC teams occupy ten of the top 24 spots in these rankings.

For those doing their own research/handicapping, there is also a link to a Google Docs version of the rankings (includes all teams, not just the top 50).

Thursday, September 18, 2014

Playoff Implications - Week 3

After a one week hiatus, playoff implications are back for week 3. The purpose of this feature is to highlight games that have a significant impact on the playoff picture (see this post for background).

The playoff implications below are derived from the same simulation I run to calculate playoff seed probabilities for my daily NFL rankings. I can group the simulation runs by the outcome of each game and then see how a team's playoff chances vary between the two groups. The interactive table at the bottom of the post will allow you to see corresponding results for any game or team.

Ranking Week 3 Games by Leverage

The table below ranks the week 3 games by total leverage. Leverage in this context is a measure of both how uncertain a game's outcome is (games between evenly matched teams have higher leverage) and how much the playoff picture swings as a result of that outcome.

Friday, September 5, 2014

NFL Rankings

Just a heads up that, due to circumstances beyond my control, my NFL rankings may only be updated intermittently over the next week or so. In addition, the rankings may look a bit odd from time to time until we get at least three weeks of "real" point spreads. Opening lines for week 3 are usually set sometime in between weeks 1 and 2.

Monday, September 1, 2014

Playoff Implications - Week 1

Too soon? Or not soon enough? The purpose of this feature is to highlight games that have a significant impact on the playoff picture (see this post for background). It may seem premature to start talking playoffs already, but with a sixteen game schedule, even week one results can create meaningful shifts in the postseason outlook.

The playoff implications below are derived from the same simulation I run to calculate playoff seed probabilities for my daily NFL rankings. I can group the simulation runs by the outcome of each game and then see how a team's playoff chances vary between the two groups.

For example, the Bengals' chances of making the playoffs can swing by 24%, depending on the outcome of their season opener against the Ravens. If the Bengals win, their expected playoff probability is 63%. They lose and it's significantly lower at 38%. The interactive table at the bottom of the post will allow you to see corresponding results for any game or team.

Ranking Week 1 Games by Leverage

The table below ranks the week 1 games by total leverage. Leverage in this context is a measure of both how uncertain a game's outcome is (games between evenly matched teams have higher leverage) and how much the playoff picture swings as a result of that outcome.

Saturday, August 30, 2014

Daily NFL Rankings - With Playoff Seed Projections

My betting market rankings for the NFL are now up and running (see here for the first look). To improve access to the table, which had been spotty, I had to move the rankings to a new location. Here is the new URL: NFL Betting Market Rankings. The old url should redirect automatically to the new one, but it's probably a good idea to update bookmarks.

All of the old features are back: sparklines, playoff seed projections, and strength of schedule.

Playoff Seeds

There is a series of odd looking bar graphs under the "Projected Seed" column. Each bar in the graph represents the probability of a team achieving a particular playoff seed. The bars run left to right from seed 16 to seed 1. The top six seeds make the playoffs and are colored blue on the graph. The probabilities are based on a 5,000 round monte carlo simulation of the regular season. These will be updated daily with the latest game results and point spreads.

Here is a look at Seattle's seed probabilities:



The most likely outcome for Seattle is a #1 seed (26% probability). The second most likely outcome is a 5th seed (18% probability). This is due to Seattle being in the same division as the #3 ranked team, the 49ers (the top 4 seeds are reserved for the division winners). Here is San Francisco's corresponding seed probabilities:



To the left of each bar graph is a percentage, which represents a team's probability of making the playoffs. To open the season, the Broncos have the highest playoff probability at ~90%. They are ranked below the Seahawks, but have better playoffs odds due to a weaker schedule.

Strength of Schedule

There are two columns to the far right of the table: pSOS and fSOS. These columns are the average GPF (Generic Points Favored) of a team's past and future opponents, respectively (home field advantage is factored into the averages).

The team with the toughest schedule this season is the Arizona Cardinals, who get to face the Seahawks and 49ers twice, as well as matchups against top tier teams like the Broncos and Eagles.

The team with the easiest schedule is the Houston Texans, due to the extremely soft AFC South and their prior season finish as the last place team.

The Rankings

Eyeballing the latest version of the table, we can break down teams into a few broad categories:
  • The Elite: Seahawks, Broncos, 49ers, Packers, Patriots, and Saints
  • The Above Average: Panthers, Eagles, Bengals, Bears, Lions, Colts, Steelers, Chiefs, Falcons, Cardinals
  • The Mediocre: Chargers, Giants, Dolphins, Texans, Rams, Redskins, Buccaneers, Jets, Bills, Browns, Titans, Vikings
  • The Raiders and the Jaguars

Tuesday, August 26, 2014

NFL Home Underdogs - A Reminder

An update on last year's post on NFL home underdogs. From 1989 to 2003, NFL home underdogs went 53.5% against the spread. But for the next nine years (2004-2012), home dogs are just 47.7% against the spread.

2013 bucked this recent trend somewhat, with 88 bets averaging 52.3% against the spread (just a hair shy of break even against the standard vig). But as it stands, this still looks like a blip in what has been poor performance for some time. So be wary of anyone that claims that NFL home underdogs are a good bet. It was true when Steven Levitt published his seminal paper on betting markets in 2004, but it has been just the opposite in the ten years since. See below for year by year performance:


Saturday, August 23, 2014

How to improve your chances of scoring a goal in soccer? Concede one first.

"If you want to be a millionaire, start with a billion dollars and launch a new airline."

Apologies for the facetious (and somewhat clickbait-y) post title. In the same way that the opening quote from Richard Branson is not intended to be serious advice on how to become a millionaire, I am not suggesting that allowing your opponent to score is a viable strategy for winning soccer matches.

What I will suggest in this post is that it appears that teams play more optimally when trailing their opponent (or similarly, teams play less optimal when holding a lead). I found this result interesting for two reasons:
  1. Arriving at this conclusion provides a good example of the pitfalls of conflating correlation with causation.
  2. The scourge of modern sports strategy is loss aversion (and its cousin, risk aversion). This result appears to show that soccer is not immune.
I had already touched on this topic in a prior post (see On the Probability of Scoring a Goal). In this post, however, I have expanded my dataset, and in addition will do my best to illustrate my point with the raw data. The results from the previous post were the end result of a regression analysis, and somewhat of a black box from the point of view of the reader. I will try to be more transparent here.

Thursday, August 14, 2014

2014 NFL Rankings - First Look

As I did last year, and the year before, here is a pre-season NFL power ranking. My rankings are not based on stats, scouting, off-season moves, or draft grades. Well, they are, but not as explicit inputs. Instead, I use Vegas point spreads as a means to reverse engineer an implied power ranking. See my post at Advanced NFL Stats Community where I first laid out the basic concept (that post is also what ultimately led to the time-sink that is this blog). You can also refer to my methodology page for more details.

If the market is efficient, then the Vegas point spread is a distillation of any and all information relevant to the outcome of NFL games, whether it be touted draft picks, roster moves, or key players returning from injury. However, with three (interminable) weeks of pre-season to go, there is only an established market for the first two weeks of the regular season. Two weeks of games is not a large enough sample to derive a 32 team ranking. My model needs at least three weeks of games before it really gets going. So, while I can't use the market just yet, I can use what a significant portion of the market uses for its opening lines: Cantor Gaming.

Saturday, July 12, 2014

On the Probability of Scoring a Goal

Soccer goal low angleIn this post, I will describe my attempts to model the probability of a goal being scored in soccer. After correcting for team imbalances, I find that a trailing team has a higher probability of scoring in most situations. This result has potential implications for strategy and whether teams should be adopting a more aggressive style of play.

The Model

Using the same dataset I used for my win probability model (~3,000 matches from five of the top European Leagues), I employed LOESS smoothing to build a model that predicts the probability of a goal being scored within the next minute of game time. The model is a function of the following:
  • game time
  • goal difference
  • team strength
I derive the team strength from the pre-match betting odds, and convert it into an expected goals scored per game. Including team strength as a parameter is crucial for this type of analysis, because the model is also a function of goal differential. There is going to be heavy selection bias in the raw historical results. Favorites are going to be over-represented in game situations in which a team has a positive goal differential. As a result, the raw goal probability is higher for teams that have a lead (favorites tend to score more). But having a lead in and of itself does not lead to a higher probability of scoring more goals. This is correlation, not causation.

In fact, once we control for the bias in the results, the exact opposite conclusion emerges: For most of the game, a team trailing by one goal is more likely to score than when leading by a goal or tied. See below for the (smoothed) goal probabilities as a function of game time. The probabilities reflect a team that would be expected to score 1.4 goals per game, on average.

Friday, July 4, 2014

Tennis matches and luck

Tennis, like most sports, is largely a matter of scoring more points than your opponent. But the game-set-match scoring system used in tennis differentiates it from other contests. In basketball, scoring more points than your opponent defines victory. In tennis, scoring more points tends to lead to victory, but it's not a guarantee. It also matters when you score your points, and whether those points help you win sets.

In a recent post for FiveThirtyEight, Carl Bialik covered this topic, referring to matches in which a player wins, despite winning fewer points, as a "lottery match". Using data from Tennis Abstract, he found that 7.5 percent of mens' matches ended in this way.

For this post, I will take a closer look at these lottery matches and use it to define a "luck" measure for tennis, which will be added to my tennis win probability graphs.

Sunday, June 29, 2014

Top Match Finder for Tennis

Win probability graphs are up and running for Wimbledon matches. With 127 matches played in each major tournament, it can be difficult to track down particularly noteworthy matches. So, I've added a top match finder, which returns the top matches according to either Excitement Index or Comeback Factor (with filters for date and tournament). This is similar to the Top Games Finder I added for the NBA earlier this year. Here is the link for Tennis: Top Match Finder.

Thursday, June 26, 2014

Live USA-Germany Win Probabilities

Weisskopf Seeadler haliaeetus leucocephalus 9 amkAs always, I don't guarantee this live updates will work, but if all goes well, check the following link at noon ET for a live updating win probability graph: USA-Germany Live Win Probabilities.

These use my recently developed soccer win probability calculator. Note that my starting probabilities will differ from the betting odds somewhat, as the market has factored in an anomalously high probability of a draw into their odds.

The graph currently shows yesterday's Honduras-Switzerland match, but should start updating once USA-Germany gets underway.

Tuesday, June 24, 2014

The Betting Odds of a US-Germany Tie

Collusion theories abound in advance of this Thursday's US-Germany World Cup match. A tie in that match would send both the US and Germany to the Round of Sixteen, and eliminate their other two Group G competitors, Ghana and Portugal. US coach Jurgen Klinsmann has already gone on record that he will not pre-arrange a tie with his former team. But it's not as if he'd admit to it if it were true.

And even if there were no formal collusion, both teams have an incentive to adopt a low-risk style of play that could increase the likelihood of a draw. Nobody can say for sure how each team will play this Thursday, and there are plenty of examples across all sports of teams "going for the win" even when the outcome is immaterial or even counter-productive to the team's long term objectives.

The pundits and fans can speculate, but the bookies have to pick a number and back it with money. Is there evidence from the sports books that the market expects an abnormally high draw probability?

Saturday, June 21, 2014

In-Match Soccer Probability

Futbol!
In what is becoming somewhat of an obsession, I've added a new sport to my suite of win probability tools: In-Match Soccer Probability.

The tool right now is fairly bare bones, but I hope to add some additional features (and de-uglify it) as the World Cup progresses. As it stands, the model provides win, loss, and draw probabilities as a function of the following: game time (in minutes), goal differential, and pre-match odds. I'm not the first to build an in-match model for soccer. The soccer analytics site Soccer Statistically has an interactive model which displays probabilities as function of game time, goal differential, and home/away.

Home field advantage doesn't really apply to World Cup games (except for maybe the host country). So, the model I have is a bit more flexible, allowing you to input pre-match probabilities in a variety of formats. You can use the betting odds from your favorite bookie (Odds Portal is a handy reference), or you can input the probabilities directly from sites like FiveThirtyEight and numberFire.

The Data

I built the model from play by play data from the past two seasons of five of the major European leagues (English Premier, Bundesliga, Serie A, Eredivisie, and La Liga). This worked out to about 3,000 matches. The model itself is a modified version of LOESS, where instead of building local linear regressions, I'm building local ordered logistic regressions (ordered logistic regression was necessary because soccer outcomes are trinary, not binary).


Sunday, June 15, 2014

Live Win Probabilities for Game 5

The win probability graph will be updating live for tonight's (potentially final) game between the Heat and the Spurs. Here is the link: Live NBA Win Probability Graph.

The graph may not start updating until midway through the first quarter. Until then, Game 4's graph will be displayed. Just click the refresh button to get the most up to date version of the graph.

Monday, June 9, 2014

Never Bet a Horse Named Joe

In this post, I will attempt to determine whether horses with popular first names (e.g. Michael, Mary, etc.) are overbet by the public.

With the Belmont Stakes over and another Triple Crown bid thwarted (by "cowards", no less), the public will go back to largely ignoring the sport of horse racing. So, this might not be the most ideally timed post, but here goes.

Moreso than probably any other sport, gambling is a fundamental part of horse racing. In most cases, odds and payouts for horse racing bets are determined by a parimutuel system. Under parimutuel betting, the odds are set directly by the public, with no need for bookmakers or "sharps" to set payouts. As a result, horse racing odds are a pure reflection of the public's preferences (to the extent they're willing to vote with their wallets).

Despite my statistical inclinations, when I bet the horse races, I tend not to put much thought into it. I'll often play number combinations that appeal to me (my birthday, my daughter's birthday, wedding anniversary, pi, etc.). I'll also play horses based solely on names. If there is a horse with "Michael" or "Mike" in its name, I'll almost always place a bet on it. I know it's not a "smart" bet, but it's a Pascal's wager of sorts for me. If I don't place the bet and the horse wins, I'll be kicking myself. The bet is insurance against this post hoc regret.

Friday, June 6, 2014

Guest Post at Deadspin: Why do NBA Playoff Games Take So Long?

I have a guest post up at Regressing (Deadspin's stat geek subsection): Why Do NBA Playoff Games Take So Long?

This builds upon my previous work on the length of NBA games, focusing now on what adds to game length in the NBA in general, as well as what drives the increased game time in the playoffs. I would love to take the credit for the amazing visuals in that post, but those are the work of Reuben Fischer-Baum.

Thursday, June 5, 2014

Game Six Revisited

With the Heat and the Spurs due to face off again in the NBA Finals, replays from last year's dramatic Game Six are sure to feature heavily in coverage of this rematch, particularly this shot:


In this post, I will use my win probability model to take a more detailed look at the ups and downs of one of the greatest NBA Finals games of all time.

A couple weeks ago, I added the 2012-13 season to my win probability graphs and box scores. Here is the game six graph (link):


The Excitement Index of 10.1 quantifies how much the win probability graph travelled over the course of the game (more movement = more excitement). The Comeback Factor of 199 means that, at their lowest point, the Heat's chances of winning the game were 199 to 1. MVP designates the player with highest Win Probability Added (WPA) for the game. LVP (Least Valuable Player) designates the player with the lowest total WPA. Hero and Goat do the same for my clutch WPA stat. Note the symmetry between MVP Ray Allen's +41.3% WPA and LVP Manu Ginobli's -41.3%.

For now, let's start with the end of regulation and work backwards. Here is a breakdown of that game tying three:

Sunday, June 1, 2014

French Open Graphs - Now with Market Odds

My win probability graphs for the French Open now feature a toggle which allows you to view probabilities that are calibrated to the betting market odds. For the original "50/50" versions, I assumed that each player has a 62.5% chance of winning a point when serving (and a corresponding 37.5% chance on return). The "Market" version adjusts the serve and return probabilities so that the starting win probability matches the win probability as implied by the pre-game betting line. The assumed serve and return probabilities are featured at the top of the graph, next to the score.

What results is a more accurate version of the real drama of each match. For example, here is the graph for the round 1 match between Serena Williams and Alison Lim. Serena went into the match with a win probability of 98.4%, and the line barely budged from that number as Serena won in straight sets 6-2, 6-2.


And here is the graph from Ernests Gulbis' upset over Roger Federer. Gulbis began the game with a 36% win probability, and saw his chances fall to 7% midway through the second set, before battling back to win in five sets.



Monday, May 26, 2014

French Open Win Probability Graphs

As I did with the Australian Open (and intend to do with all grand slam events this year), I will be providing win probability graphs for all French Open matches. Here is the link: Tennis In-Game Win Probability.

My initial post on this topic, focusing on Victoria Duval's shocking upset of Samantha Stosur in the US Open has some additional background. For these graphs, each game starts at 50/50 because I am using generic probabilities for winning a point on serve and return (62.5% and 37.5% respectively). It is possible to alter the probabilities to line up with the pre-match odds, but it's not something I can automate very easily.

The Excitement Index and Comeback Factor concepts I use for the NBA also carry over to tennis matches. The most exciting game of the first round of the French Open was a marathon match between Facundo Bagnis and Julien Benneteau, featuring a fifth set that took 34 games to complete (Bagnis won 18-16). Here is the graph:



The biggest comeback comes from the women's first round, with Stephanie Voegele coming back from a 4.2% win probability to defeat Anna-Lena Friedsam.

Graphs will update daily(ish) throughout the tournament.

Sunday, May 25, 2014

Top Game Finder for the NBA

Top. Games.
As I mentioned yesterday, win probability graphs are now available for the 2012-13 NBA season. For the lazy among you that don't want to cycle through all 2500+ games one by one, I have created a tool that helps you find the top games of the past two seasons: the Top Game Finder (as with far too many features on this site, I got the idea from Advanced Football Analytics).

Excitement, Comebacks, and MVP's

The top dropdown menu offers three options for sorting games:
  • Excitement: This option will sort games by top value of the excitement index. The excitement index measures how far the win probability graph "travels" over the course of the game.
  • Comeback: This option will sort games by the comeback factor, which is the winning team's odds at their lowest point in the game. My win probability estimates only go down to the 4th decimal place, so any winning teams with a win probability of < 0.0001 show up with a comeback factor of "9999+".
  • MVP Performance: Win probability added (WPA) can be apportioned at the player level (see here for more information). I designate the player with the highest WPA in a game as the MVP. This option will sort games by the top MVP performance in terms of total WPA.
There are a variety of filtering options: by season, by team, by regular season/playoffs (or both), and by date.

A Sampling of Top Games