Monday, June 9, 2014

Never Bet a Horse Named Joe

In this post, I will attempt to determine whether horses with popular first names (e.g. Michael, Mary, etc.) are overbet by the public.

With the Belmont Stakes over and another Triple Crown bid thwarted (by "cowards", no less), the public will go back to largely ignoring the sport of horse racing. So, this might not be the most ideally timed post, but here goes.

Moreso than probably any other sport, gambling is a fundamental part of horse racing. In most cases, odds and payouts for horse racing bets are determined by a parimutuel system. Under parimutuel betting, the odds are set directly by the public, with no need for bookmakers or "sharps" to set payouts. As a result, horse racing odds are a pure reflection of the public's preferences (to the extent they're willing to vote with their wallets).

Despite my statistical inclinations, when I bet the horse races, I tend not to put much thought into it. I'll often play number combinations that appeal to me (my birthday, my daughter's birthday, wedding anniversary, pi, etc.). I'll also play horses based solely on names. If there is a horse with "Michael" or "Mike" in its name, I'll almost always place a bet on it. I know it's not a "smart" bet, but it's a Pascal's wager of sorts for me. If I don't place the bet and the horse wins, I'll be kicking myself. The bet is insurance against this post hoc regret.

I'm sure I'm not the only one who bets this way at the track. If this type of betting happens on a large scale, we should be able to measure it by looking at average payouts for horses with popular first names. If "named" horses get overbet by the public, then the average return on investment for those horses should be lower. In other words, are horses with popular first names getting bet above their fundamental value by the public?

The Data Source

My data source for this study is somewhat stale, but robust. It is the result of a data gathering project I embarked upon several years ago. At the time, I was entertaining the notion of building a quantitative horse racing model, following in the footsteps former actuaries like Alan Woods, who used a mathematical model to make millions betting on horses in Hong Kong. Woods passed away in 2008, having used his winnings to spend the last years of his life living the lifestyle of a benign Bond villain in the Phillipines.

My own modeling efforts proved futile though, and I abandoned the project. But the data remains, and I can dust it off for this project. The data itself contains nearly all races run in the major California tracks from 2007 to 2010. The tracks include: Santa Anita, Del Mar, Golden Gate Fields, and the sadly defunct Hollywood Park. That's 11,513 races in total and 91,936 horses to bet on.

I then matched this data up with the Social Security Administration's top 100 boys and girls names of the past 100 years, separating out all horses whose names contain one of the name's from the SSA's list. Here are the results. I calculated the average return on investment if you placed a bet equally on every horse:

As you can see, the return on "named" horses has been significantly worse than horses without a popular first name. Also note that the average roi is pretty bad for any of the three categories. This is because the tracks have a guaranteed rake in parimutuel betting. They take their cut from the betting pool first (around ~17% for win/place/show pools) and then distribute the remainder to the winners. It's one of the reasons it is hard to make money in horse racing. The market would have to be pretty imbalanced to overcome what the tracks take off the top.

I also ran a more rigorous analysis of the data using logistic regression. Somewhat surprising to me was that the coefficients from that analysis did not pass a standard 5% or 10% significance test (the coefficient for girl's names just missed at 11%). So, this still may be noise. The only way to know for sure is to look at more data, and horse racing is never lacking for that. If only data acquisition were easier, or at least not prohibitively expensive.

Return on Investment by Name

If the data in total is not significant, then it's obvious that results at each individual name should not be treated with any kind of credibility. But I have the data, so why not share, if only for entertainment value. Here is the average roi for the top 10 boys names (top 10 in terms of races run).

name races win% roi
Mark 209 15.8% -19.1%
Joe 193 8.3% -57.6%
Jack 187 16.6% -19.8%
Johnny 105 22.9% 11.6%
James 79 10.1% -24.6%
John 64 14.1% -62.3%
Bobby 50 18.0% -30.8%
Michael 48 8.3% -52.9%
Albert 46 4.3% -58.5%
Brian 43 11.6% -57.2%

Horses named Joe have fared poorly (hence the post title). Horses named John even worse. But "Johnny"s have done well for some reason.

Here is the same table for girl's names:

name races win% roi
Rose 215 13.5% -13.8%
Grace 98 5.1% -64.6%
Mary 83 4.8% -84.1%
Jane 60 13.3% -37.7%
Jean 56 25.0% -1.3%
Kelly 53 22.6% 17.0%
Anna 51 9.8% -50.4%
Crystal 47 4.3% -79.4%
Emily 46 2.2% -90.2%
Marie 46 4.3% -83.9%

It seems appropriate that horses with Kelly in their name generated the most profits.

1 comment:

  1. You could have cross checked the names against the odds themselves, as the public tends to overbet longshots, leading to smaller takeout %s on the lower priced horses.