The always great Tom Scocca on the mental state of Oriole Nation as the 2008 campaign gets underway:
Beyond plain categories of optimism and pessimism live those of us who see a sparkling half-glass of water and know for sure that the Orioles are eventually going to take a crap in it.
More Orioles dyspepsia at Tom’s season preview at Deadspin.
My WNYC piece about sabermetrics and Alex Rodriguez (plus a little Orioles dyspepsia for my fellow orange-and-blackers) can now be heard online.
In today’s New York Times, Samuel Arbesman and Steven Strogatz argue that Joe DiMaggio’s streak wasn’t as miraculous as you think. They ran 10,000 Monte Carlo simultations of the history of major league baseball and found that, 42% of the time, someone had a hitting streak 56 games or longer. In every case, there was some player in some season who put together a hitting streak of at least 39 games.
That’s a nice experiment, but I don’t think it quite justifies the headline. The figure below shows that, in the simulation, long hitting streaks were strongly concentrated in the pre-1905 era, when higher batting averages were more common. In 1894 (the big spike in the chart below) the batting average for the entire National League was well over .300. The relevant question is not so much “is it surprising that someone had a 56-game hitting streak?” but “is it surprising that someone playing baseball under modern conditions had a 56-game hitting streak? And how likely is it ever to happen again?” The number I’d like to see is: of the 10,000 simulated seasons, in how many did a player have a 56-game hitting streak after 1941?
Despite my criticism, I’m delighted the NYTimes published this. The main point — that unlikely-seeming events are actually quite likely, as long as you give them enough chances to happen — is a crucial and subtle one, which should be repeated in a loud voice at every possible opportunity.
Arbeson has a blog which is mostly about computational biology and urban planning, not baseball. Strogatz has no blog, but his book Sync: The Emerging Science of Spontaneous Order is surely very good, based on the lectures I’ve seen him deliver.
I’m glad you enjoyed our article. We actually were able to tabulate the number of runs that occurred after 1941 with at least 56 games. It’s 301. So, if past performance is indicative of future results (which need not be true), it seems that we would have to wait many, many more years for there to be a good chance of a long streak to happen again.
In the NYT piece, the issue with the inclusion of the early years is slightly more complicated.
As I understand it, each iteration of the simulation was a 135-year run that identifies the longest streak over the 135 years. Any other information about other long streaks in that particular run is discarded. It is entirely possible that the facts that:
1. long streaks were more likely during the pre-1910 era and
2. a post-1910 long streak was excluded if there was a longer pre-1910 streak in the same run
make modern-era streaks look less likely than they really are.
Paulos addresses this briefly and incompletely in _Innumeracy_ (p. 47). He actually looks at Rose’s 44-game streak, claiming–but not providing the numbers–that a 44-game streak for a hitter with Rose’s average was actually less likely than a 56-game streak for a hitter with DiMaggio’s average. He works out a specific base probability of the streak (probability of a hit in a single game raised to the power of the length of the streak), discusses all the multipliers (number of seasons, games in a season, number of players in the league) that increase the overall historical probability, and concludes that such a streak is “actually not unlikely” in the history of baseball, but doesn’t provide an actual probability estimate.
Re: Sam’s comment–it’s a useful additional data point, but I think choosing “after 1941” as the target makes the 56-game streak seem less likely than it really was. Clearly there was a fundamental difference between the 1890s and the decades since, but the cut-off for the start of the appropriate “era” for comparison to DiMaggio is probably better set to the end of the dead-ball era than to the season of his streak.