The Danger of a Small Sample Size

Fair warning, little geographic content makes it onto the Twelve Mile Circle today. Mostly I’ll focus on statistics. No, no, don’t go running for the door quite yet. It will be fun and actually the statistical slant will be really mild, grossly overly simplified with sweeping generalizations and involve no actual mathematics.

I went to a Washington Nationals baseball game yesterday evening. If you’re a baseball fan I already know what you’re thinking, but actually this is a happy story.

Nationals Park
Nationals Park on September 30, 2009; a crappy image taken with my mobile phone. Check out the size of the crowd.

Imagine the Nats down 3-4 at the bottom of the ninth inning, bases loaded, two outs, a batter facing a full count, and BAM, a walk-off grand slam homer with the Nationals taking the game 7-4 over the Mets.(1) The Post declared "the last scene of the 81st and final home game of the season was unlike any moment that preceded it, perfect and delirious…"

View Larger Map

This was the last home game of the year, a final chance to catch a ballgame in Washington until next April. I’m not a huge fan of any given team. I have no loyalties. However, I do enjoy attending live sporting events, being part of the crowd, soaking up the atmosphere, participating in the pageantry, and drinking a couple of beers with friends. Buy me some peanuts and cracker jacks, and all that. I’d only been to one other game the entire season so I figured I should fit this last opportunity into my schedule when it was offered to me.

Ryan Zimmerman Washington Nationals
Ryan Zimmerman on May 24, 2009; when I remembered to bring a much better camera

Oddly, the only other game that I’d happened to attend this year occurred on May 24, 2009 and also featured a grand slam home run. That time it came off the bat of Adam Dunn, putting the Nats out front in the seventh inning and resulting in a victory over the Orioles, 8-5.

Getting back to the title of this article, the Danger of Small Sample size, if one were to examine only these two instances one might conclude that the Washington National are a fantastic team producing a grand slam home run each game as they cruised to another victory.

Let’s get real. Grand slams don’t happen very often. The chance of seeing one occur in two separate pseudorandom games — the only two games attended in an entire season — has to be considerably less than one percent. I’m sure the baseball statisticians out there could probably find the actual number of grand slams, the actual number of games, and the joint probability of two such occurrences. If there are any baseball math nerds out there who’d like to take this on as a challenge and post the calculation/results in the comments below, it would be a truly outstanding contribution. Until then I’m going to forgo precision and guess "it is fairly remote."

Let’s look at the other improbable thought. This one is easy to dispel because the entire population is well known, and in fact is published widely every day. As of October 1, 2009, the Nationals have won 55 games and lost 103, with four games remaining in the regular season. This isn’t good. In fact it’s the worst in major league baseball. Any season with a hundred or more losses is considered pretty dismal, and the Nationals have accomplished this dubious achievement two years in a row.

Sample size of two: the Nationals are a great team.
Larger sample: well, not so much.

Either way, I’ll be out at the ballpark next spring. Maybe I’ll bring them some luck.

(1)My apologies to those of you who live in parts of the world not familiar with this sport. It’s probably best to simply note that this was "an exciting and somewhat rare event"

3 Replies to “The Danger of a Small Sample Size”

  1. Taking a crack at this….

    Each team plays in 162 games in the regular season, and there are 30 teams in MLB. So there are are 2430 games played per season.

    From 2001 to 2008, the average number of grand slams was 129.6 +/- 4.7 per season. (I note that 1999 and 2000 were each higher at 140 and 176, but I’m no baseball statistician, so I don’t know why.)

    I have read that having two grand slams in one game has only happened 13 times in the history of baseball, so I’m going to assume that most grand slams in a season are in distinct games.*

    That means that the probability of seeing one in a game, lately, has been about 5.3+/-0.2%. And seeing two of them should be about 0.28+/-0.02%.

    That makes your chances of choosing two games and having a grand slam be hit in both equal to about one in a little over 350.

    That being said, obviously, chances vary depending on the teams, the hitters, the pitchers, the weather conditions, etc.

    (*Even though I know that the Nats’ Josh Willingham had two in one game this season in an away game against the Brewers.)

    1. Craig: I went through a similar mental exercise albeit less precise and came up with a figure within the same (ahem) ballpark. I got the 2430 pretty quickly but didn’t have access to the grand slam per season figures (except for a single season, 2008 I think, that was listed in Wikipedia) so I didn’t have great confidence in it. You’ve obviously added some much needed precision which is much appreciated. One in ~350 is definitely pretty unusual but hardly anything approaching lightning-striking territory. I think what you’re saying is that I’m not lucky enough to quit my day job and play the lotto full time? 😉

  2. This is remarkably close to a crossover with my second-favorite blog, Five Thirty Eight (a politics-meets-stats blog run by a guy who got his start as a professional baseball statistician).

    If I had to guess why the ’99 and ’00 grand slam numbers were so much higher (and I say this without looking anything up, so I’ll probably be wrong here), it would be that those were basically two big Steroid Era years. So maybe home runs in general happened much more often, and grand slams rose with them.

Comments are closed.