Reproducible statistics study, part II

Having run the 1941 simulation a total of 25 times, the same trends that emerged through the smaller sample size (ten seasons) continued to assert themselves in the larger sample. While a large variety of names were seen in the hitting streak leaderboard in all 25 season runs, the same group of 15-20 players appeared in the entire set of batting average leaders.
So what does this statistical analysis mean? A hitting streak, while certainly involving a great deal of skill, also involves a great deal of luck and good timing. Obviously, a batter who only accumulates 50 hits in a season can still find himself on a hot streak and collect 30 of those hits in consecutive games. That same hitter, however, would be hard pressed to continue that run throughout an entire season.
However, for all its study and practice, hitting is largely an inexact science. That can be shown no more clearly than in two cases during the 1941 season:

  • Ted Williams chose to bat in both games of a season-ending doubleheader for the Red Sox. By going 6-for-8 collectively in that day, he raised his batting average from .400 to .406. Had he not collected at least four hits that day, he would have finished under .400 for the season and his batting title would have been nothing more than a “what if?” moment for Williams, the Red Sox, and the American League.
  • Joe DiMaggio, on the other hand, was stymied by Cleveland 3B Ken Keltner who made two amazing plays on sharp grounders from DiMaggio. If even one of those gets through, DiMaggio’s record would conceivably stand at 73 games (after the end of the 56-game streak, DiMaggio went on to follow that up with another 16-game hitting streak). Incidentally, DiMaggio also holds the Pacific Coast League record for consecutive games with a hit, with 61.

DiMaggio’s final 1941 batting average was .357, almost 50 points lower than Williams. This shows that he was also slightly more prone to cold streaks than Williams, but the lack of any significant hitting streak from Williams shows that while Williams may have been consistent, his “streakiness” in comparison to DiMaggio was wholly unremarkable.
Essentially, it seems that the 56-game hitting streak is held in more reverence for one underlying reason – given the opportunity under equal conditions, Williams in 1941 could reproduce his .400+ batting average relatively consistently. DiMaggio, on the other hand, would be hard pressed to duplicate a 56-game hitting streak under equal conditions in 1941. The rarity of such a streak makes it seem that much more impressive.
Season leaderboards, runs 11 through 25:
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25


2 Responses to Reproducible statistics study, part II

  1. Sean Smith says:

    Ted Williams not having a Joe-type hitting streak has nothing to do with his “streakiness”, its because he walked so much, and was a lot more likely to break a hitting streak by going 0-1 and still reaching base 3 times. I believe Ted does hold the record for On-base streak with 84.
    Hard to believe Orlando Cabrera at 63 has the longest of the retrosheet era.

  2. Ted does hold the OB streak, with 84 – he did that in 1949.
    I never would have guessed about Cabrera, though. He definitely fits the bill as streaky – his next highest OB streak is 10 (which he did 6 times) and 9 (3 times). Ridiculous.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: