How do I become a Sabermetrician?

Occasionally,†I get e-mail from someone who reads StatSpeak or some of the other writings that I sprinkle into the blogosphere, and my favorite always goes something like “I’ve read a bunch of stuff around and I’m interested in learning how to do my own Sabermetric research.† Can you help me?”† Yes,†I can.† I’m a therapist by training, and do you ever need help!
So you wanna be a Sabermetrician, eh?† Well, first you should know that there’s no school for Sabermetrics (well, there is a class out there…)† We’re all self-taught in one way or another, mostly in the form of guys using skills from their day jobs to study baseball.† It’s part of the charm of the field.† Most of us have respectable day jobs and we use this just to pass the time.† Just about anyone can get themselves a free blog and start posting their work.† That’s how I started out.††So if†you want to be a Sabermetrician, then by the power vested in me by no one in particular and the state of confusion, I now pronounce you an official Sabermetrician.† The certificate’s in the mail.
Now of course, you don’t want to be just any Sabermetrician.† You want to be one of those cool guys that actually gets hired by an MLB team someday.† You want to publish a book.† You want to be the next big thing.† I suppose I’m not any of those things either, but I can give you a few tips on how to get started.

  1. I can’t stress this enough.† There are far too many junk stats out there.† A junk stat goes something like this.† “I just†came up with the formula†HR x 15 + RBI x 7 + HBP x 4.5 + SLG x 90†based on how important I thought each one was”† I’ve heard that particular reasoning far too many times.† There are formulae that look like that, but they are developed using a very specific process.† I’ve seen several cases of someone posting one of those, being ignored, and then disappearing never to be heard from again.† I’m guessing that they were frustrated that no one saw†their brilliance.† Don’t start with a junk stat and be frustrated.† There is good work to be done and you might be the one who can do it.† Read on.
  2. Spend a few months reading Sabermetric work.† There are plenty of good sites out there.† We all link to each other.† Read their stuff.† Read the comments.† Read Baseball Between the Numbers.† (When you get advanced enough, read The Book: Playing the Percentages in Baseball)† Go over to the Baseball Fever boards and read the discussions that go on over there.† Participate.
  3. One of the things that can frustrate newcomers is the thought that their brilliant ideas that came to them in the middle of the night… have already been studied by someone else.† We’ve all done studies on the illusion of clutch and why RBIs are a bad stat (and bad grammar).† They’ve been studied to death… unless†you can†take a little more nuanced look at things.† And to do that, you’ll need a good understanding of what research has†come before you.† Probably the biggest mistake that people make is to try to jump into Sabermetrics†with both feet, not really knowing what they’re doing.† Slowly, my friend.† Slowly.
  4. You’ve probably already read Moneyball, which should give you a broader†idea of what’s going on.† We are not in the business of making baseball more “pure” or more enjoyable or more special or more cosmic or more whatever.† (Do watch†Field of Dreams,†because it’s a good movie… but understand that’s not what we do here.) †Sabermetrics is the scientific method applied to the goal of winning a baseball game/championship.† I’ll type that again.† Sabermetrics is the scientific method applied to the goal of winning a baseball game/championship.† May I recommend that you have some background in the scientific method before you begin.† I’m not saying that you need to be a Ph.D. level physicist, but simply that you need to understand how science works.† Yes, we spend a lot of time debunking some sacred conventional wisdom.† Be prepared to have some of your basic beliefs about baseball challenged.
  5. It’s good to be a fan.† In fact, I recommend that you watch/listen to/go to as many baseball games as you can.† It’s OK to have a favorite team and to occasionally be irrational in evaluating them, because you love them.† Ask me about growing up with the Cleveland Indians some time.† But, with that said, understand that science is a dispassionate process.† We go into a situation not looking to confirm that so-and-so is the best player in baseball, but we come up with a reasonable definition of things and let the numbers fall where they may.† Sometimes that means realizing that the numbers don’t bear out what you used to think as a kid (or as a fan now).† That’s actually a lot harder to come to terms with than you might imagine.† If you can get past that, you’ll make a fine Sabermetrician.
  6. Are you in college?† (Surprise!† A lot of the guys who travel in these circles are in/barely out of college themselves!)† Sign up for a class in statistics.† Trust me on this one.† Even if you’re an English major, it’ll come in handy both in Sabermetrics and in the rest of life.† Plus, it’ll teach you a little bit of how to use some of the computer programs that Sabermetricians like to use.† And computers make life so much easier.
  7. Draw from your background.† I’m a psychologist by training.† Most of the questions that intrigue me center around “Why did he do that?”† That’s what I’ve been trained to look for in life.† You may think that your chosen field has nothing to do with baseball, but you’re wrong.† Sure, there are a lot of guys who are physics/math majors who look at algorithims for figuring out what a player will do next year, and that’s fine.† I’m personally waiting for a good Sabermetric sociologist to come along to figure out why it is that baseball teams and society in general are so poor in assigning value to baseball players.†
  8. You do not need a doctorate in math.† Sure, the more analytical techniques you know, the more complicated questions you can ask.† And you do have to know some statistical/analytical techniques,†but some of the biggest discoveries in Sabermetrics involve little more than knowing what a correlation is (e.g., DIPS) and are simple to the point of elegance.† The math can be taught.† The real work in Sabermetrics is perceptual and creative.† It’s in seeing the game in a slightly new way and understanding how that insight can be measured and then tested.† The rest is just an engineering problem.
  9. Keep a running idea list of things that you want to accomplish and ideas that you’ve had.† Any time I have an idea pop into my head, I put it into my†special file.† When I need a project, I go back and pick one that sounds fun.† Even if you don’t know exactly how you’d do it, if an interesting question or idea occurs to you, write it down.
  10. You’ll notice that I haven’t specifically pointed you to any how-to guides.† The reason is that you’ll come across those in the process of reading through things.† And you’ll also learn what other statistical tricks that others use by osmosis.† Don’t focus so much on the actual technical details of how Pitch f/x works or what’s available from Retrosheet.† If you really get restless, download some Retrosheet files and play around with them, but you’ll probably learn naturally just by doing some reading.

World Famous StatSpeak Roundtable: June 30

OK, so†I lied.† Last week, I said that there would be no roundtable this week.† Through the magic of technology, we were able to gather together a roundtable, although don’t ask exactly how that was accomplished.† It involves the fact that as this is being published, Pizza Cutter doesn’t have internet access.
Anyway, this week, in the ultimate act of nepotism, we welcome as our guest Corey Seidman from MVN’s Phillies blog Phanatic Phollow Up.† Won’t you read on as we discuss set up guys, division leaders and Curt Schilling.
Question #1: Of the current division leaders, which ones don’t you expect to be there at the end of the season.† Whom do you expect will overtake them?
Corey Seidman: We find ourselves at the halfway point with the Red Sox, White Sox, Angels, Phillies, Cubs, and Diamondbacks in first place. I see all six of these teams winning their respective divisions.
The Red Sox have been the best team in the American League to this point, with their only criticism being their sub-.500 road record. But they havenít been as bad as they have been unlucky on the road. They were swept in Toronto following their season opening series against the Athletics Ö in Tokyo. Itís hard to hold a team accountable when theyíre given a day to travel from Japan to Canada and start another series. Of their 19 other road losses, 10 were one-run games. This doesnít show that they canít win on the road, it merely shows they have been unlucky on the road through the first half of the season.
The White Sox pitching has been great, which is why they find themselves ahead of the surprising Twins and disappointing (yet surging) Tigers. The Sox rank second to only Oakland in ERA (3.43), opponentís OBP (.307), and WHIP (1.24.) They lead all of baseball with 49 quality starts. Their bullpen is second in relieverís ERA and features two late-inning guys with 0.84 WHIPís in Scott Linebrink and Matt Thornton, as well as Bobby Jenks.
Unfortunately, their two most heralded run producers are having the worst seasons of their career, in the same year. Paul Konerko has a .368 slugging percentage (career .490), and Jim Thome has driven in only 38 runs in 73 games. Thome is on pace for 81 RBI, his lowest total in a full season since 1995. Despite Konerkoís and Thomeís struggles, the White Sox are still the best team in the A.L. Central. Carlos Quentin, A.J. Pierzynski and Joe Crede have held them together offensively, and letís face it, Konerko and Thome couldnít be any worse in the second half than they were in the first.
The Angels are the best team in the A.L. West. Their 3.5 game lead and 4-3 record against the second place Aís doesnít show their dominance, but donít expect the Aís to continue their winning ways much longer. Theyíve pitched out of their mind, and weíre one Rich Harden pulled muscle and one Justin Duchsherer look in the mirror away from seeing them fall fast. The Angels have the best starting staff in baseball from 1-5 with the emergence of Joe Saunders and the return of Ervin Santana. Francisco Rodriguez is on pace to set the single-season record in saves, Scot Shields continues to look like he could close for any other team in baseball, and the back end of their bullpen has only improved this year with the addition of Jose Arredondo (1.40 ERA, 0.72 WHIP, 19 K in 19.1 IP.) Add in a collection of little speedy guys (Figgins, Izturis, Aybar, Kendrick), good defense (Hunter and Matthews Jr.) and a slugger returning to form (Vlad), and youíve got a team that makes the playoffs every year.
For the wild card, I expect the Yankees to make a late push as they have done in recent years to overtake the Rays. Right now, the Rays look like an unstoppable team, but they just strike me as being a year or so away from seriously competing. I could see them winning it but could also see them having a bad September and letting the Yankees slip past, then have a disappointing season in 2009 that leads everyone to say this year was a fluke, before making the playoffs in 2010. Either could happen but neither would surprise me.
The Phillies are the best team in the N.L. East, and will win it, barring a catastrophic injury (Utley or Hamels.) They are considerably younger and healthier than the Braves and Mets, and havenít had nearly the amount of different lineups the other two have had. The Marlins were a young team that overachieved for two months and are coming back to reality now. They donít have the pitching to continue. Tell me all you want about Josh Johnson coming back, but I see a starting staff thatís best piece is Scott Olsen and his 4.89 K/9. Andrew Miller is struggling, Mark Hendrickson looks like this yearís Adam Eaton, and only Ricky Nolasco is picking it up lately. The Phils have the 3rd most quality starts in the N.L., the best bullpen ERA in baseball, and a lineup that is finally breaking out of a 10 game slump. Ryan Howard has struggled all season, yet still leads the N.L. with 67 RBI. Imagine if he was hitting .250 instead of .215. Heíd have closer to 80.
The Cubs had been awesome all season, but have struggled lately. Regardless, they are the Red Sox of the N.L. this year. They are the best team, have a ridiculous home record of 33-10, and are below .500 on the road. They lead baseball with 442 runs scored, are 4th in the N.L. in runs allowed, and their Pythagorean W/L is a game better than they are. Offensively, they have done it through periods without Alfonso Soriano. Probably because they have 7 regulars hitting above .280. Theyíll have home field advantage.
The Diamondbacks will win it because they are in the worst division in baseball. The N.L. West was extremely tight last year, but the Padres and Rockies forgot how to win this season. The Dodgers arenít good enough to overtake the D-Backs or they already would have. The Diamondbacks have been scuffling for a while and still havenít lost much ground. The advantage in pitching goes to Arizona and their two aces, as does the division. They arenít anything spectacular offensively, and Eric Byrnes might have only hustled and gritted his way to a big contract, but nobody else in the West is good enough.
The wildcard will go to the Cardinals here. The Brewers are making a push, but they have shown us over the last season and a half that they are a streaky team. The Cards had been getting it done without Albert Pujols, and despite the numbers suggesting Ryan Ludwick canít keep this up, he likely wonít need to for the Cards to win the wildcard. (Check who leads the Cardinals in ERA. You wonít regret it.)
Eric Seidman: So, right now we’re looking at the Phillies, Cubs, and Diamondbacks in the National League. If forced to bet money it would be put on all three of these teams winning their division. As a Phillies fan I am still not sold on the division being as easy as it has been; easy as in, the Phillies lose 8 of 11 games and gain ground. I just have a funny, non-saber feeling, that if the Mets or Braves sweep them in an upcoming three-game series, it could rejuvenate their season and propel them toward some relative success.
I don’t see the Cubs dropping off though keep in mind the Cardinals have Wainwright, Carpenter, Mulder, and Clement on the DL. Who knows if any of them will come back and/or be successful, but it is a possibility. Ultimately, though, I really don’t see them posing a significant threat to the Cubs (in the regular season).
Out west, the DBacks should win the division fairly easily but we all saw last year how an insane winning streak at the end of a season can come out of nowhere and potentially skyrocket a team toward the top of the division. Without Rafael Furcal the Dodgers, essentially, have an ugly offense, even going hitless last night (yet still winning!). So, in the NL I will pick the three current winners though if I have to pick a team to potentially overtake the leaders I will go with Mets, Cards, Dodgers.
In the AL, I see the Red Sox, White Sox, and Angels winning their divisions. The Tigers have been on fire lately and the Athletics have performed well this year, too. Oh, and the Rays! And the Yankees! And the Orioles are 41-38! Okay, I’ll calm down a little. I’ll take Red Sox winning the division with the Rays winning the Wild Card and the Yankees finishing 1-2 games behind the Rays. I’m going to take the White Sox to win the Central, and the Angels to, very soon, separate themselves from the As.
Pizza Cutter: As I write this, the AL division leaders are Boston, the White Sox (by half a game over Minnesota), and the Angels.† I think all three are vulnerable.† I’ve sung the praises of Tampa Bay previously, although that one might just be hope on my part.† Boston’s still the better team, but weird things happen in baseball.† The White Sox will win the Central.† If Minnesota is actually leading the division by Monday morning, put them in as my pick to be de-throned.† The Angels are a few games up on the A’s, but the A’s have the far better run differential.† And the A’s will probably make a few moves at the trading deadline.† This could turn into a matter of who adds more at the trading deadline.† In the NL, on the other hand, I don’t see anyone moving up over Philly or the Cubs.† The NL West doesn’t matter†because everyone in the division is slouching toward mediocrity.† It’ll probably be Arizona… but that’s only beause someone has to win it.
Read more of this post

Does Movement Influence BABIP?

A couple weeks back, Pizza Cutter found an interesting oddity in that Troy Percival had consistently posted very, very low BABIPs. In response, Dave Studeman brought up Mariano Rivera–another pitcher with consistently low BABIPs–and how it has been somewhat proven that elite relievers can register atypical results with this statistic. Mentioned on a few other sites was the idea that movement may be a central cause for these lower batting averages on balls in play; due to said movement, the sweet part of the bat would fail to meet the ball as it normally would on more “standard” pitches.
Last week, we explored the relationship between fastballs 92+ mph and BABIP, examining how it differed at each mile per hour interval. 92 mph to 96 mph clocked in between .290-.310–the established general range of BABIP for pitchers–before dipping to .273 at 97 mph and shooting back up to .293 for all thrown 98 mph or higher. The 97 and 98+ groups were too small in their sample sizes to definitively fail the 5% hypothesis; we would need around 1,650 balls in play and, combined, had 1,032. Still, the combo of 97 and 98+ offered a .279 BABIP, perhaps suggesting that the .293 at 98+ was the anomaly, not the .273.
Today we will look at the movement within the same 92+ mph range in order to attempt to answer the question posed in the title. First, though, a pre-requisite of sorts with regards to movement: the relationship between horizontal and vertical components is not extremely known yet other than some telltale signs aiding in the classification of pitches. For instance, a two-seam fastball will have much higher horizontal movement than vertical movement; however, four-seam fastballs generally have lower horizontal movement and higher vertical movement.
I queried my database for all fastballs 92+ mph and separated the results into groups by movement rather than velocity intervals. The signs (+-) were reversed so that righties and lefties could be grouped together as well. First, here is a sample size grid of sorts, showing all balls in play for each horizontal group and vertical subgroup; note that the subgroups differ for each horizontal movement grouping so they will be called simply below average or above average as they were essentially determined by the average or a similar type of cutoff point. The reasoning for this is the aforementioned relationship between movement components; for fastballs, lower horizontal movement will usually correlate with higher vertical movement with the inverse also being true.

Horizontal

Below Vert BIP

Above Vert BIP

0-4 in

3,735

2,456

4-8 in

6,823

4,718

8-12 in

4,355

3,227

12+ in

408

335

BABIP takes a while to stabilize, moreso than many other statistics, so I wanted to have at least 2,000 balls in play for each sub-grouping, preferably more. From 0-12 inches of horizontal movement we have large enough samples to notice discrepancies. Greater than 12 inches, however, offers just 743 balls in play. While I definitely plan to explore this and the velocity articles later in the year when more data is available, for now, I am going to exclude the group with more than 12 horizontal inches.
Looking at the other three groups and their two subgroupings each, here are the Ball%, Strike%, HR%, and BABIP:

Horiz.

Vert.

B%

K%

HR%

BABIP

0-4

Below

35.9

45.6

0.53

.289

0-4

Above

34.9

49.8

0.48

.286

4-8

Below

35.8

43.7

0.64

.302

4-8

Above

35.8

48.2

0.58

.292

8-12

Below

35.6

41.4

0.54

.315

8-12

Above

36.5

45.6

0.58

.298

The percentage of balls essentially stays in the same general range while the strikes fluctuate. The subgroupings with above average vertical movement have much higher strike percentages than others. So, judging by this it seems before we even get to BABIP, that higher vertical movement in these larger groups result in a higher percentage of strikes.

The BABIPs for horizontal movement groups with below average vertical movement register: .289, .302, and .315. The BABIPs for horizontal movement with above average vertical movement clock in at: .286, .292, .298. Judging from these results it would appear that, yes, movement does have some type of effect on BABIP. Each horizontal group posted higher counts when they had below average vertical movement, and at every interval as well; .289 to .286, .302 to .292, and .315 to .298. Additionally, all pitches 92+ mph with 0-4 inches of horizontal movement, regardless of whether or not they fell above or below the vertical cutoff point, produced a BABIP lower than .290, which is generally the lower edge of the .290-.310 range we expect it to fall into.

Tomorrow I’ll come right back with the total number of unique pitchers and those comprising at least 1% and at least 5% of the sample, in order to see if the results are skewed in any way. For now, though, it appears that, regardless of your horizontal movement, having above average vertical movement will produce a lower BABIP at each horizontal interval.

Vindicating Derek Jeter's fielding at short (sorta)

Introducing OPA!

Vindicating Derek Jeter’s fielding at short (sorta)

Introducing OPA!† OPA! is my new (still in the works) fielding system for use with Retrosheet, one that I’ve been meaning to create for a while now.† Last week, I teased the beginnings of OPA!, at least the ground ball part.† This week, a more full exploration of ways in which we can rate infield play without the benefit of knowing where the ball went.
First, the framework.† You may be wondering what OPA! stands for.† Other than my goal of making it the most festively-named fielding system out there (next time you go to a Greek wedding, they won’t be shouting UZR! or FRAA!), OPA! is short for OPAAA, or out probability added above average.† Consider a ground ball.† Any ground ball will do.† The infielder’s job is to turn it into an out.† He can either succeed or fail at this job, but several things must happen in order for him to succeed.† He must have:

  • Good range: he has to get himself and his glove in the neighborhood of the ball
  • Good hands: he has to actually get the ball into his glove
  • Good arm: he has to then throw the ball to first (or second?) and put it somewhere in the neighborhood of the first baseman’s glove
  • The first baseman has to catch the ball

All of these things must happen in order for a ground ball to become a ground out.† One of the major problems that I see with some of the major fielding systems is that they treat all of these as one giant package.† Either the play was made or it was not.† Sure, the point of the game is to make the play, but let’s think about the following situations.† A ground ball to short where the SS gets to the ball, fields it cleanly, makes a throw right to the first baseman… who drops the ball.† Sure, the 1B will pick up an error for his efforts, but the play not being completed, the SS gets no credit when he did everything right!
One of the things that spawned the new generation of fielding stats was an understanding that fielding percentage, indeed, the entire concept of an “error” was flawed.† An error means that the fielder did something right, namely that he got to the ball.† Yes, he booted it, but we don’t have a debit for those guys who are too slow to even get to the ball to begin with.† So, an error actually penalizes one of the skills that you hope a player has.† But, the type of error given (fielding, throwing) does tell us where things went wrong.† It’s time to develop that line of logic more fully.
The average ground ball to somewhere on the†third base side of the infield has an X% chance on average of being turned into an out.† We can play with the parameters around pitcher handedness and batter handedness and if I had more detailed data, hit location, but there will be some number that emerges.† The very act of the fielder ranging to the ball and at least stopping it from going to the outfield adds some†additional percentage chance that the ball will become an out.† Letting the ball through destroys what†chance there was to make an out.† (I’m sure most of you have figured out by this point, but if anyone’s still lagging, I’m basing this model on the idea of WPA.)† If the third baseman makes the play, we ought to credit him with the out probability he adds based on his range.† If the ball goes through to left field, we should assign the 3B some blame, along with the shortstop.† How to chop up that blame was neatly explored last week.†
But, now let’s take a look at what happens if the third baseman gets to the ball (range), but boots it (hands).† He’ll be charged with a fielding error, and the out probability that he built up by getting to the ball is now gone.† To more accurately reflect what happened though, we can put his range OPA in the “range” basket and debit his “hands” basket.† (And if the first baseman drops the ball, we can debit his “hands” basket, while leaving the third baseman’s contributions alone.)† Now, we have a much more fine-grained idea of where a player’s strengths and weaknesses are.†
That’s the theory.† For the numerical spaghetti and some 2007 results (including a few things about Jeter), keep reading.
Read more of this post

World Famous StatSpeak Roundtable: June 23

This week’s roundtable will have to last you two weeks.† Next week (6/30), we will sadly have to interrupt our usual Roundtable service due to the fact that the table and everything else that†I own will be in a moving van working its way across a couple of states.† But, this week, we do have the fun of welcoming David Appleman, proprietor of FanGraphs.com, where he serves up all sorts of baseball-related statistical gooeyness for us statistically inclined folks.† Read on as David, Eric, and Pizza talk about whom they want on the mound in Game 7, lacking balance, and the Blue Jays’ rotation.
Question #1: If you had to win a single game of baseball, which active starting pitcher would you most want on the mound?
David Appleman: I think the obvious and almost unanimous choice a year ago would have been Johan Santana, but heís not quite pitching at such a ridiculous level anymore and Iím hesitant to put him at the very top of any list Iíd have to make today. Roy Halladay is one of the few other pitchers that comes to mind and heís currently having the best season of his career. He leads the majors in K/BB, and is an extreme groundball pitcher to boot, meaning youíre going to keep your home runs to a minimum.
After sifting through the stats a bit more, Josh Beckett really stood out to me. While his ERA (3.84) doesnít show it, heís pitching arguably better than he ever has. Heís striking out over a batter an inning and his walks are at a career low. And while I typically donít put a whole lot of weight on post-season statistics, he does seem to have a knack for the big game, which shows up in both his ERA and his peripherals.
Finally, C.C. Sabathia, who also doesnít have a great ERA (4.06) has been as good as anyone after April. Heís striking out batters at a career pace and thereís not really anything bad you can say about the guyís pitching. Since I brought up Beckettís postseason, I guess itís only fair I mention Sabathiaís. Yes, he was horrible last year and issued way too many walks, but Iíd still give him the benefit of the doubt.
While itís a tough decision, I think Iíd have to go with Roy Halladay as my #1 guy right now, closely followed by C.C. Sabathia, and then Iíd have insert Santana as my #3 choice (because despite his slight decline, heís still very good), with Beckett left as an alternate.
Eric Seidman: It’s tough because we have to set some parameters.† If we’re talking about one game right now with only active pitchers (meaning nobody on the DL) I would probably pick CC Sabathia or Cole Hamels.† Sabathia’s poor start was mainly attributed to two consecutive early starts in which he allowed 9 runs each.† After that he has been very stellar.† Hamels is Hamels, one of the best pitchers in baseball.† If we’re talking about anyone, I’ve always been a member of the “give me John Smoltz in a must-win” bandwagon.† According to Fangraph’s clutch stats, Vicente Padilla has been the guy people should want on the mound the most; after watching him for years in Philadelphia I’ll have to disagree there, though.† If I have to pick I’ll say Hamels simply because he has proven himself capable of “stopping” and has the mental makeup required to sustain the confidence level required in a must-win situation,
Pizza Cutter: Brandon Webb keeps the ball on the ground the best and has†a quite-good K/BB ratio and has an FIP of 3.00.† Roy Halladay has the best K/BB ratio and a quite-good keep it on the ground ratio, plus an FIP of 2.85.† Halladay throws harder and is less reliant on his fast ball, throwing it only 45% of the time (to Webb’s 70%).† Halladay gets my vote, as of right this moment.† Odd that Halladay probably wouldn’t be the first name off anyone’s lips in the general public.† A guy that good toils in obscurity.† Sad.
Read more of this post

Heater Getting Hotter

Yesterday we looked at the averages of fastballs from different velocity groups as a means to compare certain pitchers to their like-throwing peers as opposed to an extremely broad group.† This way, we can compare Matt Cain’s movement to the average movement for all 94 mph fastballs to determine how effective it has been.
In doing so an anomaly surfaced: all velocity groups had a BABIP between .290-.310 except those thrown 97 mph.† Those heaters registered a .273 BABIP, nearly 20 points below the others.† Sure enough, fastballs registering 98 mph or higher jumped back to .293, leading many of us to believe something screwy, flukey, or any other adjective ending with the suffix “-y” slapped on its end, was taking place.† After exploring some logical possibilities, like a split-half reliability test, or a look at BABIP by count and location, the results either stuck or were inconclusive due to small sample sizes at work.
We had a really nice discussion in the comments section wherein more possibilities were tossed around.† The first of these suggestions involved testing the sample size via a Bernoulli Trial.† As was shown by commenter Adam Guetz, for an observed .273 when a .295 was expected, we would need approximately 1,650 balls in play.† For 97 mph pitches there were 707 balls in play, less than half of what is required, and just 325 balls in play for 98+ mph.† While the sample sizes of actual pitches thrown are large enough to conduct certain analyses, those of balls in play for anything 97 mph or higher were not.† Here are the BIP sample sizes:

  • 92 mph, 18.85 % BIP†and†7,759 total
  • 93 mph, 18.05% BIP and 6,023 total
  • 94 mph, 18.05% BIP and 4,389 total
  • 95 mph, 17.04% BIP and 2,827 total
  • 96 mph, 17.26% BIP and 1,596 total
  • 97 mph, 16.69% BIP and 707 total
  • >98 mph, 16.11% BIP and 325 total

The samples from 92-96 appear large enough, but the combination of 97 and 98+ still comes a good 500 pitches below 96 mph on its own.† Another suggestion called for the total number of different pitchers as each interval as well as the number of those comprising certain percentages of the samples.† This way, we might be able to deduce that 97 mph pitches were skewed due to a small group representing the whole; for the lower velocities, which are more common,†it is much more likely for the pitches to be more evenly divided amongst a larger group of pitchers.† Here are the number of pitchers for each group, those comprising 1% of the sample, and those comprising 5% of the sample:

  • 92 mph: 574 total pitchers, 8 at 1%, 0 at 5%
  • 93 mph: 485 total pitchers, 18 at 1%, 0 at 5%
  • 94 mph: 516 total pitchers, 21 at 1%, 0 at 5%
  • 95 mph: 337 total pitchers, 25 at 1%, 0 at 5%
  • 96 mph: 237 total pitchers, 28 at 1%, 1 at 5%
  • 97 mph: 160 total pitchers, 25 at 1%, 4 at 5%
  • >98 mph: 102 total pitchers, 18 at 1%, 8 at 5%

In the 97 mph group, the four pitchers with at least 5% of the sample combine to represent 23% of the total.† For 98+ mph, the eight pitchers with at least 5% of the sample combine to represent 56% of the total.
From these results it seems that 92-96 mph are safe from a drastic case of small sample size syndrome.† Anything abobe 97 mph, though, seems to be the opposite as they suffer from a small sample of balls in play as well as skewed results due to a small group of pitchers representing most of the total pitches.†
Another commenter, Dave Evans, pointed out that he received a significance of 0.55 when comparing 97 and 98+, meaning their BABIPs were not statistically significantly different; for significance, that value would need to be equal to or below 0.01.† This led me to group 97 and 98+ together, to enlarge the sample.† The result was 1,032 balls in play, 288 hits in play, and a .279 BABIP.† This suggested the possibility that perhaps it was not 97 mph that deserved the adjective+suffix “-y” treatment but rather 98+ mph pitches.† Granted, it is still a small sample, even moreso for BABIP, but perhaps we will find out, as more data becomes available, that 97 mph is the threshold, as Pizza Cutter noted, for “blowing it by the hitter.”
It will require several hundred more pitches in play to determine this with any certainty but I will be keeping very close tabs as the season progresses.† For now, though, we can effectively compare individual pitchers to the average movement components, B%, K%, and BABIP for their specific velocity, not an entire group, at least for heaters 92 mph to 96 mph.

Follow

Get every new post delivered to your Inbox.