Breaking Down the Heater
June 19, 2008 32 Comments
Back on December 20th, John Walsh wrote a very interesting article at The Hardball Times, taking everything recorded by the Pitch F/X system in 2007 and, amongst others, calculating the average velocity, horizontal movement, and vertical movement for the four major pitches: fastball, curveball, slider, and changeup. The results showed that the average fastball clocked in at 91 mph with -6.2 inches of horizontal movement and 8.9 inches of vertical movement. The author acknowledged that he did not differentiate between four-seamers, two-seamers, and cutters, but rather lumped them all together in determining the averages; two-seamers and cutters differ in velocity and movement components from four-seamers.
While I plan on calculating the averages for all different sub-groupings of pitches at some point, what recently piqued my interest was finding the averages for different velocity groupings. As in, what is the average horizontal movement for all 94 mph fastballs? Or, the BABIP for 98 mph fastballs?
With that knowledge we could effectively compare certain pitchers to the means of their velocity grouping rather than overall averages of every grouping. Instead of comparing, say, Edwin Jackson’s 94 mph fastball to a group including those who throw slower, we can compare him to his “peers.”
I started at 92 mph and queried my database for groupings (92-92.99, 93-93.99, etc) all the way up until 98+ mph. I figured 92 mph would be a solid starting point since the sample size would be extraordinarily large–large enough for four-seamers to overcome the two-seamers and cutters that may inevitably sneak in. Anything 98 mph or higher was grouped together to ensure a large enough sample since, as you will see below, the higher the velocity, the smaller the sample:
Velocity |
Sample |
% |
92 mph |
41,157 |
31.4 |
93 mph |
33,368 |
25.5 |
94 mph |
24,315 |
18.6 |
95 mph |
16,586 |
12.7 |
96 mph |
9,245 |
7.1 |
97 mph |
4,236 |
3.2 |
>98 mph |
2,018 |
1.5 |
All of the sample sizes here were large enough for analysis. Even though the 98+ group appears to be 1/20th the size of the 92 mph group, that speaks more for the latter than against the former.
Next, how do the movement components look for each group?
Velocity |
Horiz. |
Vert. |
92 mph |
-6.34 |
9.24 |
93 mph |
-6.28 |
9.51 |
94 mph |
-6.16 |
9.80 |
95 mph |
-5.98 |
10.07 |
96 mph |
-5.84 |
10.23 |
97 mph |
-5.89 |
10.41 |
>98 mph |
-6.03 |
10.38 |
It should be fairly apparent that the tendency is for horizontal movement to decrease and vertical movement to increase as the velocity increases, at least through 96 mph. At 97 mph, both movement components increase. At 98+ mph, the vertical movement stays stagnant while the horizontal movement jumps quite a bit.
The next area to discuss includes B%, K%, HR%, and BABIP:
Velocity |
B% |
K% |
HR% |
BABIP |
92 mph |
35.9 |
44.6 |
0.65 |
.302 |
93 mph |
36.3 |
45.1 |
0.55 |
.303 |
94 mph |
35.5 |
45.9 |
0.55 |
.292 |
95 mph |
35.8 |
46.4 |
0.76 |
.303 |
96 mph |
35.2 |
47.0 |
0.54 |
.291 |
97 mph |
36.1 |
46.8 |
0.41 |
.273 |
>98 mph |
33.9 |
49.3 |
0.69 |
.293 |
The percentage of balls doesn’t move too much until its dip of over two percentage points at 98+ mph. The amount of strikes, however, seems to increase. There is no real discernible pattern in the home run percentages; the most came on 95 mph heaters while the least came on those registering 97 mph.
Speaking of the 97 mph group, notice anything odd? Perhaps that their BABIP is .273, a full eighteen points below any other group? Prior to getting the results I expected each group to fall somewhere in the .290-.310 range; that all of them did except the .273 struck me as very peculiar.
I spoke to several other analysts, all of whom initially mentioned small sample size syndrome, only to redact the assessment after learning the sample sizes in question. The dropoff in home run percentage was tossed around, as well, since less home runs means more balls in play to be counted in the BABIP formula. This is a “could be,” though, rather than a “definitely why.” As was mentioned in these discussions, too, it could be nothing; perhaps there were more warning track flyballs that just missed leaving the yard as opposed to weaker hit balls.
Now, while the 4,236 pitches at 97 mph constitutes a large enough sample to analyze, the balls in play were not large enough yet to break into individual counts or locations. When they do get big enough this could serve as a means of explanation; perhaps something in either or both does not jive with the other velocity groups. Of those with significance, however, there was a .263 BABIP on 0-0 counts, and a .286 BABIP on pitches in the middle of the strike zone.
Pizza Cutter, or “The Master of Statistical Reliability” as I like to call him (yeah, a nickname for a nickname), suggested that BABIP is one of those stats that is super-unreliable, even with my large sample of pitches. I did a split-half reliability test, randomly splitting the sample in half, and calculating the BABIP of each half. For those unfamiliar, this serves to test the reliability of the sample; if it truly is large enough then no matter how we cut the sample in half we will have fairly convergent results. If the results were wildly divergent then we are dealing with an unreliable sample. The BABIPs of the two groups were .271 and .275, which essentially threw that idea out of the window.
Something interesting to consider was how, in each of these tables, all patterns seemed to stop when they reached 97 mph or higher. The horizontal movement increased instead of its decreasing trend; vertical movement decreased after its increase at 97; the percentage of strikes ceased increasing; and home runs reached their low. Could be something, could be nothing, but interesting nonetheless.
For now I am going to chalk this BABIP drop as an extreme random statistical variation and hope that you loyal readers out there might chime in with some more ideas to investigate. Otherwise, though, when gauging the movement components, percentage of balls/strikes/home runs, or even BABIP, we can compare individual pitchers to their “like-minded” averages by velocity grouping. If I get enough feedback involving different aspects to measure regarding these fastballs we will look at that soon, in the next day or two. Otherwise, next week I have something similar to this, looking at BABIP by movement.
Mr. Guido, thanks for that, it should clear some stuff up. So we’re dealing with small samples for 97-98 as under 900-1000 have been put in play, and 8-10 pitchers in each range account for half of them.
Eric – Walsh’s article already shows what ‘good command’ is…the interesting part is that it changes depending on throwing velocity.
You are right – Walsh did find effectively no difference in results when comparing velocities of low and outside fastballs…but he also found that velocity does matter when thrown anywhere else but outside and low, and particularly on the inside of the plate.
If you go back to Walsh’s graphs [ http://www.hardballtimes.com/main/article/how-fast-should-a-fastball-be/ ] (I agree, this article was excellent… I’m still learning things from it weeks afterward) and compare the 94+ dots, hard throwers are more successful throwing high and tight than only throwing low and outside. The graphs show that the 94+ fastballs are most successful outside at the waist (where the larger rising movement is most effective against a flat-planed swing) or high and inside, whereas the slower fastballs are effective only on the outside of the plate.
So harder-throwing pitchers with command have an advantage against their
less than 88 mph brethren: they can throw to both sides of the plate and be successful with their fastball. It’s a big advantage for those with the control to exploit it. To come back around and answer your question… Good fastball command for a 94+ pitcher is throwing high and inside AND outside and low or waist high; good fastball command for a slower pitcher is ONLY throwing outside and low.
It should be noted with all of this that we’re not comparing pitch counts (3-1 vs. 1-2, etc.) and patterns (thrown after breaking balls, offspeed pitches, consecutive fastballs, etc.), so really I think we are simply saying in this case that ‘fastballs thrown at random times X’ achieve these success rates. I’d love to see how slower fastball pitchers compare against faster pitchers under these more specific situations – then we can start talking about the importance (or maybe lack thereof) of pitch selection vs. velocity vs. location. Right now we only are comparing the latter two… which is good, but it isn’t the whole story.
Man; I love this stuff. This is the science of pitching we’re talking about here. This is amazing material for any pitcher at any level.
I now have to go back to work as well!
Thanks again.
Walt, good points and ideas. Definitely something I’ll look into sometime soon. This is the ultimate in analysis; the batter-pitcher matchup is not-so-arguably the most important aspect of the game and, in a few years time, we will have so much data to allow us time to answer more age-old questions quantitatively.
I doubt if the small pool of pitchers for the 97 and up categories has too much of an effect here, since ‘true talent’ levels for BABIP has been shown to have a pretty low variance in general, even for ‘hard throwers’. Team/park effects add some variance, but even so I think the biggest constraint here is the total number of pitches.
Have you thought about using a stratified random sample for your data. Separate the fastballs into 9 or more sections by location, and then randomly sample from each of your mph groups, making sure that you take the same amount from each location. I would bet that this rise in BABIP (if not from random variation) is due to the fact that your last group is so much broader in terms of mph. Its possible that really hard fastballs, over 98 mph, only get swung at if they are meat, and thus are easier to get base hits off of once hit.
Dave, good idea. I’m adding that to the queue of things to investigate for tomorrow.
That location is more important than velocity is conventional wisdom. Another conventional wisdom is that pitchers can trade control for increased velocity. Perhaps when pitchers are pitching at 98+ mph, they are just trying to blow heat past hitters without controlling where it’s going. If you break each velocity group into 9 subzones of the strike zone, perhaps more 98+ mph fastballs will end up middle middle or middle in than at other speeds
How accurate are the velocity readings in the pitch/fx data? It seems to me that splitting fastball velocities into bins of just 1 mph is an overly fine-grained exercise. The inconsistencies at 97 mph could just be due to measurement errors.
Well, see, the 98+ isn’t what bothers me to much because I do believe the increase in strikes, decrease in balls can be attributed to what you said, in that most are being blown by the hitters instead of located with precision.
The 97 mph is what bothers me in the sense that the BABIPs for every other range are “normal” whereas that velocity is well below. As I mentioned, though, it is not large enough to break into 9 different zone samples; what I could do, however, is break it into 3 zones: inside, middle, away.
I think some of the more puzzling parts of the results may be explained by a bias you’re overlooking. Although the total number of pitches would make you think that the data is signficant, keep in mind that at the higher velocities, those pitches are mostly being logged by relatively few pitchers. Each of those pitchers probably tends to have their pitches cluster around a narrow velocity range, and each of those pitchers has different influence on HR rate and BABIP. So just one or two pitchers who frequently throw at 97MPH may be causing the results you’re looking at.
Eric, I had suggested what Alex said. You should show the number of distinct pitchers in your sample, as well as the number of pitchers who make up at least 5% of the sample at each speed level. It’s very possible you’ve got Papelbon, or whoever may be leading the league in BABIP right now, disproportionately representing those pitches.
As well, you’ve got 2000 pitches, but how many of those pitches were actually contacted as in play? Maybe you’ve got 400 balls in play there? That’s ridiculously small, as far as BABIP is concerned.
what is the BIP% for each group? the standard deviation for a Bernoulli trial is sqrt(p(1-p)), so to fail the 5% level hypothesis test for p = .295 with an observed p = .273 you’d need 1650 balls in play.
Good points. I’m at work right now but when I get home I’ll work something up to post later tonight looking at all of these new aspects. Tango, I actually just got your e-mail a couple hours ago, after this had been put up.
Actually, we can find the BIP numbers from everything else here. BIP% = 1-(K%+B%+HR%):
92: 18.85% or 7,759 pitches
93: 18.05% or 6,023 pitches
94: 18.05% or 4,389 pitches
95: 17.04% or 2,827 pitches
96: 17.26% or 1,596 pitches
97: 16.69% or 707 pitches
98+: 16.11% or 325 pitches
So, 98+ is ridiculously small as far as BABIP is concerned. 97 isn’t as big as I’d like it either, so ANG it is less than 50% of the 1650 required.
I’ll check the pitchers and 5% population when I get home.
Fascinating analysis, Eric, but the key is interpreting and explaining what it means…
– What if those that are able to throw 98+ are primarily using their fastball (like short relievers?), and hitters are thus more able to sit dead red against them, leading to an artificial rise in BABIP and HR% in the data?
– if command is the key, not necessarily velocity, this spawns an interesting question: what is the fastest pitchers can throw with ‘good’ command? Could this be the subject of another analysis?
Thank you for sharing this, Eric.
Walt, thanks. I’ll be checking the number of pitchers and the “who’s who” of each group in a few hours to post later tonight to see if what you, Alex, or Tango suggest shines through.
With regards to the second point, John Walsh did a really great article about a month ago at The Hardball Times showing that there is virtually no difference between an 88 mph and a 96 mph fastball when both are located on the outside corner.
I guess my question to you would be, what is ‘good’ command? Pitchers exceeding or falling below a certain b%/k% plateau? Or, via location, a higher percentage on the corners than in the vicinity of down the middle or outside? I would go for the second one and will definitely look into it. Thanks.
I second (or third, or whatever it is at this point) the notion that the number of pitchers in the upper categories are skewing the results. My initial reaction is that you’re still dealing with SSS at the upper levels, due to the specific characteristics of a small number of pitchers.
Just ran a quick check of my own DB and in 2008, just 10 pitchers accounted for over 50% of the 97+ fastballs, and 8 pitchers for over 54% of the 98+ category.
And of course they all have their own characteristics, eg Morrow’s 98+ heaters have an avg break length of 2.2 vs Papelbon’s with a 4.2, so it’s going to be hard to draw generalized conclusions on speed out of that.
ANG, are you referring to the total number of balls put in play or the actual total number of pitches? Balls in play would definitely hinder us rendering anything concrete for 97 and higher since it’s currently 707 and 325 (though I wrote this before updating from last night) even though the actual sample of pitches is still somewhat large for each, or rather significant enough for other analyses.
I’m going to keep tabs on this and update it in a couple of months to see where we’re at.
I did a two proportions test on 97 and 98 mph and the p-value was fairly large (.55). This means that there is no evidence that the 97mph and the 98 mph group have differing BABIP’s. Even if I had got a small p-value, we couldn’t draw any conclusions because an ANOVA is needed here. In other words, since we were looking for something in the .01 range, .55 is way to large to draw any conclusions other than the fact that there is no evidence of differing BABIP’s.
Dave, I literally just did the same thing. So, it seems that 92-96 are relatively safe but 97 and 98+ seem “odd” due to a small sample of balls in play, even though the sample size of pitches in those intervals is not too small for analyses in other regards.
I’m going to post something tonight discussing the “anomaly” and our findings here.
Granted, I don’t have my files on me here at work but using the calculations to find the BIP totals, if we combine 97 and 98+, we get roughly:
6,254 total pitches
1,032 total BIP
288 hits in play
.279 BABIP
Don’t forget that we do know there is a closer-effect to BABIP, contrary to what someone else said a bit earlier. It could be that the effect is concentrated among the big heater closers.
While I dispute that there’s anything magical about 97, separate from 96 or 98, certainly you can (and likely will) have an effect on BABIP as the speed of the fastball goes up.
Because we’d need to do an ANOVA to properly test if all groups had the same mean, I lumped 97+ and 92-96 together and did a 2 proportions test. The p-value was .149, which while is not really considered significant, suggests you may wanna take a closer look with a bigger sample size, as I’m sure you were going to do anyways
Yeah, as more data is recorded I’m going to keep tabs on this.
Tango, one of the things I have written down to look at (as the sample gets bigger) is, if possible, what happens when we remove the big heater closers. That is, assuming we have enough non-bigheaterclosers with the certain velocity groupings, see the effects of, say, 97+ mph pitches thrown by the closers vs. 97+ mph pitches from others.
I made a plot of the (bernoulli) fastball outcomes with confidence intervals:
http://www.stanford.edu/~guetz/fastball/fastball.jpeg
The most notable trend seems to be in the K%. B% and BABIP% are inconclusive. There’s an interesting spike in HR% in the 95MPH group; is there where 2-seam fastballs exit?
Tango, what is the magnitude of the closer effect you mention (and what’s the reference)? I’m not saying there isn’t a sample bias effect for the smaller pitcher pools, just that it’s magnitude is likely much less than the variation due to pitch sample size.
Eric, you might try a few different ways to split the sample in half and see what happens. Try evens/odds, first half/second half, or if you can take random samples do that and re-run the same type of split-half analysis. Or go so far as to break it into thirds or quarters or something.
Also, piggybacking off what others have said, you might restrict the pitchers you look at to only those who are present in the 97-98 bracket. So, you’d be looking at Papelbon, Zumaya, etc. only and what their BABIP was at 94, 95, 96, 97, 98… That could control for whether it’s something inherent in the pitcher himself. Maybe to be safe, only select as many balls in the 96 and 95 and 94 baskets for each pitcher as there are in the 97 basket.
The other thing is that you (Eric) begin to touch on in post 22 is that you may be looking at the question the wrong way. Maybe it’s not 97 that’s the weird finding but 98. We know that you have a small-er sample size in the 98 bucket (325 balls in play)… maybe “true” BABIP is really lower on 98 mph pitches, but because of the low sample size and the inherent instability in BABIP, you got a flukey finding. Maybe 97 is the threshhold for “blow it by you” speed.
Pizza, yeah that’s what I was referring to. Perhaps 98+ mph should have a “true” BABIP closer to the .273-.280 range and not vice-versa, which we’ve been discussing here.
From this dataset about all you can say about 97+ BABIP is that it’s probably between .25 and .31. You need way more samples to distinguish at the level you’re talking about. And splitting the dataset even further among various groups is going to give you even larger sample errors (for each group).
As Tango said, these are very small samples for BABIP.
Eric, Interesting study.
One thing that I’m curious about is that you may be picking up some park to park biases in velocity that just happen to converge at 97 to give you wierd results. I would actually suggest that you run the same study again using final velocity of the pitch rather than initial velocity, as across ballparks, final velocity seems to be much more stable than initial velocity.
Ike, interesting idea. I’ll definitely keep that in mind for when I re-explore this in a few months. For now, though, it seems that it’s a sample error at 97+. From 92-96 we seem to be safe but 97+ is small to the point that even when we combine 97 and 98+, the balls in play total is still less than 96 on its own.
I’m putting something up tonight recapping what we discussed here and to the effect of what I just wrote, discussing possibilities for exploring this in the future such as what you just mentioned, Ike.
ANG: I had looked at it myself quickly several months ago. I talked about it on my blog. Studes at THT also mentioned it on his site I believe. It wouldn’t take too much effort to look at career BABIP to confirm it.