Therefore, I thought it would be interesting to see the results of position players who take the mound. Certainly, they fit the criteria that we want in a control group. For one, they must certainly be worse (though, it’s possible that they are better than minor league pitchers) than minor league pitchers. Second, they pass the “scout selection bias” that goes along with pitchers who make the major leagues, that they were not selected by scouts or player analysis experts to play in the majors. Though, it should be noted that many of these pitchers do have some sort of pitching experience, and should have enough athletic ability to post good velocities. In addition, they are selected by their managers as competent pitchers. Either way, it is a reasonable assumption that these pitchers are far worse than their major league counterparts and that they do not fall under the “attrition” bias, that their poor performance will shut them out of the league, as happens with many players with poor debuts.

Alas, let’s get on to the results. The sample was taken from all player seasons in the last 15 years, where pitchers threw fewer than 10 innings and played as a position player for more than 50 games. The table is compiled at the end of the page and was derived from statistics at the Baseball Databank. The total sample comprised 54 innings.

I’ll leave the results here then let you guys talk it over.

First, the BABIP. I still think that I may have totaled the number of balls in play wrong, so I’d love for someone else to check it for me. However, the total BABIP for the sample was .269. This was especially intriguing given that it was actually lower than the standard .300. I was hoping to see a number in the upper .300s, which would mean that there would be a spectrum of BABIPs that could include the results of lesser pitchers. It’s still possible that there is such a spectrum. However, this study did not lend evidence to this effect.

Second, was the relative skill of the pitchers. Don’t fear, just because the BABIP didn’t pan out as expected doesn’t mean that the rest of the numbers didn’t as well. First, the pitchers compiled a total 7.33 ERA, with a 7.66 BB/9 rate and 4.0 K/9 rate. These results were a little surprising, as I expected the ERA to be much higher than 7.33, at some place in the teens. In addition, I thought that the K rate would be much lower, as I didn’t think that MLB hitters struck out against position players at such a frequency. Maybe it isn’t so embarrassing to be retired via the K by a non-pitcher, or maybe players should just be embarrassed every time they are K’d by Carlos Silva.

Without fly ball data, I was unable to assess the HR/FB rates. However, they were not all that high, as 9 home runs were registered in 178 balls in play. However, without fly ball data, it is difficult to say the effect. However, if we guess and say that 37.07 percent of BIP were fly balls (for a total of 66 fly balls), this means that 9/ 66+9 balls left the yard, or 12 percent of fly balls – just 1-2 percent worse than the league average for MLB pitchers. Strange.

With such a small sample size, it is yard to pull any concrete results from the data. However, it does seem to lend evidence against the notion that there is a BABIP and HR/FB selection bias against major league pitchers.

Beyond that, I’ll let you readers discuss.

Here are the sums of the data:

BIP: 178 H on BIP: 48 BABIP .26966

HBP: 6 H: 57 IPouts: 162

BFP: 263 HR: 9 BB: 46

SO: 24 IBB: 0 ER: 44

IP: 54 K/9: 4.0 ERA: 7.3333

BB/9: 7.666

And, one last note, I removed Rick Ankiel from the results, as he was formerly an accomplished pitcher, but still crept into the query.

playerID | playerID | G | HBP | H | IPouts | BFP | HR | BB | SO | IBB | ER | G_batting | AB | yearID |

alexama02 | alexama02 | 1 | 0 | 1 | 2 | 7 | 1 | 4 | 0 | 0 | 5 | 54 | 149 | 1997 |

bellde01 | bellde01 | 1 | 0 | 3 | 3 | 10 | 0 | 3 | 0 | 0 | 4 | 158 | 627 | 1996 |

benjami01 | benjami01 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 35 | 103 | 1996 |

bogarti01 | bogarti01 | 2 | 0 | 2 | 6 | 9 | 1 | 1 | 1 | 0 | 1 | 97 | 241 | 1997 |

boggswa01 | boggswa01 | 1 | 0 | 0 | 3 | 4 | 0 | 1 | 1 | 0 | 0 | 132 | 501 | 1996 |

bonilbo01 | bonilbo01 | 1 | 0 | 3 | 3 | 6 | 1 | 1 | 0 | 0 | 2 | 159 | 595 | 1996 |

burkeja02 | burkeja02 | 1 | 0 | 1 | 3 | 4 | 0 | 0 | 0 | 0 | 1 | 57 | 120 | 2004 |

burrose01 | burrose01 | 1 | 0 | 4 | 3 | 7 | 1 | 0 | 0 | 0 | 3 | 63 | 192 | 2002 |

cangejo01 | cangejo01 | 1 | 0 | 1 | 6 | 7 | 0 | 0 | 0 | 0 | 0 | 108 | 262 | 1996 |

cansejo01 | cansejo01 | 1 | 0 | 2 | 3 | 8 | 0 | 3 | 0 | 0 | 3 | 96 | 360 | 1996 |

cirilje01 | cirilje01 | 1 | 0 | 0 | 3 | 5 | 0 | 2 | 1 | 0 | 0 | 158 | 566 | 1996 |

davisch01 | davisch01 | 1 | 1 | 0 | 6 | 7 | 0 | 0 | 0 | 0 | 0 | 145 | 530 | 1996 |

durritr01 | durritr01 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 43 | 122 | 1999 |

espinal01 | espinal01 | 1 | 0 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 59 | 112 | 1996 |

finlest01 | finlest01 | 1 | 1 | 0 | 3 | 4 | 0 | 1 | 0 | 0 | 0 | 161 | 655 | 1996 |

francma01 | francma01 | 2 | 0 | 3 | 4 | 10 | 1 | 3 | 2 | 0 | 2 | 112 | 163 | 1997 |

gaettga01 | gaettga01 | 1 | 1 | 1 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 141 | 522 | 1996 |

giovaed01 | giovaed01 | 1 | 0 | 1 | 4 | 7 | 0 | 2 | 0 | 0 | 0 | 92 | 139 | 1998 |

gonzawi01 | gonzawi01 | 1 | 0 | 0 | 3 | 4 | 0 | 1 | 0 | 0 | 0 | 95 | 284 | 2000 |

gracema01 | gracema01 | 1 | 0 | 1 | 3 | 4 | 1 | 0 | 0 | 0 | 1 | 142 | 547 | 1996 |

haltesh01 | haltesh01 | 1 | 0 | 1 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 74 | 123 | 1997 |

harrile01 | harrile01 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 1 | 0 | 0 | 125 | 302 | 1996 |

howarda02 | howarda02 | 1 | 0 | 2 | 6 | 12 | 0 | 5 | 0 | 0 | 1 | 143 | 420 | 1996 |

jacksda03 | jacksda03 | 1 | 0 | 3 | 6 | 10 | 0 | 2 | 0 | 0 | 2 | 49 | 130 | 1997 |

jimenda01 | jimenda01 | 1 | 0 | 0 | 4 | 4 | 0 | 0 | 0 | 0 | 0 | 86 | 308 | 2001 |

lakerti01 | lakerti01 | 1 | 0 | 1 | 3 | 5 | 0 | 1 | 1 | 0 | 0 | 52 | 162 | 2003 |

loretma01 | loretma01 | 1 | 0 | 1 | 3 | 5 | 0 | 1 | 2 | 0 | 0 | 73 | 154 | 1996 |

mabryjo01 | mabryjo01 | 1 | 0 | 3 | 2 | 6 | 0 | 1 | 0 | 0 | 2 | 151 | 543 | 1996 |

martida01 | martida01 | 1 | 0 | 2 | 1 | 5 | 0 | 2 | 0 | 0 | 2 | 146 | 440 | 1996 |

maynebr01 | maynebr01 | 1 | 0 | 1 | 3 | 5 | 0 | 1 | 0 | 0 | 0 | 85 | 256 | 1997 |

mccarda01 | mccarda01 | 3 | 0 | 2 | 11 | 14 | 0 | 1 | 4 | 0 | 1 | 91 | 175 | 1996 |

menecfr01 | menecfr01 | 1 | 0 | 6 | 3 | 8 | 1 | 0 | 0 | 0 | 4 | 66 | 145 | 2000 |

milesaa01 | milesaa01 | 2 | 1 | 3 | 6 | 9 | 1 | 0 | 0 | 0 | 2 | 134 | 522 | 2004 |

nunezab01 | nunezab01 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 90 | 259 | 1999 |

ojedaau01 | ojedaau01 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 78 | 144 | 2001 |

oneilpa01 | oneilpa01 | 1 | 0 | 2 | 6 | 11 | 1 | 4 | 2 | 0 | 3 | 150 | 546 | 1996 |

osikke01 | osikke01 | 1 | 1 | 2 | 3 | 8 | 0 | 2 | 1 | 0 | 4 | 48 | 140 | 1996 |

penato02 | penato02 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 1 | 0 | 0 | 152 | 509 | 2007 |

perezto03 | perezto03 | 1 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 91 | 295 | 1996 |

relafde01 | relafde01 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 1 | 0 | 0 | 142 | 494 | 1998 |

seitzke01 | seitzke01 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 132 | 490 | 1996 |

sheldsc01 | sheldsc01 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 58 | 124 | 2000 |

spiezsc01 | spiezsc01 | 1 | 0 | 0 | 3 | 4 | 0 | 1 | 0 | 0 | 0 | 147 | 538 | 1997 |

venturo01 | venturo01 | 1 | 0 | 1 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 158 | 586 | 1996 |

wallati01 | wallati01 | 1 | 0 | 1 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 57 | 190 | 1996 |

whitema01 | whitema01 | 1 | 1 | 1 | 3 | 7 | 0 | 2 | 3 | 0 | 1 | 40 | 140 | 1996 |

wilsojo03 | wilsojo03 | 1 | 0 | 1 | 3 | 5 | 0 | 1 | 0 | 0 | 0 | 90 | 263 | 2007 |

woodja02 | woodja02 | 1 | 0 | 0 | 3 | 3 | 0 | 0 | 0 | 0 | 0 | 98 | 117 | 2007 |

zeileto01 | zeileto01 | 1 | 0 | 1 | 3 | 3 | 0 | 0 | 1 | 0 | 0 | 29 | 117 | 1996 |

In order to get a good sense of what we are dealing with, we should see how well these batted ball descriptions correlate with BABIP. Therefore, I took a sample of all qualified 2008 starting pitchers and made a regression equation to compare batted balls to BABIP. The results were not particularly encouraging.

Here’s the equation:

Pitcher BABIP = 1.90 – 1.11 LD% – 1.67 FB% – 1.75 GB% – 0.144 IFFB%

The R-Squared of this equation was 0.352. Unfortunately, this is a moderate to weak correlation. In other correlations, such as trying to find the relationship between break and curve ball success or count versus BABIP, we may be happy with this result. However, with the importance placed on batted ball data, especially when analyzing pitchers, this shows that the current classifications are inadequate.

Another important factor to remember is defense. Every defense influences the pitchers that throw in front of it. Therefore, we should test this equation while accounting for defense, to see if we can bring the correlation anywhere closer to a linear trend.

Here is the regression equation:

Pitcher BABIP = 1.06 + 0.616 Team BABIP – 0.51 LD% – 1.01 FB% – 1.09 GB%

– 0.102 IFFB%

R-Squared: .418

Again, there is only a moderate correlation, as even factoring defense into the equation raised the linear trend only marginally.

As we are on the eve of the availability of Hit F/X data, hopefully these points will become moot. Until then, be sure to take batted ball tendencies of pitchers with a grain of salt when making inferences on BABIP.

]]>Anyway, here are the actual results for fastball velocity, particularly swing and miss percentage and foul balls.

The sample includes every fastball that was swung at during the 2008 season, broken down into velocities of 85 mph and up. This yielded a sample of 38108 events. There were a number of interesting trends, particularly the correlation coefficients of fastball velocity relating to swing&miss percentage, and foul ball percentage.

To me, the foul ball percentage was the most interesting, but I’ll let you decide.

Below is a description of the data, with velocity in the first column, followed by the percentage of all swings at each velocity according to: Swing&Miss Percentage, the Foul Percentage, then Foul Tip Percentage, then In Play Percentage, followed lastly by the number of total events at each velocity.

Part 1: Velocity versus Swing and Miss Percentage

This one was no big surprise. In essence, the higher your velocity is, the more swings and misses you get.

The correlation coefficient for this data set was 0.89. Therefore, there is a strong linear relationship between the velocity thrown and the percentage of swings and misses at the pitch. One thing to notice, however, is that this graph is not completely linear. At the velocity gets above 95, especially with the point at 98 (which, granted, has a small sample size), the graph becomes non-linear, with what looks like an exponential relationship. Therefore, it gets exceedingly harder to make contact with a pitch that is going that additional mile per hour.

As a result, this also causes a lower value of the correlation coefficient, even though the graph has a clear upward trend. Remember, a correlation coefficient is a measure of *linear *relation. Therefore, when the graph is exponential, the linear relation will be less.

Still, no surprises, as this was expected.

Part II: Foul Ball Rates

This graph was particularly surprising. Maybe because I never have really given it much thought, but I didn’t think I would find such an interesting trend. Here’s the graph:

For the rate of foul balls per swing, there is a clear upward, linear trend until 95 mph, where the graph falls at a pretty steep rate.

Foul balls are one of the last unexplored realms of baseball statistical analysis. Hopefully Hit F/X will be able to give us some useful data, but until then, I’ll be waiting. Also, why do we only measure foul balls when they are caught by a fielder? Otherwise, they wouldn’t even be counted as a ball in play. There’s a lot we can learn about the batter-pitcher interaction by foul balls, but there is very little information out there. It would be a great leap forward if there were some good studies on foul ball data.

But, back to the graph. There isn’t a strong linear trend on the graph because of its parabolic shape. However, the correlation coefficient between 85 and 95 mph is .97949, which is an incredibly strong correlation.

This is a very important point when analyzing the success of soft-tossing pitchers. For pitchers who throw at low velocities, it is important to note that by getting fewer fouls, they are essentially giving away free strikes. These batted balls become balls in play, while for pitchers at higher velocities, the batter now has one additional strike on them, with a great chance for a strikeout. Besides the low swing and miss totals, these low-velocity pitchers have fewer strikes in their favor.

As to why there is a sharp downward trend in the data after 95 mph, I’m not totally sure as to why, though I do have a hypothesis. One, is to think of the graph not in terms of foul or non-foul, but in terms of being late on a pitch. While some of these fouls are going to be pulled, the fact that it is dictated by velocity means that the ones affected by velocity are those that the batter is late on. Therefore, as the velocity goes up, the batter will be late on the pitch to a greater degree. As a result, when the batter gets beyond 95 mph, they are no longer late and fouling off the pitch, but they are late for a swing and miss. This probably has something to do with the exponential increase in swing and misses for high velocities.

This may not change the end result of the at-bat too much, as a strike is still a strike whether its a whiff or a foul; though, higher velocities will have more 2 strike swing and misses (for a K), while lower velocities have longer 2-strike at-bats, due to the at-bat staying alive. The lower velocities will probably have more foul-outs as a result, however.

Part 3: Ball In Play Rate

This last graph shows the rate of balls in play per swing at each velocity. Again, the data is about where we’d expect it, as its harder to put a ball in play at a higher velocity. This speaks volumes as to why low-velocity pitchers struggle in the majors: if the batters can put your stuff in play more often, there are more chances for hits, and fewer for free outs (strikeouts). This one follows common logic: the faster the velocity, the fewer balls in play per swing.

The graph follows a very consistent linear trend from 85 to 97 mph, then drops suddenly at 98+ mph. It is difficult to say why there is a sudden drop, as it could be due to small sample size or due to the fact that they’re just so hard to make contact with at those speeds. It may very well be a mix of both, though the fact that even 97 mph is within the linear trend makes me believe there is a significant sample size component to this issue.

From 85 mph to 97 mph, the correlation coefficient is -0.977, which is another very, very strong correlation. The fact that there is a correlation is not surprising, though the strength of the correlation is quite shocking. I didn’t expect there to be such a substantial correlation.

This study brings about some very interesting trends, as the strength of these correlations are very strong. In particular, the relationship between velocity and foul balls (which is probably causal, velocity causing foul ball percentage for the reason explained) is particularly interesting, especially because the issue is rarely discussed. I think this could give us some insight as to the relationship between velocity and pop-up rate, as pop-ups are generally thought to be the result of being late on a pitch, particularly on inside pitches, where its hard to get the bat head to the ball on time.

In the end, the data seem to back up the reasons why it is so hard to succeed in the MLB without fastball velocity: low-velocity means fewer Ks, more balls in play. I’ll do more research on this, and I hope to post more next time.

*Thanks to TheHardballTimes.com for their contributions to this article.*

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. Aside from StatSpeak, you can find Mike at TheHardballTimes.com and FireBrandAL.com. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

]]>Anyway, here are the actual results for fastball velocity, particularly swing and miss percentage and foul balls.

The sample includes every fastball that was swung at during the 2008 season, broken down into velocities of 85 mph and up. This yielded a sample of 38108 events. There were a number of interesting trends, particularly the correlation coefficients of fastball velocity relating to swing&miss percentage, and foul ball percentage.

To me, the foul ball percentage was the most interesting, but I’ll let you decide.

Below is a description of the data, with velocity in the first column, followed by the percentage of all swings at each velocity according to: Swing&Miss Percentage, the Foul Percentage, then Foul Tip Percentage, then In Play Percentage, followed lastly by the number of total events at each velocity.

Part 1: Velocity versus Swing and Miss Percentage

This one was no big surprise. In essence, the higher your velocity is, the more swings and misses you get.

The correlation coefficient for this data set was 0.89. Therefore, there is a strong linear relationship between the velocity thrown and the percentage of swings and misses at the pitch. One thing to notice, however, is that this graph is not completely linear. At the velocity gets above 95, especially with the point at 98 (which, granted, has a small sample size), the graph becomes non-linear, with what looks like an exponential relationship. Therefore, it gets exceedingly harder to make contact with a pitch that is going that additional mile per hour.

As a result, this also causes a lower value of the correlation coefficient, even though the graph has a clear upward trend. Remember, a correlation coefficient is a measure of *linear *relation. Therefore, when the graph is exponential, the linear relation will be less.

Still, no surprises, as this was expected.

Part II: Foul Ball Rates

This graph was particularly surprising. Maybe because I never have really given it much thought, but I didn’t think I would find such an interesting trend. Here’s the graph:

For the rate of foul balls per swing, there is a clear upward, linear trend until 95 mph, where the graph falls at a pretty steep rate.

Foul balls are one of the last unexplored realms of baseball statistical analysis. Hopefully Hit F/X will be able to give us some useful data, but until then, I’ll be waiting. Also, why do we only measure foul balls when they are caught by a fielder? Otherwise, they wouldn’t even be counted as a ball in play. There’s a lot we can learn about the batter-pitcher interaction by foul balls, but there is very little information out there. It would be a great leap forward if there were some good studies on foul ball data.

But, back to the graph. There isn’t a strong linear trend on the graph because of its parabolic shape. However, the correlation coefficient between 85 and 95 mph is .97949, which is an incredibly strong correlation.

This is a very important point when analyzing the success of soft-tossing pitchers. For pitchers who throw at low velocities, it is important to note that by getting fewer fouls, they are essentially giving away free strikes. These batted balls become balls in play, while for pitchers at higher velocities, the batter now has one additional strike on them, with a great chance for a strikeout. Besides the low swing and miss totals, these low-velocity pitchers have fewer strikes in their favor.

As to why there is a sharp downward trend in the data after 95 mph, I’m not totally sure as to why, though I do have a hypothesis. One, is to think of the graph not in terms of foul or non-foul, but in terms of being late on a pitch. While some of these fouls are going to be pulled, the fact that it is dictated by velocity means that the ones affected by velocity are those that the batter is late on. Therefore, as the velocity goes up, the batter will be late on the pitch to a greater degree. As a result, when the batter gets beyond 95 mph, they are no longer late and fouling off the pitch, but they are late for a swing and miss. This probably has something to do with the exponential increase in swing and misses for high velocities.

This may not change the end result of the at-bat too much, as a strike is still a strike whether its a whiff or a foul; though, higher velocities will have more 2 strike swing and misses (for a K), while lower velocities have longer 2-strike at-bats, due to the at-bat staying alive. The lower velocities will probably have more foul-outs as a result, however.

Part 3: Ball In Play Rate

This last graph shows the rate of balls in play per swing at each velocity. Again, the data is about where we’d expect it, as its harder to put a ball in play at a higher velocity. This speaks volumes as to why low-velocity pitchers struggle in the majors: if the batters can put your stuff in play more often, there are more chances for hits, and fewer for free outs (strikeouts). This one follows common logic: the faster the velocity, the fewer balls in play per swing.

The graph follows a very consistent linear trend from 85 to 97 mph, then drops suddenly at 98+ mph. It is difficult to say why there is a sudden drop, as it could be due to small sample size or due to the fact that they’re just so hard to make contact with at those speeds. It may very well be a mix of both, though the fact that even 97 mph is within the linear trend makes me believe there is a significant sample size component to this issue.

From 85 mph to 97 mph, the correlation coefficient is -0.977, which is another very, very strong correlation. The fact that there is a correlation is not surprising, though the strength of the correlation is quite shocking. I didn’t expect there to be such a substantial correlation.

This study brings about some very interesting trends, as the strength of these correlations are very strong. In particular, the relationship between velocity and foul balls (which is probably causal, velocity causing foul ball percentage for the reason explained) is particularly interesting, especially because the issue is rarely discussed. I think this could give us some insight as to the relationship between velocity and pop-up rate, as pop-ups are generally thought to be the result of being late on a pitch, particularly on inside pitches, where its hard to get the bat head to the ball on time.

In the end, the data seem to back up the reasons why it is so hard to succeed in the MLB without fastball velocity: low-velocity means fewer Ks, more balls in play. I’ll do more research on this, and I hope to post more next time.

*Thanks to TheHardballTimes.com for their contributions to this article.*

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. Aside from StatSpeak, you can find Mike at TheHardballTimes.com and FireBrandAL.com. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

]]>Today we’ll look at some strong correlations and the regression equation I produced for projecting strikeouts.

First, here’s the graph for actual strikeouts versus the regression equation. It’s R-squared is .84.

It was generated using Fangraphs.com’s plate discipline statistics. There are a few interesting points. The two I liked most were the highest expected strikeouts (C.C. Sabathia: .2659 exp. K percentage, versus .2455 actual K%) and the highest actual strikeouts (Tim Lincecum, .2399 expected K%, .2858 actual K%). Lincecum’s outlandish strikeout totals make him an easy pick for an outlier or a player with substantial error.

However, I have a suspicion that the regression line for this problem would fit better as a non-linear relation.

Here are some other correlations that were pulled out of the data. Each correlation is the R value between the given variable and actual strikeout percentage.

Contact Percentage: -0.869

This one took the cake… and not surprisingly, either. If you miss bats, you will get lots of strikeouts. No surprise here.

Swing Percentage: 0.177

This one was a little surprising. I expected the correlation to be much stronger. If you swing more, you will make contact in more at-bats. The correlation is still there, but it is very weak. I would like to investigate this one a little more.

Zone Percentage: 0.065

To me, this one was shocking. I expected there to be at least some meaningful correlation between zone percentage and strikeout percentage. However, if seems that, for the range of zone percentage thrown among MLB pitchers, there is no correlation. Of course, a pitcher who never throws strikes will never strike anyone out, but, for the range that MLB players throw strikes, it makes no difference. If you want to avoid BBs, pound the zone. If you want strikeotus, I guess it doesn’t matter much.

O-Swing: 0.323

This was another surprising development. I expected this to be much higher: if you can get a batter to chase pitches, he’ll miss more often. The logic is certainly true, but, again, for the range of values among MLB pitchers, there is a weak correlation. Don’t get me wrong, it does matter, just not as much as I had expected.

ZSwing %: -0.052

Another surprise. This may have further implications to the ability of pitchers to get called strikes. The amount that a hitter swings in the zone matters very little and is negligible. Wouldn’t it seem that a hitter who swung a lot in the zone would make contact more and strike out less? Guess not. Funny how things are sometimes.

That’s all for now. Next time, we’ll continue this study of plate discipline statistics

*Thanks to Fangraphs.com for their contributions to this article.*

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

]]>To do this, I compared Jeff Kent to all the 2b in the Hall currently. Below are tables that feature WAR, WAR/150, OBP, SLG, wOBA, and EqA. The tables also feature the average of each stat for the selected pool of players **NOT INCLUDING** Kent.

Player |
WAR |

B. Mazeroski |
27 |

R. Schoendiest |
40.3 |

N. Fox |
44.6 |

B. Doerr |
48 |

T. Lazzeri |
48.1 |

J. Evers |
48.3 |

J. Gordon |
54.9 |

B. Herman |
55.5 |

B. McPhee |
57.8 |

R. Sandberg |
61.8 |

J. Robinson |
63 |

F. Frisch |
74.7 |

R. Carew |
79.3 |

C. Gehringer |
80.9 |

J. Morgan |
103.5 |

N. Lajoie |
104.1 |

E. Collins |
125.7 |

R. Hornsby |
127.7 |

Average |
69.2 |

J. Kent |
59.4 |

Player |
WAR/150 |

B. Mazeroski |
1.9 |

R. Schoendiest |
2.7 |

N. Fox |
2.8 |

B. Doerr |
3.9 |

T. Lazzeri |
4.1 |

J. Evers |
4.1 |

B. McPhee |
4.1 |

B. Herman |
4.3 |

R. Sandberg |
4.3 |

R. Carew |
4.8 |

F. Frisch |
4.9 |

C. Gehringer |
5.2 |

J. Gordon |
5.3 |

J. Morgan |
5.7 |

N. Lajoie |
6.3 |

E. Collins |
6.7 |

J. Robinson |
6.8 |

R. Hornsby |
8.5 |

Average |
4.8 |

J. Kent |
3.9 |

Player |
OBP |

B. Mazeroski |
0.299 |

R. Schoendiest |
0.337 |

R. Sandberg |
0.344 |

N. Fox |
0.347 |

B. McPhee |
0.355 |

J. Evers |
0.356 |

J. Gordon |
0.357 |

B. Doerr |
0.362 |

B. Herman |
0.367 |

F. Frisch |
0.369 |

T. Lazzeri |
0.38 |

N. Lajoie |
0.38 |

J. Morgan |
0.392 |

R. Carew |
0.393 |

C. Gehringer |
0.404 |

J. Robinson |
0.409 |

E. Collins |
0.424 |

R. Hornsby |
0.434 |

Average |
0.373 |

J. Kent |
0.356 |

Player |
SLG |

J. Evers |
0.334 |

N. Fox |
0.363 |

B. Mazeroski |
0.367 |

B. McPhee |
0.372 |

R. Schoendiest |
0.387 |

B. Herman |
0.407 |

J. Morgan |
0.427 |

R. Carew |
0.429 |

E. Collins |
0.429 |

F. Frisch |
0.432 |

R. Sandberg |
0.452 |

B. Doerr |
0.461 |

J. Gordon |
0.466 |

T. Lazzeri |
0.467 |

N. Lajoie |
0.467 |

J. Robinson |
0.474 |

C. Gehringer |
0.48 |

R. Hornsby |
0.577 |

Average |
0.433 |

J. Kent |
0.5 |

Player |
wOba |

B. Mazeroski |
0.293 |

N. Fox |
0.324 |

R. Schoendiest |
0.334 |

J. Evers |
0.344 |

B. McPhee |
0.354 |

R. Sandberg |
0.355 |

B. Herman |
0.362 |

R. Carew |
0.37 |

B. Doerr |
0.377 |

J. Gordon |
0.377 |

F. Frisch |
0.377 |

J. Morgan |
0.382 |

T. Lazzeri |
0.386 |

N. Lajoie |
0.399 |

C. Gehringer |
0.404 |

J. Robinson |
0.412 |

E. Collins |
0.414 |

R. Hornsby |
0.459 |

Average |
0.374 |

J. Kent |
0.366 |

Player |
EqA |

B. Mazeroski |
0.244 |

N. Fox |
0.253 |

R. Schoendiest |
0.258 |

B. McPhee |
0.258 |

J. Evers |
0.269 |

F. Frisch |
0.274 |

B. Doerr |
0.278 |

T. Lazzeri |
0.284 |

B. Herman |
0.284 |

R. Sandberg |
0.284 |

J. Gordon |
0.287 |

C. Gehringer |
0.289 |

R. Carew |
0.301 |

J. Robinson |
0.308 |

N. Lajoie |
0.309 |

E. Collins |
0.311 |

J. Morgan |
0.314 |

R. Hornsby |
0.337 |

Average |
0.286 |

J. Kent |
0.292 |

According to Tom Tango the odds of someone making the HOF when their WAR is in the 50’s is 49%. Jeff Kent is sitting on a WAR of 59.4 If you vote based on precedent, Kent is obviously a HOF’er as Mazeroski and Schoendiest are in the HOF. If you vote on production, Kent should fall just short. In reality, he has a 50/50 chance, as Tango’s odds show. While he compares favorably to the pool in respect to SLG and EqA, he is below average when it comes to WAR, WAR/150, wOBA, and OBP.

The HOF is a museum where baseball greats are recognized (which is why I feel Bonds, Rose, etc should be allowed in). Considering Kent retired as the all-time leader among 2b in HR’s, he should be voted in and recognized for that accomplishment, even if his production is not enough to get in.

Same rules as last time. All 2008 qualified batters, linear regression.

The results were pretty good, though not as accurate as last time. The r-squared of the equation is .7349, which, while good, isn’t quite as accurate as estimating batter strikeouts. I’m sure part of this is intentional walks, which wouldn’t likely have anything to do with swing%, zone%, or contact rate, but that is a topic for another day. Let’s look at some of the results.

More than Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Albert Pujols | 0.415 | 0.471 | 0.901 | 0.166 | 0.125 | 0.041 | |

Pat Burrell | 0.42 | 0.497 | 0.813 | 0.16 | 0.121 | 0.039 | |

BJ Upton | 0.404 | 0.511 | 0.805 | 0.154 | 0.118 | 0.036 |

I did not say “lucky” on this one for a couple reasons. First, since intentional walks are undoubtedly going to be a part of the error margin, I don’t feel that “luck” is the appropriate term, as we use it so much in the statistics community. Second, while an r-squared of over .7 is certainly a good one, there are a number of other factors to be analyzed as well.

Worse Than Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Garrett Anderson | 0.487 | 0.484 | 0.828 | 0.049 | 0.092 | -0.043 | |

Aubrey Huff | 0.433 | 0.483 | 0.848 | 0.081 | 0.116 | -0.035 | |

Jeremy Hermida | 0.435 | 0.492 | 0.778 | 0.087 | 0.121 | -0.034 |

Close to Expected

Swing% | Zone% | Contact% | ActualBB% | Pred BB% | Difference | ||

Adam Jones | 0.535 | 0.527 | 0.769 | 0.046 | 0.0464 | -0.0004 | |

Brian McCann | 0.464 | 0.481 | 0.855 | 0.101 | 0.10127 | -0.00027 | |

Josh Hamilton | 0.555 | 0.453 | 0.741 | 0.093 | 0.093057 | -5.7E-05 |

While there is still some work to be done, most importantly, that with intentional walks, the model is fairly accurate for a basic linear model. Next time, we will see how accurate such a regression formula is with pitchers.

*Thanks to Fangraphs.com for their contributions to this article*.

Regression calculations performed by:

**Wessa, P. (2009), Free Statistics Software, Office for Research Development and Education,version 1.1.23-r4, URL http://www.wessa.net/**

Mike Silver recently completed his requirements for the Sport Management Major at THE University of Massachusetts-Amherst, where he is a brother of Theta Chapter of Theta Chi Fraternity, the best house in the country. He is a huge Red Sox and Bruins fan, and longs for the days of the REAL Boston Garden, Cam Neely, and the ultimate Dirt Dog Trot Nixon. If you have any questions, you can reach him at mjasilver@gmail.com. Have a good night readers, and know that Mike hopes to hear from you soon. If you quote Mike in an article, please let him know. He’d love to hear it.

]]>