Breaking Down the Heater

Back on December 20th, John Walsh wrote a very interesting article at The Hardball Times, taking everything recorded by the Pitch F/X system in 2007 and, amongst others, calculating the average velocity, horizontal movement, and vertical movement for the four major pitches: fastball, curveball, slider, and changeup.  The results showed that the average fastball clocked in at 91 mph with -6.2 inches of horizontal movement and 8.9 inches of vertical movement.  The author acknowledged that he did not differentiate between four-seamers, two-seamers, and cutters, but rather lumped them all together in determining the averages; two-seamers and cutters differ in velocity and movement components from four-seamers.

While I plan on calculating the averages for all different sub-groupings of pitches at some point, what recently piqued my interest was finding the averages for different velocity groupings.  As in, what is the average horizontal movement for all 94 mph fastballs?  Or, the BABIP for 98 mph fastballs? 
With that knowledge we could effectively compare certain pitchers to the means of their velocity grouping rather than overall averages of every grouping.  Instead of comparing, say, Edwin Jackson’s 94 mph fastball to a group including those who throw slower, we can compare him to his “peers.” 
I started at 92 mph and queried my database for groupings (92-92.99, 93-93.99, etc) all the way up until 98+ mph.  I figured 92 mph would be a solid starting point since the sample size would be extraordinarily large–large enough for four-seamers to overcome the two-seamers and cutters that may inevitably sneak in.  Anything 98 mph or higher was grouped together to ensure a large enough sample since, as you will see below, the higher the velocity, the smaller the sample:

Velocity

Sample

%

92 mph

41,157

31.4

93 mph

33,368

25.5

94 mph

24,315

18.6

95 mph

16,586

12.7

96 mph

9,245

7.1

97 mph

4,236

3.2

>98 mph

2,018

1.5

All of the sample sizes here were large enough for analysis.  Even though the 98+ group appears to be 1/20th the size of the 92 mph group, that speaks more for the latter than against the former.
Next, how do the movement components look for each group?

Velocity

Horiz.

Vert.

92 mph

-6.34

9.24

93 mph

-6.28

9.51

94 mph

-6.16

9.80

95 mph

-5.98

10.07

96 mph

-5.84

10.23

97 mph

-5.89

10.41

>98 mph

-6.03

10.38

It should be fairly apparent that the tendency is for horizontal movement to decrease and vertical movement to increase as the velocity increases, at least through 96 mph.  At 97 mph, both movement components increase.  At 98+ mph, the vertical movement stays stagnant while the horizontal movement jumps quite a bit.
The next area to discuss includes B%, K%, HR%, and BABIP:

Velocity

B%

K%

HR%

BABIP

92 mph

35.9

44.6

0.65

.302

93 mph

36.3

45.1

0.55

.303

94 mph

35.5

45.9

0.55

.292

95 mph

35.8

46.4

0.76

.303

96 mph

35.2

47.0

0.54

.291

97 mph

36.1

46.8

0.41

.273

>98 mph

33.9

49.3

0.69

.293

The percentage of balls doesn’t move too much until its dip of over two percentage points at 98+ mph.  The amount of strikes, however, seems to increase.  There is no real discernible pattern in the home run percentages; the most came on 95 mph heaters while the least came on those registering 97 mph.

Speaking of the 97 mph group, notice anything odd?  Perhaps that their BABIP is .273, a full eighteen points below any other group?  Prior to getting the results I expected each group to fall somewhere in the .290-.310 range; that all of them did except the .273 struck me as very peculiar.

I spoke to several other analysts, all of whom initially mentioned small sample size syndrome, only to redact the assessment after learning the sample sizes in question.  The dropoff in home run percentage was tossed around, as well, since less home runs means more balls in play to be counted in the BABIP formula.  This is a “could be,” though, rather than a “definitely why.”  As was mentioned in these discussions, too, it could be nothing; perhaps there were more warning track flyballs that just missed leaving the yard as opposed to weaker hit balls.

Now, while the 4,236 pitches at 97 mph constitutes a large enough sample to analyze, the balls in play were not large enough yet to break into individual counts or locations.  When they do get big enough this could serve as a means of explanation; perhaps something in either or both does not jive with the other velocity groups.  Of those with significance, however, there was a .263 BABIP on 0-0 counts, and a .286 BABIP on pitches in the middle of the strike zone.

Pizza Cutter, or “The Master of Statistical Reliability” as I like to call him (yeah, a nickname for a nickname), suggested that BABIP is one of those stats that is super-unreliable, even with my large sample of pitches.  I did a split-half reliability test, randomly splitting the sample in half, and calculating the BABIP of each half.  For those unfamiliar, this serves to test the reliability of the sample; if it truly is large enough then no matter how we cut the sample in half we will have fairly convergent results.  If the results were wildly divergent then we are dealing with an unreliable sample.  The BABIPs of the two groups were .271 and .275, which essentially threw that idea out of the window.

Something interesting to consider was how, in each of these tables, all patterns seemed to stop when they reached 97 mph or higher.  The horizontal movement increased instead of its decreasing trend; vertical movement decreased after its increase at 97; the percentage of strikes ceased increasing; and home runs reached their low.  Could be something, could be nothing, but interesting nonetheless.

For now I am going to chalk this BABIP drop as an extreme random statistical variation and hope that you loyal readers out there might chime in with some more ideas to investigate.  Otherwise, though, when gauging the movement components, percentage of balls/strikes/home runs, or even BABIP, we can compare individual pitchers to their “like-minded” averages by velocity grouping.  If I get enough feedback involving different aspects to measure regarding these fastballs we will look at that soon, in the next day or two.  Otherwise, next week I have something similar to this, looking at BABIP by movement.

Advertisements

Playing the blame game with ground ball singles

I’m building something.  It’s something that I’ve been meaning to do for a while, which is a defense rating system.  In fact, I once defined defense as “something that every Sabermetrician has a system for measuring that he is ‘working on.’ “  I guess now I’m a proper Sabermetrician.
I’m not exactly the first person to tackle this one.  There’s the Fielding Bible with its lovely data from Baseball Info Solutions, which I use as my gold standard.  The problem is that those data are proprietary (read: expensive), and I’m a graduate student.  There are a few other systems that have caught my eye.  Shane Jensen and friends developed the Spatial Aggregate Fielding Evaluation (SAFE) system and they got mentioned in a few newspapers, mostly dismissively (for being far too nerdy — because the worst thing you can be in baseball is a nerd – and) for showing (again) that Derek Jeter isn’t a very good shortstop.  There are plenty of others, and listing them turns into a lovely alphabet soup (PMR, ZR, UZR, RZR, FRAA, DER, and of course the greatest fielding stat ever, fielding percentage)
But, the ancestry on my system traces back in part to my former colleague Sean Smith, who about a year ago here on StatSpeak introduced TotalZone (and here’s part 2 and his latest on the subject), which was a system based only on what was available from Retrosheet, where the data are the perfect price for a graduate student: free.  Dan Fox, formerly of Baseball Prospectus, now of the Pittsburgh Pirates, also went about the business of creating a Retrosheet-based system for fielding, which he called simple fielding runs.  But Sean’s gone from StatSpeak and Dan’s gone to that big front office in the sky… er, Pittsburgh.
So, here, I pick up the baton.  I need a Retrosheet compatable system that isn’t just a poor man’s rip off of the other systems.  (On the second, I fear that I shall fail miserably.)  And so I start with the ground ball.  It always ends up in someone’s glove.  Whether that glove is on the hand of an infielder, an outfielder, or the occasional fan is the question.  Usually, it’s a good thing if the man who fields a ground ball is an infielder rather than an outfielder, but who’s to blame if it gets through the infield?  Both Dan’s and Sean’s system assume that if a ground ball goes through to the left fielder, we can blame that half-and-half on the third baseman and the shortstop.  They do similar things for CF (50% the fault of the 2B, 50% the fault of the SS) and RF-bound ground balls.  But, does that stand up to the evidence?  I say no.
The problem, of course, with Retrosheet data is that it doesn’t have hit location data (or at least very much) for recent years, and so anyone wanting to know about fielding in the past few years is reduced to making assumptions like this (or buying the BIS data).  However… there is a little bit of data that can be exploited on Retrosheet.  Because RS bought their 93-98 data from somewhere else (Project Scoresheet?) the 93-98 data have hit locations!  They use the Project Scoresheet location system, which uses a series of vectors to code for where the ball was either fielded or where it went through the infield.  I tossed out all of the balls that didn’t make it to the infield skin.  The infielder will make it to the dribbler and the bunt, no doubt.  Whether or not that will be in time for them to be any use is another issue.  But, can the infielder get to the ball before it gets to the outfield is an important first question because it’s the first step in throwing the batter out.
The careful reader will have noted that I’m not talking about completing plays and making outs, only about getting to the ball.  First off, it plays into my system on a larger scale.  Secondly, I’m reminded of the old adage about why errors are a faulty stat in that an error means that the fielder did something good in at least getting to the ball.  An infield hit is better than an outfield hit, and in order to get an out on a ground ball, an infielder needs to get to the ball.  (Yeah, you see the occasional 9-3 putout… about as often as I see my cousins who live in Phoenix.  Hi, Mike and Steve!)  So, here I’m looking at the Retrosheet data which indicates by whom the ball was fielded.  Whether or not the play was completed is irrelevant… for now.
Here’s what I did.  I took the 1993-1998 data and built a huge data base of ground balls.  I coded for pitcher and batter handedness (it makes a diference!  This had been noted by the ever-reliable John Walsh some time ago.), and, if the ball went to the outfield for a hit, whether or not the hit that resulted was a single or an extra base hit.  Then, I looked at the spread of balls hit to each zone and who was fielding the balls where.  I tossed out all bunts and anything that didn’t at least make it to the infield skin.  I had ten zones to work with, which can be seen here on this diagram.  It’s not quite what the Fielding Bible does (they have 17 zones), but the Retrosheet’s data are free
Let’s look at a ground ball single that gets through to the left fielder in a righty-righty pitcher-batter matchup.  What zone was it usually hit to?  Most often, and fairly obviously, to the hole between short and third (84.1% of the time), a zone marked “56” by Retrosheet.  But, sometimes (7.0%), it went to the zone marked “5” (because that’s where the third baseman is usually standing), and sometimes (6.0%) to “6” and sometimes (2.2%) to “5L” (down the left field line) and sometimes (0.5%) to “6M” (up the middle, to the shortstop side of second base).  There are some weird entries in there that are probably data entry errors (a hit to left field that went through the hole between first and second?) that account for the rest of the numbers (if you add, that’s only 99.8%).  We can re-create the same database for all handedness-type of hit-fielded by combos.  In fact, I did.  Something to note is that right-handed hitters were more likely to pull the ball toward more third base-ward zones (and lefties to shortstop-ward zones).  The effects weren’t huge, but they’re far enough away from 50-50 to be notable.
Now, who’s in charge of each of those zones?  That’s easy enough to figure out.  When the ball is hit to each of the zones, and it doesn’t scoot through, which infielder usually is the one to field it?  Again, looking at our righty-righty matchup, we get the following splits.
Zone   SS got it   3B got it
5L       1.1%         98.6%
5         0.3%         98.8%
56       41.9%       57.4%
6         97.6%       1.2%
6M     88.1%       0.1%  (the second baseman and pitcher pick up the other 11.8%)
Again, note that a right-handed hitter pulled the ball closer to the third baseman (see zone 56).  The pattern was slightly reversed for lefties, although not as extreme.  Now, it’s a matter of simple multiplication to see what share of the blame each of the two fielders should get for a hit to left field.  Since 84.1% of GB singles to left from righty-righty matchups go to zone “56”, and they are 57.4% the responsibility of the third baseman, then he gets 48.2% of the blame for GB singles to left, plus whatever other responsibilites he gets from the other four zones we’re focusing on.  In fact, he ends up with 54.2% of the blame for a single to left, given a righty-righty matchup.  It’s not 50-50, although in fairness to the other systems, it’s close.  When looking at hits to center fielder, the pattern becomes a little more pronounced, with more of a 60-40 split to the shortstop for right-handed batters and to the second baseman for left-handed batters.  50-50 isn’t going to cut it.
For a full breakdown of who’s to blame given some other combos, click here.
In the Retrosheet years where we don’t have hit location data, and all we know is that a GB single went through to the left fielder, we at least now have a better idea of where to place the blame among the infielders.
A few caveats.  One is the obvious fact that I’m going to be using data from 1993-1998 and assuming that it still holds up 10-15 years later.  Indeed, I’ve shown that baseball players are getting bigger and that they are probably getting slower.  This could certainly affect range and I suppose could in turn affect those numbers.  The other is in extending this system.  Dan Fox, when originally developing SFR had in mind a system that could be applied to minor league data.  This system assumes that minor leaguers hit like major leaguers and have similar spray charts.  It may very well be the case, but without hard data, we have no way to know.

Who the heck is Chris Antonetti?

For those of you who were paying attention to this week’s World Famous StatSpeak Roundtable, Eric asked the question of who would be the first GM to be fired.  Eric’s got an odd knack for these things.  A few weeks ago, he asked the question of who would throw the first no-hitter of the year.  The next day, Jon Lester went out and did that.  This time, on Monday afternoon, right after the Roundtable came out, Bill Bavasi of the Seattle Mariners was told that he should find an alternate line of work.
The next question is who will be the next General Manager of the Seattle Mariners, and the Mariner faithful over at U.S.S. Mariner seem to have chosen their champion in Chris Antonetti.  Now, lest we get ahead of ourselves, no one in the Mariners organization has said anything about him publicly nor has Antonetti said anything about the Seattle job, and this is just one blog’s speculation.  But, the guys over at USSM (including previous roundtable guest Dave Cameron) are usually pretty spot-on with these things… and it does kinda make sense.  Read on.
Who is Antonetti?  He’s an assistant GM in Cleveland, charged mostly with the quantitative analysis and contract negotiations.  He’s one of the reasons that the Indians have been so quick to embrace quantitative analysis (i.e. Sabermetrics) in their decision-making process.  The exact details of his biography aren’t all that important right now, but the ones of greatest relevance are these.  Most GMs are former players.  They may not have been major leaguers, but most of them logged some time in the minors.  Antonetti did not.  In fact, he’s only 32, which makes him younger than some of the players whom he would generally manage.  Antonetti, instead, has an academic background, with an advanced degree in sports management.  Cleveland GM Mark Shapiro leans on him to work the numbers.  Word on the street is that he’s very good at what he does, and would probably make someone an outstanding GM.
If the press reports are to be believed (and I believe everything that the media tells me), Antonetti was heavily considered for the St. Louis Cardinals GM vacancy last winter, as well as the Pittsburgh job (which went to fellow Indians’ assistant GM Neal Huntington, who is also Saber-sympathetic).  Reports were that the Indians lured Antonetti away from taking one of those jobs by making him a well-compensated man and promising that he would eventually succeed current Indians GM Mark Shapiro in a few years.  But, as Derek over at USS Mariner points out in his plea for Antonetti to come to the Great Northwest, there’s a lot to be said for the Seattle position being a good fit for someone of Antonetti’s ilk.  Derek points out that in addition to the lovely Seattle culture (I still have all my Nirvana CDs), he’d be in a relatively low-stress setting media-wise with a big budget and a surrounding community with a lot of high-powered technologically minded people (think: Microsoft lives in Seattle).  I don’t know Chris Antonetti personally, and I don’t know if he has any interest in taking the job (speaking as an Indians fan, I hope not…), but he would seem to be a really good candidate.  He’s been a big part of the Indians taking a mid-market payroll and turning it into a contending team.  Imagine what he could do with a license to rebuild the team from the ground up and ownership that would actually push the payroll into nine digits.
But Antonetti is something more than just a hot assistant GM being mentioned as a possible candidate for a job.  What happens with the Seattle situation and whether or not they approach Antonetti is a measuring stick in how far the Sabermetric movement has come in being accepted in the mainstream of baseball culture.  Would a team that has had Bavasi, considered to be a traditionalist in his methods, as their GM and has stuck by him for as long as they have turn about and pick a guy who’s much more from the Sabermetric school?  It’s not like there aren’t Saber-friendly GMs out there.  (I think I read somewhere that Billy Beane was rather amenable to the idea.)  But, an Antonetti hire would begin to represent a critical mass of acceptance.  Suddenly, there would be a few stat-head GMs around (Beane, Theo Epstein, Shapiro, Huntington) and the last few GM hires in the game would have at least had serious candidates who were statheads.
So, the Sabermetrician in me sees this as a possible defining moment.  Maybe it’s just the fact that I was a skinny, nerdy kid who could hit, run, throw, or field.  (I think my friends all just thought in unison, “was?”)  But does the fact that those of us who weren’t physically gifted persevered in the game we loved by seeing the game through the prism of reason and intellect mean that we can’t have a seat at the decision-making table?  More and more the answer is becoming we can have that spot at the table, and I’m happy with how far the movement has come, but this feels like it would be a clincher.  The Mariners could actually send the message that baseball is ready for a statistical revolution, that no longer will they be afraid of guys with calculators who might challenge the accepted wisdom.  Baseball might actually move into the Enlightenment.  An amazing thought.
Whether or not Chris Antonetti gets the job, I hope that the Mariners make a lot of noise about wanting him.  It’s up to him whether he would even be receptive to such overtures, but if the Mariners make it a point to pursue him (and loudly), there’s a message in there.  The Indians fan in me hopes that Chris Antonetti is happy to stay in Cleveland and enjoy some of that lovely Cleveland culture (Rock and Roll Hall of Fame!  Drew Carey!  Midges!  Me!), so that he can use the oodles of talent that he has to keep the Indians contending.  But maybe, just maybe, for the good of dragging baseball, kicking and whining into something bigger, I can be convinced to let Chris Antonetti go.

World Famous StatSpeak Roundtable: June 16

The roundtable rolls on (thought of that one myself) this week to the doorstep of Dave Studeman, one of the fine folks over at The Hardball TimesYou’ve probably read Dave’s column, Ten Things I Didn’t Know Last Week, and if you haven’t, you are a bad human being.  Dave also runs the Baseball Graphs website.  Today, Dave joins Eric and Pizza to discuss the long-suffering Cubs, WPA, and which GM should be spending his Monday morning updating his resume.
Question #1: Is WPA a useful tool for ranking players?  In what instances?  If not, can it be improved?
Dave Studeman: People who read my work at the Hardball Times know that I’m a fan of Win Probability Added.  I think its inherent logic is deeply compelling.  It doesn’t measure “true talent” or even “production.”  But as a measure of value, it’s hard to beat.  As a quantification of the ups and downs of a game, it’s the best.  WPA is truly unique and endlessly fascinating.
However, there are a few things that could be done to improve WPA:

  • Measure WPA against a replacement level instead of average.  This is how you factor playing time into the measure, and I think it needs to be added to the discussion.  Do it right, too: make the replacement level differ between starting pitchers and relievers.
  • Find a way to add fielding to the mix.  Eric proposed “visual WPA” a month or two ago, and I’d love to see something like this occur.  In fact, there was a movement afoot to create such visual WPA a few years ago, but it never congealed into something useful.
  • For those who consider WPA a “junk stat,” pursue ways to combine WPA and other win-based systems, like Win Shares, to overcome their objections.  In particular, I hope to create a system in which the wins and losses of individual players adds up to the teams’ wins and losses — but that doesn’t have the “late inning emphasis” that WPA has.

Eric Seidman: I am a huge proponent of win probability added and the like, including WPA/LI, LI, and Clutch score.  They each offer something different with regards to the win contribution of the player(s) in question and something I really enjoy doing is looking at the three side by side–a WPA slash line of sorts–so we can compare a) overall contribution, b) contribution in a context-neutral setting, and c) whether or not said player raises his game in crucial situations.  For instance, Pat Burrell (at the time this answer was given) has a WPA slash of 4.22/2.80/1.24; he leads all of baseball with four and a quarter wins contributed, ranks third in contribution based solely on his performance and not the context in which that performance took place, and comes in fourth in terms of overall clutch score. 
Put together, it seems that few, if any, have been more valuable to their team in terms of contributing wins than Burrell.  Despite this, poll every somewhat sane fan on the planet, Phillies supporters or not, and Chase Utley is very likely to garner 95%+ of votes as far as who the Phillies MVP has been.  Utley’s WPA slash is 2.06/2.53/-0.52; his performance has contributed less than half of Burrell’s but, while he has not been very clutch, his context-neutral win contribution is similar to Pat’s.
Utley has been a better player than Burrell this year, though, based on just about every other type of metric and, because of that, I would have to say that WPA is not a tremendous evaluative tool UNLESS we are strictly measuring win contribution on a constantly updating player projection, similar to how we should evaluate players on any statistic.  I love using the stat to measure team-based win contributions as well, to see who has accounted for the highest percentages of their team’s victories.  Someone with a 4.25 WPA on a team with 50 wins will have a lesser percentage than someone with a 4.25 WPA on a team with 35 wins.
Pizza Cutter: WPA has its uses as a rough measure of worth, but of course, WPA is affected by leverage, so WPA/LI makes more sense to me as a tool for ranking batters.  This makes sure that players aren’t rewarded or punished for circumstances beyond their control (what’s the leverage at the given point that their turn to bat just happens to come around).  Pitchers have more control over the leverage of a situation and are the only member of the team guaranteed to be involved in every at-bat in the inning, but even then, pitchers get everything that happens in an inning attributed to them, even the obvious fielding errors.  Teasing apart who’s to blame (or credit) for the team in the field, whether the credit belongs to the fielder, pitcher, or simply the batter, is one of the great Sabermetric frontiers that people have only begun to venture into.
Read more of this post

Juuust A Bit Outside

On Thursday we took a look at the pitchers with the highest percentage of Pitch F/X-recorded pitches right down the middle of the plate.  I listed the top thirty out of the 165 pitchers with significant numbers and found that Ted Lilly of the Cubs has thrown the highest percentage; on top of that, the next pitcher on the list found himself relatively far off.  Today we are going to look at the opposite: The pitchers with the highest percentage of pitches outside the zone.
Now, outside the zone calls for four general parameters: very high, very low, outside to the left, and outside to the right.  I feel like I’m typing the Cha-Cha slide.
For now I am going to focus on the left/right parameters outside the strike zone, and we will explore high/low a bit later in the year as I have other ideas centering around those parameters.  As discussed previously, the strike zone on a general pitch location chart goes from -0.83 to 0.83 on the horizontal axis and 1.6 to 3.5 on the vertical axis.  To track pitches down the middle the axis numbers were set much smaller.  To track pitches outside the zone the horizontal axis numbers branch out in different directions.  For pitches outside to the left I set my database to give me all pitches with a PX (horizontal location in the data) less than -1.55 as well as greater than +1.55.
This provided me plenty of pitches to analyze but keep in mind that the data was not insanely consistent last year with regards to who gets recorded and where the recording takes place.  This year it has become more consistent and uniform but there may be data discrepancies due to some players having insufficient data.  For instance, Player A might be known to throw a ton of pitches out of the zone but, because the Pitch F/X system did not track many of his starts, he might not qualify. 
To help ensure the pitchers in the below leaderboard did not fall into this statistical fallacy, a minimum of 240 raw pitches was set.  That certainly whittled the list down.  The total tracked pitches were then recorded for all remaining pitchers, and they were then sorted by % instead of raw total.  Here are the top ten:
1) Livan Hernandez, 15.83%
2) Derek Lowe, 12.43%
3) Jake Peavy, 12.39%
4) Chad Gaudin, 11.56%
5) Braden Looper, 11.56%
6) John Smoltz, 11.40%
7) Jamie Moyer, 11.20%
8) Justin Germano, 11.01%
9) Jeff Francis, 10.73%
10) A.J. Burnett, 10.71%
I did not necessarily predict that Livan would be atop this leaderboard but, at the same time, it was not very surprising to find his name there, with a significant lead over the next pitcher nonetheless.  Moyer didn’t surprise me either as he’s a notorious “junkballer.”  Here are 11-20:
11) Jarrod Washburn, 10.59%
12) Carlos Zambrano, 10.05%
13) Shaun Marcum, 10.03%
14) Tim Hudson, 9.78%
15) Javier Vazquez, 9.66%
16) Kevin Millwood, 9.63%
17) Jose Contreras, 9.54%
18) Miguel Batista, 9.33%
19) Roy Halladay, 9.29%
20) Vicente Padilla, 9.14%
Something really interesting here is the emergence of Burnett, Marcum, and Halladay.  I noted in the comments on Thursday that, of pitchers with significant data, Burnett, Marcum, and Halladay were in the bottom ten of percentage of pitches thrown right down the middle; here they are in the top twenty of pitches thrown outside the zone.  I noted at Fangraphs a week or two ago that the Blue Jays rotation, arguably the best in the bigs both last year and this year, consisted of three guys (McGowan, Marcum, Litsch) who threw four or five different pitches at least 10% of the time, somewhat of an extreme rarity.  Additionally, Halladay has a potent three-pitch combo, and Burnett has a plus-fastball and plus-curveball.
Put together it seems like the Blue Jays pitchers are spreading their pitch selections quite liberally, rarely making mistakes in throwing the ball right down the middle, and not worrying about being outside the strike zone.  Perhaps this means nothing with regards to their performance, but it is interesting nonetheless that a rotation like this appears in the leaderboards in three different areas of selection/location.
As we get deeper into the season enough data will be compiled to look at both down the middle and outside pitches solely for 2008, when the data is tracked in each park.  For now, though, we’ll have to settle with Ted Lilly and Livan Hernandez.  If only those two faced each other this year.

Right Down the Middle

Last week I took a look at the relationship between pitches and home runs, checking to see if there were any noticeable discrepancies between those that sail out of the stadiums and those that do not.  The results showed that fastballs turned into souvenirs when they came in with lesser velocities and movements as well as with poor location; breaking balls were hit out when they hung in the zone.

While conducting these analyses I became very interested in pursuing the idea of mistake pitches and balls thrown not just in the zone but right down the middle.  Of all the balls that were hit for home runs from the top home run surrendering pitchers this year, at least 80% were smackdab in the middle of the plate.  Since this piqued my interest I decided to check out which pitchers threw down the middle most often.
The strike zone, in Pitch F/X terms, is generally -0.85 to 0.85 on the horizontal axis and 1.6 to 3.5 on the vertical axis.  I went smaller, looking at pitches in the middle of that zone, as evidenced by this picture:

strikezone.JPG


Probing my database for pitches in the smaller box–what I would consider to be down the middle–I found a ton of pitches.  Keep in mind, though, that the results below are from pitches tracked by the Pitch F/X system; there are some pitchers that might have a higher total or percentage but did not have the luxury of having their relevant data recorded.
I found 165 pitchers with a significant number of pitches down the middle.  Luckily, in terms of using neat/even numbers in a list, the top 30 percentages happened to consist of everyone with at least 14% of their pitches thrown down the middle.  Here are the top ten:
1) Ted Lilly, 18.6%
2) Paul Byrd, 16.7%
3) Josh Beckett, 16.3%
4) Micah Owings, 16.1%
5) Tim Lincecum, 15.9%
6) John Danks, 15.8%
7) Felix Hernandez, 15.7%
8) Greg Maddux, 15.5%
9) Joe Blanton, 15.5%
10) Justin Verlander, 15.4%
Lilly threw just about two percent more pitches down the middle than his closest competitor whereas #2-#10 were separated by a total 1.3 percent.  Numbers 11-20:
11) Andy Sonnanstine, 15.3%
12) Kevin Millwood, 15.2%
13) Cole Hamels, 15.1%
14) Aaron Harang, 15.0%
15) Brian Bannister, 14.8%
16) Daisuke Matsuzaka, 14.7%
17) Vicente Padilla, 14.7%
18) Matt Cain, 14.7%
19) Javier Vazquez, 14.7%
20) Randy Wolf, 14.6%
And the last group with at least 14% of their pitches down the middle:
21) Brad Penny, 14.5%
22) Roy Oswalt, 14.5%
23) Johan Santana, 14.4%
24) Nate Robertson, 14.3%
25) Ervin Santana, 14.2%
26) Miguel Batista, 14.2%
27) Jon Garland, 14.1%
28) John Lackey, 14.1%
29) CC Sabathia, 14.1%
30) Jarrod Washburn, 14.0%
Unfortunately, just as David Appelman found a couple of years ago, there is not much correlation between pitches thrown down the middle and, well, anything else at all.  I thought there might be something significant between down the middle pitches and line drives–it’s been theorized before that line drives might correlate quite well with mistake pitches–but, alas, there was not; at least not yet.
Additionally, I would like to explore this at the end of this season, or perhaps further into the year, when all pitchers would have the same (or close to it) amount of data recorded.  For now, though, at the very least, it’s somewhat interesting to see which pitchers throw the most down the middle.
On Saturday we will look at the opposite, pitchers who throw the most OUT of the zone and then compare the results (Balls, Called K, Swing K, etc) between pitches down the middle and those out of the zone.

Your guide to the 2008 election

This year, Americans will undertake one of their most sacred duties as citizens.  Millions of ballots will be cast, and this time, it really counts.  This year’s election will shape the very fabric of America.  (When did I become that writer?)  On a Tuesday night later this year, we will all gather together to watch the results of this momentous tallying play out on live television, and those results will have have effects that will be heard months, even years down the road. 
I ask all America to pause and consider what a great responsibility is being given to you.  Should you vote for the veteran who’s been doing it for years or for the new guy who might be a flash in the pan or might be the next big thing?  It’s a tough decision.  Maybe you vote for one of those other guys who probably won’t get elected anyway, but who you feel still deserve some support.  Now, it’s a tradition for media members to endorse candidates, and here at StatSpeak, we’re part of the new wing of the new new media.  I respect that you may have differing opinions on the candidates than I do, but perhaps my opinion means something to you.  Therefore, I’d like to endorse the following candidates in this 2008 election.  To the All-Star team.
(As for that other election going on this year, I have no clue.  I personally intend to flip a coin.  Or vote for Tommy Hinzo again.  And yes, I really have voted for Tommy Hinzo for President in the past.)
National League:
Catcher – Brian McCann.  Oddly enough, McCann’s currently losing in the balloting to a rookie from Chicago who might end up being amazing or might just be another in a long line of disappointments for the Cubs.  Sport is indeed a microcosm of life.  McCann gets my nod based on being atop the NL VORP standings for catcher and for being from Atlanta.  I could see an honest vote going to Russell Martin, and a good third place vote going to Soto, but there aren’t any third place votes on the All-Star ballot.
First Base - Lance Berkman.  Every once in a while, I wonder if people actually pay attention when voting.  It would be so easy for people just to punch the little chad by Albert Pujols’s name out of habit, but I’m impressed that the fans actually have Berkman up by a good margin.  After all, Terry Steinbach (!) actually started a couple of All-Star Games because people weren’t paying attention.  Can we change America?  Yes we can!
Second Base – Chase Utley.  He’ll win the actual vote in a landslide.  The thing that gets me is that a guy like Dan Uggla who is running a close second (not worthy of the same sentence as Utley, but worthy of the same paragraph), is right now running behind Mark DeRosa and Kaz Matsui?  Maybe all of Uggla’s votes in Florida have been locked away in a… are we still allowed to make Bush v. Gore jokes or is that too Michael Moore?
Third Base – Chipper Jones.  I know that David Wright is the wave of the future.  I know that batting average is a lousy stat.  But hey, Chipper at .400 at the All-Star break?
Shortstop –   Miguel Tejada.  He’s potentially been lying about his age and vitamin-taking habits, but thankfully, America has a history of voting for liars.  Hanley Ramirez is having a better year (although he plays horrid defense…), so if you vote on things like (hitting) ability, Ramirez is your pick.  But, Tejada is a better story.  Speaking of drugs, will someone tell me how Ryan Theriot is out-polling last year’s (ill-deserved and currently injured) MVP Jimmy Rollins?  Figure that Rollins would get some votes based on the fact that he won the MVP.  Oh right, this is the one election where they serve alcohol while you vote.  (Imagine whom America would elect if everyone went wasted to the voting booth.)  Now, that I think of it, Theriot plays for the Cubs.  And this is the third Cub who is strangely higher up in the standings than he deserves.  Maybe some of those dead people in the cemetaries are practicing with All-Star ballots.
Outfield – Pat Burrell, Jason Bay, Matt Holliday.  I’m not voting for Nate McLouth on GP’s.  Let’s see, the top three VORPers among NL outfielders are Ryan Ludwick, Jason Bay, and Pat Burrell (Matt Holliday is fourth).  The top three in WPA are Burrell, Bay, and Holliday (Ludwick is fourth…)  Holliday has been hurt, but I just can’t find it in my heart to punch out the little thing by Ryan Ludwick’s name.  It just doesn’t seem right.  The Indians had him for a while and figured he’d turn into a power hitting corner outfielder.  They were unfortunately right.  Plus, Holliday still has a little bit of that “you haven’t heard of him but I have” snobbery chic factor left, although not much.  That said, the fans are currently voting for the sentimental favorite (Ken Griffey, Jr.), the endlessly-talked-about-guy-who-signed-the-big-contract (Alfonso Soriano), and the he’s-Japanese-so-he-must-be-good guy (Kosuke Fukudome).  Two Cubs… weird…
American League
Catcher – Joe Mauer.  I opened the current vote totals to see what was going on in the actual voting and found a disturbing pattern.  The leaders: Varitek, Youklis, Pedroia, Jeter, A-Rod, and Manny (and Ichiro and Josh Hamilton.)  David Ortiz is winning at DH.  ESPN, what hath thou wrought?!  Since I’m the one writing this, I will no longer allow for the endorsing of any member of the Yankees or Red Sox.  In this case, it’s pretty easy as Mauer has been a better catcher (by VORP) than either Posada or ‘Tek.  Then again, so has Dioner Navarro.
First Baseman – Justin Morneau.  Oh great, the top two VORPers among AL first basemen are Jason Giambi and Kevin Youkilis.  Here’s why you should vote Morneau, other than to counteract the whole Yankees-Red Sox axis of evil.  Morneau is Canadian.  Now, what are the two most awkward moments on television all year?  The playing of the Canadian National Anthem at the All-Star game.  Why?  Because the camera guys have to ask ”What do we focus on for those two minutes?”  If we can get Jason Bay (also Canadian) and Morneau in there (plus whoever the token Blue Jay is), they can alternate between the two of them and Avril Lavigne, who will undoubtedly be called on to screech the anthem.  In other words, this election is all about foreign policy.
Second Baseman – Ian Kinsler and/or Brian Roberts.  The second base crop is pretty underwhelming this year.  Since we’re going a bit of political allegory here, this particular choice is like voting for the County Court of Appeals Judge, Third Circuit, for the term beginning on January 2nd.  You have no idea who any of the candidates are.  They’re probably all somewhat sleazy, yet competent enough lawyers.  And so you vote for the guy whose last name you like best.  This is why when I actually vote for the All-Star teams, I vote for all the Cleveland Indians and all the guys in the NL who won’t get many votes anyway.
Third Baseman – Alex Rodriguez.  Yeah, yeah, I know, no Yankees… but the game is in Yankee Stadium and he’s going to win anyway.  This is just a shrewd political maneuver on my part.  Plus, he’s the best hitter in the game and half of America votes just to say that they voted for the winner.  Besides, it’s not like we’ve never seen a broken promise in politics.
Shortstop – Michael Young.  Young’s probably wondering what it takes to get noticed around baseball.  His stats are superior to Jeter’s (and everyone else at the position) and he’s been to the ASG a few times before.  Heck, he was even the MVP in 2006.  But, the guy with the amazing resume doesn’t always get noticed when there’s a rock star around.  Just ask Bill Richardson.
Outfield -Josh Hamilton, Carlos Quentin, and B.J. Upton.  Hamilton’s story of putting his life back together is truly inspiring considering he was finally able to realize a lot of the potential he was supposed to have.  The guy he was traded for (Edinson Volquez) isn’t having a bad year either.  Carlos Quentin, where have you been hiding all this time?  You snuck up on everyone so much that I actually had to write your name in.  That means your own team didn’t think that you would be one of the best three outfielders on your team, much less the league!  B.J. Upton gets a nod based on the fact that he plays for the Rays, and the Rays are having the kind of year where they deserve to have a starter.  I should probably be suggesting that you vote for the ever-sublime Manny Ramirez (argh, there I am being that writer again) or the strangely-resurrgent Johnny Damon, but it’s my ballot.
Designated Hitter – Frank Thomas.  This is a snide vote on my end.  When Thomas was released by the Blue Jays, Sabermetricians everywhere shook their heads at how the Blue Jays could lack such a basic understanding of how to evaluate a small sample size (correct answer: don’t).  The A’s (gratuitous Moneyball reference!) picked him up and look what he’s done since.  He’s not the best DH in the league this year and he won’t win., but consider this a protest vote.
Then again, maybe I’ll just write in Barry Bonds.  How cool would that be?  He hasn’t retired.  He gets voted to start the game and is wearing just a plain gray uniform.  Hmmm… 
Anyway, go vote.