A PITCHf/x primer

Many of you are hopefully familiar with the PITCHf/x system and at least some of the data and analysis that have been produced on the subject over the past year, but it may be completely new to some of you. In either case, I thought it would be helpful to provide an introduction and tutorial on the information that is available. I’ll point toward some existing resources and try to fill in some of the gaps. I’ve divided this primer into sections so you can easily skip to the parts that interest you.

  1. What is PITCHf/x?
  2. How do I get and use the data?
  3. Where can I find resources?
  4. How do I identify pitch types?
  5. How do I interpret graphs?
  6. Is the data reliable?
  7. Where can I go for further discussion and study?

1. What is PITCHf/x?
PITCHf/x is a system developed by Sportvision and introduced in Major League Baseball during the 2006 playoffs. It uses two cameras to record the position of the pitched baseball during its flight from the pitcherís hand to home plate, and various parameters are measured and calculated to describe the trajectory and speed of each pitch. It was instituted in most ballparks throughout MLB as the 2007 season progressed, such that we have PITCHf/x data for a little over a third of the games from 2007. MLBAM used the PITCHf/x data in their Enhanced Gameday application and also made the data freely available for downloading and research.
In some ways, PITCHf/x is a bridge between scouting and analysis, giving us an objective window into the batter-pitcher matchup at a level we’ve never seen before. In 2008, the system should be installed in every major-league ballpark, and we will hopefully have complete detail for every pitch, although MLB has not committed to whether all the data will continue to be freely available in the future.
2. How do I get and use the data?
If you want to look at the XML data from a single game, you can go to the MLB website and browse through the files. Data is organized by year, month, day, and game. Within each game directory are a number of subdirectories containing the data in XML format. If you want to see the detailed pitch information within the game context, I suggest looking at the files in the inning subdirectory. If you want to see all the pitch information for a particular pitcher, you can go the pbp/pitchers subdirectory, but you need to know Elias playerID for your pitcher of interest. If you want to know what the various XML pitch data fields mean, read my glossary.
If you want to manipulate and analyze a single game’s worth of data, you can download and import the XML files into a Microsoft Excel spreadsheet. Dr. Alan Nathan has laid out the steps for you at his Physics of Baseball site.
If you want to get a little more hardcore, you can download the XML data for every game in the 2007 season. Using Perl scripts adapted from Joseph Adler’s Baseball Hacks, I downloaded the data and parsed it into a MySQL database. I’ve outlined the steps needed for you to do this yourself and shared the Perl code to give you a head start. (I’m not aware of anyone who’s gotten the Perl-to-MySQL path working on a Mac, so if you have, please drop me a line.)
3. Where can I find resources?
Probably the most popular and valuable PITCHf/x resource on the web is Josh Kalk’s collection of player cards. Josh has classified every pitch as either a fastball, sinker, cutter, splitter, changeup, slider, curve, or knuckleball using a clustering algorithm and made graphs of pitch speed, movement, and release point for every pitcher with at least 100 pitches recorded by PITCHf/x. Strike zone charts are available for hitters. This is a great resource that reminds me in some ways of Wikipedia: the depth, breadth, and accuracy of the information is amazing, doubly so since it’s free, but the accuracy isn’t perfect, and it’s worth keeping that in mind. Stuff that looks quirky to you may in fact be quirky. (Felix Hernandez does not throw a 100-mph splitter.)
Josh Kalk has also developed a PITCHf/x tool that allows you to query his database for a specific subset of pitches and plot their strike zone location.
The Hardball Times published a pitch identification tutorial by John Walsh that is a good introduction to the general PITCHf/x topic as well as the specific topic of pitch identification.
Dr. Alan Nathan’s Physics of Baseball site has a lot of interesting resources, including some PITCHf/x-related material.
4. How do I identify pitch types?
Some people are good at identifying pitch types while at the ballpark or from the center field TV camera view. That was a splitter. That was a sinker. That was a slider. Etc. I am not one of those people. If you are not one of those people either, PITCHf/x was made for you. Even if you are one of those people, PITCHf/x can be a useful resource for learning about how different pitches move.
A pitcher’s fastest pitch is usually a four-seam fastball. A typical major-league fastball is around 90 mph, many a little faster, some a little slower. The fastball from a right-handed pitcher breaks in toward a right-handed hitter. Pitches from a lefty move the opposite way; a fastball from a lefty breaks away from a right-handed hitter. I’ll describe the movement for pitches from a righty and you can flip the orientation if you want to know how a similar pitch from a lefty would behave.
Pitchers throw variations of the fastball by changing the grip on the baseball or parts of their motion and delivery. The most popular variation is a two-seam fastball, which often thrown a couple mph slower and breaks in more and drops more to a right-handed hitter from a right-handed pitcher than the four-seamer. The cut fastball is also thrown a few mph slower than the four-seamer and breaks away a little from a right-handed hitter, if it breaks at all.
The most popular off-speed pitch is the changeup, which is typically thrown 7-10 mph slower than a pitcher’s fastball. It usually has a similar break to the fastball, in toward a right-handed hitter. Some pitchers employ a grip on their changeup to impart additional movement, usually causing the pitch to break in more and drop more to a right-handed hitter. The split-finger fastball acts much like a changeup except that its velocity and movement are usually somewhere between the fastball and changeup.
Breaking balls include the slider and curveball. The slider is usually thrown at the same speed as the changeup or sometimes a few mph faster. The movement on the slider can vary quite a bit from one pitcher to another. Some sliders move like a cutter, with hardly any left-right break. Other sliders move more like a curveball, which breaks away from a right-handed hitter and down. The curveball is the slowest pitch, thrown in the 65-80 mph range in major league baseball.
The knuckleball is a special case in major league baseball these days. As far as I know, there were only two regular practitioners of the pitch in the majors last year: Tim Wakefield and Charlie Haeger. The pitch is thrown with very little spin such that the airstream interaction with the seam orientation causes the baseball to move unpredictably. Wakefield and Haeger throw the knuckleball about 65-70 mph.
Of course, there are a number of variations and combinations of the above pitches and specialty pitches like the screwball and gyroball and even the 50-mph Orlando Hernandez eephus pitch.
Typical RHP spin deflection
Here is a plot showing the typical vertical and horizontal spin deflection (a.k.a.”break”) of typical pitches from a right-handed pitcher, as viewed from the catcher’s point of view. A mirror image would give you the plot for left-handed pitcher. You can use this as a key for interpreting some of the graphs on Josh Kalk’s player cards or for understanding the spin-induced movement on various types of pitches.
5. How do I interpret graphs?
PITCHf/x analysis and research is a promising field with wide application and broad interest, and there are a number of people who have made important contributions in the first year of analysis. As a result, there are many different formats for presenting the results. I’ll summarize and explain a few of them here and give a more detailed explanation of some of the graphs that I use most frequently.
The most common plots presented by other PITCHf/x researchers include information about the speed and spin-induced deflection of pitches. To the best of my knowledge, Joe Sheehan was the first to produce these plots, showing speed on the vertical axis and the two components of spin deflection as two sets of points on the horizontal axis. Joe hasn’t done much pitch classification work recently, but he deserves a nod as the groundbreaker in that field.
Something you’re more likely to encounter these days is a plot from John Walsh, such as those contained in his pitch identification tutorial. He plots vertical “movement” versus horizontal “movement”, where movement refers to the spin-induced deflection, and indicates speed by color-coding the points on the graph.
Most common of all are the plots from Josh Kalk’s pitcher cards, particularly the plots of vertical “break” versus horizontal “break”. These are similar to John Walsh’s plots except that instead of color-coding for speed, the points on the graph are color-coded by pitch type. Josh has separate graphs that plot speed versus horizontal break and speed versus vertical break, reminiscent of the original Sheehan plots. Josh’s player cards also contain information on release point, which is the height and left-right position of the pitch measured 50 feet from home plate, which is soon after the actual release by the pitcher.
In the past I have presented graphs similar to those of Sheehan and Kalk, but more recently I’ve adopted a graph from Alan Nathan as my mainstay. It is a polar plot, with the speed of the pitch on the radial axis. The faster the pitch, the farther from the center. The slower the pitch, the closer to the center. The angle is the angle of the Magnus force, which is the force that cause the ball to break. Curveballs break down, so they’ll be in the bottom part of the graph. Sliders break away from a right-handed hitter, so they’ll be on the left side of the graph. The Magnus force of a fastball pushes the ball up, causing it to drop less than it normally would due to gravity alone, so the fastballs will be on the top part of the graph.
I’ve also started showing a graph of what I call “late break”, which is a combination of the effects of spin deflection and gravity as well as the speed of the pitch. The goal is to show something close to what the hitter perceives as the break or movement of the pitch. I calculate the deflection of the pitch due to two forces, spin and gravity, in the last 0.25 seconds of its trajectory before it crosses the plate, an idea I got from Tom Tango. I chose a quarter second because that’s roughly the reaction time of a batter executing a swing. I chose to include the effect of gravity because I believe that more accurately reflects what hitters see. Hitters don’t attempt to hit a gravity-less pitch; they attempt to hit a pitch that’s being affected by gravity and being deflected by spin.
6. Is the data reliable?
Whenever you are viewing or analyzing PITCHf/x data, it’s worth keeping in my mind that 2007 was a work in progress for Sportvision and MLBAM. They instituted the system in only a handful of stadiums to begin the year and added more systems in other stadiums, particularly in the second half of the year, as they gained confidence in the performance and accuracy of PITCHf/x. They experimented with measuring the initial point of the pitch trajectory at various distances from home plate, finally settling on 50 feet. They worked to identify and remove spurious data that was collected by the system. They trained operators who did such things as identifying the beginning of play in each half inning and setting the top and bottom of each batter’s strike zone in the system. In addition, the camera systems were sometimes recalibrated, possibly at the beginning of each home stand.
So it’s a bit naive to assume the data we have is a perfectly objective, accurate, and precise measure of each pitch. In most cases, it’s pretty close (within an inch or two) and good enough–much better than anything we’ve ever had before! But what are some of the sources of error to watch out for?
The data for some pitches is missing. In some cases this is obvious, when a stadium doesn’t have a system for part of the year, for example. Other times, portions of games will be missing, or even just individual pitches. Perhaps the operator may not have turned the system on for the first pitch of the inning, or MLB/Sportvision retroactively discovered an error in their data and removed it. We are also missing PITCHf/x data for all hit batsmen during the regular season.
There is erroneous data–spurious or mis-measured pitches. For example, the data may say that a pitch was released from ten feet off the ground, and unless Gumby has caught on with a major league team, I doubt any pitcher can reach that high. There are a number of 30-40 mph pitches that are recorded in the data that do not appear to be realistic. It’s been suggested that some of these may have been the system inadvertently recording other non-pitch throws of the baseball between the mound and the plate as a pitch.
There are indications of park and/or camera system bias. Data from Seattle and Toronto indicate pitch speeds that seem a few mph higher than they should be. Look how hard Dustin McGowan and Felix Hernandez are shown to have thrown on average. These guys are hard throwers, but not that hard. Similarly, the system at Fenway Park seems to have underestimated pitch speeds and otherwise collected strange data.
There are also altitude and temperature effects. In this case, the data collected by PITCHf/x may be completely correct, but our interpretation of the data has to take into account that air density affects how a pitched baseball moves. A curveball thrown in the thin air of Denver, Colorado won’t break as much as the same curveball thrown in the pea soup at sea level.
7. Where can I go for further discussion and study?
If you want to learn more about the details of Sportvision’s PITCHf/x system and MLB’s implementation, read this article by Mark Newman of MLB.com.
If you want to learn more about the physics of pitched baseballs, Alan Nathan is your man, and his freshman physics lectures on the Physics of Baseball at the University of Illinois are an excellent place to begin. You might also find these articles by Dave Baldwin and Terry Bahill helpful.
If you want to learn more about pitch classification methods, as I mentioned earlier, John Walsh’s pitch identification tutorial is a good place to start. You may also want to consult my survey of the topic, which contains a particular in-depth emphasis on my own work on the subject.
If you want to discuss PITCHf/x with other sabermetricians, I recommend The BOOK Blog run by Tom Tango.
If you want to learn about systematic error correction for the PITCHf/x data set, read Josh Kalk’s posts at his blog, and this post by Ike Hall, including comments by Alan Nathan.
If you want to learn about pitch sequencing analysis, Joe P. Sheehan’s Command Post at Baseball Analysts is a good resource, including these posts on the topic. Joe Sheehan’s writing is an excellent resource on a number of diverse PITCHf/x topics. Although I only listed him here under pitch sequencing, it’s well worth going through his archives on many other topics if you are interested in learning about PITCHf/x.
Dan Fox’s work is another great PITCHf/x resource, although, like Joe, I couldn’t find a neat category to file him under. He’s covered everything from pitch classification to measures of strike zone judgment.
If you want to learn about pitching styles, strategies, and repertoires throughout baseball history, I highly recommend reading the Neyer/James Guide to Pitchers, published in 2003. Rob Neyer has updates to the book at his blog.

Advertisements

28 Responses to A PITCHf/x primer

  1. BobbyRoberto says:

    “The fastball from a right-handed pitcher breaks away from a right-handed hitter. Pitches from a lefty move the opposite way; a fastball from a lefty breaks in toward a right-handed hitter.”
    Is this correct? I thought it was just the opposite. A fastball from a right-handed pitcher will tail in to a right-handed hitter. A fastball from a lefty will tail away from a right-handed hitter.

  2. Mike Fast says:

    Bobby, you’re right, and I’ve corrected it. I tend to get left-right dyslexic when I think about this stuff too much.

  3. tangotiger says:

    Excellent Primer. I’d also add in the work from Dan Fox, for the “gang o’ 6” (Sheehan, Walsh, Kalk, Fast, Fox, Nathan).

  4. Pizza Cutter says:

    If only I had the programming ability to mine this data… *sigh*

  5. Mike Fast says:

    Thanks, Tango. I added a link to Dan’s work in the final section. He definitely is one of the pioneers worth mentioning. I thought of him when I was writing, but I didn’t know where to put him at first.

  6. dan says:

    I still think that the average person looking at this will not completely understand what the graphs mean without explanation. When we watch a baseball game, we see the curveball drop downwards, not break slightly up (in some cases). We don’t see a rising fastball and sinker, we see one that stays straight or drops a little bit. I understand what the graphs mean just from reading most of what’s been written on the subject, but it just doesn’t always sit well with me.

  7. Mike Fast says:

    Dan, do you find my graphs of “late break” helpful in that regard, then? The concern encapsulated in your comment is one of the two main reasons I’ve started using the “late break” graph (the other being that it’s helpful in pitch classification).
    Generally speaking, my “late break” graph is targeted toward the average baseball fan, and my “speed vs. spin deflection angle” graph is targeted toward the more serious PITCHf/x-aware fan, although both graphs hopefully have some appeal and usefulness to both groups.
    In addition, I try to focus on spin when I’m talking about spin rather than calling it “break” or “movement” which I think are confusingly vague terminology. To me, the “break” on the pitch is always going to include some effect of gravity, and if one is only talking about spin effects, that should be made clear.
    However, there are many authors with significant contributions in the field who do talk about “break” or “movement” to mean only the spin-induced deflection, so that’s something to be aware of.

  8. […] to point it out. That Outfield Arms article I linked to last week wasn’t bad, and this piece, A PITCHf/x Primer written by Mike Fast, is even […]

  9. dan says:

    I have no idea why, but I just looked at the late break graph again and it instantly made sense. The part that I feel still is difficult to understand is why a sinker breaks “up” like 10 inches when I see it move downward on the TV. I do know the answer to this question, it’s because of spin, friction, etc. I still think that seeing break in absolute terms would make more sense. A Barry Zito curveball clearly drops a few feet, and I think the numbers should reflect that. Please educate me as to why I’m wrong…. I have a hard time believing that I am right and you, Walsh, Fox, et al are wrong.

  10. Mike Fast says:

    Dan, one thing I like to do in my articles is experiment with different graphs to see what people find helpful. If I don’t get any comments on a particular graph after I use it a few times, I tend to drop it from the rotation. I still produce it and look at it for myself, but I don’t publish it.
    I’ve shown the graph that you’re talking about a few times–the effect of spin and gravity over the whole trajectory. For example, here: http://mvn.com/mlb-stats/files/2008/01/bedard_vertical_vs_horizontal_deflection.jpg
    You can see Bedard’s fastballs drop 6-12 inches and his curveball drops almost three feet. I haven’t looked at Zito’s curve, but I suspect it drops a bit more than that. Joakim Soria has a big slow curveball that drops between 3 and 4 feet; it’s probably the most similar one to Zito’s that I’ve looked at so far in terms of drop.
    http://mvn.com/mlb-stats/files/2007/12/soria_vertical_vs_horizontal_movement.jpg

  11. dan says:

    That is exactly what I was looking for. I didn’t realize what each graph was showing at the time, so I just moved on and kept reading. Thanks Mike.

  12. Mike Fast says:

    You’re welcome, Dan. Thanks for your feedback. It’s very helpful to get specific comments like yours.

  13. Ike says:

    Mike and Dan,
    With respect to vertical break, and making the plots more accessible to people that aren’t as familiar with the pitchF/x system and what it is describing, I have one solution that I’ve been bouncing about in my head for my next big dump of playing with the data. And that is to define a “league average straight fastball”. Say, a 90 mph (or 91 mph…whatever) fastball thrown at sea level on a dry day. Take the expected or average “break” numbers (or late break numbers that you plot), for such a fastball and then rescale all pitch breaks relative to that. I think then ‘break’ would have a more consistent definition to what we see on TV screen when we watch a ballgame, rather than the current definition, which is the break relative to a pitch that doesn’t exist (a pitch with no spin that doesn’t behave like a knuckleball, and for which the drag is only along one axis).
    At this point, it’s all hairsplitting, and completely tied to which version of break you seem to prefer, and that will be different for a lot people. For physicists like me, separating out the break due entirely to the magnus force is a more intuitive thing, but for the rest of the world, that may not make as much intuitive sense. At the end of the day, they are all really describing the same thing, and the only reason for making all the different kinds of break plots is if you actually care that the numbers mean something to you. For me, it’s enough to say that Barry Zito has the more downward break on his curveball than anyone else. I really don’t care what number we give to that.

  14. […] Mike Fast vi dice tutto quello che avreste mai voluto sapere sul Pitch F/X ma non avete mai osato chiedere. E […]

  15. […] Mike Fast of MVN takes a look at what exactly Pitchf/x is and why it’s valuable. – (Source) […]

  16. […] The system allows us to track the exact flight path of pitches. There’s a fantastic primer on PITCHf/x here that explains things better than I can if you’re new to the […]

  17. […] Pitch F/X (Read the primer first)Take a look (pitch by pitch) at any given hurler. See what pitches are thrown, where they’re […]

  18. Anonymous says:

    […] bad days. He is human. So what I have done is put on my spats and ventured into the brave world of PITCHf/x. I took Santana’s best and worst starts from 2007 (by Game Score) and let the tea leaves fall […]

  19. […] bad days. He is human. So what I have done is put on my spats and ventured into the brave world of PITCHf/x. I took Santana’s best and worst starts from 2007 (by Game Score) and let the tea leaves fall […]

  20. […] the way, I don’t mean this to sound like a complaint, but I really want this pitch f/x data to be tabulated and available quicker.† I know it’s an incredibly tedious process and […]

  21. […] tracks all pitchers using something called PITCHf/x. If you need a primer on PITCHf/x, then you’ll probably want to read up on this, but in short it’s a system for tracking pitches much more comprehensively than they’ve […]

  22. […] What is PITCHf/x you may ask? Well, here is the PITCHf/x primer. […]

  23. […] performance by Halladay are a couple of interesting points. To do this quick analysis, Iíve used PITCHf/x data (a great data tool from […]

  24. […] For those new to PITCHf/x, it is a system developed by Sportsvision in use by Major League Baseball that uses two cameras to measure the position of the baseball between the pitcher’s hand and home plate, which can be used to determine various parameters about each pitch including velocity and break (for a more thorough introduction to PITCHf/x, refer here and here). […]

  25. […] For those new to PITCHf/x, it is a system developed by Sportsvision in use by Major League Baseball that uses two cameras to measure the position of the baseball between the pitcher’s hand and home plate, which can be used to determine various parameters about each pitch including velocity and break (for a more thorough introduction to PITCHf/x, refer here and here). […]

  26. […] with Pitch-f/x technology. The possibilities for statistical research with this data (which you can download yourself) are […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: