Aging Patterns: It’s All Downhill From Here
October 2, 2008 1 Comment
Last month I described my player projection system, which Tom Tango suggests I should name Oliver, after the famous “humanzee”. One of the concepts in my system is that each of the components (singles, doubles, triples, homeruns, hit by pitch, walks, strikeouts, etc) should be processed separately, and then the batting or pitching line reassembled as a later step. I’d like to offer a few reasons why this approach is necessary.
On Monday Pizza Cutter wrote about how the much skill and how much luck is involved in the different components of a batter’s statistics. His article illustrates why to best estimate a player’s skill in each of these components, each should be regressed separately. In the designing Oliver I did test various levels of regression for each component to find the ones that would minimize the total error, as measured by rms. Much of that work was discussed in Tango’s “The Book Blog”
Another concept Pizza Cutter mentioned was flow charting of an at bat. This is a method of describing the relationships between the various components. You can get a triple until you are already at second base with a double, and you can’t get a double until you already have a single.
Here’s the way I’ve charted it
Does the pitch hit the batter?
If not, does the batter walk or strikeout?
If not (makes contact), is it a bunt?
If not, is it over the fence for a homerun?
If not (in play), is it a base hit?
If yes, is it a double?
If yes, is it a triple?
If yes, is it an inside the park homerun?
This chart then tells you that strikeouts should be processed as a function of pa minus hp, while triples are a function of extra base hits.
Lastly, the various components age differently. From Bill James first attempt to study how players age, most have focused on measuring how the total production of the player rises and then declines as he ages. Tom Tango was one who looked to determine the aging patterns by component. A player in his twenties will have increasing walks and decreasing strikeouts, but then have these trends reverse in his thirties.
Tango’s study includes 1979 to 1999 and also 1919 to 1999. I ran my own analysis from 1954 to 2007, and reran it while filtering for various player attributes, such as different rates of career homerun percentage. While my work substantially confirmed Tango’s, there were some interesting observations. At first I ran all batters, including pitchers, setting no minimum amount of plate appearances, comparing year1 to year2 for each batter. Homeruns peaked at 25, but the decline from 26 to 28 was very slight (0.969 total for the period). I then split batters into five groups based on their career HR%. Those in the highest group (greater than 0.80) peaked at 29, while those in the lowest (less than 0.15) peaked at 26, with the three middle groups giving peaks which create a smooth slope suggesting a later peak for those with a higher HR%. Which was the cause, and which was the effect? Did they peak later because they had a higher HR%, or did they have a higher HR% because they peaked later? Because I was looking at all players, I thought I might have an attrition bias. By setting a minimum number of career plate appearances, I could look primarily at players who’s careers extended from before the normal peak to after. Setting a minimum of 4000 career plate appearances, players in all five ranges of career HR% now peaked at 28 or 29. Tango’s chart shows homeruns peak at 27, slightly sooner, but possibly within the error of the sample size. If I was looking at total productivity, playing time would be dependent on that productivity.. Here I had players with 4000 or more career plate appearances, and the players who had a career HR of greater than 0.080 peaked their homerun rate at the same age as the players who had a career rate of less than 0.015. I feel safe saying that the long playing career of these players was independent of their ability to hit homeruns, and that it then appears that players peaked their homerun rate at the same age, regardless of what the rate was.
The results show us that power, as indicated by homeruns and partly by doubles, peak about 27 or 28, stay flat until 30, and then decline. Speed peaks much sooner, while many players are still in the minor leagues. Triples max out at 21 or 22, stolen bases at 24 or 25.
Strikeouts minimize from 26 to 28, while walks increase until 33 to 37. I am interested in looking at ball and strike totals to see how the plate discipline and contact change as a player ages, to see the causes of the walk and strikeout rates.
Remember that these are the aging patterns for the typical player. There are two variables – when does the peak occur, and how high is it? Some players increase more quickly, others not at all. Some players peak earlier and then regress, while others are late bloomers. Pizza Cutter warned about trying to fit players into a standard curve. There should be minimal error in using a standard aging curve to correct a one year projection, but what is more problematical is projecting multiple years into the future. One of the items on my to-do list is seeing if we can predict each player’s unique curve by looking at his career so far, fitting it to his existing data points in order to extrapolate more accurately into the future.