# The developmental curveball

August 18, 2008 4 Comments

I’ve had two types of jobs in my life. The first started when I was 10 years old. My father runs a bunch of parking garages for a living, and his assistant manager at the parking lot had a side business selling baseball cards (and old comics) at card shows. I was never one for comics, but I could talk about baseball all day, so I stayed on that end of the table and sold cards to people. It was a dream job for a 10 year old. Come to think of it, that would be a really cool job even now. My other job (well, jobs really) have all had to do with kids. I’ve done day care work, and most of my clinical work in psychology, as well as my research, has been in studying kids. They’re fascinating little creatures in the way that they grow and develop. When I taught, my favorite class to teach was always Child Psych. It was even more fun than Stats.

In working with kids, you learn very quickly to see the signs of whether a child is developing properly. For example, in language development, I know that babies first start to babble in the second half of their first year, then use a few words around 12 months, and then their vocabulary explodes around 18 months. There are ways to measure vocabulary development that can predict how big a vocabulary the child will have at different ages and whether that falls into the normal range. The problem is that development is un-even. It goes in fits and spurts (ask any overly anxious parent of a 2 year old and you will get a report on all the developmental milestones that Junior is coming up just short on… it usually ends with the parent diagnosing some major problem that they have no business diagnosing. And it’s usually not the case.) Some kids learn a lot in a short period of time, and kids learn different skills at different times. The good news is that most kids end up OK.

Now, while this is great for a child development blog (maybe MVN would let me start one of those!), what does it have to do with baseball? Baseball players have a life-span. They are minor leaguers, then rookies, and if all goes well, regulars, then veterans, then aged veterans, then retirees. It’s, at best, a 20 year life-span, but are there are certainly skills to learn and developmental milestones to meet.

Sabermetrics has long sought to produce aging curves that will accurately model a player’s development over the course of his career, although they’ve generally suffered from the problem of being good in the aggregate, although not helpful in the specific. I can appreciate that the average 24 year old will improve X number of runs next year, but what can you tell me about this particular 24 year old? Replace 24 year old with 6 year old and runs with vocabulary words, and you have a glimpse into a day in my office. Maybe it’s time we started looking at aging curves in a slightly more sophisticated manner.

The trick is to look for the signs that the kids, or in the case of young baseball players, the rookies, are learning. Because development isn’t a smooth process in either population, it becomes important to find those who show the incipient signs of growth and development. For example, with child language development, receptive language (the ability to understand a word spoken to you) is an important first step in getting the child to speak the word. If he knows what the word means, there’s a good chance he’ll start using it soon. Do these types of signs exist in baseball players? Yes, if you know where to look.

Let’s for a moment take a look at the development of strikeouts. Young players often strike out a lot. but as they mature, they learn how to strike out less. The problem is that not all young players learn this. Even more maddening, some of them suddenly seem to mature out of nowhere and learn to make good contact and to lay off bad pitches. Or do they? A while ago, I developed a measure of strike zone judgment based on signal detection theory and called it “strike zone senstivitiy”. It’s based on the thought that there’s more to a being disciplined hitter than walking. Indeed, it’s very disciplined to put the ball into play if it’s a good pitch. The mechanics of the measure aren’t important right now, just know that it’s a pretty good correlate of strike out rate.

I took all players under the age of 26 (as of July 1, 2006) who got 100 PA or more in the 2006 season, a total of 99 players. I then calculated the strike zone sensitivity shown by the hitter over his first 50 plate appearances during the year. (I’ve previously shown that this particular stat stablizes very quickly.) Then, I calculated the stat for plate appearances 2-51, then 3-52, and so on until I ran out of data. By doing this, I created a moving average for the stat over the course of the season. Now, some players have either fully developed what will be their “adult” level of strike zone sensitivity or they simply aren’t growing either getting better or worse. In these cases, any deviation from that will likely be simple random variation around their ability.

But what about cases when there is clearly a pattern. Let’s say that a player is becoming more strike zone sensitive (a good thing). As the moving average moves along, a line connecting the moving averages would slope slowly upward. If it was just variation around the mean, it would dip up and down and up and down and wouldn’t show much of a linear pattern. How to tell if the dots form a coherent line. We can put the dots into a linear regression formula and see what comes out. If the dots really are in a line (the player is either steadily getting better or getting worse), then the model fit statistics (mostly, the R-squared for the regression) will be high. If it is, then we can look at the regression coefficient to see whether or not the strike zone sensitivity is going up or down, and how quickly or slowly.

I ran such a regression for all 99 players in my sample. Of them, 20 had R-squared values over .30, and 8 (not a really great sample size) had values over .50. After I finished, I calculated the player’s overall K rate for both 2006 and 2007 and the differences between the two. I figured that if a player had a high R-squared, it suggested that he was showing signs of his plate sensivity actually changing. The trend line was an actual line and not a blob. If the line was pointing up (regression coefficient was positive) and the player was becoming more sensitive, there should be some reduction in strikeouts the next year. If the opposite pattern were true (the player was becoming less sensitive) then strikeouts should increase in the next year. And that’s what happened.

I first ran a correlation between the size of the coefficient in te regression to the difference in the strike out rates the next year on the whole sample of 99 players. The correlation came back a less than inspiring .184. But, when I restricted the sample to those players whose moving average trend lines had a more than .30 R-squared, a very different story emerged. The correlation coefficient was a nifty .525. When I raised the requirement for R-squared to .50, the coefficient went up to .667. The last number is just with 8 participants, so it should be treated with a bit of caution, but it seems that those who show specific signs of growth carry them into the next year by showing a rate of either growth or recession roughly proportionate to their growth trend line. I re-ran the numbers on the regressed rates using the custom PA-by-PA split half coefficients that I had previously generated. Same basic results.

It wouldn’t be a good children’s book without a few pretty pictures for illustration. The first is the moving average graph for currently wayward Indians third baseman Andy Marte in 2006, who in 2006 struck out 23.2% of the time in 178 PA at the Major League level (and 22.7 at AAA in about 380 PA there.) The plot has an overall R-squared of NUMBER and is clearly pointing in the direction of Marte becoming more selective at the plate. Surely enough, Marte’s strikeout rate dropped to 18.2% in AAA in 2007, and 15.8% at the big league level. What happened in 2008 (he’s back up to 24.4% and has otherwise been horrible at the plate), I don’t quite know. Maybe he got back into some bad habits. But clearly, the trend is going toward Marte learning a bit more sensitivity as 2006 progressed, and that correctly predicted a decrease in his overall K-rate in 2007, and a fairly significant one at that (5 percentage points).

The second picture, for comparison belongs to the 2006 version of Jeff Francouer, whose plot wildly oscilates around and has an R-squared of almost nil, because it doesn’t create a coherent trend line. We can’t really tell much of anything about where Francouer is going to go given this information. But, we do have an idea of where Marte is going. So, while we’ll have to rely on Francouer’s previous stats to help project his stats for the next year, we know we have a little extra help with Marte, and it seems a shame not to use it.

I have to stop and say that this is certainly preliminary work. I’ve shown only one stat to have these properties, in one isolated case, with a relatively small sample. However, there is some evidence here that might be helpful in projecting next year’s stats. At the end of the 2008 season, I can calculate a player’s rate of whatever I want, but I can also look at his trend line. If it shows a specific trend toward more or less of that stat, maybe that could be part of the algorithim that predicts his 2009 stat line. I need to fool around with how many PA are included the moving average and other things of that nature, but in theory (and seemingly in practice), this could very easily turn into something that can be incorporated in predicting breakout years in certain stats. It makes sense that we might quantify growth and development. This is simply the math that I propose that we use.

This is great work… I always hated the uniform aging curve. It doesn’t apply to anything (education, maturity, etc.), why should it apply to baseball players? Just look at Ben Grieve.

Is this what Peter Jensen was talking about with his projections a few weeks ago?

If it was just variation around the mean, it would dip up and down and up and down and wouldn’t show much of a linear pattern.Except that it could indeed do that. Random variation could mimic a pattern that is increasing or decreasing as easily as it could look likes it’s variating wildly. It’s random, after all.

Dan, I was heavily influenced by what Peter said over at The Book when I was conceptualizing this. As I understood what he was saying (if he’s out there he can correct me), it’s not exactly what he was thinking about, but it’s not out of range. Peter was arguing for regressing players back to their original predictions of their true talent level. I’m not quite going that far. Here I’m suggesting another term in the equation to predict whatever stat we’re interested in at the moment.

Nathaniel, you’re right that a random pattern could look like a coherent one (Type I error), although it’s the magnitude of those lines that’s correlating with the rise/fall in K rates from year to year that I’m more concerned with. I do need to fool around with these data a little more.

I know my projection program never liked Marte very much …despite so% being stable at a relatively low number of PA, perhaps it was still too small of a sample.

Looking at the hr curve for all players,

from 20 to 21, HRs up 17%,

from 21 to 22 14%,

from 22 to 23 6%,

from 23 to 24 3%

….knowing a player’s age and hr% in several seasons, can we build a model which will tell us if our player is following the curve, or is not developing like the population as a whole (Dan’s uniform curve). Can we extraplote a curve from what we know of the population and of the individual so far?