# Turning the Monkey Into a Gorilla

Recently I was over at Bucco Blog discussing the Pirates’ trading away of Jason Bay. An earlier rumor had Bay headed to Tampa Bay in exchange for Reid Brignac and Jeff Niemann, but instead the Red Sox obtained Bay, with the Pirates receiving Andy LaRoche and Brandon Moss, along with two minor league pitchers. I posted a spreadsheet of Minor League Equivalencies (MLEs) to justify my opinion that not only were either LaRoche or Moss were far superior offensively to Brignac, but that both were also a little better than Xavier Nady, traded to the Yankees the week before.
Ray offered this response:
“No offense Brian Cartwright but I don’t think you have the slightest idea of what you are talking about. If baseball was all about spreadsheets and plugging numbers into a database then every team should be competitive.
You have no idea what you are talking about so please don’t put up fake statistical analysis and try to act like you can justify Reid Brignac not being able to be an offensive threat in the major leagues”
Dude, don’t you know who I am? How can I not take offense? Seriously, instead of explaining my methods to an audience of one, I decided that this was the better venue. After all, Pizza, Eric and Colin have all written about projection systems within the past month, so why not one more? One well accepted system is Tom Tango’s Marcel, which for its simplicity is characterized by the monkey who can do better than the humans. To summarize how Marcel works:

· Take the last three years of data, weighted 5/4/3

· Regress the total to the league average

· Apply aging factors

I build upon that by

· Including all past seasons, continuing the diminished weighting

· Including minor leagues.

When looking at projections, I rely on babip, hr%, bb% and so%. I want to be able to look at a record which shows a 20 year old player in AA, who has an average babip (.304) a hr% just short of excellent (.087), a little above average bb% (.105) and an average so% (.168), which combine for a 297/381/571 line, and then say what is the percentage of the time a player a such as this makes the major leagues, and once there, is a regular, a star, or a superstar? In this case, these are the number for Prince Fielder after the 2004 season. As of the 2008 All-Star break, his numbers are babip .292, hr% .082, bb% .098 and so% .187, for a 279/359/537 line. The projection is slightly high, but it is not necessary to precisely predict the numbers. If we understand the variance of performance around the projection, classifying them such as “Excellent”, “Very Good”, “Average”, “Poor”, etc will clearly show the type of player that Fielder or anyone else has become.
Tango’s 5/4/3 weights, when divided by 5, makes the current year weight 1.00, year-1 as 0.80 and year-2 as 0.60. This was s shortcut for 0.8^(now-year), in which year-2 is actually 0.64, year-3 would be 0.48, etc, multiplying each prior year by 0.8. I ran mean error tests of my yearly projections compared to the season immediately following the projection. I found that the values which minimized the mean error were a prior year weighting of 0.7 combined with regressing to 150 plate appearances of league average performance. In response, Tango suggested continuing to use the 0.8 weighting, as it would include more of the player’s historical data, which would in turn decrease the effect of regression. I have chosen to stay with 0.7 as it did minimize the error, even if increasing the effect of regression. Therefore, the weight for the current season is, as always, 1.00, year-1 is 0.7, year-2 is 0.49, year-3 is 0.34, etc. A smaller weight will be more sensitive to recent data, and create more variance in the projections for a player from one year to the next. A larger weight will include more data from years past, creating longer term trend, that will smooth (and possibly hide) abrupt actual changes in a player’s talent level, such as due to injury.
In regressing to the mean, Tango’s Marcel weights the regression with the same of plate appearances for each component. My mean error test showed the optimal level is 150 PA. I agree with what Pizza Cutter has recently written that it is best to calculate the number of PAs of league average performance to regress to separately for each component. Stats such as hr%, bb% and so% stabilize for individual players much more quickly than babip. However, my tests showed that the difference in the babip error from no regression to 150 PAs is much more than the difference from 150 to 600 PAs. Using 150 PA for babip is not as good as using 600 (or more), but the difference is slight.
Should we include minor league data? It really bothers me when I see someone post a comment such as “a rookie who we know absolutely nothing about”. Since the time the player turned pro statistics have been recorded, and play by play is available for the past four seasons of minor league games. History does not begin when a player makes his major league debut. Can we use a player’s complete professional record to determine his “true talent level?” Can we look at those minor league statistics and tell what the odds are that he will play in the major leagues, and what his statistics will look like if and when he gets to “The Show”? This all got started for me last summer when I asked myself “Just how good is Rajai Davis?” at the time he was called up by the Pirates. I pulled some old ideas out of the back of my brain and plugged the batting stats into Excel. The spreadsheet kept growing as I added more players, but it was very labor intensive. A couple of weeks ago I developed it into an Access database, and loaded it up with data.
I use a simple concept to calculate the Major League Equivalencies, creating matched pairs of major and minor league, grouped by team, league and level. In 2008 Reid Brignac is playing for Durham of the International League, in AAA. If we want to know how Brignac’s Durham stats translate into Major League Equivalents, let’s find all the other players who played in Durham, who played in the International League, and who played in AAA, and then also played in the majors, and compare their performances in each of the batting components (base hits, extra base hits, homeruns, hit by pitch, walks, strikeouts, grounded into double plays) between each. You need two buckets of data. In one, place the batters major league stats; in the other, his minor league stats, broken down by team, league and level. To control for the different amount of plate appearances, always scale the larger down to the smaller. For example, if Player A had 500 PA and 120 SO in the majors, and 250 PA and 50 SO in the minors, keep the minor league totals unchanged, as they came from the smaller sample, but scale the major league strikeouts (the larger sample) by (250/500) then times 120, placing 60 in the major league bucket. After this is done for each player, sum the categories in each bucket, and then compare the totals. My database currently has 60 players who played at Durham who later played in the majors, covering 13904 plate appearances. Those 60 players hit 325 major league homers, but 401 in an equivalent number of PAs in Durham, increasing by a factor of 1.23 (the 9th highest rate in AAA). Base hits (singles, doubles and triples) increased by 1.06, walks by 1.18, strikeouts by 0.94. To create a MLE for a player at Durham, the rates of his various components in the Durham stats need to be decreased by the factors given, and then have the batting line reassembled.
In the early days of this project I was running linear regressions to study the relationships in the minor and major league sets of the matched pair data. I noticed that for all components, players who went on to become regulars in the majors had better regression coefficients than players who played in the majors, but failed to establish themselves as regulars. For a long time I interpreted this to mean that those who succeeded at the next level were somehow able to retain a higher percentage of their performance level than those who failed, and that the task remained to try to find some leading indicators which might foretell, between those otherwise equal players, who would succeed and who would fail. Recently it dawned on me that it’s mostly just luck.
The projections that are generated are the single most likely values that form the mean of a bell shaped (nearly) normal distribution of possible future performances. Just by the laws of probability (luck), 15% of players will under perform by at least one standard deviation from the mean, and 15% will over perform by the same amount. Especially when a player is in the minor leagues, and especially if that player is considered to be below major league average, being unlucky enough to be at the bottom of the bell curve (a bad year) will be enough to get the player released, or at the least returned to the lower level, so that he never has a chance to even out his bad luck with a second season to expand the sample size. Many players promoted to the majors spend their time pinch hitting, where the average performance is much worse than when a player starts. When the player doesn’t perform as a pinch hitter, he is returned to playing everyday in the minors. This creates a paradoxical type of selection bias, in which selecting all players creates the bias. This is similar to the study of aging curves, where underperforming causes a player to drop out of the surveyed group.
The selection of all players causes the projection to underestimate. Further, any system which calculates an MLE by chaining A to AA, AA to AAA, AAA to MLB will multiply this underestimate. To avoid this, I have chosen to directly compare each level to MLB. However, this increases the time gap between the statistics being compared, which increases the possibility of the player’s true talent being at a different level by the time he reaches the majors. But, if the question is “how will a player who had this performance at this level be expected to perform in the majors?” is it not best answered by querying all other players who played at that level and then went on to play in the majors? Thus, the direct comparison limits the players in the sample to those who went on to play in the majors instead of those who advanced to the next level. The sample size can be effectively increased by using sampling to determine each minor league’s mean talent level, and then using play by play data (available for minor leagues beginning in 2005) to determine each team’s park factors within those leagues.
There are limits to the sampling method. In 2006 and 2007, Brignac played at Montgomery in the Class AA Southern League. In contrast to the 60 players at Durham, only eight players from Montgomery had gone on to play in the majors, and only five of those had been regular starters. In 2005, Brignac played for Southwest Michigan of the Class A Midwest League. Other than Brignac himself, no one else in the two year history of the team ever played in the majors. In these cases where team specific information is sparse, the ratings need to lean more to those of the league and the level, where larger and thus more reliable sample sizes are available. By measuring the variances of each component, a regression can be constructed that combines each team’s stats with a certain number of its league stats. The fewer plate appearances that are sampled for a given team, a higher percentage of the rating will depend on its league’s ratings. In the same manner, leagues that have a smaller sample will be regressed towards the ratings for its level.
When comparing minor to major league performance, the two main factors to be considered are ballparks and level of competition. Sampling takes care of both of these at the same time, simplifying the calculations, but sampling doesn’t specifically account for changes in the home or road ballparks over the period of time observed, instead trying to mean them out with large sample sizes. Beginning in 2005, the play by play for all minor league games have been stored in Game Day records stored at mlb.com. Using the method I described in a previous article, park factors, relevant to the other parks in the same league, can be calculated using this play by play. Sampling is fine for calculating the level of competition in each league, relevant to the majors. Sampling shows that the Pacific Coast League has higher offensive levels that the International League. Over an 11 year period, this is likely not due to unequal talent between leagues in the same level, but instead suggests that the PCL has more hitter friendly ballparks. Thus, sampling at the league level will help correct for unequal distributions of ballparks between leagues. In the years that these play by play park factors are available, they can be combined with the sampled league factors to create more accurate factors for every team, regardless of how many players from each particular team went on to play in the majors.
In the end, building on top of Tom Tango’s Marcel creates a system which is quite similar to Baseball Prospectus’ Pecota, and also that it can

· Describe each component as a percentile of the league average;

· Create lists of comparable players

· Calculate “stars and scrubs” probabilities of future performance based on the track records of those comparables.

However, BP only updates Pecota at the beginning of each season. Fed with GameDay’s play by play data, this system can do daily updates of true talent level and end of current season projections.
Over the rest of the season, I hope to profile players who are in the news but at this point have a brief or non-existent major league record. In the off-season, after all minor leaguers have been entered into my database, I can do depth charts of each organization, and list which minor leaguers reliably project to be better than average future major leaguers, and which ones look to be over hyped.

### 14 Responses to Turning the Monkey Into a Gorilla

1. Brian Cartwright says:

Shortly after the end of the season.
Some of it is still a work in progress. I wil be programming the age corrections next. At the end of the season I will be loading in all players, including pitchers. Using minor league GameDay play by play data will also allow the rating of the minor leaguers on defense, baserunning, etc, which will help fill out the profile with more than just a batting line.

2. WCB says:

You really wasted your time writing this? I can’t believe I just read this garbage.

3. dan says:

Brian, I’d be careful with anything below AA. It is very common for “toolsy” players to not hit well in the low minors, only to put it together and break out as they move up the ladder. While still in the minors, a player’s true talent level is changing much more rapidly than at the major league level. Miguel Cabrera was a scrub, performance-wise, until 2003. Hanley Ramirez’s minor league stats look nothing like his major league stats.
WCB– what exactly is garbage? You don’t think we know anything about a player until he makes the major leagues? If that were the case, how would teams know when to call guys up?

4. Brian Cartwright says:

Dan – very valid comments, but we can improve the process by understanding the context of the minor league stats, partly by seeing how those who have gone before have done.
I have yet to apply aging curves to help correct the projections (article forthcoming). I am not yet found a multi year algorithm I am comfortable with, but I think it will be fine for doing a one year correction.
At the end of 2002, Miguel Cabrera was a 19 year old in High A, projected at 270/320/429. That was already just a hair below an average major leaguer. Six years later, ate age 25, he projects at 307/367/527. babip from .319 to .343. HR% from .033 to .060, BB% from .062 to .080, so% from .201 to .182. At first glance, walks and strikeouts are close, and look like they following a normal progression. He definitely has grown in the power numbers.
In 2004 Hanley Ramirez was a 20 year old in AA, projected at 303/348/459, currently projects 306/365/509. babip has stayed at .339, hr% from .033 to .048, a normal age progression would put him at .044. bb% from .060 to .079, so% from .155 to .166. So Ramirez did already project well at age 20, .143 BaseRuns per PA (ML Avg .133) and has grown his power fairly typically.
When I fill out the database with all minor leaguers, I will be looking for other players of the same age, at the same level, and see how the others turned out. Both of your examples of Cabrera and Ramirez were very young when they got to the majors, but even at 19 or 20 projected to be at least league average. We got to see them grow into stars at the major league level.

5. Xeifrank says:

vr, Xei

6. Hylton says:

I attempted to read it, but the font is too small!
That’s my only complaint. ðŸ™‚

7. Brian Cartwright says:

Sorry Hylton, I learned the hard way that WordPress does not like copy and paste. This was my first time, and I learned my lesson the hard way. From now on I will be composing in the browser.

Brian, that is really great work. I have a feeling that anyone smart enough to follow along with your article should be smart enough to know how to resize fonts in their browser.

9. Brian, if you copy your stuff into the “html view” in the WordPress writing screen you should be ok. It will only copy the text itself, without the html markup from your word processor that goofs things up.

10. BobbyRoberto says:

Great article! I’d love to see the projections in the off-season. And a quick way for readers to enlarge the font is by holding down the Apple key and the plus sign (for Macs, might be Control+ for non-Macs).

11. Jake says:

“I posted a spreadsheet of Minor League Equivalencies (MLEs) to justify my opinion that not only were either LaRoche or Moss were far superior offensively to Brignac, but that both were also a little better than Xavier Nady…”
I’m sorry but suggesting Moss will be better than Nady is really reaching out for sugar plums. Andy LaRoche might end up a solid every day player if he takes his bro’s meds to slow the game up, otherwise he’s another Jose Bautista.
Now where will these men be in five years? Probably doing well.
With another team.
After being corrected of all the tweaks inexperienced field staff thrust on them in Pittsburgh.
Hey – maybe you should add a couple of columns to your spreadsheet for “field staff stupidity” and “inexperienced and poor” field staff?

12. Brian Cartwright says:

Am I correct, by your tone, that is Jake from Bucco Blog?
I’m trying to take a longer term view in player analysis, more than what someone has hit in the last 30 days or so. Now the sad thing about the Pirates, after the trades, I have 45 players AA and above, and only 11 are MLB avg or above batters.
Yes, coaches can manage to screw players up, but looking at past performance, Nady is way overproducing this year, and can expect a crash back to near 286/337/476 any time soon. His offense is a hair above average for a corner of. Moss walks more, gets more singles and doubles, but fewer HRs than Nady, Overall, his slash lines are nearly the same 279/338/463.
LaRoche has had a poor 2008, I suspect mostly due to the hand injury in spring training. Currently, his projection is 265/346/449. At the end of last year, it was 278/347/481. ML avg 3b has a BsR of .135 in 2007 – LaRoche is .140 now, .149 at start of year, Bautista is now .127, and has been between .124 and .137 since 2001.
Nady is 29, and can not expect any help from aging curves. Moss and LaRoche are 24 and 25, and still have a chance, on average to increase their power.
Moss and Nady are about equal, slightly above avg bat for corner of. LaRoche is above avg for 3b, well above Bautista. Pearce is above avg for rf or 1b, but below Bay.
Having Pearce, Moss and LaRoche replace Nady, Bay and Bautista, as a group, is fairly even. But the two trades also netted three starting pitchers (Karstens, McCutchen & Ohlendorf), a reliever (Hansen) and two lower minor league players – all eight decent to good prospects.
What there would you disagree with, other than maybe Nady is better than a .286 hitter?

13. TheScout says:

The best year someone publishing stats will have is 70%. At best. I take about 7 projections, average them out and I go from there. I love when sites show you 25 guys they “nailed” actuals vs Projections. I laugh. I can give you 50 I missed, and 50 I “nailed” too. It’s silly.

14. Brian Cartwright says:

True.
I am working on an article for FanGraphs that shows the error rates of projections. If the formula says a guy is a .340 wOBA, what is the actual distribution of results?