Can We Project College Hitters?
November 13, 2008 Leave a comment
When I started collecting batting stats to put in my database, I thought no, and skipped over the college data. Later, I had a change of heart and began adding the college batting lines to the database, and was frankly amazed at the results.
When a player changes levels from one year to another, his true talent changes over that time much less than the talent level of the pitchers and defenses that he faces, and the ballparks he plays in. Sampling players who have played at each level, and comparing their results, allows us to apply those factors to the players who have not yet played at the higher level.
Let me quickly review the process:
1. Seperate the data into two buckets, one for major league, and the other for the level to be projected. Weight by the lowest of the plate appearances.
2. Sum both buckets, group by league, and compare the totals for each component. This gives the MLE factors.
3. Apply park factors within a league to the MLE factors for that league.
4. Apply combvined factors to each batting line to normalize
5. Sum into a single batting line for each season
6. Weight each season by 0.7 times the season following, basically 10-7-5-3-2-1
7. Sum weighted seasons into a single batting line for each player
8. Add regression to league mean
When analyzing the results, one must always remember sample size. I have found that below 600 to 800 weighted plate appearances there is a higher incidence of projections that don’t correlate well with later projections with larger sample sizes for the same player. College players who are three year starters will normally have from 500 up to 650 weighted PAs, on the bottom edge of reliability. Major league starters, with many years of service, can have in excess of 2000 weighted PAs in their projections.
The best example of a small sample size in college stats in Rickie Weeks. In 2003, Weeks batted .500 at Southern, with 16 HRs, 46 BB and only 17 SOs. That one season of college batting translated to 329/414/593, a .426 wOBA, .063 HR% ..115 BB% and .140 SO% – superstar levels, but in only 226 PAs. The rates seemed fantastic, and made Weeks the first pick in the draft by the Milwaukee Brewers. By the end of 2004, with a year and a half of minor league stats added to his college stats, good for 792 weighted PAs, Weeks projection then stood at 270/361/445, .352 wOBA, .033 HR%, .092 BB%, .180 SO%. As soon as he had attained sufficient sample size, he was clearly in the ballpark of his current 2008 projection of 254/357/422, .346 wOBA, .038 HR%, .110 BB%, .208 SO%. Although he’s had some up and down years, Weeks wOBA projections have been very consistent, .355, .352, .348, .351 and .346 each of the last five seasons, which is still above average (.327) for a major league secondbaseman.
To offer some proof that this works, here’s a spreadsheet of some recent players for whom I have college, minor league and major league data, run seperately. Where the players have limited MLB experience, I have included a “Pro” line which is both minor and major league. Almost all of the players are consistent over all three levels. HR% in college appears to have the highest frequency of discrepancy, but I am currently not using an park factors for the college data.
Here’s a summary of the performance of the 1st round drafts picks from college from 2006 to 2008. A more detailed report, for the same players, can be found here.