Can We Project College Hitters?

When I started collecting batting stats to put in my database, I thought no, and skipped over the college data. Later, I had a change of heart and began adding the college batting lines to the database, and was frankly amazed at the results.

 When a player changes levels from one year to another, his true talent changes over that time much less than the talent level of the pitchers and defenses that he faces, and the ballparks he plays in. Sampling players who have played at each level, and comparing their results, allows us to apply those factors to the players who have not yet played at the higher level.

Let me quickly review the process:

1. Seperate the data into two buckets, one for major league, and the other for the level to be projected. Weight by the lowest of the plate appearances.

2. Sum both buckets, group by league, and compare the totals for each component. This gives the MLE factors.

3. Apply park factors within a league to the MLE factors for that league.

4. Apply combvined factors to each batting line to normalize

5. Sum into a single batting line for each season

6. Weight each season by 0.7 times the season following, basically 10-7-5-3-2-1

7. Sum weighted seasons into a single batting line for each player

8. Add regression to league mean

 

When analyzing the results, one must always remember sample size. I have found that below 600 to 800 weighted plate appearances there is a higher incidence of projections that don’t correlate well with later projections with larger sample sizes for the same player. College players who are three year starters will normally have from 500 up to 650 weighted PAs, on the bottom edge of reliability. Major league starters, with many years of service, can have in excess of 2000 weighted PAs in their projections.

 

The best example of a small sample size in college stats in Rickie Weeks. In 2003, Weeks batted .500 at Southern, with 16 HRs, 46 BB and only 17 SOs. That one season of college batting translated to 329/414/593, a .426 wOBA, .063 HR% ..115 BB% and .140 SO% – superstar levels, but in only 226 PAs. The rates seemed fantastic, and made Weeks the first pick in the draft by the Milwaukee Brewers. By the end of 2004, with a year and a half of minor league stats added to his college stats, good for 792 weighted PAs, Weeks projection then stood at 270/361/445, .352 wOBA, .033 HR%, .092 BB%, .180 SO%. As soon as he had attained sufficient sample size, he was clearly in the ballpark of his current 2008 projection of 254/357/422, .346 wOBA, .038 HR%, .110 BB%, .208 SO%. Although he’s had some up and down years, Weeks wOBA projections have been very consistent, .355, .352, .348, .351 and .346 each of the last five seasons, which is still above average (.327) for a major league secondbaseman.

 

To offer some proof that this works, here’s a spreadsheet of some recent players for whom I have college, minor league and major league data, run seperately. Where the players have limited MLB experience, I have included a “Pro” line which is both minor and major league. Almost all of the players are consistent over all three levels. HR% in college appears to have the highest frequency of discrepancy, but I am currently not using an park factors for the college data.

 

Here’s a summary of the performance of the 1st round drafts picks from college from 2006 to 2008. A more detailed report, for the same players, can be found here.

 

Player Level Org Age Pos Bats BA OB SA wOBA RAA
Wallace, Brett AA STL 21 3b L 0.312 0.381 0.526 0.391 30.7
Wieters, Matt AA BAL 22 c S 0.301 0.382 0.503 0.383 26.7
Kulbacki, Kellen A+ SDN 22 of L 0.291 0.364 0.532 0.383 26.4
Alonso, Yonder A+ CIN 21 1b L 0.267 0.367 0.491 0.371 20.3
LaPorta, Matt AA CLE 23 of R 0.259 0.345 0.514 0.367 18.4
Smoak, Justin A TEX 21 1b S 0.268 0.347 0.509 0.367 18.1
Brown, Corey A+ OAK 22 of L 0.259 0.340 0.515 0.364 16.8
Beckham, Gordon A CHA 21 ss R 0.276 0.342 0.505 0.363 16.1
Alvarez, Pedro NCAA PIT 21 3b L 0.276 0.346 0.496 0.362 15.9
Longoria, Evan MLB TBA 22 3b R 0.275 0.343 0.490 0.358 13.6
Dykstra, Allan A+ SDN 21 1b L 0.241 0.355 0.456 0.355 12.0
Cooper, David A+ TOR 21 1b L 0.287 0.342 0.480 0.354 11.4
Mills, Beau A+ CLE 21 2b L 0.271 0.333 0.481 0.350 9.5
Posey, Buster A- SFN 21 c R 0.294 0.354 0.434 0.346 7.5
Weeks, Jemile A OAK 21 2b S 0.271 0.339 0.440 0.341 4.5
Flaherty, Ryan A- CHN 21 ss L 0.278 0.336 0.443 0.339 3.8
Donaldson, Josh A+ OAK 22 c R 0.259 0.321 0.445 0.332 0.2
Arencibia, J.P. AA TOR 22 c R 0.272 0.310 0.466 0.332 0.0
Davis, Ike A- NYN 21 1b L 0.273 0.323 0.444 0.332 -0.1
Coghlan, Chris AA FLO 23 3b L 0.272 0.341 0.406 0.331 -0.5
Payne, Danny A SDN 22 of L 0.248 0.355 0.363 0.328 -2.3
Forsythe, Logan A- SDN 21 3b R 0.259 0.337 0.398 0.327 -2.8
Stubbs, Drew AAA CIN 23 of R 0.253 0.328 0.403 0.323 -4.9
Havens, Reese A- NYN 21 ss L 0.249 0.319 0.413 0.322 -5.5
Antonelli, Matt MLB SDN 23 2b R 0.242 0.332 0.372 0.317 -8.1
Borbon, Julio AA TEX 22 of L 0.286 0.321 0.391 0.314 -9.4
Gillaspie, Conor MLB SFN 20 3b L 0.268 0.315 0.400 0.314 -9.5
Doolittle, Sean AA OAK 21 1b L 0.247 0.314 0.399 0.314 -9.6
Colvin, Tyler AA CHN 22 of L 0.257 0.296 0.431 0.312 -10.3
Burriss, Emmanuel MLB SFN 23 ss S 0.265 0.321 0.329 0.295 -19.2
Williams, Jackson A SFN 22 c R 0.229 0.295 0.359 0.291 -21.4
Mangini, Matt AA SEA 22 3b L 0.234 0.296 0.352 0.289 -22.4
Castro, Jason A- HOU 21 c L 0.230 0.299 0.343 0.288 -23.0
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: