What are the chances?

My my, baseball is quite the strange game, isn’t it?:

  • As late as about a week and a half ago, Baseball Prospectus‘s playoff odds report had the Mets’ chances of not making the playoffs well below 1%.  The Mets were as sure a bet as there was in the NL.  Apparently, they didn’t have the teamwork to make the dream work.
  • The Colorado Rockies won 12 of their last 14 games to force a playoff for the NL Wild Card spot.  Their Pythagenpat projection has them rated as a “true” .557 team.  The chances that such a team would win 12 of 14 games are 1.92%.
  • Speaking of Pythagorean results that make no sense, the Arizona Diamondbacks were outscored this year(!), 732-712.  They finished with the best record in the National League.  Last year, the Indians underperformed their Pythagorean projection by 11 wins.  This year, the D’Backs did the exact opposite.  Chances of that happening randomly?  34 in 10,000.
  • The Yankees made the second half a heck of a lot more interesting than I thought they would.
  • Pythagorean win percentage: An formula which tells you that despite the fact that another team is piled on top of each other after winning the World Series, your team was actually better this year.  And we can prove it.  (Gee, that makes me feel better.)

But then again:


Brains vs. Braun

Fellow MVNer Daniel Rathman, who writes the outstanding Baseballistic column, today asks the question of who will win the NL Rookie of the Year Award; Troy Tulowitzki of the Colorado Rockies or Ryan Braun of the Milwaukee Brewers.  Daniel picks Braun:

Who is more worthy of the award, a superior defensive shortstop with passable offensive numbers and plus power, or an absolute offensive juggernaut with a wooden glove?
Since defense tends to be overlooked in favor of offense, the choice will likely be Braun.
Though Tulowitzki’s season has been excellent in its own right, I think that Braun’s ridiculous offensive performance simply dwarfs it, in spite of his shoddy defense.

I beg to differ.  I agree that Braun will win, but I say that Tulowitzki should win. 
I can’t argue with two things.  Ryan Braun has looked like the second coming of Albert Pujols offensively and Troy Tulowitzki enjoys the cool mountain air of Coors Field 81 days per year.  The assumption is that Tulowitzki’s numbers (.292/.362/.475, 23 HR), which are nice for a shortstop, wouldn’t be so nice if it weren’t for that Colorado air.  Indeed, as Daniel points out, Tulowitzki’s home/road splits show him to be a mere mortal on the road.  Before blaming the Colorado air, let’s look at a couple things:
Baseball-Reference lists Coors Field’s park effect rating for 2007 as 107.  (Braun’s Miller Park rates at 100, which is perfectly neutral.)  That does mean that Coors tilts toward hitters, but this isn’t your father’s Coors Field.  Most folks still have dreams of the old (pre-humidifier) Coors Field, which had park effects from 1999-2002 of 129, 131, 122, and 121.  Tulowitzki got some bounce, but let’s not overstate it.  Still, Braun is still (clearly) the better hitter by a lot.  Baseball Prospectus‘s VORP for Rookies (which really only looks at hitting, if I’m not mistaken), rates Braun in 1st place among rookies and Tulowitzki in third (Hunter Pence… remember him?… is in second place).  Voters (and fans) are impressed by offense because there are really good statistics to measure offensive contributions (not that those are the ones that are used… but those stats do exist.)
But can the effect of fielding be measured in the same way as batting?  Can a team really be better off with an all-field, no-hit player rather than a monster hitting masher with a hole in his glove?  In a word: yes.  In two words: Adam Everett.  It’s hard to tell what Braun would have looked like playing short (OK, it’s obscenely easy to tell what would have happened in that case) or what Tulowitzki would have done playing third, but it’s pretty clear that Tulowitzki is the better fielder.  In fact, take a look at Tulowitzki’s Fielding Runs Above Replacement (FRAR) over at Baseball Prospectus.  The shortstop tends to be the best defender on a team and Tulowitzki is 44 runs(!) better than a scrapheap shortstop (and 22 more than an average one).  That’s amazing, outstanding, demanding, and commanding.  Tulowitzki is also adds 9 runs above replacement with his hitting.  Braun is a better hitter, 48 runs better than a scrapheap third baseman, but 8 runs worse than that scrapheap third baseman when it comes to matters of the glove.
Overall net effect of playing Tulowitzki rather than a AAA callup, marginal utility infielder type, or waiver wire pickup shortstop: 72 runs.  Net effect of playing Ryan Braun over a replacement third baseman: 40 runs. 
Another way to look at it is this: had Tulowitzki stayed the same on defense, but had the offensive output of Stephen Drew of the Diamondbacks (.233/.310/.361), he would have still been more valuable of a player than Braun.
Yeah, Ryan Braun will win the trophy and he really is an outstanding hitter.  But Tulowitzki will win you more ballgames and I nominate him as the thinking man’s pick for NL ROY.  Baseball is a game played in half-innings and a player can make a contribution in both halves.  It’s just that very few folks recognize one of those halves.

The "toughest out" study, redux

I never expected the “toughest out” study to be much of anything.

The “toughest out” study, redux

I never expected the “toughest out” study to be much of anything.  I was bored one night, fooling around a little bit with some Retrosheet data files, and thought I might test out the old “What’s the toughest out?” question.  Then, Rob Neyer from ESPN linked to me, and it became my all-time most read piece.  All this for a study that I did in about 15 minutes.  Of course, the stuff that takes me hours of detail-oriented work to do gets read by five people.
When writing the piece, I knew that I wasn’t really doing the study much justice.  I didn’t control for batter or pitcher quality and my sampling methods were based on how quick I could get the study done.  A few good commenters pointed out a few possible improvements, and then (insert spooky sound effects here), Bill James himself visited me in a dream last night and told me the spirits of Sabermetrics were angry at me for shirking my duty.  (OK, not really.)  So, here is the “toughest out” study, done properly.
First, the ever reliable Tango Tiger suggested that I look at the overall league OBP for the plate appearances when there had already been 1 out recorded, then 2, and so on.  Fair enough.  (A small confession: what I calculated wasn’t exactly OBP.  Because of my database set up, I had to make do with whether an out had been recorded in each plate appearance.  The great majority of those outs were made by the batter, but occasionally a batter singles, but his idiot teammate gets thrown out at third.  There are also times when a batter strikes out, but reaches first on a passed ball.)  My data set is almost everything that happened in 2006, throwing away my original stipulation that the only interesting things to look at were the games in which all 27 outs had been recorded.  (This threw away all ninth inning comebacks by the home team, as well as all home wins in which the bottom of the ninth was superfluous to requirements.)  I didn’t look at outs that were recorded on caught stealings or pickoffs.
The out with the highest OBP?  The 17th out (which came in 2nd place in the original study) with an OBP of .3614.  It was followed closely by the 9th out at .3608, then the 10th out, 2nd out, and 1st out.  The easiest out to get was the 25th out (1st out of the ninth inning) with an OBP of .3071.  So the difference between the highest and the lowest is .054, which is one non-out in twenty plate appearances.  Not a huge difference, but definitely a difference.  The 27th out was actually the 6th easiest to come by.
Still, in my original study, the 1st out was the most difficult to come by.  Several folks properly pointed out that this had something to do with the fact that the first person up in a game is the leadoff hitter and, Juan Pierre not withstanding, the leadoff guy is usually a high OBP guy.  So, it’s important to control for the batter’s ability to avoid outs and the pitcher’s ability to induce them.
I calculated OBP for all pitchers and hitters over the course of the season and converted them into odds ratios.  For those who aren’t familiar, an odds ratio takes the probability (p) from a yes/no question (Did the batter make an out or not?) and turns it into something that is much more easy to work with mathematically.  The formula is p / (1-p). 
Now, suppose that Larry is pitching and he has an OBP against of .333.  He is facing Neifi (a name I just pulled out of nowhere) who has an OBP of .200.  Neifi’s odds ratio is .200 / (1 – .200), which is (.200 / .800), or 0.25.  Larry is at .333 / (1 – .333) or 0.5.  What is the expectation that this confrontation will end up without making an out?  We can find it with the following formula:
(batter OR / league OR) * (pitcher OR / league OR) = (expected OR / league OR)
I had to calculate the OBP for all at-bats in the 2006 season (.3409 for the curious, which may not match up to other sources, but remember I’m using a slightly definition for the purposes of this study), but the rest is just plugging numbers into the formula and solving.  Once we’ve got the expected OR, it’s easy enough to convert it back into a probability.  p = OR / (OR + 1).
Given all that, we can figure out what the expected OBP would be for any given plate appearance and by summing a few things up, what the overall expected OBP would be for all PA’s at a specific level of outs.  Then, we can compare what the actual OBP was for that number of outs versus what could be expected given batter and pitcher quality.
The toughest out to get using this formula?  Still the 17th out.  It had an expected OBP of .344 given who was batting and pitching at that time, but had an actual OBP of .361 for a difference of .017.  Following close behind it were the 9th, 12th, and 14th outs.  The easiest out to get was still the 25th out, followed by the third, and the fifth out.  I couldn’t discern any kind of pattern running through the numbers.  Maybe I’ll take a look at a few other years to see whether certain outs are tough to come by from year to year.
The first out of the game actually drops down into 16th place.  Interestingly enough, the expected OBP for the batter/pitcher matchups that tried to produce the first out was .3556, while the actual OBP was .3551.  The first out of the game is almost exactly as hard to come by as one would expect given the people who generally bat (and pitch) there.  The 27th out actually had a similar pattern, with an expected OBP of .3279 and an actual OBP of .3282.  The last out of the game isn’t any harder (or easier) to get than one might expect given the batter/pitcher matchups that happen there.

My 2007 Cy Young Ballot

Earlier this week, I took a look at the races for the Most Valuable (offensive performance by a) Player (who is not a pitcher) Awards for both leagues, and where my votes would go if BBWAA had remembered to send me a ballot.
Now, on to the Cy Young Awards for each league.  Only pitchers need apply.  And for crying out loud, don’t you dare look at the Cy Young Predictor at ESPN.  Yes, I know it was created by Bill James.  Bill James wanted something that predicted the way that voters actually voted: using wins, saves, and ERA (Like, LOL!).  Surely even Da Vinci drew a picture of a clown at some point.
By the way, Cy Young ballots actually only have three places on them, but I prefer the MVP-style 10 man ballot.  Anyone know why there’s a difference?
American League
How do we determine value in a pitcher?  Well, let’s start off with Baseball Prospectus’ VORP ratings for pitchers.  Then, let’s look at the three things that a pitcher can control: walks and strikeouts (K/BB ratio) and home runs (HR/9 innings).  Finally, even though it’s not a good idea to do this, let’s look at win probability added (WPA) by each pitcher.  The problem is that WPA for pitchers says that everything that happens on the field is the responsibility of the pitcher, leaving out the contributions of the defense, but it’s a decent rough marker.
Let’s look at who’s in the Top 10 on more than one of these lists.
C.C. Sabathia, Johan Santana, Fausto Carmona, Josh Beckett, Erik Bedard, Roy Halladay, Rafael Betancourt, J.J. Putz, Jonathan Papelbon, Joe Nathan, Joakim Soria (yeah, that’s right, Joakim Soria).  Others who will get consideration for all the wrong reason include Daisuke Matsuzaka (because he’s Japanese, which writers often confuse with “good”), Chien-Ming Wang (plays in Yankee Stadium), and Kelvim Escobar/John Lackey (because apparently the Los Angeles California Angels of Anaheim, California near Los Angeles made the playoffs, and thus, need to have a representative… not that those two have been bad pitchers mind you…)
Betancourt isn’t one of those magical “closers” which apparently means that he’s not a good pitcher, so he won’t get more than a handful of votes. Betancourt is second overall in WPA (behind Putz), 9th in home runs allowed, and has the best K/BB ratio in the AL among those with more than 60 IP.  What’s (tragically) funny is that Betancourt’s teammate Joe Borowski, who has been far inferior to Betancourt this year, will get consideration based on leading the AL in saves.  Pardon me while I bang my head on the desk.
Staying on the Indians kick, VORP has Carmona in 2nd, with C.C. in first.  (Side note as an Indians fan getting ready for the playoffs: Awwww yeah….)  In K/BB ratio, Sabathia trails only Betancourt, Mariano Rivera, and J.J. Putz.
Speaking of Putz, he’s 2nd in K/BB and 1st in WPA, but comes in 26th in VORP (which admittedly is unkind to relievers because they face fewer batters… Betancourt is 23rd).  Putz also comes in 34th in HR rate.  (In fairness, Sabathia comes in 17th and Carmona is in 28th.)
This is a tight one, and I can see an honest first-place vote going to Putz.  But, in a tight race (C.C. vs. J.J.?), my bias is toward a starter that breaks into the top ten of a category, K/BB, usually dominated by relievers.  The fact that the most logical choice plays on a team that I’ve spent 20 years of my life rooting for never hurts.  My non-existent first place vote goes to C. C. Sabathia.
Santana won it last year, and I must repeat, he is not having a bad year.  Yes, he’s gone 15-13, and that’s near .500.  If you were thinking of using that argument, please read this.  It’s the same reason that Josh Beckett shouldn’t be handed the award just because he has 20 wins.  Bedard and Halladay are both having fantastic years on irrelevant teams.  Putz, Papelbon, and Nathan are elite relievers.  Soria probably surprised some people by his inclusion.  He gives up very few HR (5th on that list), and is 8th on the WPA charts.  He’s another guy who is not the best pitcher in the American League, but maybe deserves a little love from the voters since they have to pick a top ten.  He won’t get any because he plays in Kansas City.
My ballot: Sabathia, Putz, Carmona, Beckett, Santana, Bedard, Betancourt, Papelbon, Nathan, Halladay, and an 11th place honorable mention to Soria.
National League
Let’s round up some candidates, shall we?  Same method as above.
Jake Peavy, Brandon Webb, Brad Penny, Tim Hudson, John Smoltz, Roy Oswalt, Aaron Harang, Chris Young, Brandon Lyon, Carlos Marmol, Heath Bell, Takashi Saito.  Others who will get consideration for all the wrong reasons include Carlos Zambrano (after all, he signed that big contract) and Francisco Cordero (because he collected a lot of saves), and Billy Wagner (ditto).
Peavy is first in VORP, first in WPA (by a lot!), 13th in K/BB, and 10th in HR allowed.  Jake Peavy is running away with the NL Cy Young.  The trophy should be sent over to the engravers now.
Penny is 2nd in VORP, 6th in WPA, and 9th in HR allowed, but his K/BB is in the middle of the league.  Webb is just behind Penny in VORP, WPA, and HR, but carries a higher K/BB. Hudson is generally slightly behind Webb.  Smoltz and Oswalt are generally behind Hudson, with the notable exception of Smoltz still having an excellent K/BB ratio.
If you want to see something fun, go to Google and type in “Aaron Harang” “Cy Young”.  Harang is one of those guys who makes a fantastic Roto league pick.  Last year, he led the NL in strikeouts and wins and didn’t get a vote for the 2006 CYA.  This year, the residents of Cincinnati are campaigning to get Harang a little recognition.  I sympathize with a really good pitcher whom no one knows exists among general baseball fans, and Harang is top ten in VORP, WPA, and K/BB, but I’m not on the “Aaron Harang is the best pitcher in baseball” bandwagon.  He’s in the range of “really good”, and if the Cy Young had ten ballot places, he would be in one of mine.  But, he’s not quite that good.  Again, I know he’s 16-4, but won-loss record isn’t much of a barometer of pitcher quality.  It tells me much more about a team’s quality than anything.  Sorry Cincy.
Chris Young’s BABIP this year is .242.  He’s gotten lucky.
And now the relievers.  Lyon.  Bell.  Marmol.  The number of saves that those three have combined for this year?  Five.  Two for Lyon, two for Bell, one for Marmol.  Yet, of the four relievers that my method has identified, only one of them (Saito) collects saves for a living.  Bell and Marmol rate ahead of Saito in VORP.  All three set-up guys are top 10 in HR allowed.  Saito isn’t even Top 30.  Saito does kick butt on K/BB (1st in the league) and is second to Peavy in WPA (Bell is 3rd, Lyon is 8th, Marmol is 9th).  It’s at least in the realm of reasonable statements to say that the NL’s top reliever is not a closer at all, and depending on where you want to put Saito, maybe the top three!  Saito’s BABIP is a ridiculously low .219, which means that his uppance shall come.  I’d put Bell in front of Saito, with Lyon and Marmol trailing closely behind, but feel free to re-arrange as you see fit).
My ballot: Peavy, Penny, Webb, Bell, Saito, Hudson, Smoltz, Oswalt, Harang, Marmol.
So StatSpeak endorses Jake Peavy and C.C. Sabathia.  Let’s see how many voters read StatSpeak.

On second thought, the universe is doomed

A few days ago, I was hopeful for the universe.  Then, I read this interview with Blue Jays manager John Gibbons at Baseball Prospectus.  This is one of those subscription-needed pieces.  Because of that, I’ll respect BP’s copyright and simply summarize a few observations that I had on the piece.

  • Gibbons identifies Alex Rios as a player who “could steal 30-40 bases.”  Yet, Rios has a speed score (the ones I created) of 0.28, which rates him at “a little above average” (zero is average).  Then, as an afterthought, he mentions Vernon Wells, who actually has a higher speed score (0.45).  Now that’s knowing your personnel!
  • Read his answer to the question on whether there is an organizational philosophy on running.  Gibbons’ answer is a thing of ugly.  If you can make heads or tails out of that paragraph, you win a cookie.
  • He says that he wouldn’t bring his closer into the seventh inning.  After all, who will pitch the ninth?  (Remember, a closer does the same thing as a middle reliever.  He just does it in the ninth inning!)

While I’m in the neighborhood, it looks like the ever-mysterious Player AR struck again last night in the Yankees-Red Sox game.  Bases loaded, two outs, bottom of the ninth, down one run.  Pop up to shallow center.

My 2007 MVP Ballot

For some reason, my 2007 Awards ballot has not yet arrived from the Baseball Writers’ Association of America.  I write about baseball associated things and I’m an American.  So, since the Chicago postal system seems to be running slow, I guess I’ll have to submit my ballot the old-fashioned way, by posting it on the Internet.  This year, I offer you a study in two contrasting races.  The AL trophy might as well be engraved now.  The NL trophy isn’t so clear.  In true StatSpeak style, I prefer to use some of the more advanced metrics available to us.
Most Valuable (Offensive) Player Who Isn’t a Pitcher
American League
Alex Rodriguez, 3B, New York Yankees. 
Well, that one was easy.  As much as it pains me to say anything positive about a Yankee, A-Rod deserves it.  Most Valuable Player awards often come down to the usual arguments.  Most is an adverb modifying the adjective and means “the highest level of.”  Player means someone who participates in the game of baseball (although we occasionally get the argument of whether a pitcher is eligible for this award, what with the Cy Young Award available… I’m a fan of keeping pitchers out of the MVP and it’s my ballot!).  Valuable… now that’s one that eludes definition.  Thankfully, A-Rod is the consensus choice on just about all of the usual criteria used for MVP voting, including “The guy who had the most home runs, or perhaps the most RBI,”  “The guy on a playoff team who was the sine qua non (if not for him, the team would not have been in the playoffs),” “The best player on the East Coast,” and “The guy who’s having a good year and has won it in the past few years.”  A-Rod taking home the AL MVP in a few weeks is a better bet than some third world countries still being around bt then.
But, for what it’s worth, A-Rod as an MVP makes sense from a statistical point of view too.  He leads the AL in VORP, win probability added, context neutral WPA, and Batting Runs Above Average.  He’s been that good this year, and for $30 million per this off-season, he could be yours.
But, what about the rest of the ballot?  They don’t give out “Second Most Valuable Player” (but they should…), although the ballot that the BBWAA sends out asks for a first through a tenth place vote.  Given the last four statistically based criteria I used above, the following players make the top ten list on those stats more than once: Magglio Ordonez, David Ortiz, Jorge Posada, Curtis Granderson, Vlad Guererro, Carlos Pena, Victor Martinez, B.J. Upton, and Jim Thome.   Other players who will get a look because of one of the “other” criteria I listed include Justin Morneau (won it last year, having a decent although not earth-shattering year… plus he’s Canadian!), Derek Jeter (a true Yankee), and Placido Polanco (BBWAA members love opera). 
See a surprising name or two in there?  Carlos Pena ranks 8th league-wide in VORP, and B.J. Upton is 11th.  Pena is fifth in context neutral WPA, Upton is 7th.  They both play for the Devil Rays.  Neither one will get many votes and will be left off some ballots altogether because ESPN is only aware of two of the five teams in the AL East, but both players deserve a second look.  In fact, both will finish behind David Ortiz, who will get votes based on the fact that he lives in Boston, the “intangibles” he brings to the club, and on his clutch reputation, even though this year he’s been the fourth most anti-clutch hitter in the AL!  (For the record: Ortiz has had a better year and belongs ahead of them… I’m just saying it’ll be for all the wrong reasons)  Ordonez and Granderson will take a hit due to the Tigers’ late season collapse, as will any ideas that Gary Sheffield had of winning the award.
Ordonez deserves second place.  He’s 2nd in VORP, 2nd in WPA, 3rd in context neutral WPA, and 2nd in BRAA.  Ortiz places third in VORP, 4th in WPA, 2nd in context neutrap WPA, 3rd in BRAA and 3rd on my ballot.  The rest of the top ten, in order: Vlad, Pena, Posada, Martinez, Granderson, Upton, Thome.
National League
This one isn’t as clear cut.  VORP says it’s Hanley Ramirez, BRAA says Chase Utley, WPA and context neutral WPA say Prince Fielder.  Using my “appear more than once in the top ten” test, we get Ramirez, David Wright, Chipper Jones, Miguel Cabrera (all 16 tons of him), Albert Pujols, Fielder, Utley, Matt Holliday, Barry Bonds, and Adam Dunn.  Others who will get consideration for all the wrong reasons include Ken “nice comeback, we still like him better than Bonds” Griffey, Ryan “won it last year, good-but-not-great year this year” Howard, Ryan “I’m sooooo going to win Rookie of the Year” Braun, and Jose “wait a minute, how is he not in this discussion, especially given that he plays for the Mets” Reyes.  (Reyes is 11th in VORP, but in the 30s-40s on the other stats.  Yeah, I know he’ll steal 80+ before it’s all over.) 

  • Ramirez is 1st in VORP, but 20th in WPA, 7th in context neutral WPA, and 15th in BRAA.  Not bad, but not something that screams “I’m a walkaway winner!”
  • Fielder is 3rd in BRAA and 6th in VORP (quite nice)
  • Utley is 7th in VORP, 9th in WPA and 4th context neutral WPA. 
  • But wait, David Wright is 2nd in VORP, 2nd in context neutral, 5th in WPA, and 6th in BRAA…
  • …and Chipper Jones is 3rd in VORP, 6th in context neutal, 4th in WPA, and 2nd in BRAA.

VORP is generally the more complete stat, although Fielder suffers from the fact that replacement level for a first baseman is higher than for any of the other infielder positions (which the other folks play).  Ramirez leads VORP, but the two guys behind him generally outdo him in the three other categories, despite not being the leaders in any of them.  Fielder will win, if only for the fact that he’s leading the league in the one stat that writers are guaranteed to look at: home runs.  But, when you look at a few more advanced stats, it becomes a little more cloudy.
Dunn is having another one of his 40 HR, 100 BB, 170 K’s, which means that, once again, about half of his PA this season will end in one of those three outcomes.  That’s consistency!  Bonds will get some votes because he’s Barry Bonds and will be left off some ballots because he’s Barry Bonds.  Until such time as there’s evidence that he… ummm… seeing that O.J.’s back on trial, using the word “Juiced” just doesn’t seem right… anyway, he’s staying on my ballot.  Pujols, Holliday, and Cabrera are doing what they normally do on teams that will not make the playoffs.  (I know, the Rockies are still kinda in it.)
My ballot reads: Fielder, Wright, Jones, Ramirez, Utley, Cabrera, Pujols, Holliday, Bonds, Dunn.  The thing is that I could make a case to re-arrange that ballot in any number of ways.  If I were a betting man (I’m not), I’d lay money on Fielder.  It’ll be close when the actual balloting is held, and several guys will get first-place votes, and it’s not out-of-bounds that they do.  Fielder gets the nod as “most” valuable player, slightly above the rest of the pack.  Remember, someone’s gotta win.
But feel free to argue that I’m wrong.  I’ve been wrong before; just ask my wife.  But do come back later in the week when we’ll discuss an award named after the all-time most losingest pitcher in all of baseball history.