2007 Sabermetric Year in Review: San Francisco Giants

Continuing our reverse alphabetical tour of MLB, StatSpeak heads west to the C-state for stop #7: San Francisco, which, I might add is about to get the Lilo and Stitch treatment.
Record: 71-91, 5th in NL West.  For a team that had that much attention paid to them in the past year, they were… a last place team.
Pythagorean Projection (Patriot formula): 77.03 wins (683 runs scored, 720 runs allowed). 
Team Statistical Pages:
Baseball Reference
Baseball Prospectus
MVN Blog:
Giants Cove 
Other Giants Resources:
Latest News
Contract Status
Trade Rumors
Overview: Let me see here.  Did anything happen in 2007 of any importance in San Francisco? I’m not coming up with anything, except that the American Psychological Association held its annual convention there.  (I went.)  Must have been that kind of year.  No huge storylines.  No controversy.  Just your basic baseball season.  They did have the All-Star Game, which must have been fun.
What went right: Cain, Lowry, Lincecum.  Has that Smoltz, Glavine, Avery feel to it, doesn’t it?  I suppose that they can argue amongst themselves which one gets to be Steve Avery, but things worked out pretty well for that threesome of pitchers, eh?
Don’t let the record fool you.  Cain lost 16 games, but posted an ERA of 3.65.  His weakness is that he walks too many batters (3.56 per 9 innings), but he was also one of the better strikeout starters in the league last year.  Take a look at his plot for the amount of break on his pitches.  You’ll see that his fastballs are all generally within one blob, suggesting that he has a good idea of where the fastball is going, which is probably why he throws it more than 60% of his pitches.  With his off-speed/breaking stuff, on the other hand, there are a few curves and sliders and changes that seem to be little islands unto their own.  Cain is 22, and has time to learn to control those pitches.  He also gives up a lot of flyballs, but he’s right-armed and lives in a spacious park that is murderous on left-handed power hitters (or at least so the reputation goes).  Cain is able.
Lincecum struck out more than a batter an inning, induced ground balls in 47% of the balls hit off of him, and had a line drive rate of 15.4%.  These are all good results.  He’s also got a 95 mph fastball, and a change and hook to go with it.  He’s also part of the ”I walk a few too many hitters (4 per nine innings)” club, which seems to be a problem with the Giants.  Maybe after seeing Barry Bonds walked so often, they just figured that’s what you’re supposed to do when facing a hitter.  Hmmm… Fantasy players, watch Cain and Lincecum’s walk rates early in the year.  If they’re going down, then buy buy buy buy buy.
Noah Lowry is being bandied about as possible trade bait.  He’s not awful, but he did walk as many batters as he struck out (5 per 9 IP).  He’s also 26, which means the ceiling isn’t quite as high.  But, people who aren’t paying attention might get him confused with Lincecum and Cain (who are 3 and 4 years younger) and assume that Lowry is also 22 or 23.  Maybe that will increase his value.  He’s also left-armed, so he’s looking more like Steve Avery every moment.
What went wrong:  I suppose to continue the above analogy, Barry Zito was supposed to be Greg Maddux, the former Cy Young Award winner free agent signing who would put the team over the top.  In fairness to Zito, he didn’t have a terrible season.  He threw 196 innings, put up respectable numbers, and hey for a fourth starter, I think most teams would be happy to have him in that spot in their rotation.  But, 7/126 is a set of numbers that will haunt the Giants for a very long time.  Six more years to be exact. 
There was one other little problem with the Giants this past year.  The offense was… offensive.  The Giants had three position players with a VORP above 10.  Barry Bonds (55.2) was one of them and he isn’t coming back next year.  The other two were Randy Winn (26.4) and Bengie Molina (14.4).  Pedro Feliz, Omar Vizquel, and Ray Durham, representing 3/4 of the Giants’ infield, all functioned below replacement level.  Even allowing that Feliz is one of the best fielding third basemen in the league, and Vizquel, even at 40-something, is still a premiere fielding shortstop, that can’t be healthy for a team.
Yeah, that about sums it up: And now a list of everyone under the age of 30 who logged more than 250 AB for the Giants this past year: Kevin Frandsen.
Oh yeah, him:  Congratulations to Barry Bonds.  We’re not entirely sure for what yet, but it’s clear that he did something this year.  I think more ink has been spilled on Bonds this year than perhaps the rest of the league combined.  Why waste more?
Brad Hennessey: Here’s another case of a hidden closer who deserves a second look.  The Giants installed Hennessey as their closer at the end of May after the Giants recognized that Armando… Benitez… sorry, I’m doubled over laughing here that Armando Benitez was allowed near the ninth inning.  By the looks of it, Hennessey was replaced by Brian Wilson after Hennessey had a few bad outings at the beginning of September.  During his tenure in the bullpen, Hennessey had 19 saves, 13 holds, and 5 BS, for a close lead protection rate of 84% (32/37), which stacks up decently against the rest of the league.  No one will argue that he’s an outstanding reliever and I wouldn’t want him as my first choice to close, but one could do worse.  He seemed to lose the job based on the fact that he had a few bad outings.  (Why do managers insist on playing the “hot hand?”)  Hennessey was a starter who didn’t really make it as a starter, and so he became a bullpen specialist (politically correct term for “reliever.”)  He doesn’t have electric stuff, but now he has “closing experience” (which he can parlay into at least a million more per year on his next contract.)  I’m not sure what the Giants have in mind for their bullpen this year, but they would do well to consider the reasons why that if Hennessey was good enough to close for them in July, he wasn’t good enough in September.  To me, it sounds like a team that’s clutching at straws.
Hooked on speed?:  Forget steroids.  It looks like the Giants are hooked on speed.  Take a quick look at the run-down of the Giants’ minor league system.  Focus your eyes on the columns marked SB and CS.  See some eye-popping numbers in there?  See them repeated?  The Giants had 15 players in their minor league system who stole more than 20 bases this past year and five who stole at least 40.  The Giants have apparently decided to turn their farm system into a rabbit breeding ground.  Parlez-vous organizational philosophy?  While that’s nice, speed is only helpful if one is on base to use it.  Only five of those 15 speed-demons had OBP’s above .350.
Outlook: Well, let’s see.  Your team loses its biggest offensive weapon from an offense that wasn’t very good to begin with.  They were a last place team last year, even if you believe their Pythagorean record.  You do the math.  This is an organization that’s apparently building around pitching, defense, and speed, instead of… offense, I guess.  Call it the other Barry Bonds backlash.  Now that Bonds has his magic home run, what of the Giants?  They’ve basically existed for the last few years as a vehicle to get Bonds to 756.  Looks like it’s tme to rebuild.

Stats 204: The proximity matrix OR Re-visioning similarity scores

I suppose that when Bill James invented the similarity score, it was an attempt to say “Who exactly is this guy like?”  Is he the second coming of Joe DiMaggio (the power hitter who never strikes out), or is he the second coming of Dave Kingman (the power hitter who strikes out a little more often)?  Maybe he’s the second coming of Tommy Hinzo.  How can we tell.  Mr. James put together a formula that attempted to answer exactly that question.  The formula itself is based on a fairly simple system of “start with 1000” and subtract points for differences in various statistical categories.  It’s not an awful system and generally produces some decent comparisons, but mathematically, we can do better than that!
Let’s pretend that there are only two stats in baseball that matter: walks and strikeouts.  We might use raw numbers of BB and K, but it makes more sense to put them into rate form.  We might classify players, in a very rough way, as being players who neither walk nor strikeout much, players who walk and strikeout a lot, players who strikeout a lot, but don’t walk much, etc.  If we want to get more fine-grained, we can start saying medium or medium-low, etc.  Or if we want to find the player whose BB and K rates match most closely, we can start digging through the data.  If Player A strikes out 15% of the time and walks 7%, then Player B who strikes out 14.8% of the time and walks 7.1% is a good match.  Player C who strikes out 23% of the time and walks 5% isn’t a good match.  But, how good a match… or a non-match is he?  And what do we do when we get beyond two stats of interest.  How do we account for walks, strikeouts, and home runs, singles, or anything else for that matter?
Enter the proximity matrix.  Let’s go back to our “walks and strikeouts only” example.  We could plot walk rate and strikeout rate on a standard two-dimensional axis (graph paper), and label all the players.  They we could measure (with a ruler!) which player is the closest to any other player.  That works great when there’s only two variables.  Three dimensional graph paper (for three variables) is harder to come by, and by the time we get to four variables, well now we’re into hyperspace.  (Yes, I love Star Trek too.)  Fortunately, mathematics isn’t bound by such constraints, and it’s possible to calculate the distance between a point in four (or more, there’s no limit) dimensions.  It’s called the squared Euclidean distance.  In fact, we can get a matrix of how far away every player in our sample is away from every other player.  That’s the lovely thing about computers, they do all the heavy lifting, and do it in rather short order. 
And we can use whatever criteria or stats are of interest.  Want to look at player height and weight?  Want to look at career OBP and SLG and do it up to age 29?  Want to include every major leaguer ever?  Want to look at projected stats?  That’s fine.  Your CPU will groan a little more, but it can be done.  It’s just an engineering problem.
So, let’s run a little example.  Let me take the 2007 seasonal stats and calculate K rate, BB rate, and HR rate (all per PA), and BABIP.  I kept it to those hitters who had 200 PA or more (even though I spent way too much time arguing that more than 200 PA were needed for BABIP to be reliable enough to use… I’m just illustrating here), leaving me with 341 players.  I asked my computer to give me a proximity matrix.  (Technical note: I re-scaled everything to a range of -1 to +1, which mathematically makes things better.)
Then I tried to post this matrix so that everyone could see it.  The problem is that only 256 variables can be put into an Excel file (there are 341 players here), and when I tried to post it as pure text, the file reached 578 KB in size.  Google docs has a limit of 500 KB for text files.  If anyone wants the document, just e-mail me.  I prefer to keep everything I do open-source.
To give you an idea though of how it might work, and again only using the four stats above (more on that in a minute), let’s look at recent free agent debate-starter, Torii Hunter.  Whom, in terms of 2007 performance, did Torii most resemble?  Hunter hit a HR 4.3% of the time, struck out 15.5% of the time, walked 6.2% of the time, and had a BABIP of .306
Top 5 matches:

  1. Adrian Beltre (4.1%/16.3%/5.9%/.297)
  2. Brandon Phillips (4.3%/15.5%/4.7%/.307)
  3. Alex Gonzalez (3.7%/17.4%/5.6%/.301)
  4. Damien Easley (4.6%/16.1%/8.7%/.297)
  5. Ryan Garko (3.9%/17.4%/6.3%/.322)

You’ll notice that none of those gentlemen are center fielders by trade, which is something that James’s system does take into account, however imprecisely.   It’s my understanding that a categorical variable (primary position) can be entered into the matrix and that can be controlled for.  (I used hierarchical clustering… I believe that would be two-step clustering.)
Now, I picked these four stats because they were easy to calculate and they do a decent enough job of encapsulating a player’s performance over a year, and that was all I needed for a quick example.  I’m fully expecting that the careful reader out there is already thinking “But those aren’t the best 4 stats.  You need to include/take out/replace….”  And that’s fine.  In fact, I’m counting on it.  It’s an interesting question.  What suite of stats would work best in here?  What stats would fully encapsulate a player’s abilities?  In other words, when you compare a player to some other player, what type of criteria do you use to make the comparison?  Does it depend on the question you’re trying to answer?  Pitchers?  Defense?  Hmmm…

2007 Sabermetric Year in Review: Seattle Mariners

StatSpeak heads to the great Northwest to take a look at the Seattle Mariners, as our reverse alphabetical tour of MLB continues to stop #6.
Record: 88-74, 2nd in the AL West
Pythagorean Projection (Patriot formula): 79.15 wins (794 runs scored, 813 runs allowed).  The Mariners got a wee bit lucky, apparently.
Team Statistical Pages:
Baseball Reference
Baseball Prospectus
MVN Blog:
Caffeinated Confines
Other Mariners Resources:
Latest News
Contract Status
Trade Rumors
Overview: Remember when the Mariners were knocking on the door of the playoffs?  The 2007 season featured a manager leaving his job when the team was winning (and then Ichiro re-signed… total coincidence), a collapse near the end of the year that dropped them from playoff contention, and the team still won 88 games.  And everyone wondered when the Mariners would bring Sexson back.  (Somewhere out there, someone just made a resolution to hunt me down and smack me for saying that.) 
What went right: How’s this for a telling statistic?  Who on the Mariners received the most intentional walks in 2007?  Jose Guillen?  Adrian Beltre?  Nope, Ichiro Suzuki, singles hitter extraordinaire.  In fact, Ichiro has apparently been the most feared Mariner for the last six years (in one year tying with Raul Ibanez and John Olerud).  Ichiro-mania may have died down from sheer exhaustion (and Dice-K), but Ichiro himself is still going strong.  He doesn’t walk very much, but then again he doesn’t need to.  He didn’t lead the league in OBP, but the players above him, and many below were all power hitters who did it by crushing the ball deep.  How does he do it?  Simple.  He’s the fastest player in the league.
When I filled out my AL Cy Young ballot, I was sure to put J.J. Putz in second place on the ballot.  At the time, I even said that I understood a first place vote for Putz.  (I let my Cleveland-centric tendencies get the better of me.)  What I didn’t understand was how Putz was left out of the Cy Young voting completely.  Completely.  Justin Verlander got a vote, but Putz didn’t.  Since he pitches the ninth, and pitches 81 games in Seattle plus another 20 or so in LA and Oakland, and a smattering in San Diego (MLB has the Padres and Mariners as regional arch-rival teams… someone at the scheduling office needs to look at a map of America), most of his work took place at around midnight or 1:00 am on the East Coast.  Thankfully, there’s no East Coast bias in the media.
I don’t want to belabor Putz, because he’s been spoken of at length.  George Sherrill hasn’t. (Well, OK, a shout out to Mariner Morsels, a collective blog dedicated to “freeing” Sherrill.)  Sherrill has quietly developed into one of the better lefty specialists in the league.  In his career, left-handed batters have a combined .167/.227/.291 line against him.  For what it’s worth, he isn’t awful against righties either (.261/.384/.352), although he has his flaws against them.   That spike in OBP is from walking far too many RH batters.  In general, he’s a flyball pitcher (GB/FB rate of .045), but he also plays half of his games in Safeco.  And he strikes out more than a batter an inning.  In 2007, he was pretty much a LOOGY and rarely got to pitch a fully inning, but like the Mariner Morsels folks, I wonder why the Mariners aren’t opening him up a little more.  They could certainly do much worse.
What went wrong: At the bottom of the Mariners’ audit page at Baseball Prospectus, two names pop out.  Richie Sexson and Jose Lopez are an odd couple.  A closer look will show a few important differences.
Sexson actually dropped his K rate from 2006 to 2007, increased his BB rate, and his batted ball profile was pretty much unchanged (he hit a few less line drives, and instead beat them into the ground.)  His BABIP was the culprit.  A gentleman who has normally put up a .280-.320 BABIP over a number of years suddenly saw it drop to .217.  In statistics, that’s called an outlier.  Sexson gets paid to hit 35 HR.  He also usually checks in with an equal number of doubles.  This year, he not only dropped to 21 HR, but he also only hit 21 two-baggers.  The other thing that changed was that he saw about a quarter of a pitch less (3.97 to 3.74) per plate appearance from 2006 to 2007.  Sexson needs to relax.  Assuming that there wasn’t a huge major injury that wasn’t made public, Sexson should revert to form.
Lopez is another story.  After impressing (and making the All-Star team) in 2006, his production dropped off the face of the earth.  Or more to the point, it dropped back to a level consistent with what it was in his first two (partial) seasons in the majors.  Again, here’s another case where his batted ball profile didn’t change, nor were his walk and strikeout rates markedly different.  Lopez’s BABIP’s from 2004-2007, .251, .276, .312, .269.  2006 looks like the outlier.  Lopez, deep down in his soul is probably a .250-.260 hitter right now.  He’s also 23.  Lopez only saw 3.4 pitches per at-bat (ah youth, always so eager), and could do to walk a bit more and the good news is that those skills come with age.  Thankfully for him he’s on the right side of 30.  The bad news is that Lopez fooled the fans of Seattle (and the front office, who figured it was safe to trade Asdrubal Cabrera) in 2006.
Yeah, that about sums it up: Who woulda thunk that, in retrospect, Gil Meche would seem like a pretty good deal at $11M per year?
Felix Hernandez… need more be said?: Dave Cameron, over at U.S.S. Mariner (one of the best baseball sites on the web, and not just for Mariners fans — they do general baseball talk as well) had a very well-publicized blog post in which he questioned out loud why Felix Hernandez was throwing so many fastballs in the first inning.  The night of his next start, Hernandez mixed his pitches more.  And not to mistake it for coincidence, Hernandez himself basically came out and said that he got the idea to do so from U.S.S. Mariner.  Mike Hargrove (too soon?) once said that there are two things that every man thinks that he can do better than anyone else: cook a steak and manage a baseball team.  (He said that after the 1999 ALDS, his last act as manager of the Cleveland Indians.)  Then again, where is Mike Hargrove at this moment?  Probably cooking some steak.  Let this be a lesson to all the stat-o-phobes out there.  We’re really a harmless bunch.  In fact, we occasionally have some ideas that just might work.
While we’re on the topic, Felix Hernandez is that good.  Take a look at his pitch breakdown.  96-97 mph splitter?  Then, a nice 83 mph deuce with some bite?   A 60% ground ball rate?  Want to hear something scarier?  His BABIP over the last year was above the league average.  That means he’s gotten unlucky.  Hernandez has been with the big club since he was 19.  I suppose any pitcher is always a blown out elbow away from his career ending, but this guy is amazing.
Will the real Adrian Beltre please stand up?: I know, I know, he’s not even 30 yet.  Me either.  This is for all the folks who are hoping against hope that Adrian Beltre will re-capture the magic from 2004.  The year where he hit 48 HR.  The year where all that promise seemed to be fulfilled.  The year before the Mariners gave him that now-ridiculous-seeming contract.  Pop open Beltre’s year-by-year stats.  Now, take a black magic marker and cross out his 2004 season on your monitor, so that you can’t see it.  We’re going to pretend that 2004 never happened.  Do you notice a pattern in the stats that I’ve allowed you to see?  Other than his 2004 season, Beltre has been obscenely consistent.  He’s not a bad player at all.  But, he had the good sense to have an outlier year in his free agent year.  But, what the Mariners have gotten from Beltre over the past three years is what they can expect for the next two.  I know, I know, but you desperately want him to go back to being the 2004 version of himself.
Outlook: The Mariners have quite a few rather interesting-looking position players coming up through the minors, although most of their young pitching is in the majors already (King Felix, Feierabend, Morrow, O’Flaherty).  They have the advantage of being in the smallest division in baseball (4 teams).  But they also play in a division with the Rangers who have a lot of young talent too, the A’s who have the smartest GM ever (EVER!!!), and the Angels, who are starting to have that “We’re bent on world domination, just like the Yankees” feel to them.  The Mariners have been mentioned as being in the mix for Dontrelle Willis, which would be an interesting addition, but that’s not going to solve all their problems.  What the team probably needs to do is get out from under some of the silly contracts that they’ve given out.

2007 Sabermetric Year in Review: St. Louis Cardinals

The fifth team in our reverse-alphabetical-order spin through MLB came into 2007 as the defending World Series champions, yet didn’t make the playoffs.  They have arguably the game’s best hitter (an very loud argument with supporters of A-Rod, but a pretty good argument nonetheless), but didn’t break .500 this year.  Then again, in 2006, they barely broke .500 (83-78), and gave a beautiful demonstration as to why a short series is not an adequate sample to determine the better team (also known as “anything can happen in a short series”).
Record: 78-84, 3rd in the NL Central.  In September, when everyone seemed to be avoiding the top of the NL Central, the Cardinals showed that no one wanted it less than they did.
Pythagorean Projection (Patriot formula): 70.67 wins (725 runs scored, 829 runs allowed).  Read that one closely.  The Cards were more akin to a 70/71 win team this year by run distribution.
Team Statistical Pages:
Baseball Reference
Baseball Prospectus
MVN Blog:
That’s A Winner  (Is that a St. Louis thing?)
Other Cardinals Resources:
Latest News
Contract Status
Trade Rumors
Overview: I suppose after completely ruining the fairy-tale storyline that was the 2006 Detroit Tigers, the Cardinals’ uppance was due to come.  Then again, I suppose that the Cards returned most of the same cast of characters from 2006 (a year older) – although Chris Carpenter went down for the count after Opening Day – and their 2007 record was pretty much like their 2006 record, a difference of 5.5 games.  I have a dear friend who was born and raised in STL, and I asked her to sum up the season from the perspective of a Cardinal fan.  She stuck out her tongue and gave me a thumbs down.  When I asked her to elaborate, she just repeated the same thing.  Looking at the numbers, she was right.
What went right: Chris Duncan, at least against right-handed pitching.  Duncan’s OPS against righties was .944.  Against lefties, .632.  Other than David Eckstein, who may or may not be back in red next year, Duncan was the most effective  offensive weapon that the Cardinals had behind Albert Pujols.  This, by the way, was the closest thing to a “what went right” I could find for the Cardinals offense.
Adam Wainwright did a serviceable job as the staff “ace.”  When Wainwright was relieving in 2006, he had a higher K rate and a lower walk rate, and I’m guessing that has something to do with the fact that he didn’t have to pace himself when he was relieving.  And really that’s the only difference between him in 2006 and 2007.  He’s a good-but-not-great starting pitcher, or at least he was in 2007.  On the bright side, he is 25, which suggests there’s some room for growth.  And if he’s good-but-not-great now, maybe there’s room for very-good-but-not-outstanding.  For what it’s worth, Wainwright did hit .290 in 74 AB.  Not bad, and gave him more batting runs above average than Yadier Molina (expected) and um, Jim Edmonds.
What went wrong:  Let’s see.  When the entire season for the undisputed ace of your staff can be summed up in one box score, things apparently took a turn somewhere.  This was a year in St. Louis where Aaron Miles (2) pitched in more games than Chris Carpenter (1), Rick Ankiel (11) hit more home runs than Scott Rolen (8), and Adam Wainwright (.290/.323/.387) outhit Adam Kennedy (.219/.282/.290).
I don’t know that there was a bigger disappointment in baseball in 2007 than Scott Rolen (read: he was on my fantasy team).  Rolen hurt his shoulder in 2005, but in 2006, he put up a sporty .887 OPS, and surely many in Cardinal-land were probably happy with what looked like a return to health for Rolen.  Then, he hurt the shoulder again.  The resulting .265/.331/.398 line  with 8 HR speaks for itself.  How did that shoulder hurt Rolen?  Teams figured out that they could throw fastballs past him.  Take a look at Rolen’s batting stats broken down by pitch-type.  He hit off-speed pitches at a .300+ clip.  He hit .200 on fastballs.  Wouldn’t you know it, 52% of the pitches that Rolen saw were fastballs, 6% more than the league average.  It’s clear that his legs are still in good shape.  He’s always been a good fielder at third base, and this year, RZR had him 2nd in the majors behind Pedro Feliz.  His eyes seemed to be OK too.  I don’t have the ability to calculate his batting eye stats yet (2007 Retrosheet event file isn’t available yet), but his strike percentage was 62% vs. 61% last year.  He’s not striking out any more often.  In fact, for the past three years, his K rate has been in the 13-14% range, where it had been 17-18 previously, but his walk rate is down as well. 
The real story is in Rolen’s batted ball profile.  His line drive didn’t really change from 2006 to 2007, but he did have a shift of 5% from flyballs to groundballs.  The telling stat though was that of the flyballs that Rolen did hit, only half as many of them (percentage-wise) left the yard.  He also showed a small uptick in his infield popup rate.  Rolen’s also seeing fewer pitches per plate appearance.  In 2006, he saw 3.91 pitches per PA.  In 2007, 3.74.  Rolen’s getting a little anxious and putting the ball in play quicker.  Seems like Rolen’s shoulder is affecting his brain as well.  Will he recover in 2008?  Rolen is one of those players for whom statistical projections for his 2008 are pretty useless.  You’d be better off having a copy of his medical records than his previous stats to get a real idea of what he’ll do.
Then there’s free agent disappointment Adam Kennedy.  True, he signed for a mere $10M over three years, but Cardinals’ brass surely thought that they were getting a player who would put up his career .275/.329/.390 type of line.  That’s not much of a line, but it’s a fair sight prettier than the .219/.282/.290 (ewwww) that he actually put up.  Kennedy ended up as the second least useful 2B in baseball (by VORP), behind(?) Josh Barfield.  Odd, because he actually struckout less this year than 2006, but he also saw his line drive percentage drop by 10 percentage points, and they mostly became fly balls.  As such, his BABIP dropped to .239… ouch.
Yeah, that about sums it up: On September 7th, the Cardinals were a game out of first place (69-69), tied in the loss column with both the Cubs and Brewers (71-69 each).
Is Rick Ankiel for real?: This is a really strange question to ask.  Ankiel has already been a super-phenom pitcher at the Major League  level, until he caught a case of the yips in the 2000 playoffs.  Since he was only 20 at the time (so was I!), he still had time to re-invent himself as an outfielder and to come to the Majors at 27.  So, is he a real Major League outfielder?  (If I were Bill Simmons, I probably would have found a way to relate this to the career path of Raven-Simone.)
Let’s see, he strikes out almost a quarter of the time, which would put him in the upper (lower?) echelon of hitters in baseball.  His HR/FB ratio was an astounding 20.0%.  I don’t have anything to compare it to as to whether he did that in the minors as well, but that would put him in Lance Berkman/Matt Holliday country, which is in the top 10 in that stat as well.  Let’s for a moment assume that those numbers really reflect his true ability and weren’t the product of a little luck or the fact that many of the pitchers he faced hadn’t seen him before.
Power hitters tend to strike out a lot (and Ankiel is no exception), which is annoying, but comes with the territory.  But there a couple of numbers which worry me.  Ankiel only hit line drives 14.9% of the time, which puts him in the lowest regions of the league. and he only saw 3.43 pitches per PA.  Ankiel, at least in what he did during his brief stay in MLB last year, seems like he has one skill: raw power.  Sure, there are plenty of guys in baseball who fit that mold, and Ankiel seems to be one of them. 
Whither Pujols?: Let’s, for a moment, leave aside that Albert Pujols had another brilliant defensive season and is the best fielding first baseman in the league by five lengths.  What the heck happened to him this year offensively?  After all, he didn’t put up his usual “Top 3 in OPS in the league.”  I mean, he was eighth!  And he only hit 32 HR!  Cardinals fans: repeat after me.  There is nothing wrong with Albert Pujols.  He experienced a small drop in the percentage of flyballs that left the yard and hit a few more ground balls, but those tended to go for singles and doubles anyway.  I live on the North Side of Chicago, and those of you in St. Louis know that means that I can’t legally say anything nice about a member of the Cardinals.  But, with that said, Albert Pujols is an amazing hitter, and I’d say the best player in league.  The stat that I think is most revealing about Pujols is that the guy walked 99 times this past year and struck out 58.  That’s downright DiMaggioian.  So, you have in Pujols the league’s best defender at his position and one of the top 10 offensive talents in the league, and you’re only paying him $15M per.  The sky is not falling.
That gritty, plucky little player, David Eckstein: So it looks like David Eckstein wants a Julio Lugo contract.  Something like 4 years and $36 million… and hopefully not doing what Lugo did this year.  Eckstein was the fifth worst fielding shortstop in baseball, according to Dewan’s plus/minus and to RZR.  But… he does consistently post around his .286/.351/.362 career average, which is nice for a shortstop (or possibly a converted second baseman).  The Cardinals do have 25-year-old Brendan Ryan ready to go, who hit a combined .278/.355/.364 line (compare to Eckstein) between AAA and 200 PA with the Cardinals.  Ryan hasn’t done it at the Major League level yet, at least over a full season.  Right now, he seems like the better option in that he is cheaper and at least indicators point to him being a David Eckstein wannabe (with a little more speed).  I understand that Cardinals fans have an emotional attachment to Eckstein and that he’s a proven commodity.  How much for security and sentimentality?  $9 million a year seems a steep price.
Outlook: Ah, the most dreaded word in baseball.  “If.”  If Rolen is healthy.  If Carpenter makes a miracle comeback.  If…  The Cardinals had a good run in the first part of this decade, but the puzzle is starting to fall apart and it’s going to take a little time to put things back together.  The Cardinals missed out on hiring Chris Anotnetti (assistant GM in Cleveland) as their GM, reportedly because he would have had to share too much power with Tony LaRussa.  So, it looks like it’s Tony LaRussa’s team, as it has been for the last decade or so.

So how many HR did Bonds* really hit?

Here we go again…
Thursday afternoon, Barry Bonds* was indicted in federal court for perjury, specifically that he perjured himself (legalese for “told a lie”) when he said that he had never taken steroids.  So, that means that someone in the federal government thinks that Barry Bonds* took steroids.  I suppose Barry* is entitled to his day in court, but I believe the old saying “Where there’s smoke, there’s fire” applies here.
Bonds* finished the 2007 season with 762 career HR, 7 more than Hank Aaron.  But even still, he’s a 40-something-year-old player with bad knees, and even without this particular nastiness hanging over his head, he was pretty much aiming for a DH role in the American League.  I’m assuming that Barry* will be tied up with this matter through the off-season and into the spring.  So, even if he were just another outfielder, he wouldn’t make sense as a signing in the off-season.  If he’s found guilty, it sounds like he’ll get a year or two in prison.  By that point, he might not even be in baseball shape, and even if he wanted to continue, I don’t know that MLB would want him back.  Barry Bonds* may have retired today.
Would you like me to tell you that Bonds* isn’t really the home run champion of all time?  Would you like me to do some clever math and de-throne him?  Would you like me to write 1000 words, reference statistical procedures with which you aren’t familiar, and somehow shave 8 HR off Bonds* career home run totals?  Want me to restore Hank Aaron to his rightful place in the sun?  I can’t do it.  Barry Bonds* hit 762 HR* in his career, which is more than any other Major League player has ever hit.  The numbers don’t lie.
And I have to say that I’m observing a delicious little irony in the reaction to the Bonds* indictment, as everyone watches in shock as a man who is greater than Babe Ruth is felled.  Sabermetricians are often derided as not “getting it” because our numbers can’t describe the full impact of a player.  After all, a home run isn’t just a home run.  It’s a momentum shifter.  It’s the mark of a team leader, a man of virtue, a “great” player.  There’s something special about a home run, and to reduce it to a cold, calculated four-base hit doesn’t do it justice, right?  There’s one little problem.  It turns out that the man who has performed this marvelous feat of virtue the most often may actually have cheated and lied to get there.  Suddenly, the home run doesn’t seem to virtuous.
If you have a little knot in your stomach trying to reconcile the fact that Barry Bonds* is both the home run king and a possible cheater (like I do), may I recommend looking at it like a Sabermetrician.  Maybe a home run isn’t a mark of virtue and home run hitters don’t belong in our cultural pantheon.  Maybe a home run is just a home run, something that surely helps teams win games and makes you the fan feel better, but not something that describes anything about the character of the man hitting it.  Bonds* contributed a lot to the teams on which he played, but baseball is just a game, not a magical fairy land, and Barry Bonds* is not a hero.
Don’t come crying to your friendly local Sabermetrician to make you feel better about how much value you’ve mistakenly placed in… well… a number, even if it’s the number 762.

525,600 minutes: How do you measure a player in a year?

What does a year really tell you about a player?  Seriously.  If I gave you the seasonal stats for any player last year (or the year before), how much could you really tell me about him?  If I told you he hit .300 last year, are you confident that deep down, he’s really a .300 hitter?  How do you measure a year in the life?
Like a lot of things that happen out here in the Sabersphere, I take my inspiration for this (series of?) article(s?) from a conversation that went on at the Inside the Book blog.  A few folks were discussing an article that I wrote here at StatSpeak on productive outs and as these things are wont to do, the conversation wandered.  Inside the Book co-author MGL asked me a fair question: when I talked about productive outs, what sample size I was dealing with.  Not so much how many player-years were in my data set, but for each of those player years, how many PA’s did each player have.  It’s a much more important question than you might think.
If you’ve been reading my work for a while, you know that I often say things like, “minimum of 100 PA.”  (I’m hardly the only one to do this, by the way.)  Why did I make sure that the batter had 100 PA?  Well, first off, let’s say that I’m interested in rating batters by how often they strike out.  And I happen to come across a player who got five at-bats in a season and never ever struck out.  I hereby crown him the king of all contact hitters!  He will never ever ever strikeout ever.  Right?  Of course not.  5 PA isn’t a big enough sample size to measure anything.  But what is?  When I say minimum 100 PA, I must admit I’m usually using a very unscientific “yeah, that sounds about right” criteria for picking the number.  What if 100 PA isn’t a big enough sample for what I’m trying to measure either?  I’m a scientist by training (my cancer biologist wife laughs at me when I say that), and I should be a little more… scientific.
(Major and extensive numerical nerdiness alert.  As if the reference to Rent wasn’t nerdy enough.  This is a really long methodological article for the hardcore researchers out there.  If you’re here for witty banter about statistical matters in baseball, may I suggest you pick another article.)
Read more of this post

Paradise by the Dashboard Light

I have to start off this post by saying that I can’t take any credit at all for what’s to follow.  But it wins the award for the most creative use of baseball research data for 2007.  On the Retrosheet distribution list, for those of us who spend way too much time on Retrosheet, an e-mail came through from Ted Turocy concerning the Meatloaf song “Paradise by the Dashboard Light.”  If you’ve been to a wedding reception, you’ve heard it.  Loud, off-key, and very very drunk, but you’ve heard it.  They played it at mine, anyway.  (Our first dance was actually another Meatloaf song from the same album.)
The song is about (how to keep this PG?) an amorous encounter that Meatloaf (himself a very good professional softball player!) and a female companion are having, apparently in a car.  Midway through the song, there’s a cut to a clip of the late Phil Rizzuto announcing a fictional baseball game in which a player (who is never named) hits a double to center (depending how you interpret the call, it could be a single and an error, as the ball is bobbled in the outfield).  He then steals third on the first pitch of the next at-bat (so he’s advanced from “first base” to “second base” to third… I think we have a metaphor for something…) and then the batter lays down a squeeze bunt.  In the song, we only hear that it will be a close play at the plate, and never find out what happens to the poor “runner.”  (This is the part where all the bridesmaids start yelling “Stop right there!”)
Ted wanted to find out if the sequence of events described in the song (double or single and an error, steal of third, squeeze bunt) had ever actually occured in an actual game.  In the beginning of the interlude, Rizzuto says that there’s no score, with two outs in the bottom of the ninth, and no manager, other than Ozzie Guillen, would attempt a squeeze with two outs (when this was pointed out to Meatloaf, composer Jim Steinman, and producer Todd Rundgren, they didn’t care; Guillen hadn’t yet made his Major League debut when the album was recorded in 1977), so we can’t get quite an exact match.  Ted did what any baseball-obsessed researcher with an odd question and a little free time would do: he looked it up.  He sent the results out to those of us on the Retrosheet list.  Everything that follows represents his hard work, not mine.  I e-mailed him and specifically asked him if I might post this and he was kind enough to grant me permission.
Turns out that the double-then steal of third-then squeeze attempt with the throw coming home (which would be scored a fielder’s choice no matter the outcome) has happened thrice in the games which Retrosheet has available.

  • In a 1977 game between the Mets and Expos, with one out in the bottom of the fourth, Bud Harrelson doubled to left and stole third and scored on a squeeze bunt laid down by pitcher Jerry Koosman.
  • In 1995, during a Twins-Mariners game, Rich Amaral actually won the game when he doubled to left, stole third, and scored on a squeeze bunt by Chad Kreuter.
  • Finally, in the fifth inning of a 2006 game between the Padres and Giants, the Giants’ Randy Winn hit an RBI double to right, and stole third on the second pitch of the next at-bat, and scored on a squeeze bunt by Omar Vizquel.

None of the doubles were to center, and none of the steals of third happened on the next pitch after the double.  (Winn stole his base on the second pitch of the next at-bat.)
However, in 1988, with the Red Sox playing the Rangers, Oddibe “Young Again” McDowell came up in the seventh inning and hit an RBI single to center which was bobbled by Red Sox center fielder Ellis Burks, allowing Curt Wilkerson to score and McDowell to go to second.  McDowell stole third, and the next hitter (Scott Fletcher) dropped a squeeze bunt, although the throw didn’t come home.  It went to first.  Fletcher was out and McDowell scored.  So, that doesn’t match up.
Sadly, it looks like there’s no perfect match.  However, Ted did end his e-mail with a rather cryptic statement.  “Finally, there are no sequences fitting this where the runner is out trying to score on the bunt.”  Not exactly sure what he meant by that…