Taking a nap on the bases, the Scott Podsednik story

43, 70, 59, 40.  From 2003-2006, those were Scott Podsednik’s seasonal stolen base totals.  In 2004, he led the National League in SB and he was 2nd twice and fifth in another year.  Podsednik must be a good Ukrainian and a good baserunner, you think to yourself.  You’re half right.
Last year, Scotty Pods led the league in another baserunning category that no one ever talks about.  Podsednik made the most baserunning blunders of 2006.  Podsednik managed to get picked off an astonishing 12 times, 7 times on a simple pickoff, and 5 times in which the pitcher moved to first while Podsednik was on his way to second.
A few definitions: There are four major baserunning blunders which I am counting here.  The first is being picked off a base by the pitcher (or in some cases, the catcher).  This one’s fairly easy to calculate.  Retrosheet‘s event files contain fields for pickoffs indicating whether or not a pickoff was recorded on the play.  In any case, the pickoff is usually a sign that the batter “fell asleep” on the basepaths, in this case, not “reading” that the pitcher was throwing to first and/or leaning too far away from the bag.
There’s also another type of pickoff, which is generally given as the “pickoff/caught stealing”, in which the pitcher throws to first, and the runner is either already in flight to second or makes the decision that he’s a dead duck anyway and might as well go out in a blaze of glory, and so heads to second.  I’ve counted these two separately, as in this case, the runner is more likely to have been running, but guessed wrongly on whether the pitcher was throwing to home or to first.  The distinction is slight, but I choose to maintain it.  I believe that a straight pickoff represents a runner taking a nap while a POCS represents a runner being too aggressive.
Another couple of baserunning blunders to consider: One is a runner who is doubled off a base on a fly ball.  In this case, I looked for runners who were doubled off only on fly balls to the outfield.  A runner might be doubled off on a screaming liner right at the second baseman and have no chance at all to get back, so out of fairness, I eliminated fly balls caught by infielders.  But there’s little excuse for being doubled off on an outfield fly.  The runner misjudged it, plain and simple.
One more blunder that’s a little more exotic: being thrown out over-running a base.  Consider this curious line from a Retrosheet event file: D9, BX2(94).  That translates into “double to right field, batter out at second base, with the out going from the right fielder to the second baseman.”  How could a batter be both safe and out at the same base on the same play?  If he hit second safely (double is now recorded in the books), but took too big a turn, and the right fielder threw in behind him and the runner became dead meat, you have a baserunning blunder and an out.  I coded for a few different circumstances in which a runner could over-run a base and counted them up.
Your top twelve baserunning blunder-ers of 2006? 
1) Pods 12 blunders (40 SB in 2006)
2T) Ryan Freel 10 (37)
2T) Wily Taveras 10 (29)
4T) Brian Roberts 8 (36)
4T) Jose Reyes 8 (64)
4T) Alfonso Soriano 8 (41)
7T) Jose Bautista 7 (erm… 5)
7T) Jamey Carroll 7 (10, with 12 CS!)
9T) Dave Roberts 6 (49)
9T) Ichiro Suzuki 6 (45)
9T) Ryan Zimmerman 6 (11)
9T) Juan Pierre 6 (58)
But wait, before we pile on Pods, notice something about that list.  Most of the guys on the list are the ones who are running a lot anyway.  They steal a lot of bases, and so they are likely to be the ones who draw a lot of pickoff throws, and most of those “blunders” are pickoffs.  Base-stealers might be considered high-rolling gamblers in a way.  Certainly, their stolen base totals are gaudy, but they also pay a price for their aggressiveness on the basepaths, both visible (caught stealing) and invisible (pickoffs).
In other findings, there were only 14 instances of a runner over-running a base (by 14 different gentlemen) and 49 of runners being doubled off, with only Travis Hafner, Javy Lopez, and Jacque Jones being repeat offenders.  Milton Bradley managed to pull something of a unique quadrifecta in 2006, being picked off once, POCS once, over-running a base once, and being doubled off once.
One name conspicuously missing from the list of those who made a baserunning blunder last year?  The supposed king of baserunning (and otherwise) cluelessness, Manny Ramirez.

A moment of silence please

It wasn’t an unusual game in any way that night in Oakland.  It was a Friday night game on the West Coast and the Mariners were in town to play the A’s, with the A’s hoping to move into a tie for first place with the Texas Rangers.  The Mariners, however, were only 2.5 games out, and on this night, they jumped on A’s starter Ron Darling for six runs in the second inning to open a 6-0 lead.  With Randy Johnson on the mound, Seattle didn’t need much more.  Early in the morning hours of August 12, 1994 (at least on the East Coast), Randy Johnson finished off a trademark 1-run, 4-hit, 15-strikeout performance, and threw the last pitch of this night of baseball past Ernie Young.
It was a strike.
Thirteen years ago today, what should have been a bustling Saturday of Major League Baseball was reduced to an eerie silence.  And every night for 200+ days, I played Willie, Mickey, and the Duke on my cassette player (it was 1994…).  It was all my fourteen year old mind could think of to fill the void left by baseball.  After all, for the first time in about 35 years, the Cleveland Indians were actually good, and at that point were leading the chase for the (newly created) Wild Card.  They say that you never realize what you’ve got until it’s gone and that summer, I learned that there really is nothing like baseball on a summer night.
There was no World Series that year, and for a little while in 1995, it looked like baseball would be played as something as a real-life fantasy camp.  But what would have happened in 1994 had all that unpleasantness not transpired?  What if baseball had continued?  Allow me to perhaps waste inordinate amounts of your time answering that question.  In 1994, Diamond Mind baseball simulators took to simulating the rest of the 1994 season and reported the results to the world.  It was all that we had.
Back in 1994, there was something else that was fairly new to the American consciousness called “The Internet”.  This was a time when people asked whether or not you had an e-mail address rather than simply asking what it was.  Strangely enough, back in 1994, people were getting most of their information from places like newspapers and radio, as “The Internet” wasn’t really so much an information superhighway yet, but more of a jaunt through a suburban subdivision on a Thursday afternoon.  But, thanks to the magic of the internet archive, there was one site (the now-defunct) Nando Times, which has a (partial, I believe, as only some of the links are working for me) archive of those box scores.  I won’t spoil the ending for you, but here’s the main page.  (Warning: This is an internet quicksand trap for baseball nerds.)
I write this as a lesson to all the stat geeks out there (like me!) and all baseball fans, for that matter.  Today, take a moment of silence today to remember that despite all the steroid allegations, the fact that the (string of expletives deleted) Yankees are somehow back in the playoff race (this seems to happen every year…), and people’s strange belief in the power of clutch hitting, life would be really boring if all that we had was cold computer processing instead of real live baseball.  In 1994, all that we had was a computer simulation and Terry Cashman.  If you’re old enough to remember that time, don’t ever forget it.
It’s why every year around August 12th, my heart sinks a little.  I work as a Sabermetrician because I love the game of baseball, and 13 years ago today, they took the game away from me.

Do you have any idea how fast you were going?

Seems that at StatSpeak, we’re all about speed.  I teased this in my last post on speed, but I recently wrote an article for SABR‘s Statistical Analysis Committee newsletter By The Numbers, in which I develop a whole new system for speed scores based on some advanced statistical methodology.  But, after all that, it turns out that Bill James’s speed scores are pretty much all we need after all.
 You can read the article here (under my ever-so-secret real name, starting on page 8).  Fair warning, that’s a PDF file.
For those interested in speed scores for the whole league last year under my model, I’ve posted that as a Google Doc here.  Players are listed by their Retrosheet ID and at least right now, are sorted from fastest to slowest.  There are a lot of missing values on the chart, owing to the way in which I calculated things.  (I used a log-odds ratio method for all the probabilities.  If someone was perfect (100%), then the log odds ratio was undefined (odds ratio is 1/0).  If someone got caught every time, their odds ratio was zero, for which there is no natural log.  I tried to correct for that with some other trickery, but some guys just don’t have speed scores.  Also, only players with 100+ PA last year are included.

Thankfully, it's over: Barry Bonds hits #756

Barry Bonds just hit his 756th career home run.

Thankfully, it’s over: Barry Bonds hits #756

Barry Bonds just hit his 756th career home run.  After all of baseball agonized over the inevitability of this moment for the last 4 months, it’s finally happened.  As a baseball fan, I have an uneasy feeling about that fact, and it’s not the same uneasy feeling I got right before my wedding.  It’s more the uneasy feeling I get before cleaning out my office.  But, now it’s all over, and that particular storyline doesn’t have to hang over baseball for the rest of the season. 
I’m conflicted though.  In a strange way, as a Sabermetrician, I owe Barry Bonds a huge debt of gratitude.  Seriously.  I would vote Barry into the initial class of the Sabermetric Hall of Fame with Bill James, Michael Lewis, and the Mills Brothers.  Consider this: even before the inevitable happened, Sabermetrically inclined folks have taken it upon themselves to show in some way that Bonds is not really the greatest HR hitter of all time.  Putting aside the questions on Vitamin S, we know that Bonds has benefitted from rampant expansion, smaller stadia, a lower pitching mound, better nutrition and medical care, and depending on whom you ask, a juiced ball.  Sabermetricians are able to point out that in today’s game, HR are easier to come by.  (Consider, 50 HR used to be something that happened once in a decade or so.  Since 1997, someone’s hit at least 47 every year!)  We’re able to talk about why that is and what the necessity is for adjusting statistics to reflect the context in which they were produced.
And for once, people will be listening, although for all the wrong reasons.  Sabermetricians are often placed in the role of telling people what they don’t want to hear about baseball.  Using the closer exclusively in the ninth inning really is a waste of a good pitcher.  Clutch hitting ability doesn’t really exist.  The sac bunt really isn’t that great an idea.  Finally, after all that, we get to say something that people want to hear, that 756 home runs might not qualify Barry as the best of all time when you consider the context.  And for a little while, people will (hopefully) be interested in things like park and era effects and standard deviations and z-scores.  And maybe some of them will take the time to understand a few of the rest of the things that Sabermetricians have been trying to tell them about the grand ole game for the past few years.  But it all starts off because people don’t like Barry Bonds as a person and want to hear evidence that he’s really not that good.  As a scientist, I try to work the other way around (evidence, then conclusion).  It’s a rather backward entry into the public consciousness, but surely, we will be thrown into the public consciousness thanks to one Barry Lamar Bonds.
I hadn’t really thought of this until I was talking to Bob Ngo, a doctoral student in sociology, who is conducting his disseration research on Sabermetricians (we’re apparently a sub-culture now!).  He asked me a question on what stake I believed that Sabermetricians had in the steroid controversy.  Oddly enough, the steroid controversy may have a stake in Sabermetrics as a field.  Funny how these things work.

Another post on the impact of speed

The impact of speed, as measured by taking the extra base or stealing bases, is not a huge in baseball, but in some cases the impact of speed is quite large, when you consider how it adds to a hitter’s basic stats. Great speed can add 30 hits per year in some cases. This isn’t additional value, a hitter who bats .300 by legging out singles isn’t any more valuable than a slow hitter who hits line drives and puts up the same stats. In fact, he’s slightly less valuable, as infield hits don’t have the same advancement value.
I thought it might be fun to try and measure how much speed can add, so I consulted an old friend, my retrosheet database. I looked at all groundballs that never left the infield, where “fielded by” indicates an infielder or the pitcher. Groundballs that make it to the outfield are not included, as they are hits for everybody, except for the occasional 9-3 groundout. Then, I looked at how often a groundball hit to any given infielder winds up as a hit, and compared each batter to the league average. For this exercise I counted reached on error the same as a hit, and used separate calculations for righty and lefty batters, which I consider a must do.
Over the last 4 years (2003-2006) Ichiro has reached on groundballs 84 more times than an average batter. Without the speed, he would be a .280-.290 hitter in 3 of the 4 years, with a .325 average in his big 2004 year. Slow-chiro’s overall offense would be slightly below average, unless he can really change his game and become the power hitter Seattle batting practice watchers think he could.
Bengie Molina is a pretty decent offensive player, as long as you use the qualification “for a catcher”. He’s not a patient hitter, but rarely strikes out and is pretty good at hitting the ball squarely. He’s also perhaps the slowest player in the game. His lack of speed costs Bengie about 7-9 hits per year. Give him average speed, and Speedy Molina hits .298/.300/.317/.305 over the last 4 years. His OPS would be above average every year, including .824 and .826 the most recent two.
I’m pleased with how this stat ranks the fastest and slowest players. Ranking by hits added per infield groundball, The top 10 are Willy Tavaras, Alex Sanchez, Corey Patterson, Ichiro, Joey Gathright, Dave Roberts, Rocco Baldelli, Adam Everett, Kenny Lofton, and Carl Crawford. In the next 10 are Luis Castillo, Juan Pierre, Ryan Freel, Chone Figgins, and Jose Reyes.
At the bottom of the list are Jason Phillips (slower than Bengie!), Bengie, Toby Hall, Damian Miller, John Olerud, Carlos Delgado, Adam Dunn, Mike Piazza, Jeff Francoeur, and Andruw Jones.
In other words, several catchers, some other well known slow players, and two Braves outfielders who are not supposed to be this slow (and certainly don’t field like it). Maybe Braves fans can answer: Does Jones fail to run hard to first most of the time? Is Frenchy picking up his bad habits?
If I ever need to run speed scores with the old Bill James formula, I think this stat is worth working in. It can replace the range factor part, since hardly anybody these days thinks range factor does anything more than tell you how many balls are hit to you.

Is speed really that important?

90 feet.  Go.
Baseball is a game of hitting and running (and pitching, but that’s something else).  In order to score, a player must actually run the 360 feet around the basepaths (in 90 foot increments) to end up back at home plate, and the faster a runner he is, the more able he will be to accomplish this.  Right?  Speed is one of those things in baseball that everyone seems to think is important, but they’re not entirely sure why that is.  After all, it’s not entirely raw speed that influences whether or not someone is good on the bases.  For example, in stolen bases, some amount of ability in reading the pitcher must be involved, and for what it’s worth, the pitcher’s abilities in holding the runner and the catcher’s arm must be taken into account.  But then again, speed can influence categories commonly thought of as being hitting-related.  Beating out an infield hit still raises a player’s OBP and AVG.
But how much does speed really matter on the basepaths?  Runner on third, one out and the batter hits a high fly ball to left field.  You’re the third base coach.  Do you send the runner?  How will you make that decision?  Well, in previous work, I found that you’re not thinking about how fast the runner is.  You’re looking at how far away the ball is.  Even more, it doesn’t really seem to matter how fast the runner is as to whether or not he makes it.  Not only do 97% of all runners who attempt to score on a fly ball to the outfield make it home safely, but speed doesn’t seem to be much of a determining factor as to whether or not the runner will make it home safely.  In fact, most runners who attempt to take an extra base generally make it.
Time to break out one of my favorite techniques: binary logistic regression.  This is a type of analysis that answers the question “What are the odds?” and allows us to see how different factors influence those odds.  How do I know that speed isn’t much of a determining factor in telling us whether the runner will make it home safely on the potential sac fly?  Well, binary logit tells us whether or not something significantly influences the odds, and if so, by how much (for you stat geeks, I look at Nagelkerke R-squared as my measure of variance explained).
One quick question to answer: How to measure speed?  I’m using the good old Bill James speed scores method.  In an article to be published in the upcoming edition of By The Numbers, I actually developed my own speed measure from the ground up using some fairly high-level methodology (which took forever to calculate).  James’ method and mine correlated at .81.  Mine was a slightly stronger measure in terms of scale properties, but his is much easier to calculate (and the scale properties are still pretty good).  After going through all the analyses that I did, it turns out the James method works pretty well.  I stuck with it.
So, when would speed come in handy?  It sure would come in handy trying to go from first to home on a double.  I coded for all times from 2003-2006 in which a runner was standing on first, his teammate at bat doubled, and the runner did not stop at third.  Either the runner in question was slapping hands with the on-deck guy after scoring (success!) or he went back to the dugout having been thrown out (failure).  I only looked at those runners who made the attempt and ignored all the guys who just stopped at third and called it a day.  I entered his speed score in as a predictor into the logistic regression.  Sure enough, speed significantly predicted the odds that he would be safe in the expected direction (faster runners were more likely to be safe), but the Nagelkerke R-squared was…. 1%.  That’s it.  One percent of the “recipe” for the odds of whether or not a runner will make it is how fast he runs.  I tried restricting the sample to situations with less than two out (still 1%) and situations with two out (a little south of 1%.)  I looked at fly balls with less than two outs figuring that the runner might hesitate to see if the ball would be caught (all the way up to 1.5%).  I made sure that the ball went through the infield (1.1%). 
Second to home on a single?  R-squared was 1.2%.
First to third on a single? R-squared was 0.2%.
What about the most obviously visible time when speed comes into play: the stolen base.  I isolated all SB attempts of second base.  The R-squared for speed did make an impact here: 4.2%.  Not anything to sneeze at, but probably less than you expected.  Previously, I’ve found that something as simple as whether the pitcher throws over is good for explaining even more of the variance (7.6%).
But there is one extra thing that speed helps with.  I looked at situations in which there was a runner on first and the batter hit a ground ball to one of the infielders (i.e. a double-play ball).  Did the batter’s speed help him to stay out of the double play?  Yes, and it explained 5.5% of the variance.
Why are these number so low?  Even on something like a stolen base, only 10% of the variance in success rates has to do with speed?  Well, the other determining factors of whether the runner will make it are how big a lead he gets, how good the pitcher is at holding him on (those two will be correlated), how well the catcher throws, what sort of pitch the pitcher throws, and a few other factors.  With some of the extra base advances on hits, there’s the issue of where the runner is when the ball is picked up and how far away that is from the base he’s trying to reach.  Plus, you try throwing a ball 350 feet and hit a target to within a foot or so.  The point is that while a speedy runner will have an easier time about things than a slower runner, the contribution of speed is not all that big.  I’d guess that a lot more has to do with how well the fielders react.
So what does this mean for teams?  Some teams keep a guy around whose only real purpose is to pinch run (and have a few at-bats in garbage time).  Sure, there are some situations where every little bit helps, but the contribution of speed in most situations is much less important than I believe is generally thought.  Even the strategy of pinch-running for a big-hitting (but slow running) player with a banjo-hitting but faster guy late in the game seems to have its drawbacks.  Why lift your best hitter for a pinch runner, especially in a tie game, when the hitter might need to come up in the extra innings?  All things being equal, a faster runner is a better player, but remember: you can walk home on a home run.