Attention baseball fans, please exhale

We’re on the edge of September, which can only mean one thing for baseball.  It’s time for bated breath, edge-of-your-seat pennant races.  We’re at the (roughly) thirty-games-to-go point in the season, and there are a few nervous baseball fans out there.
Devil Ray Kool-Aid drinkers, I have news for you: It ain’t gonna happen.  Hopefully, I didn’t crush anyone’s happy place there.  But for those teams who are still in the hunt, how worried should you be about your team making the playoffs?  Thanks to the magic of computers, I can tell you exactly how much you should worry.  I swear, I’m not casting my eyes sidelong at anyone in Red Sox Nation.  I swear.  Oh for crying out loud, Boston, calm down.
First off, click here and view a wonderful page created by the fine folks over at Baseball Prospectus.  Before I knew that this page existed, I was actually contacted by a gentleman who wanted to create something similar.  He asked me for my mathematical advice on how to create such a model.  With one or two minor differences, I suggested this model.  The BP method takes advantage of the fact that computers can run thousands of simulations in a short period of time.  The model starts from the current standings, and then runs the rest of the schedule in a simulator and bases the results on the two team’s expected winning percentage (based on their Pythagorean expected winning percentage.)  For you stat nerds: they even sample a new winning percentage from a distribution around that expected winning percentage!
Let’s take a look:
Boston fans, the computer (running 1,000,000 simulations), says that in 93.8% of those universes, the Red Sox win the AL East.  Yes, I read Summer of ’49.  Yes, I saw the ads for “The Bronx is Burning.”  Yes, I saw the piece in The Onion.  I’m just saying that you can safely print your playoff tickets (in another 5.7% of those universes, the Sox were the Wild Card) and your AL East Champion t-shirts.  Now, when you get into a short series, you’re on your own.
The computer likes the Indians (hooray!) in the AL Central, and the Angels in the AL West.  Both teams are over 80% to win their divisions.  In the NL, things are a little less sure.  The Mets are about a 3-to-1 shot to win the NL East, the Cubs are at 65% in the NL Central (which as a friend of mine who cheers for the Cardinals said, will be decided by who doesn’t want it the least).  The NL West, where the D-Backs currently lead by a game, is actually more likely to go to the Padres, according to the computer, with a 42-37 split between those two teams, but with the Dodgers having a 15% chance.  The NL Wild Card is almost a total toss up between 5 teams.
As a psychologist, I like to keep people from feeling un-necessarily nervous.  So, bookmark the page and check each morning how nervous you should be.  See you in October!

Homerun kills rally, film at 11

File under “Stupid things that sportscasters say.”  This one only happens once in a while, but it does happen.  A team is behind by 5 or 6 runs late in a game, but strings a couple of hits together and pulls to within 3 or 4 with a runner still on.  I’ve seen this happen a few times and the Joe Morgans of the world always seem to say the same thing.  “You don’t want to hit a home run here, because a homerun will kill the rally.” 
I suppose the rallying team would rather appreciate some sort of base hit or walk or hit batsman or catcher’s interference, but why not a home run?  After all, a home run is the single most advantageous hit there is.  It makes sure that everyone on base scores, as well as the batter.  A double generally scores the runner s on second and third, but isn’t a guarantee for the runner on first and the batter is only half way to his goal.  Will someone tell me how a double would be better?
As best as I can tell, the idea is that because the home run is the very pinnacle of exhiliration in a baseball game, the letdown afterwards would sap a team of all its mojo to keep going with the rally.  Plus, it would take runners off the basepaths (by making sure that they scored!), I guess leading to despair in the fact that the next hit, unless it is also a home run, has no chance of bringing in another run.  Apparently, the only way to score several runs is to score them in a slow, steady stream rather than all at once.  They must count more if they’re gotten one at a time.
But then again, this is a testable question.  Dataset is my handy-dandy 2000-2006 PBP database from the greatest website in the world, Retrosheet.  The nice thing about a home run is that you can know for sure what the baserunner configuration will be afterwards: no one on.  The only mystery is how many outs there are.  If a homerun really does kill off a rally, then we should see that the run expectancy for these situations after a homerun will be less than if the situation hadn’t been preceded by a homerun.  I isolated all situations in which there were none on and none out and separated them into situations in which the event immediately beforehand had been a home run or whether it had been something else.  Was there a difference between the run expectancy when a HR had come right before?  Yes, there was.  The “home run” group actually had a higher run expectancy than the “other event” group (.567 vs. .534).  I ran an independent groups t-test to make sure those numbers were significantly different.  They were.
I repeated the process for situations with none on with one out and none on with two out.  Same basic results.  A homerun predicted a significantly greater run expectancy (1 out: .308 vs. .286; 2 out: .122 vs. .111).  After all, the batting team is facing a pitcher who is pitching so poorly as to give up a home run!
But perhaps I’m looking in the wrong place.  After all, I’m looking at all home runs, not just those which are hit during late rallies.  I refined my search to situations which took place in the seventh inning or later and which pulled the batting team to within 1 to 3 runs.  I looked at the run expectancy between these two groups of situations.  In all three cases (0 out, 1 out, 2 out), there was a small bump up in run expectancy after a home run, although this time it wasn’t significant.  So at the very least, a home run doesn’t take any steam out of a rally.  The difference is probably due to the fact that a very good relief pitcher is more likely to be pitching in a close/late situation.
The numbers, for the curious:
0 out: .470 for the HR group, .456 for the other group
1 out: .256 vs. .239
2 out: .094 vs. .091
Finally, perhaps there is something to the idea that an RBI double is better than a home run.  I found all situations in which there had been an RBI double that brought a team to within three runs late in the game and left only a runner at second and compared the run expectancy of that situation to the run expectancy of a runner at second in a close/late game that had happened by some other manner.
With no one out, run expectancy was 1.14 following an RBI double, while 1.07 following some other event.  The difference is not significant.  With one out, there is a significant difference, such that an RBI double makes run expectancy go down (.510 after a double, .646 after something else).  With two outs, the difference is not significant, but the trend is in the direction of an RBI double doing more harm than good (.283 vs. .297).  If something actually sucks the mojo out of a rallying team, it’s an RBI double!
Next time you hear an announcer say that a double would be better outcome than a home run for a rallying team, please be so kind as to smack some sense into them.

Is Jonathan Papelbon even the best closer on the Red Sox?

A small tip of the cap across the way to the folks at Fire Brand of the American League, MVN’s Red Sox blog.  Blogger (and um, MVN president… so, my boss) Evan Brunell takes a look at the question of whether or not Jonathan Papelbon is the best closer in the bigs.  First off, regular StatSpeak readers will know that I’m no fan of the position of “closer”.  I personally think the save rule has done more to ruin the game of baseball than steroids.  (Yeah, I said it.)  Evan uses Blown Save Rate, WHIP, and opponent’s SLG, and a few other stats (including, *cringe*, ERA) as his criteria.  Like a lot of stats, these aren’t terrible horrible awful ways to evaluate a closer, but there are plenty of better ones.
First off, let’s strip away one lie about closers.  A pitcher is not automatically good for having put up 30 saves.  It just means he most often pitched in the ninth inning, when saves are handed out.  A scrub pitcher who “closes” for a decent team would probably pick up 30 saves in the process.  A closer is a reliever who is participating in one of the greatest con jobs in history.  He makes 5-8 million dollars to do the exact same thing that the 7th inning guy does, but does it 30 minutes later in the day.  And the seventh inning guy gets paid 500K.  (Before you say, “But the closer is under so much pressure, please read this.”)
So, let’s take a look at all the relievers in baseball and see who’s doing a dandy job.  First, let’s stop off at win probability added (WPA).  Your top seven relievers in MLB?  J.J. Putz, Takashi Saito, Rafael Betancourt (non-closer!), Tony Pena (after all those years of catching in the 80s, who knew he could pitch!), Papelbon, Joe Nathan, and Hideki Okajima.  Some of you can perhaps see where this is going.  Batting Runs Above Average is another fun stat to use.  Your top seven (as of this writing): Putz, Okajima, Betancourt, Matt Guerrier, Carlos Mamol, Kevin Cameron, and Papelbon.  Hmmm… J.J. Putz is looking pretty good this year.  But there’s Okajima hanging out there in the top seven of both categories along with Papelbon.
As far as WHIP goes, Okajima has a WHIP of 0.82, Papelbon has a 0.83.  Call it a tie.
Okajima’s VORP? 32.1.  Papelbon’s?  21.2.
Finally, let’s get down to the issue of saves.  If we’re going to keep this dreadful awful stat around, then let’s at least put some context around it.  MLB was kind enough to include “holds” as an official stat starting last year, and I commend them.  A hold is a save, except that the pitcher who did it didn’t have the good sense to tell his manager to have him do it in the ninth inning, rather than the 7th or 8th.  The rules for a hold are basically the same as for a save.  I consider the two to be equal (at least for the moment).
Okajima has 24 holds, 4 saves, and 2 blown saves/holds.  So, he’s protected 28/30 close leads handed to him, for a protection percentage of 93.3%  Papelbon has 1 hold, 30 saves, and 2 blown saves/holds, for a protection rate of 93.9%.
Moral of the story: don’t be fooled by gaudy save totals.  Papelbon is a very good relief pitcher, and is one of the better relievers in the game.  But, for some reason, no one wants to give Okajima the same love.  He does all the same things that Papelbon does, it’s just that he doesn’t get the saves.  I’d say it’s pretty fair to call it a jump ball as to whether Papelbon is even the best closer on the Red Sox, much less all of MLB.

27 run save?

For those of you who missed it last night, the Baltimore Orioles jumped out to a 3-0 lead on the Texas Rangers last night in the first game of a doubleheader, but couldn’t hold it and lost 30-3.  The Rangers had more runs in the game (and hits) than they did outs.  (Apparently not happy with that outburst, the Rangers put nine on the board against the Orioles in the second game of the doubleheader and won that game 9-7).   Sounds like that Mark Teixeira trade worked out OK for them.
But, in recognition of one of the most curious technical applications of the save rule ever seen, we here at StatSpeak would like to congratulate Rangers pitcher Wes Littleton for notching his second major league save.  Littleton, who pitched the final three innings for the Rangers in what can only be described as the very essence of garbage time, was credited with the save in a game which his team won by 27 runs.  Apparently the Rangers wouldn’t have won without Littleton’s… heroics?  To his credit, he did enter the game in the seventh inning with the Rangers only up by 11 runs.  If you read the save rule though, it does say that if the last guy out there pitches 3+ effective innings, he gets a save.
My father has a saying for times like these.  “I think that’s nice.”

Is the home plate umpire a racist?

For the past week or so in the Sabermetric blogosphere, there’s been a rather interesting discussion of a paper by a quardrivirate of writers looking at an interesting question: Is the home plate umpire a racist?  For their data set, they looked at all “called” pitches in the years 2004-2006, that is all pitches that were either a called ball or a called strike.  Then, they looked at the percentages by differing combos of racial groupings for umpires and pitchers.  For example, what percentage of called pitches were strikes when the umpire was White and the pitcher was Hispanic?  Their conclusion was that there was evidence of bias, although the bias seemed to disappear when people were watching closely (on a 3-2 count, when a large crowd was at the game, or when MLB was spying on them with the QuesTec system.) 
The story was picked up by MSNBC, United Press International, and Time Magazine.  The actual statistical minutiae have been hacked to death by countless Sabermetrically inclined bloggers (for a lively read not for the statistically faint of heart, go here or here or here or here), and I’ve been participating in the discussion here and there.  The general consensus has been that if there is any discrimination at play here, it’s probably on the order of one pitch out of a few hundred or so.  (I found that the most egregious combination might have resulted in a change of 1 pitch out of 170.  For a starter, you’re talking about one called pitch every 3 or 4 games.)  But, there are all sorts of methodological problems, mostly stemming from the fact that most umpires and pitchers in MLB are White.  It’s not a firm case that the effect seen is real, but let’s for a moment say that home plate umpires really do discriminate based on race.
I’m not here to re-hash the statistics.  Instead, I’m more interested in wearing my psychologist hat to explain a little bit of why it might be the case that there is evidence of racial bias.  After all, if I had each of the MLB umpires in a room with me, they would probably all swear up and down that they were not racists, that they did not call the game any differently based on the race of the pitcher, and that they were offended by the very thought of their discriminating.  After all, other than the morons who populate the KKK, no one ever proudly proclaims “I’m a racist.”
But, it’s not quite that easy.  Humans have been shown over and over again to display something called in-group bias and not even know it.  We like people who share some common feature as us, no matter how irrelevant that feature is.  Ever look more kindly on someone because he belonged to the same fraternity as you, despite the fact that you went to different schools at different times in different states?  Researchers, led by social psychologist Henri Tajfel, did a series of experiments a few years back that showed how minimal a connection was needed for a discernable effect.  In one experiment, they fooled the people into believing that they had been classified into groups based on their mathematical ability when in reality they had been randomly assigned.   When asked to divide up resources among members of their group and members of an “other” group, they gave more to people in their own group than the other group. 
A few years later, the researchers re-ran the same experiment, this time dropping the pretense of mathematical ability being the grouping factor and just outright telling people that the assignments were completely random.  (I think they actually pulled names from a hat in front of them to drive the point home.)  But, people still showed favoritism to people in their group, despite having nothing else in common.  Not that any of them were aware of it.
You can perhaps see how this would become racism very quickly.  Skin color (which isn’t the same thing as race — and some people would argue that there is no such thing as race) is easily discernable and easy to use as the basis of forming a group (or as Kurt Vonnegut said, a granfalloon).  I don’t want this to be seen as an excuse for racist behavior.  Racism has no place in fair competition or in any gentlemanly pursuit.  This in-group bias is something to be recognized and overcome.  That’s a lot easier said than done, but it’s not impossible.  Humans have violent urges too, but we control those (usually).  And to the umpire’s credit, if there is a bias, they certainly haven’t let it overrun their judgment.  Again, even the most generous estimates of the magnitude of the bias say that it’s 1 pitch in 100.  My point is that before you start blaming the umpires, consider a brief look in the mirror.
The umpire may be falling prey to a fairly widespread human trait.  But, not all the time.  Remember, the bias (again, making the assumption that it’s a real effect) disappears in situations where more people seem to be watching.  This too appears in the professional literature.  People, who all deny that they are racist, will still exhibit mildly racist behavior, but will reduce it or stop when they know that someone’s watching.  Technically, it’s called positive impression management.  Colloquially, it’s called putting your best foot forward or “watching what you do because your mother-in-law is in the room.” 
All this is to say that the umpire is human, and that this sort of finding, even if it is a little disconcerting, is something that all of us would run up against if we were placed behind the plate with one of those clicker thingies

Fan fielding survey

Need to waste some major time at work?  Do I have a job for you!  You can even assuage your guilt by thinking “I’m helping a research project.”
Tango Tiger is running his fifth annual survey of the fans concerning fielding ability.  Basically, you go to this webpage, find a team (or teams) that you have observed  a lot and rate each player on seven defensive skills.  As Tom points out, “There is an enormous amount of untapped knowledge here. There are 70 million fans at MLB parks every year, and a whole lot more watching the games on television.”  He’ll publish the results on his webpage.  If you haven’t before, spend some time poking around his webpage.  It’s for the advanced Sabermetric scholar, but it’s informative nonetheless.
What Tango is going for on this one is the fairly well-established idea of the “wisdom of the crowds.”  If you get enough independent eyes on something, and average out all of the responses, it will produce a pretty good measurement of something.  There are times when the crowds fail.  Consider, millions of people are convinced that Paris Hilton has some sort of talent

Is speed really that important, Part II

Two weeks ago, I said that speed wasn’t all that important.  Oh sure, it plays some part in a runner succeeding when he tries to take an extra base, but it’s a rather small part.  I looked at a few situations in which speed might come in handy (e.g., going from first to third on a single, stealing a base) and found that speed (as measured by Bill James’s speed score method) only predicted a small amount of variance in success rates.
Does speed predict whether or not the manager/third base coach will send the runner in the first place, though?  Yes, it does, and actually speed is a better predictor of whether or not a runner will be sent than whether or not he will make it.  I looked at four situations in which a manager/3B coach actually has to make the decision to send the runner (1st to 3rd on a single, 2nd to home on a single, 1st to home on a double, and stolen base attempt), and looked to see whether an attempt was made in each situation where the relevant circumstances were present.
First to third on a single: attempt R-squared, 1.4%; success R-squared, 0.2%
First to home on a double: attempt, 2.0%; success, 1.0%
Second to home on a single: attempt, 1.7%; success, 1.2%
Stolen base (of 2nd): attempt, 10.2%; success, 4.2%
Speed is a much better (although that’s a very relative term) predictor of whether a runner will be sent than whether he will make it.  Still, speed isn’t predicting a whole lot of the variance.  For those of you not familiar with this methodology, this doesn’t mean that MLB teams are sending everyone.  (Far from it… from 2000-2006, the “send” rates were 31.8%,  45.3%, 63.0%, and 7.1% respectively).  It just means that when deciding whether or not to send the runner, speed is something like 2% of the decision with the ball in play and 10% of the decision for a stolen base attempt.
With a stolen base attempt, there are some situations in which a manager wouldn’t attempt a stolen base, even with Jose Reyes at first and Mike Piazza behind the plate.  If someone’s up 16-1, there’s an unwritten rule that you don’t try to steal there.  I restricted the sample to the first three innings when it’s generally more respectable to try to steal, although this only bumped up the R-square to 11.4%.
So, if not speed, what are coaches looking at in making their decisions?  My guess is that when the ball is live on the field, they’re again looking at where the ball is.  When I did a study on sac flies, where the ball was in the field explained about half(!) of the variance in whether or not the runner was sent.  So, if you cheer for a team where they keep a guy around just because he’s fast, write to the general manager and question his sanity.  Not only is speed a huge factor in whether or not the runner will make it, managers aren’t even really looking much at speed anyway.

Taking a nap on the bases, the Scott Podsednik story

43, 70, 59, 40.  From 2003-2006, those were Scott Podsednik’s seasonal stolen base totals.  In 2004, he led the National League in SB and he was 2nd twice and fifth in another year.  Podsednik must be a good Ukrainian and a good baserunner, you think to yourself.  You’re half right.
Last year, Scotty Pods led the league in another baserunning category that no one ever talks about.  Podsednik made the most baserunning blunders of 2006.  Podsednik managed to get picked off an astonishing 12 times, 7 times on a simple pickoff, and 5 times in which the pitcher moved to first while Podsednik was on his way to second.
A few definitions: There are four major baserunning blunders which I am counting here.  The first is being picked off a base by the pitcher (or in some cases, the catcher).  This one’s fairly easy to calculate.  Retrosheet‘s event files contain fields for pickoffs indicating whether or not a pickoff was recorded on the play.  In any case, the pickoff is usually a sign that the batter “fell asleep” on the basepaths, in this case, not “reading” that the pitcher was throwing to first and/or leaning too far away from the bag.
There’s also another type of pickoff, which is generally given as the “pickoff/caught stealing”, in which the pitcher throws to first, and the runner is either already in flight to second or makes the decision that he’s a dead duck anyway and might as well go out in a blaze of glory, and so heads to second.  I’ve counted these two separately, as in this case, the runner is more likely to have been running, but guessed wrongly on whether the pitcher was throwing to home or to first.  The distinction is slight, but I choose to maintain it.  I believe that a straight pickoff represents a runner taking a nap while a POCS represents a runner being too aggressive.
Another couple of baserunning blunders to consider: One is a runner who is doubled off a base on a fly ball.  In this case, I looked for runners who were doubled off only on fly balls to the outfield.  A runner might be doubled off on a screaming liner right at the second baseman and have no chance at all to get back, so out of fairness, I eliminated fly balls caught by infielders.  But there’s little excuse for being doubled off on an outfield fly.  The runner misjudged it, plain and simple.
One more blunder that’s a little more exotic: being thrown out over-running a base.  Consider this curious line from a Retrosheet event file: D9, BX2(94).  That translates into “double to right field, batter out at second base, with the out going from the right fielder to the second baseman.”  How could a batter be both safe and out at the same base on the same play?  If he hit second safely (double is now recorded in the books), but took too big a turn, and the right fielder threw in behind him and the runner became dead meat, you have a baserunning blunder and an out.  I coded for a few different circumstances in which a runner could over-run a base and counted them up.
Your top twelve baserunning blunder-ers of 2006? 
1) Pods 12 blunders (40 SB in 2006)
2T) Ryan Freel 10 (37)
2T) Wily Taveras 10 (29)
4T) Brian Roberts 8 (36)
4T) Jose Reyes 8 (64)
4T) Alfonso Soriano 8 (41)
7T) Jose Bautista 7 (erm… 5)
7T) Jamey Carroll 7 (10, with 12 CS!)
9T) Dave Roberts 6 (49)
9T) Ichiro Suzuki 6 (45)
9T) Ryan Zimmerman 6 (11)
9T) Juan Pierre 6 (58)
But wait, before we pile on Pods, notice something about that list.  Most of the guys on the list are the ones who are running a lot anyway.  They steal a lot of bases, and so they are likely to be the ones who draw a lot of pickoff throws, and most of those “blunders” are pickoffs.  Base-stealers might be considered high-rolling gamblers in a way.  Certainly, their stolen base totals are gaudy, but they also pay a price for their aggressiveness on the basepaths, both visible (caught stealing) and invisible (pickoffs).
In other findings, there were only 14 instances of a runner over-running a base (by 14 different gentlemen) and 49 of runners being doubled off, with only Travis Hafner, Javy Lopez, and Jacque Jones being repeat offenders.  Milton Bradley managed to pull something of a unique quadrifecta in 2006, being picked off once, POCS once, over-running a base once, and being doubled off once.
One name conspicuously missing from the list of those who made a baserunning blunder last year?  The supposed king of baserunning (and otherwise) cluelessness, Manny Ramirez.

A moment of silence please

It wasn’t an unusual game in any way that night in Oakland.  It was a Friday night game on the West Coast and the Mariners were in town to play the A’s, with the A’s hoping to move into a tie for first place with the Texas Rangers.  The Mariners, however, were only 2.5 games out, and on this night, they jumped on A’s starter Ron Darling for six runs in the second inning to open a 6-0 lead.  With Randy Johnson on the mound, Seattle didn’t need much more.  Early in the morning hours of August 12, 1994 (at least on the East Coast), Randy Johnson finished off a trademark 1-run, 4-hit, 15-strikeout performance, and threw the last pitch of this night of baseball past Ernie Young.
It was a strike.
Thirteen years ago today, what should have been a bustling Saturday of Major League Baseball was reduced to an eerie silence.  And every night for 200+ days, I played Willie, Mickey, and the Duke on my cassette player (it was 1994…).  It was all my fourteen year old mind could think of to fill the void left by baseball.  After all, for the first time in about 35 years, the Cleveland Indians were actually good, and at that point were leading the chase for the (newly created) Wild Card.  They say that you never realize what you’ve got until it’s gone and that summer, I learned that there really is nothing like baseball on a summer night.
There was no World Series that year, and for a little while in 1995, it looked like baseball would be played as something as a real-life fantasy camp.  But what would have happened in 1994 had all that unpleasantness not transpired?  What if baseball had continued?  Allow me to perhaps waste inordinate amounts of your time answering that question.  In 1994, Diamond Mind baseball simulators took to simulating the rest of the 1994 season and reported the results to the world.  It was all that we had.
Back in 1994, there was something else that was fairly new to the American consciousness called “The Internet”.  This was a time when people asked whether or not you had an e-mail address rather than simply asking what it was.  Strangely enough, back in 1994, people were getting most of their information from places like newspapers and radio, as “The Internet” wasn’t really so much an information superhighway yet, but more of a jaunt through a suburban subdivision on a Thursday afternoon.  But, thanks to the magic of the internet archive, there was one site (the now-defunct) Nando Times, which has a (partial, I believe, as only some of the links are working for me) archive of those box scores.  I won’t spoil the ending for you, but here’s the main page.  (Warning: This is an internet quicksand trap for baseball nerds.)
I write this as a lesson to all the stat geeks out there (like me!) and all baseball fans, for that matter.  Today, take a moment of silence today to remember that despite all the steroid allegations, the fact that the (string of expletives deleted) Yankees are somehow back in the playoff race (this seems to happen every year…), and people’s strange belief in the power of clutch hitting, life would be really boring if all that we had was cold computer processing instead of real live baseball.  In 1994, all that we had was a computer simulation and Terry Cashman.  If you’re old enough to remember that time, don’t ever forget it.
It’s why every year around August 12th, my heart sinks a little.  I work as a Sabermetrician because I love the game of baseball, and 13 years ago today, they took the game away from me.

Do you have any idea how fast you were going?

Seems that at StatSpeak, we’re all about speed.  I teased this in my last post on speed, but I recently wrote an article for SABR‘s Statistical Analysis Committee newsletter By The Numbers, in which I develop a whole new system for speed scores based on some advanced statistical methodology.  But, after all that, it turns out that Bill James’s speed scores are pretty much all we need after all.
 You can read the article here (under my ever-so-secret real name, starting on page 8).  Fair warning, that’s a PDF file.
For those interested in speed scores for the whole league last year under my model, I’ve posted that as a Google Doc here.  Players are listed by their Retrosheet ID and at least right now, are sorted from fastest to slowest.  There are a lot of missing values on the chart, owing to the way in which I calculated things.  (I used a log-odds ratio method for all the probabilities.  If someone was perfect (100%), then the log odds ratio was undefined (odds ratio is 1/0).  If someone got caught every time, their odds ratio was zero, for which there is no natural log.  I tried to correct for that with some other trickery, but some guys just don’t have speed scores.  Also, only players with 100+ PA last year are included.
Enjoy.

Follow

Get every new post delivered to your Inbox.