Baserunning and its’ Dependence on Environment

First of all, my apologies on the long hiatus since my last post. I was dealing with writing what turned out to be a 29 page senior thesis on the lack of skill of current US numerical forecast models in predicting the path and weather impacts from Nor’easters and presenting my findings to a general scientific audience. Not an easy couple of weeks, but things get a little easier from here, so I should have some time to comment more regularly again.
On the matter proposed in the title of this post, I’ve been working for some time on refining a method for using PBP event data to rate every aspect of baseball performance, and one of the most difficult areas to assess is baserunning. It’s difficult because there are frequently multiple baserunners, the result of a baserunning play is heavily dependent on the batted ball trajectory or direction on the field, the skill of other runners, and the skill of the fielders. In general, however, I plan to apply the same method to rate baserunning that Tom Ruane pioneered several years ago using a smaller data set (1973-1992 only) with a few important changes. The method goes something like this:

  1. Given a starting base/out state, a batted ball trajectory, and a basic event type, find the average resulting run expectency after all similar plays conclude.
  2. Figure out the run expectency after this particular play.
  3. Charge differences to the runners based on repeatable methods of distributing those differences.

For a single baserunner and a typical ball in play, this is fairly straight forward. If the guy on first gets to third 25% of the time on average on a single, and on this play, he got to third, you would find the value of reaching third and the value of reaching only second,subtract the average final run expectency from the run expectency of runners at the corners and there you go.
For multiple runners, it starts getting complex. If there are runners at first and second and a single is hit, the runner from first can only go to third if the runner from second tries to score. In short, the lead runner who can be forced sets the tone for the rest of the baserunners behind him and of the runners who cannot be forced, the lead runner sets the tone for the followers (runners at second and third for example, the runner at second can only tag and go to third on a fly ball if the runner on third tagged). Or more generally, the most advanced baserunner is more important than the one before him, who is more important than the one before him who is more important than the batter.
As such, I believe the best way to rate baserunning depends on something called conditional probability. You would phrase a question like this: “Given that the runner on second scored on this single, what is the probability that the batter reached second on a fielder’s choice throw home?”
This approach comes with problems though. For rare events (for example, bases loaded, one out, a ground ball single is hit, the first two runners score, the runner at first is thrown out at third, the third basemen then tries to throw out the batter who is advancing to second on the throw to third and lobs the ball into right field allowing the batter to score the third run of the play), conditional probabilities get all blowed up as you can imagine. How many times does the runner at first get thrown out trying for third from a bases loaded/one out starting state on a groundball single…let alone all of the other crazy stuff I mentioned happening after that? In all 49 years of PBP availability it’s happened 11 times…the exact play I just described.
To combat this problem of small sample sizes without giving up on conditional probability, I thought I could make the assumption that while the rate at which batting events occurred changed in different leagues, thus affecting the run scoring environment, the state to state probabilities probably didn’t change much for any given event. You’re just as likely to go from first to third on a single now as you were in 1968.
I thought wrong.
I tested that assumption using a very simple condition…less than two outs, runner at first (no other runners) and the batter hits a ground ball single. That’s it. What I found was documented in this article over at detectovision.com: http://detectovision.com/?p=1027
Suffice it to say, I am now convinced that linear correlation between run scoring rate and baserunning probabilities is necessary in order to allow me to continue to use the entire PBP database as my sample rather than individual leagues (to keep sample sizes fairly big) without losing accuracy. I’d be interested in some of your thoughts as to what the best approach to this problem might be.

Advertisements

18 Responses to Baserunning and its’ Dependence on Environment

  1. Pizza Cutter says:

    Matt, you’re also going to have some significant park effects here, I believe. Some suggested reading: http://baseballpsychologist.blogspot.com/2007/03/runner-tagging-from-third-heres-throw.html

  2. Edo River says:

    the value of reaching only second,
    So what is this value?

  3. Edo River says:

    One more question about this: subtract the average final run expectency from the run expectency of runners at the corners
    How do you get the “final” run expectancy.

  4. Matt Souders says:

    Any run expectency can be found for any base/out state through the use of Markov chains…suggested reading:
    http://www.pankin.com/markov/theory.htm
    That’s how I’ve done run expectencies for my database.
    For the simple runner at first and a single is hit, the value of first and second and no one out is roughly 0.9 (it varies by run-scoring environment), wheras the value of first and third and no one out is roughly 1.25, for example.
    That’s the kind of value difference I’d like to be able to “see”…but there are a lot of complicating factors.

  5. Matt Souders says:

    Pizza Cutter: I wonder if it might be a good idea for you to hold “classes”…IOW, write articles explaining some of your prefered statistical methods. I consider myself a skilled statistician, but 99% of the tests you use I’ve never heard of.

  6. Pizza Cutter says:

    Not a bad idea. I teach college stats in one of my day jobs… it’s something of a natural outgrowth. I think I can handle that!

  7. John Beamer says:

    Matt — great idea …. I’d like that

  8. Matt Souders says:

    I’m not saying I know everything…far from it…but if your methods are unfamiliar to me and my 3 semesters of college statistics and six years of sabermetric experience, then chances are, you’re losing every single other person who might read this thread as well. Social scientists have invented a gazillion really nifty and useful statistical methods that aren’t covered in your typical college stats classes…which is a big reason why you were recruited for this blog…teach us a thing or two, would ya? 😀

  9. tangotiger says:

    There are three major reason why the baserunning rates would differ:
    1 – Parks. The closer the OF is playing, the less chance you’ll have to take the extra base. As well, turf parks force the fielders to play a bit farther. As we know, parks changed considerably in the last 40 years.
    2 – Run Environment. The higher the run environment, the larger the cost of the out. However, the value of the base does not rise to the same proportion. So, with the breakeven point higher for SB and for baserunning/taking extra base, it will be less likely that a runner will try for the extra base.
    3 – Actual players. The Vince Colemans and Willie Wilsons do not fill our league like they used to. It could very well be that this era is filled with guys who simply don’t have the speed to take the extra base.
    If you were to create a chart like this:
    http://www.tangotiger.net/destmob.html
    for say a group of speedsters today (Crawford, Pierre, Ichiro, etc), and compare it to speedsters of yesteryear (Raines, Rickey, Coleman, Wilson), might they be the same? Maybe. But, you might have a disproportionate number of those speedsters back then.
    In any case, all three are plausible.
    Tom

  10. Matt Souders says:

    Good comments Tango…nice to see you dropping by here. 🙂 That’s a good sign for this blog.
    The parks are a good point…but I have no idea how I would account for that problem in the pre-PBP era. perhaps a correlation between average wall-distance and bulk baserunning?
    As for the idea that the speedsters of today are just…slower…than the speedsters of yesteryear…it’s possible, but I doubt it. Vince Coleman was fast…Ichiro is much much faster. He just doesn’t steal as often because he’s a perfectionist and only runs when he knows he’s got it stolen.

  11. Pizza Cutter says:

    Something I’ve thought about doing is checking to see whether ground balls actually go through the infield more often in some parks (turf effect). In the outfield, you may want to account for the fact that some players don’t have a lot of ground that they have to cover, while others do. Somewhere out there, someone calculated the square footage of each major league ballpark. Perhaps that could be a start?
    I wrote something on taking the extra base and park effects here: http://baseballpsychologist.blogspot.com/2007/03/runner-tagging-from-third-heres-throw.html

  12. DanAgonistes says:

    I haven’t tried to tackle multiple runners yet (or more specifically trailing runners) because of the problems you mentioned. More recent PBP codes include when the runner took the next base on the throw but you still have the problem where a runner was thrown out by the defense decoying the trailing runner and allowing the lead runner to score. Not sure that should really count against the lead runner (or at least not as much).
    As for what affects advancement percentage, parks are certainly a part of it and I’ve calculated hit advancement park factors by field. I’m not in a place to get at them at the moment but I wrote a little about it at http://www.baseballprospectus.com/article.php?articleid=5380. For example Fenway Park was .87, .99, .98 left to right and Yankee Stadium was 1.06, 1.02, and .98.
    My sense is that run environment is the smallest of the three factors.

  13. Matt Souders says:

    If that’s true…then why is there sucha a strong and obvious correlation between 1st to 3rd advancement percentage and league RS/G? If you go to the article I linke3d…you’ll see a couple of graphs that make it VERY clear that run scoring environment is a BIG factor.

  14. tangotiger says:

    As for the idea that the speedsters of today are justslowerthan the speedsters of yesteryearits possible, but I doubt it. Vince Coleman was fastIchiro is much much faster. He just doesnt steal as often because hes a perfectionist and only runs when he knows hes got it stolen.

    I didn’t say the speedsters of today are slower. I said that the average runner today is slower.
    Assume that the speedsters of today are just as fast as yesteryear. Assume that the fatsos of today are just as slow as those of yesteryear.
    If most of the league in yesteryear were gazelles, guess what, you have an average fast runner.
    If most of the league today are fatsos, guess what, you have an average slow runner.
    Even though, on a per-type group, they are equally fast.

  15. Pizza Cutter says:

    Has anyone looked at it this way? We’ve got Height/weights (roughly) for players (at least what they were listed at). It seems like most of the speedsters were skinny guys (relatively) built like Olympic sprinters. With the shift to the power-based offense, it seems like a more muscular (I hesitate to say fatter) build is in. Does body size (perhaps using body-mass index… which is weight/height-squared, a standard measurement used in medicine) correlate with speed/baserunning measures?

  16. Matt Souders says:

    Barry Bonds continued to be very fast even when he bulked up. I don’t think weight can really be used to discount the speed of modern players. I’m not terribly convinced that it’s really that much bulkier and slower today despite the increase in power. I think the increase in power is leading to a more conservative approach, and I’m not necessarily saying I think Tango is wrong, because he could easily be right…I’d just like to see some strong supporting evidence before I am totally convinced.

  17. tangotiger says:

    “I think the increase in power is leading to a more conservative approach”
    Right, that’s my point #2.
    “Im not necessarily saying I think Tango is wrong, because he could easily be rightId just like to see some strong supporting evidence before I am totally convinced. ”
    I myself am not convinced, but the idea makes sense.
    In a somewhat related post here:
    http://www.insidethebook.com/ee/index.php/site/comments/peak_offensive_age/#19
    I showed how the number of non-power hitters relative to power hitters has changed dramatically, and at the same time, the peak offensive age of the non-power hitters is 1.0 to 2.5 years earlier than power hitters. The makeup of the league has definitely changed over a very short 10 year period.
    It is a given that there would be *some* effect (i.e., less number of fast runners today, therefore, less chance of taking an extra base). The key question is, as always: “How much?”.
    Maybe it’s insignificant, I don’t know.

  18. DanAgonistes says:

    Well, that’s why it was just my sense 🙂 I didn’t see the url embedded in the response but it certainly appears run environment is a factor here. It’ll be interesting to see if the rate continues to decline if run scoring continues to stay fairly constant.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: