## Stats 201: Binary logistic regression (or why your team didn't make the playoffs)

April 29, 2007 10 Comments

I suppose that it’s the mark of a true nerd that I actually have a couple of “favorite” statistical techniques.

An archive of StatSpeak from its days on MVN

April 29, 2007 10 Comments

I suppose that it’s the mark of a true nerd that I actually have a couple of “favorite” statistical techniques.

April 29, 2007 10 Comments

I suppose that it’s the mark of a true nerd that I actually have a couple of “favorite” statistical techniques. (Nerd pride!) My training is as a psychologist and I spend a lot of my time at my “real” job studying human behavior, so a lot of the questions that I ask when I am playing around with my Retrosheet files go something like “Why did he do that?” Why did the third base coach send the runner there? Why did the voters go for Colon over Santana in 2005?

Baseball is full of a thousand little decisions, made on the field and off. Players don’t always make the same decision each time, but there do seem to be some pretty stable patterns that develop over time. The nature of baseball, though, is such that the decisions can usually be broken down into a simple form: yes or no? Should I or shouldn’t I? Either I swing or I don’t. (I may check my swing, but the umpire will determine yes or no whether it counts as a swing.) Now, many of these decisions are made thousands of times over the course of a season (e.g., swinging or not), so we usually have plenty of data points on which to base our research.

The problem is that when looking at questions like “What factors influence X?” we usually reach for a multiple regression technique. What factors influence HR rates? Throw in a bunch of predictors and see what shakes out! The problem is that an outcome that is a yes/no question doesn’t quite work like that. We need a statistical test for just this type of occasion. We thankfully have one: binary logit regression.

Hardcore math alert ahead. Read more of this post

April 28, 2007 2 Comments

I have made some progress on Total Zone in the last week. Here’s part 1 in case youmissed it. For outfielders, I already separated line drives from flyballs. To further improve the ratings for outfielders, I added batter hand, as line drives and flyballs to the opposite field are more likely to be turned into outs. I also added a road park factor, which is not a big deal in most cases, but makes a slight difference for someone like Manny Ramirez, who doesn’t have to play in front of a big wall on the road, while other AL leftfielders sometimes do.

For infielders, batter handedness also matters. I did an adjustment for when first base is occupied, which affects 1st basemen, second basemen, and shortstops but has a minimal effect on third basemen. I didn’t look at every combination of base or out situations. I considered doing an infield park factor, using one factor for all positions (each OF position has its own factor). The factors were not huge, all teams are between .98 and 1.02, but they did show as significant (that’s what happens when your sample is over 8000 groundballs per team) so I decided to leave them in.

And now, here are the ratings, for 2005 and 2006, players who met a minimum number of chances.

April 25, 2007 4 Comments

One of my “real jobs” is teaching college classes in statistics and research methods at a large university somewhere in the Midwest. If you’ve ever wondered why I use a pseudonym, there’s one reason. Like a lot of Sabermetricians, I got into “the business” because I love baseball and my day job required me to know a lot about statistics. Then again, I probably owe a great deal of my familiarity and comfort with numbers and probabilities (the same thing that allows me to function in this job) to reading box scores when I was six.

The problem that a lot of would-be Sabermetricians cite as the reason that feel that they can’t contribute their own work or that they have a hard time reading others’ work is a lack of comfort with the statistical techniques involved. Let’s be honest: You avoided taking stats in school, if that was possible. If you had to take it, you held your breath until it was over, and were thankful for the C+ that you got. You then promptly forgot everything. I know, I teach the class. But, we are baseball researchers and, as I often tell my students, statistics is the language of research.

Over the next few weeks, I’ll be posting a few pieces on a few basic and not-so-basic statistical techniques useful in Sabermetrics. I’ll especially be covering the ones for which I have a special affinity. This first post is something of a baseline post for beginners. I direct it at those who need a refresher on the very basics or who have no formal training in statistics. If that describes you, start here. If you’re already familiar with the difference between descriptive and inferential statistics, standard deviations, and correlations, you can get off the boat here and not miss anything.

Read more of this post

April 23, 2007 14 Comments

This weekend, I did something I’ve wanted to do for a while now. I dusted off my copy of Baseball Hacks, learned some basic Perl programming, downloaded Retrosheet’s bevent program, emailed or a little help (thanks Joe) and built myself a play by play database.

Its a wonderful thing. I started playing around with it, and thought I’d see how far I can get constructing my own play by play defensive measure. For years 2003 through 2006, retrosheet has, for virtually every batted ball, a code for type. It will tell you if a ball was a grounder, line drive, flyball, or popup. There are also codes for who fielded the ball, and a spot for hit location using project scoresheet codes, but there isn’t much data there, so I had to do with out.

Here’s what I did: I charged all hits to a specific fielder, combined that with his plays made and errors, and came up with a zone rating. There are no areas outside a fielder’s zone. Infielders are charged when they make an error or field an infield hit. Outfielders are charged when they field any line drive or flyball hit or error. For infielders, I only look at ground balls. Ground ball singles to left are counted 1/2 towards third base, and 1/2 to short. CF singles are charged to 2b and short, and RF to 2B and 1B. In addition, groundball extrabase hits are charged 100% to 3rd base (if LF) or 1B (if RF).

I was surprised to find the results look very reasonable. In most cases they stack up well to more detailed play by play measures. It doesn’t capture as detailed data as zone rating, but is less subject to scorer differences and counts all balls in play, which is the type of measure that will reward fielders with great range.

All players are compared to league average, and for outfielders I was able to get park factors, since I can easily look at home and road numbers. Also, line drives and flyballs were looked at separately instead of being lumped together (something that BIS zone rating does). I aggregate the plus/minus rating for OF, but an OF is not penalized for having an unusual mixture of line drives or flyballs.

I’ll give a summary of some of the players who generate the most discussion. I probably have some revision work to do, such as regressing the park factors and who knows what else. Read more to see how individuals rate: Read more of this post

April 22, 2007 2 Comments

If baseball really were a religion, and some of my friends seem to think it is, then it would need a few sacred mysteries that we should all ponder. I nominate Denny Hocking. Baseball has had its share of below average players, but they never seem to stick around all that long. Hocking managed to play in more than 50 games in a season for eight consecutive years! Was it for his offensive prowess? Here’s a man with a career .251/.310/.344 line (wow!), who was a career 57% base stealer (not that he ever really went that often… 36 career steals), and didn’t hit for power either, (25 career HR in 2600+ PA), and had a career OPS+ of 68. He did lay down a few sac bunts here and there, but during his “peak” OPS+ year (2000, OPS+ was 94), he also set a personal best in sac bunts. What does it say when in the middle of your career year, you’re asked to bunt even more? It means that you need to find a new line of work.

The only thing that he really had going for him was that you could basically stick him anywhere on the field, defensively, and he would stand there. The man built an eight-year career out of simply being willing to play anywhere he was asked. Yes, Denny Hocking was the quinteseential “utility guy.” In his career, he managed to appear in at least 45 games at all seven of the non-battery positions. He even DHed a few times! Here’s the thing: When I looked again at Hocking’s fielding statistics (at least the one posted on baseball-reference, range factor), it showed that most of the time he was actually below the league average for each of the positions he played. So, it wasn’t that Hocking was some sort of defensive specialist or wizard. He was simply gullible enough to do what Tom Kelly asked of him and to do it poorly.

Let’s leave aside the fact that most utility players aren’t great with the bat. If they were, they’d probably be starting somewhere. They’re on the bench for a reason, generally because they supposedly know how to play a couple of different positions and can give the regulars a day off here and there. It’s an odd thing though. Generally, players are developed as position specialists (that is, they come up as third basemen or right fielders or something like that). Then, when teams discover that the players are marginal/bench players offensively, they are asked to learn the skills of another position or two to stick around as utility guys. Seemingly, the most popular combinations are the middle infield guy (who might also play a little 3B), the fourth outfielder (who might play 2 or sometimes all 3 of the OF spots), the corner infield guy (usually a failed 3B who can also play 1B). It’s rare that more exotic combinations (the 2B who also plays RF, 1B who plays CF) occur. Catchers are usually a species unto themselves.

But are players who are good at a certain particular position better candidates to learn to play other positions? What can we learn from the data out there? I took the Lahman fielding database, and selected out for 2000-2006. I picked those years because the data are broken down by OF position (that is, individual numbers for performance in LF, CF, and RF), and because those years have zone rating numbers for each position. Despite the problems with zone rating, it was all that is available for free.

Over those seven years, I selected for players who played more than 90 innings (10 games) in two or more positions (excluding pitcher). Sampling here was difficult. If I dropped the innings requirement too low, I run into the problem of a ZR based on too small a sample size. If I make the criteria too high, I would cut my overall sample sizes to an entirely too small levels. I then ran correlations on the range factors posted by at each position in each qualifying player-year. Before I get the comments, there are a few problems with this strategy. First, I’m going to be getting many of the same players from year to year, meaning that I’m violating the independence of observations assumption. Also, the folks I’m comparing are, by the fact that most are on the bench, below average for MLB (apologies go out to the Figgins family and Angels fans everywhere. Please don’t send the hate mail. I know he’s in there too. Right, Sean?). These guys are also the ones who the manager *thinks* can play more than one position. So, the representativeness of the sample is suspect. More so, correlation coefficients will have to be calculated relative to the respective means of range factor within this rather restricted sample. This study is far from ideal, and I would do things much differently if I had these guys’ data. But stay with me.

First off, you can learn a lot by counting things. Over the seven years of data, very few catchers came out from behind the plate for any extended period of time. The most common position for catchers to also play was first base, but there were only 20 player-seasons where a catcher caught 90 innings and played 90 innings at 1B. On the flip side, there were at least 200 cases in each combination of OF slots (LF-RF, LF-CF, CF-RF), and at least 130 in the each of the non-first base infield combos (2B-SS, 2B-3B, SS-3B). First basemen were most likely to two-time as (or be the alter egos of) 3B, LF, and RF (in that order).

So, which positions saw the highest correlation of skills? Would you believe center field and third base? (Huh?) The two were correlated at -.521. That means that good center fielders made for generally poor third basemen, and vice versa. This is based on a sample size of 23, so read into it at your own risk (it is significant at the .05 level), but it would seem on an intuitive level that the skills needed in CF (good range, catching fly balls) would be a lot different than those needed on the hot corner (quick reflexes, fielding ground balls). Any other significant correlations? There’s a negative correlation between zone rating at second and zone rating at first base and at second base (r = -.342) and a positive one between first and third (r = .225). As far as significant correlations go, that’s it.

What jumped out at me in this particular analysis was how little intercorrelation there was among zone ratings for these utility guys. Even SS and 2B, two positions largely considered to be interchangeable, weren’t even correlated. This means that knowing whether a utility player is a good SS tells you nothing at all about whether he’d make a good 2B. Outfield positions showed almost no correlation with each other, with the greatest correlation being LF-RF at .106, which is just shy of significance (p = .056).

So, largely, skill at one position doesn’t necessarily translate to skill at another position. Now, this is using a flawed metric and my research methodology is lacking, but it says something about utility players. If skill at the different positions is uncorrelated, then it becomes a challenge to find a guy who’s good at two or three of them and can still carry enough of a bat to warrant a spot on a major league bench. It also says something about Denny Hocking, who accomplished the truly remarkable by having a major league career despite doing none of the above.

April 20, 2007 18 Comments

First of all, my apologies on the long hiatus since my last post.

April 20, 2007 18 Comments

First of all, my apologies on the long hiatus since my last post. I was dealing with writing what turned out to be a 29 page senior thesis on the lack of skill of current US numerical forecast models in predicting the path and weather impacts from Nor’easters and presenting my findings to a general scientific audience. Not an easy couple of weeks, but things get a little easier from here, so I should have some time to comment more regularly again.

On the matter proposed in the title of this post, I’ve been working for some time on refining a method for using PBP event data to rate every aspect of baseball performance, and one of the most difficult areas to assess is baserunning. It’s difficult because there are frequently multiple baserunners, the result of a baserunning play is heavily dependent on the batted ball trajectory or direction on the field, the skill of other runners, and the skill of the fielders. In general, however, I plan to apply the same method to rate baserunning that Tom Ruane pioneered several years ago using a smaller data set (1973-1992 only) with a few important changes. The method goes something like this:

- Given a starting base/out state, a batted ball trajectory, and a basic event type, find the average resulting run expectency after all similar plays conclude.
- Figure out the run expectency after this particular play.
- Charge differences to the runners based on repeatable methods of distributing those differences.

For a single baserunner and a typical ball in play, this is fairly straight forward. If the guy on first gets to third 25% of the time on average on a single, and on this play, he got to third, you would find the value of reaching third and the value of reaching only second,subtract the average final run expectency from the run expectency of runners at the corners and there you go.

For multiple runners, it starts getting complex. If there are runners at first and second and a single is hit, the runner from first can only go to third if the runner from second tries to score. In short, the lead runner who can be forced sets the tone for the rest of the baserunners behind him and of the runners who cannot be forced, the lead runner sets the tone for the followers (runners at second and third for example, the runner at second can only tag and go to third on a fly ball if the runner on third tagged). Or more generally, the most advanced baserunner is more important than the one before him, who is more important than the one before him who is more important than the batter.

As such, I believe the best way to rate baserunning depends on something called conditional probability. You would phrase a question like this: “Given that the runner on second scored on this single, what is the probability that the batter reached second on a fielder’s choice throw home?”

This approach comes with problems though. For rare events (for example, bases loaded, one out, a ground ball single is hit, the first two runners score, the runner at first is thrown out at third, the third basemen then tries to throw out the batter who is advancing to second on the throw to third and lobs the ball into right field allowing the batter to score the third run of the play), conditional probabilities get all blowed up as you can imagine. How many times does the runner at first get thrown out trying for third from a bases loaded/one out starting state on a groundball single…let alone all of the other crazy stuff I mentioned happening after that? In all 49 years of PBP availability it’s happened 11 times…the exact play I just described.

To combat this problem of small sample sizes without giving up on conditional probability, I thought I could make the assumption that while the rate at which batting events occurred changed in different leagues, thus affecting the run scoring environment, the state to state probabilities probably didn’t change much for any given event. You’re just as likely to go from first to third on a single now as you were in 1968.

I thought wrong.

I tested that assumption using a very simple condition…less than two outs, runner at first (no other runners) and the batter hits a ground ball single. That’s it. What I found was documented in this article over at detectovision.com: http://detectovision.com/?p=1027

Suffice it to say, I am now convinced that linear correlation between run scoring rate and baserunning probabilities is necessary in order to allow me to continue to use the entire PBP database as my sample rather than individual leagues (to keep sample sizes fairly big) without losing accuracy. I’d be interested in some of your thoughts as to what the best approach to this problem might be.

April 19, 2007 8 Comments

After two and a half weeks into the 2007 season, your second-leading home run hitter is… Ian Kinsler? Kinsler, who took over the Rangers starting second base job full time last year (after not even breaking camp with the club out of Spring Training), hit 14 HR in 120 games last year. Following this performance, the major projector systems had him pegged for hitting between 13 and 20 HR over a full season of starting. Right now, he’s on a pace to hit roughly 6 million homeruns this year. At least to hear some people talk.

On the surface, Kinsler smells like this year’s version of Chris Shelton. Shelton hit 10 HR in April 2006 and was hailed as the next big thing, but ended up in Toledo for part of the year and hit 6 more major league home runs during the rest of the season. I used Shelton as an exemplar in an earlier post on the dangers of reading too much into small sample sizes, and Kinsler’s 46 PA at the time of this writing is a mighty small sample size from which to draw any conclusions.

Can he keep it up? Well, the question begs a bit of analysis. What else do homerun hitters do? I ran a quick stepwise regression to see what was associated with HR rate (log of the odds ratio). I threw in a bunch of predictors to see what would shake out, including rates of different types of hits (fly ball, ground ball, etc.), swing contact rate, swing rate, rates of different outcomes (2B+3B, singles, K’s, BB’s). Five variables emerged as significant predictors before the contribution to the R-squared started getting trivial. All in all, I nailed down about 75% of the variance. My data set was all hitter-seasons from 1993-1998 with at least 100 PA.

First of all, home run hitters also hit a lot doubles and triples. Kinsler has seven homeruns and only one double to his name this year (as compared to 27 doubles, a triple, and 14 HR last year). So far, it’s not looking so good for his keeping up his current insane pace. It looks more like he’s had a few doubles pushed over the fence by some fortuitous gusts of wind.

Second, hitters who make less contact with the ball (raw percentage of the time swinging that bat meets ball, no matter what happens from a foul tip to a moon shot) actually hit more homeruns (see this study for further investigation). That particular data wasn’t available on Kinsler for this year, although last year, he was in the top 10% in MLB in making contact with the ball (88%). It’s possible that he’s experimenting, successfully apparently, with a longer swing, but in general contact percentage is remarkably stable from year to year, with an intra-class correlation (AR1 rho) of .77. This means that about 60% of the variance in contact rate from year to year is consistent within a player. It’s hard for a leopard to change his spots. Kinsler was a contact hitter last year, and he did pretty well for himself. It’s hard to believe he would change much, but he might.

The final three predictors showed that HR hitters generally hit more fly balls and pop ups (that is, balls in the air), and that they walk more often. Kinsler’s stats do show that he’s hitting more balls into the air this year (48% vs. 44% last year), although those rates for this year are based on small sample sizes. Also, fewer of those fly balls are staying in the infield (13.8% vs. 6.3%). Here’s the big difference: Last year, 8.8% of Kinsler’s fly balls went for homeruns. This year, 43.8% of them have. There’s no earthly way that a jump like that can be sustained. He might have altered his swing a bit, but don’t expect that sort of rate to continue.

Kinsler shows a lot of signs that over a small number of plate appearances, he got a little lucky. My guess is that deep down, he’s still really just the guy that people were pegging for 15-20 HR this year. He might finish with 25 (figure that for the rest of the season he hits 15-18, which was the rate he was more or less predicted at), and I don’t want to besmirch a second baseman who can do that, but please people (especially you fantasy owners), don’t think that you’re going to get a 60 HR season out of him.

But if you own him in your league and want to trade up, remember, all you need is an owner who doesn’t know the basic laws of statistics. The one who traded for Chris Shelton last year. Talk to him, especially if he owns Chase Utley.

April 18, 2007 15 Comments

In sabermetric analysis, there are several circumstances where we need to adjust a stat by some type of factor. Examples include minor league equivalencies (mle’s), park factors, and age adjustments. Say Conan hits 70 homeruns in Coors field. How many would he have hit in a normal park? How many would he have hit in a tough homerun park, like Seattle? A top prospect hits .375 in the Pacific Coast League. What’s that worth if he’s called up to the Majors? Read more of this post