What can my glasses teach us about home runs?

Consider the act that you are doing now.  No, not wasting time at work.  You’re reading.  Ever stop for a moment and think about how complicated a process reading really is?  After all, you must be able to see the squiggly lines on the screen, recognize them as symbols of a language system (aka letters), be able to sound them out, put the combinations of letters together as words, and understand what they mean both as words and as part of a sentence.  That’s a lot of work, but of course, you’ve learned to do it almost automatically after years of practice.  Babies can’t read.  Why not?  They can see the squiggles, but they don’t (yet) recognize them as meaningful symbols.
One of the things that drives me nuts when people start diagnosing the problems (or successes!) of baseball teams or players is that most of the explanations focus on the need for the team/player to increase or decrease their performance in just one area.  (“He just needs to walk more.  Then he’ll be a Hall of Famer.”)  In addition to a general distaste for any theory which relies on “one magic bullet”, it shows a general misunderstanding of how people, including baseball players, develop.  A few weeks ago, I talked about a few insights that I have on baseball, specifically the development of baseball players, that come out of my work studying the development of kids.  Here’s another.
Now, this is one where I think the casual fan and the Sabermetrician (including myself) are both guilty of using “one magic bullet” thinking.  Sabermetricians just have prettier bullets.  But saying that “He just needs to walk more” is kinda like looking at a child who is having trouble reading and saying that “he just needs glasses.”  The logic is fairly sound.  If he can’t see the words, he won’t be able to read them, so certainly giving him a pair of glasses won’t hurt.  And, if he’s got all of the other skills necessary for reading (good phonemic awareness and processing, symbolic decoding skills, and a grasp of vocabulary, grammar, and usage), glasses will probably do him a lot of good.  Yes, the better your visual acuity (to a point), the better your reading ability will be, if you have all the other skills necessary for reading.  However, having perfect visual acuity is useless if you don’t have the ability to understand that those squiggly lines on the screen are letters!
An example: when I wake up in the morning in Cleveland, my glasses make the difference in my being able to read the road signs I pass on my way to work.  When I went to Moscow with my wife two years ago, glasses or no, I couldn’t read the signs on the subway.  In Cleveland, I know what the squiggly lines on the signs mean.  In Moscow, they’re just meaningless squiggles because they’re in Russian.  ?? ?? ?????? ????????? ???. ???? ?? ??????, ?? ?? ??????????? ???????.
When you change the level of one skill (comprehension of the language), the relationship between two other skills changes.  In the Cleveland case, improving my visual acuity by giving me glasses helped because it happened in the presence of another skills (comprehension of the English language).  In the Moscow case, improving my visual acuity made no difference and was not correlated with my ability to read Russian because I have no clue when it comes to the Russian language.  In one case, visual acuity and reading are correlated.  In another, the same two variables are uncorrelated.  That’s the essence of a moderator.
Sabermetrics (and a lot of other fields) need a more nuanced approach.  We like bivariate correlations and regressions because they show one variable’s effects on another.  They’re easy to understand, and it may indeed produce some interesting (which is not necessarily the same thing as useful) conclusions.  But here, I am arguing for a proper understanding of something called moderator effects and their application to baseball.  For illustration, I specifically chose to look at the effects that moderators have on batter’s home run rates.  I could do this type of analysis for any stat, really, but home runs are fun and people obsess over them.
Reader, if you’re interested in the mathematical guts of the method, they are hiding behind the cut.  The short version: I calculated a bunch of stats, both rate stats (1B rate, K rate, BB rate, etc.) and some swing diagnostics (swing %, contact %, pitches per PA) and batted ball stats (LD rate, GB rate) for all player-seasons from 2003-2007 (min 250 PA) and looked for interactions between these stats in predicting HR rates.  (I also looked at HR/FB, but the results were pretty much the same.)  A bunch of interactions popped up (I ran everything as a moderator of everything else), which tells me that (publicly available) Sabermetrics has a lot of work to do on this one.  I picked the ones that were a) the strongest and of those b) the most interesting to look at to report on, but there’s plenty more where this came from.
Three variables came out most strongly as moderators and in some rather interesting ways: contact percentage, swing percentage, and pitches per PA.  Swing diagnostics make a difference.
For example, there was an interaction between extra base hit (XBH) rate (doubles and triples) and contact rate.   Generally, we figure more XBH means more HR, but not always.  It depends on what’s going on with another skill, contact rate.  Contact rate changes the relationship between XBH’s and HR’s.
Players who had low contact rates in general hit more homers than those with high contact rates.  Makes sense since a power swing is generally one which sacrifices the chances for making contact in exchange for a chance at hitting the ball farther if you do make contact.  But, let’s say that you see a hitter’s XBH rate creeping up.  Should you expect more or fewer HR from him?  If he’s a high contact hitter, you should expect more.  If he’s a low contact hitter, you should expect fewer. 
Players who are already aiming for the fences will probably succeed, but some of the balls are bound to go over the fence and others just to hit the fence.  Some guys get unlucky in the sense that they have to make do with more doubles and fewer HR.  Players who are contact hitters and who usually hit singles are probably changing their approach a little and instead of prioritizing contact, they are aiming a bit more for the fences.  Moderators make a difference.
There’s another interaction I found between swing percentage and contact percentage.  Again, low contact hitters hit more HR, but what happens when a hitter swings more?  If he’s a high contact hitter, swinging more won’t really do much to his HR rate.  But, if he doesn’t make contact a lot, swinging more will actually depress his HR rate.  I’m guessing that if he doesn’t make contact and he swings a lot, instead of hitting HR, he’s striking out.
One other interesting find.  Pitches per plate appearance is a rather interesting moderator of a well-known property of HR hitters.  HR hitters strike out a lot.  But, the ones who see more pitches per PA, as their strikeout rates go up, their HR go up much more than if they don’t see a lot of pitches per PA.  So, the ones who are better at extending the count are the ones who get more bang for the buck in terms of HR gained for each strikeout.  But there’s another effect of PPA that should prove rather interesting.  High PPA hitters, when they hit more flyballs have a sharper increase in HR rate than those hitters with a low PPA number.  So a high PPA player, when he hits more flyballs, hits better quality flyballs.
A rather important note: everything here is cross-sectional.  The reader may be thinking that since everything here is measured within the same year, there’s no way to prove causation.  For example, is it that hitters who like to extend at bats out (and have high PPA numbers) then develop the patience and selectivity to pick out good HR pitches or is it that players who hit a lot of HR choose instead to wait on more pitches?  If you’re thinking that, you’re ready for the next step: multi-latent developmental models.
For those who are hungry for more methodology, some major numerical nerdiness follows.  If you’re happy just knowing a few more things about home runs and perhaps a little bit more about a few new wrinkles that should be in the greater Sabermetric methodology, you are now excused.
Read more of this post


Surprise! Kelly Johnson has gotten better this year

Recently, there was a note at the ever-excellent MLB Trade Rumors which said that the Atlanta Braves were likely looking to shop second baseman Kelly Johnson in the off-season.  The post noted that Johnson’s offensive production had declined this year, and the Braves do have fashion designer Martin Prado ready to play second next year.  I don’t mind the thought that the Braves might think Prado the better option.  He strikes out much less than does Johnson, although Prado seems to have a bit less power.  The part that I object to is the thought that Kelly Johnson is actually “losing it” this year.
Certainly, Kelly Johnson’s performance has suffered.  Last year, his slash line of .276/.375/.457 was rather nice for a second sacker.  This year, Johnson has slipped a little with a slash line around .260/.335/.400.  Not bad, but not what Braves fans were hoping for.  So Johnson must be losing his mojo, right?  Not necessarily.  In fact, I’d say that Johnson has actually gotten better this year.  How does a player drop 80-90 points worth of OPS and become better?  Read on.
First, let’s look at Johnson’s swing and plate discipline profile.  What’s important to know is that things involving plate discipline and swinging are the least given to variation over time.  It makes sense, because players are the ones who decide whether or not to swing the bat.  Hitting a home run requires cooperation of the pitcher, ball, and occasionally, wind.  This year, Johnson, a man with a strikeout problem, and a rather pedestrian contact percentage (around 80-81%, which is around the league median) actually started swinging more.  And that’s a good thing.  In 2007, on my twin measures of plate discipline, Johnson had a response bias rating of 0.84.  Now, response bias is a measure of how likely a player is to swing.  The ideal number is 1.00, because it minimizes the number of strikes that a player piles up, given whatever abilities he has on the other measure, sensitivity.  A number over 1.00 means that a player is swinging too much.  Under 1.00 means the player is swinging too little.  In 2007, Johnson’s problem is that he was taking too many pitches.  Johnson took a step toward fixing that.
My measure suggested that Johnson would benefit from swinging more, and he has done so.  Last year, Johnson swung at 39.3% of pitches.  This year, he’s been up around 45%.  (Maybe he reads StatSpeak?)  His strikeout rate has dropped (although only about a percentage point) in response.  Swinging more also drove down his walk total, but it meant that he was putting more balls into play.  So, let’s look there.
In general, a batter has pretty good control over what type of batted ball he puts into play.  The rates at which a batter hits grounders, flyballs,  line drives, or popups has pretty good reliability, so changes in them are generally not random in nature, but a change in either talent level or approach.  What happens to those batted balls is another matter.  More on that in a minute.  This year, Johnson’s LD/GB/FB profile went from 18.8%/42.7%/38.5% last year to something around 23%/38%/39%.  His flyballs are staying steady, but he’s turning some of his ground balls into line drives.  That’s good, because a line drive (which doesn’t leave the yard)  has about a 73% chance of going for a base hit, while a grounder has a 24% chance.  Line drives are good.
The fine folks over at FanGraphs are fond of using xBABIP for hitters.  Given a batter’s batted ball profile, we can get some sort of idea of what we might expect his BABIP to be (hence xBABIP).  The formula that FanGraphs uses is .15 * FB% + .24 * GB% + .73 * LD%.  Last year, Kelly Johnson’s xBABIP was around .290.  His actual BABIP was .330.  Johnson did 40 points better than expected given his batted ball profile.
The next question is whether that ability to “outhit” the expectation is something that is luck or skill.  As is my custom, I took four years worth of data (2004-2007) and calculated the xBABIP and the actual BABIP for all players, and found the difference between the two (whether they over- or under-performed).  It’s possible that some players just hit line drives or ground balls that are harder to catch than others.  If that’s the case, then we should see consistency over those four years in which players over-perform and which ones under-perform.  To test this, I used my favorite devide, the intra-class correlation (shot!).  The result was an ICC of .27 or .28, depending on how much I restricted the sample by the minimum number of PA required.
That means that there is a little bit of skill involved in over- or under-performing one’s xBABIP, although there’s a good deal more luck in there than one might expect.  Looking at it from an R-squared perspective, it’s more than 90% luck (or more properly, unexplained).  It’s not quite the level of non-correlation found in BABIP for pitchers, but it’s closer to that area than to the “three true outcome” neighborhood.  Perhaps it’s time for DIBS.
Going back to Johnson, it means that it’s likely that most of Johnson’s over-performance in the BABIP area was due to chance.  I haven’t run the numbers, but I’m guessing that expected BABIP is going to be a better predictor of future results than is actual BABIP.   Now, in 2007, Johnson’s expected BABIP was .290.  This year, it’s around .315 (more line drives!), which is what his actual performance has been.  All performance is talent plus luck.  So in reality, Johnson’s numbers from last year, which were fueled mostly by that high BABIP was mostly a matter of luck.  This year, he hasn’t had good or bad luck, but the underlying talent seems to have improved.  Atlanta’s management might be confusing luck with skill.
The one concerning piece about Johnson’s statline is the drop in HR/FB.  HR/FB is a statistic that is mostly in the batter’s control, and his drop from 10.3% to around 7% is a little concerning.  His flyball percentage hasn’t changed much from last year… they’re just not leaving the park as much, so perhaps there’s a power outage in there somewhere. 
With that said, Johnson isn’t exactly a world-beater.  Now that his luck has stablized, we’re getting a pretty good idea of what he’s really capable of.  According to VORP he’s in the bottom half of “regular” second basemen in all of baseball among such luminaries of Joe Inglett, Clint Barmes, and Mark Grooz, Grudsil, oh, you know who I’m talking about.  He strikes out way to much for a guy who doesn’t put up massive HR numbers.  My OPA! fielding system has him rated as a boring old average second baseman in the field.  So, while I can’t fault the Braves if they think they have a better option, I’d caution them to be a little more careful in how they make that decision.  Kelly Johnson is a symptom of a much bigger problem of the need to understand the separation between talent and performance.  He’s actually gotten better this year, despite what it looks like.

The foul ball, part one: What does it tell us about a batter?

No one likes foul balls.  They don’t accomplish anything, and the two strike variety in particular actually does nothing at all to move the game along.  In fact, it used to be that the foul ball was a non-pitch, no matter how many strikes were on the batter.   Really, the only good that a foul ball does is give some kid a souvenir that he’ll treasure forever.  (Admit it, if you’ve caught one or even gotten close to one, you can tell me the date, opponent, score, who hit it.  Even if you’re 40, it was a meaningless game, and Steve Lombardozzi hit it.)
But what of the foul ball?  Everyone hits them.  Some hit more than others.  But can they actually tell us anything about a batter?  Surprisingly, yes.  So, as we begin our look into the foul ball, let’s create a few metrics.  First off, Retrosheet has data on the fact that a foul ball was hit, although doesn’t tell us exactly how foul the ball was.  For example, was it just poked to the first base coach, tipped at the plate, or a monster shot down the left field line that just… hooked… foul?  That limitation aside, we can still create some simple metrics.

  • Foul balls per plate appearance
  • Percentage of total pitches fouled off
  • Percentage of pitches with which the hitter made contact that went foul (foul contact)
  • Overall swing percentage and overall contact rate

Additionally, there are two “types” of foul balls.  There are the foul balls committed when there are 0 or 1 strikes (which count as a strike) and those that come with 2 strikes (which don’t).  We know that with two strikes, a batter will often go into “protect” mode and swing at borderline pitches, figuring that if he swings and fouls them off, it’s not the end of the world.  So, we will split these two types of foul balls apart, and create two metrics.  One is for 0-1 strike foul balls per plate appearance.  The other is for 2 strike foul balls per plate appearance in which the batter actually had two strikes on him.
First off, let’s see if fouling pitches off is a repeatable skill.  For example, we know that some players are pretty consistent home run hitters, but are there foul ball hitters?  I subjected all of the above new metrics to an intra-class correlation (a measure of how consistent players are across years… think of it as a year-to-year correlation but with the ability to incorporate multiple years of data), using four years worth of Retrosheet data (2004-2007).  Results were pretty encouraging.  With a minimum of 250 total PA for the season in question, foul balls per PA checked in with the lowest intra-class correlation of .574.  All of the other stats reached into the mid- .60 range or better.
Now, while that’s nice to know that players are generally consistent in how often they generate foul balls, do those foul balls actually tell us anything useful.  I looked at a bunch of batting statistics for some answers.  I looked at usual “slash” stats (AVG/OBP/SLG), along with the batter’s batted ball profile, walk rate, strikeout rate, single rate, double-and-triple rate, and HR rate.  I ran a gigantic correlation matrix to see what turned up.  The first thing to note is that just about everything was statistically significantly correlated with one another.  I took all players from 2000-2007 with a minimum of 250 PA and ended up with a sample of 2400+ player-seasons.  At that kind of sample size, it’s all significant, so our analysis will deal more in the strength of the correlation.
What’s interesting is that 0 and 1 strike foul balls per PA had a correlation with two strike foul balls in two strike PA’s of .106, which is rather low.  This says that they are two relatively independent “skills.”  Knowing about a player’s general foul ball count isn’t enough.  You have to differentiate between the two.  There’s other evidence that we are dealing with two different skills with two different types of etiology.  Hiding in the correlations between the swinging metrics that I created, there was an interesting pattern to be found.  Foul contact percentage  was correlated with 0 and 1 strike foul ball rate at .487.  The correlation with two strike fouls was a mere .150.  Looks like 0 and 1 strike foul balls are more the result of a player who can’t straighten out his swing.  Then, there’s the issue of overall contact percentage.  The correlation between that and two strike fouls is .524 while the correlation with 0 and 1 strike foul balls is -.366 (note that’s a negative).  So, a player who makes a lot of contact is likely to have a lot of two strike pitches that he spoils, but fewer foul balls for strike one and strike two.
Do foul balls correlate with any of the actual outcome stats?  Well, the usual slash stats didn’t correlate well with any of these new metrics.  But, some specific outcomes show some rather intriguing patterns.  A batter who hits a lot of two-strike foul balls is less likely to strike out (r = -.482) and less likely to walk (r = -.345).  Makes sense, since he is more likely to extend his at-bats until (assuming he actually doesn’t end up walking or striking out) he puts the ball in play.  And put the ball in play he usually does.  Two strike foul balls are moderately associated with an upswing in singles rate (r = .347), but a downturn in HR rate (r = -.215) and HR/FB (r = -.300).  This pattern becomes even more pronounced when one looks at overall contact percentage (which we’ve already seen is a pretty good correlate of two-strike foul ball hitting).  The correlation with strike outs hits -.875, which makes sense because you can’t strike out if you hit the ball, foul tip into the catcher’s glove notwithstanding.  Overall contact is correlated with more singles (r = .549) and fewer HR (r = -.521).
What about zero and one strike foul balls?  The correlations with the outcome measures aren’t very strong.  However, foul contact percentage predicts the opposite pattern of overall contact.  Strikeouts go up (r = .669), singles go down (r = -.454), and homeruns go up (r = .410). 
What’s funny is that if you just look at foul balls per PA, the correlations are not really that interesting.  Most of them are below .20, which isn’t much of anything.  A lot of the effects seem to wash out when you look at all foul balls together.  You really have to break them down into their component parts before you can fully understand what’s going on.  Foul balls early in the count speak of a player who doesn’t make a lot contact, when he does make contact he’s not likely to hit it fair, who strikes out a lot, but when he hits the ball, it’s more likely to go out of the ballpark.  There was one other thing that jumped out.  Foul contact percentage was (moderately) correlated with a lower ground ball percentage (r = -.318) and a higher fly ball percentage (r = .297).  So, we have guys who appear to be trying for fly balls, and fly balls that will leave the park at that.  That’s a higher risk swing, and more likely to go awry, either by swinging and missing or swinging and having the ball go foul.  Two strike foul balls speak of a hitter who makes good contact, keeps at bats alive, but is generally just a singles hitter.  Low risk, low reward.
So if you want know what’s going on with your favorite player, the one who seems to be acting a little weird lately and all you have is a box score, take a look at his foul balls.  They might provide you with a useful little diagnostic of whether he’s feeling a little risky or if he’s playing it safe lately.  I suppose there could be the case where a hitter is high on both types of foul balls (or low on both), and the effects would seem to cancel each other out.  (Remember, total fouls per PA aren’t really correlated well with anything.)  But, if you see a lot of one type and not a lot of another, you can perhaps come to some conclusions about what’s going on in the batter’s head.

Who gets the credit/blame for that home run?

Do hitters hit home runs, or do pitchers give them up?  Of course, the answer to that question runs both ways, but who is more to blame/credit?  Pitchers occasionally throw such beautifully tantalizing hanging curveballs that even Rafael Belliard hit the occasional home run and some hitters are so strong that they can punish the even the best-placed pitch.  But it brings up an interesting question.  Who is more in control of how far the batter hits the ball?  After all, a home run is simply a fly ball that went a long way and crossed over a fence.
Here’s how I (sorta) answered the question:  From 1993-1998, Retrosheet’s data files contain pretty good data on hit locations, primarily because those years were compiled by Project Scoresheet and licensed to Retrosheet.  Recent Retrosheet files are much more scant in this data.   The way that Project Scoresheet made notations on the data was through the use of a standardized rough map of zones on a ball diamond.  It’s rather rough-grained, but it takes us from being able to say that Jones flew out to center field to saying that Jones flew to shallow (or deep) center.  Once we know where a fly ball went (and I selected out all balls from 1993-1998 which Retrosheet said were either pop ups or fly balls), in terms of what zone, we can get a decent appoximation of how far away that is from home plate. 
I assumed that all balls attributed to a zone were hit to the exact center of that zone.  Of course, that’s not true, but it’s close enough for government work (some were hit a little beyond, some a little in front… it evens out).  Since the Project Scoresheet grid is meant to scale to the outfield dimensions of a park, we need to know the outfield dimensions of the park in use.  (The infield dimensions of all parks are set by the official rule book).  If one knows a little bit about trigonometry, it’s easy enough to get a decent guess of where the was hit to, if it was on the field of play.  For home runs, I gave the hitter 105% of the wall measurement over which it crossed.  (So, a HR hit to a 360 foot power alley was estimated at 378 feet.)  105% was nothing more than my guess.
I totalled up the mean estimated distance for all fly balls and pop ups hit in a season by each batter, and then turned around and sorted it by pitcher.  I selected out only those with 25 fly balls in the season in question that they either hit or had hit off of them.  I subjected them to an AR(1) intra-class correlation to look at the year-to-year correlations over the six years in the data set to see if the mean distance was more consistent for pitchers or for hitters.
ICC for pitchers = .312
ICC for batters = .612
Batters are fairly consistent from year-to-year in how far their average fly ball travels.  Pitchers are less so, but still have some level of consistency from year to year.  It seems that both share some blame/credit for the distance on a flyball.  This might explain why batters seasonal rates of HR/FB were more stable than pitcher rates.  For those unfamiliar with this methodology, you can interpret those numbers in much the same way as a year-to-year correlation coefficients (although this method is better, as it allows for multiple data points.)  There are some batters who are powerful (i.e., they hit the ball a long way) and some who are not, and that power level is pretty consistent from year to year.  Pitchers who give up fly balls (and all of them, save Fausto Carmona, occasionally give up a fly ball) do have some (not a lot, but it’s there) repeatable skill in whether they tend to give up short fly balls or long fly balls.  For those GMs nervous about signing that fly ball pitcher because he might give up a bunch of home runs, you can check his average fly ball distance (and perhaps his standard deviation), perhaps look at it by field, and plug in a few numbers to at least give you a little better projection for how many HR he might give up next year, although the error of prediction is still likely to be rather high.
Let’s play around with this a bit more from the batter’s perspective.  I looked at the average distances for balls hit to the batter’s pull field, opposite field, and center field.  I upped the inclusion criteria to 50 FB in the season in question.  Again, I looked at ICC over the six seasons in the data set.  (Anything in the grid with an “8” in it was “center field”, so that includes the power alleys.) 
ICC for pull field = .239
ICC for center field = .591
ICC for opposite field = .359
Batters are much more consistent in how far they hit the ball to center field (and the power alleys), and are actually more consistent in how they hit the ball to the opposite field than to their pull field.  So if you want to get a good idea of how a player will hit for power, take a look at what he does gap to gap.  That’s going to be the most consistent measure.

The Name Game

Growing up in Philadelphia, and raised in an extreme sports environment, Jayson Stark has always been an idol of mine. In fact it was reading his Philadelphia Inquirer column every week that eventually propelled me into sabermetrics. His columns always combined humor and statistics in order to show all of the hilarious or newsworthy baseball happenings that could not be seen on an ESPN show. Not shocking in the least, ESPN eventually brought him onboard. That being said, I thought I would do my sports-writing idol proud by writing an article in a style similar to his.
The idea for this came to me when the Phillies signed Chad Durbin to be their: (circle the correct answer)

  • A) 5th Starter
  • B) 6th Starter
  • C) Mop-Up Reliever
  • D) Waste of Space
  • E) Who cares, we have Adam Eaton!?

Regardless of the answer you selected, this now gave the Phillies Chad Durbin and J.D. Durbin – two completely unrelated Durbins. Now, it isn’t as if we’re talking about two guys with the last name of Smith. I never knew “Durbin” was a last name until a couple of years ago and now there are not only two in major league baseball but two on the same team?
More interestingly enough, there have only been four Durbin’s in the history of major league baseball and the other two ended their careers during, or before, 1909. The only two Durbin’s in the last 98 seasons of major league baseball are now on the same team – and have no relation to one another.
The Phillies acquired J.D. Durbin after the Diamondbacks placed him on waivers in April. Durbin had appeared in one game for Arizona and surrendered 7 hits and 7 runs in 2/3 of an inning. For the Phillies, Durbin was somewhat serviceable, even throwing a complete game shutout against the Padres.
J.D. Durbin made his Phillies debut on June 29th during the first game of a double-header against the Mets.
At the time of acquiring J.D. Durbin, the Phillies had a minor league prospect with the name J.A. Happ. Due to rotation injuries, Happ made his first major league start on June 30th, against the Mets.
Now that would be odd enough, on its own, however the Phillies also acquired J.C. Romero from the Red Sox. Romero also made his Phillies debut on June 29th, during the second game of Durbin’s double-header.
So, to recap, not only did the Phillies have three pitchers with the first names of J.A., J.C., and J.D., but all three of them made their Phillies debuts within the span of 48 hours from June 29th-June 30th!
And, speaking of the Phillies, they acquired Tad Iguchi from the White Sox towards the end of the season. Since he would not have been able to play for the Phillies until May 15th, if he re-signed with them, he went elsewhere (Padres). The Phillies, in need of another bench player, decided to sign So Taguchi. I guess this way the transition will be easier for the players.
Or how about the Twins deciding to replace Luis Castillo with Alexi Casilla.

  • Believe it or not, the American League had an Ellis, an Ellison, and an Ellsbury.  And no, they were not Dale, Pervis, or Doughboy.
  • The Athletics had Dan Haren and Rich Harden.
  • The American League also had a Joakim, a Joaquin, and a Johan.  That’s never happened before with different players.
  • Lastly, there was the Rays’ Delmon Young and the Dodgers’ Delwyn Young, who sadly never got to face each other.

Speaking of “Young’s,” the NL West not only had two of them, but two Chris Young’s.  They could not be more different, either, as one is a 9-ft tall, white, former ivy-league pitcher and the other is a 6-ft, black, college-less outfielder.  Pitcher Chris Young (PCY for those keeping track) won the 2007 battle as his younger counterpart went 0-10, with a walk and 4 K’s against him.

  •  Orlando Hudson went 2-11, with an RBI and 4 BB, against his “River” counterpart Tim Hudson.
  • Unfortunately, Reggie Abercrombie never got to face Jesse Litsch.  I wonder what Sportscenter would call that matchup.  Reggie and Jesse?  Reggie and Litsch?  Abercrombie and Jesse?  Ugh, who knows…
  • Aaron Rowand and Robinson Cano didn’t face each other this past year either.
  • Somehow, the Blue Jays and Rockies have played nine times and we are still waiting on a Halladay/Holliday matchup.
  • Scott Baker didn’t pitch against, or to, Paul Bako in 2007, though my fingers are crossed for 2008.

Mike Lamb is 3-9 in his career against Adam Eaton (who isn’t?) as well as 1-7 off of Todd Coffey.
Coffey and Lamb usually don’t go well together, though, but Felix Pie is also 0-1 off of the caffeinated one.
Eaton has never gotten to face Pie yet.  I’d like to put a pie in Eaton’s face.  3 yrs and 24 mil worth of pies!
In what would probably cause the universe to crumble, I am patiently awaiting a Rick VandenHurk vs. Todd Van Benschoten matchup.  I’m feeling 2008 or 2009.
In the long-name department, Jarrod Saltalamacchia went 1-2 against Andy Sonnanstine.  Salty also went 0-2 against Mark Hendrickson.  He went 1-1 against Ryan Rowland-Smit, but Ryan had two last names to reach eleven letters and therefore had an unfair advantage.
Easily the most hypocritical name award goes to Angel Pagan.  You can figure that one out.  Did you know, though, that the National League had “Two Wise Men”?  That’s right – Matt and Dewayne.
Though Matt Wise surrendered a hit to Angel Pagan, he struck out Dewayne Wise, proving what we already knew – Matt Wise is the smartest pitcher ever.
On a sad note,  2007 proved to be a disappointment in the generic name field (not Nate Field or Josh Fields).  Combined, there were only four Smith’s.  Jason, Joe, Matt, and Seth.
Even sadder, we only had three Williams’ – Dave, Jerome, and Woody.  Scott Williamson tried his hardest but that does not count.  Could be a cool sitcom title – Three Williams and a Williamson.
Major League Baseball spanned the endpoints of the life cycle this year.  On one side we had Alan Embree (embryo) and Omar Infante (infant) and on the other there were Jermaine Dye (die) and Manny Corpas (corpse).
Dye has never faced Corpas but is 2-7 in his career off of Embree.  Infante has also never faced Corpas but has doubled in 4 at-bats against Embree.
Jorge de la Rosa and Eulogio de la Cruz did not face each other this year despite being the only two “of-the” names.  And, just to clarify the none of you who asked, Valerio de los Santos would not qualify for this category since de los would technically be “of-them” or “of-those.”
Miguel Cairo has long been the MVP of this group but he welcomed two additions this year in the forms of Ben Francisco and Frank Francisco.  I had always thought of Francisco as a Spanish first name but was very surprised to find it as an American last name.  In fact, if you say Ben Francisco really quickly and in front of a drunk, it could even sound like San Francisco.
I recently got an original NES and could not help but notice that two major leaguers sound like items from a Zelda game.  Don’t both of these sentences make sense?

  1. Link, to defeat Ganon, you must hit him in the lower Velandia.
  2. Use your Verlander to blow up the stones blocking the entrance.

One of my favorite movies is Sinbad’s Houseguest, and whenever I hear the name of Giants’ 2B Kevin Frandsen I am reminded of Sinbad’s character Kevin Franklin.  Something tells me Frandsen never impersonated a dentist.
In addition to everyone else we had six players with job names.  Chris Carpenter and Lee Gardner maintained the stadiums and fields, Scott Proctor made sure they didn’t cheat, Skip Schumaker supplied them all with cleats, while Matt Treanor helped rehab Torii Hunter.
Schumaker did not face Carpenter, Gardner, or Proctor.  Treanor is 1-3 off of Carpenter in his career.  Hunter was 3-6 with a HR and 2 RBI off of Carpenter (career), as well as 2-6 with an RBI off of Proctor.
Clearly, a Hunter is more valuable than a Proctor and a Carpenter.
Point blank – the following names sound incredibly made up and fake:

  • Frank Francisco
  • Dave Davidson
  • Emilio Bonifacio
  • Rocky Cherry

When primitive men first began to speak it was easiest to combine two words together without any intermediates.  Thousands of years later we still have names like Grady Sizemore, Jarrod Washburn, Mark Bellhorn, and Chris Bootcheck.
Speaking of Chris Bootcheck, I wonder what he and Jon Knotts would talk about.
In the anatomy field, Rick Ankiel and Brandon Backe were in the same division, with Ankiel going 0-3 with an RBI off Backe.

  • DIRTY NAME AWARD – Rich (Dick) Harden
  • ACADEMY AWARD – Sean Henn
  • LED ZEPPELIN AWARD – Scott Kazmir
  • FUTURE PIZZA SHOP NAME AWARD – Doug Mirabelli (hon. mention – Mike Piazza)
  • FICTIONAL SERIAL KILLER AWARD – Mike Myers (as usual)
  • NAME TYPO AWARD – Jhonny Peralta
  • MOST FUN TO SAY AWARD – Jonathan Albaladejo
  • IMPERVIOUS AWARD – (tie) James Shields and Scot Shields

And there you have it.  We covered the life cycle, the entertainment (regular and adult) industry, jobs, cities, the bible, and more.
We can only hope that 2008 will finally bring us a VandenHurk/Van Benschoten or a Holliday/Halladay.
Keep your fingers crossed.

A second look at Pythagorean win estimators

Over the past few days on the SABR statistical analysis list-serv, there’s been a bit of chatter about the Pythagorean win estimator.  My guess is that most of the folks reading this post are familiar with the formula, but for the benefit of those who may not be, the formula was created by Bill James, in an attempt to model how many games a team “should” have won, based on how many runs they scored and how many they allowed.  The original formula read: Winning % = RS^2 / (RS^2 + RA^2).  It’s eerie resemblance to the Pythagorean theorem in geometry (the one you hated in high school) gave it a name.  Several different modifications have been suggested in the intervening years, including changing the exponent to 1.82 (some say 1.81), and two “dynamic exponent” formulas (one by Clay Davenport, the other by David Smyth) which have a formula to calculate the proper exponent, which is then substituted in on a case-by-case basis.

Before coming on board here at MVN, I had meditated briefly on these formulae and their merits relative to each other, with the Smyth formula coming out the winner, if only by a tiny margin.  In evaluating any estimator, there are two important questions to answer: how closely does it predict the observed values (in this case, the team’s actual winning percentages) and are the mistakes (in statistics-speak, residuals) in some way biased.  In my original post, I found that the residuals were essentially centered around zero (very good!) and the standard deviation of the residuals for all four of the formulae was somewhere in the neighborhood of 4.3 wins.  Additionally, the residuals all showed a minimal amount of skew.

There are a few more residual diagnostics to run to check for any additional biases in the estimators.  For example, if the estimators over-estimate the winning percentages of good teams, but under-estimate the winning percentages of bad teams (or vice versa, for that matter), then there is a built-in bias to the estimator.  Along with being accurate, no matter the team quality, an estimator should work no matter how many games were played in the season, and how many runs the team scored and/or gave up.

I used the Lahman database for this one, and selected out all teams who played at least 100 games.  This gave me a database of 2370 team-seasons to work with.   I calculated the projected winning percentages for the Pythagorean, Exp 1.82, Davenport, and Smyth formulae, and then subtracted each of them from the actual winning percentage to get the residual for each. 

I calculated (well OK, my computer calculated them) correlation coefficients for the residuals of each formula and the following variables: games played, runs scored per game, runs allowed per game, wins, and actual winning percentage.  None of the formulae were correlated with games played.  There were small correlations observed between the original Pythagorean formula and runs scored per game (.106) and runs allowed (-.071).  No other such correlations were observed.  Those correlation values were significant, although are rather small in magnitude.

The biggest finding in my analyses was the fact that the residuals from the Exp 1.82 formula, Davenport, and Smyth formulae were all correlated with wins and winning percentage.  The Exp 1.82 formula, likely the most-used and reported formula, showed correlation coefficients of -.346 and -.380, respectively.   The Davenport (-.253 and -.269) and Smyth (-.256 and -.273) correlation coefficients were lower, although still notable.  The original Pythagorean formula residuals had much lower correlations of -.095 and -.101.  These findings suggest that Exp 1.82, Davenport, and Smyth all have a bias such that better teams are more likely to have their estimates in the formulas be lower than their actual winning percentage.  Poor teams are more likely to have their estimates be higher than their actual winning percentage.

If the previous sentence made your head spin, here it is in English with numbers made up on the spot for pure illustrative purposes:  Let’s say that a team won 94 games in the year in question.  The Exp 1.82, Davenport, and Smyth formulas are more likely to be wrong in the direction of saying that the team should have won fewer games (91).  A poor team that won 61 games is more likely to have their projection be much higher (perhaps 65).

So what?  Since these formulas became popular, the differences between the projections and the actual results have been taken to indicators of such things as manager ability.  (A less-than-proper use of the formula in my opinion, but it is the common application.)  If a team wins more than its projection, the manager must be doing a good job, because he’s maximizing runs at the proper time to win games.  If a team wins fewer than projected, the manager might be fired.  If the formulas are biased though, some of the credit and blame being passed along due to them may be a statistical artifact.  The bias built into the formula would make a manager from a last-place team look like he is underperforming, even as he now has to answer to the GM on having just lost 101 games.  On the other hand, the manager on the successful team is more likely to look like he is over-performing and maybe will get a nice contract extension and raise out of it.  Managers on bad teams look even worse, and managers on good teams look even better.

It looks like the Pythagorean estimators need a little bit of tinkering.  They don’t need to be thrown out.  In fact, to the contrary, they perform exceptionally well overall.  The bias I identified is going to be most noticeable at the extremes, which is a common problem in estimators of this type.  Analysts just need to be a little more careful in interpreting the results in those cases.

Remember: Even the Scarecrow didn’t get the Pythagorean theorem exactly right on the first try.