Double is the new walk

Is there a book that has been interpreted in more ways than Moneyball?  (Maybe this one.)  After its publication, people who didn’t really have a grasp of Sabermetrics or baseball analysis in general read the book (because everyone else was reading it) and didn’t get it.  I believe that everyone reading this article, if you’ve ever disclosed your Sabermetric leanings, has had that one friend who said “What’s so great about walks?  And why are you so obsessed with Kevin Youkilis?”  Philistines!

However, the book did speed up the OBP revolution that is slowly over-taking even mainstream baseball commentary.  (Personally, I look forward to the death of batting average.)  And therein was (part of) the point.  OBP is a better stat than batting average, but everyone paid attention to AVG.  What’s the difference between OBP and AVG?  Walks.  The point behind the book was that Billy Beane recognized this and found value where people weren’t looking.  But, something weird started happening.  Other teams picked up on the idea.  Sportscasters picked up on the idea.  Your friends got the idea that this was something worth considering.  Talking fluently about OBP showed that you were part of the avant garde crowd.  (Frankly, if you use words like avant garde to describe yourself, I reserve the right to smack you.)  Your friends didn’t understand why it was so important to know Smith’s OBP, but they knew that there was something cool about it and you knew all about it.  (Note: if you are a former college DJ, like me, just replace OBP with MCR.  Same basic idea.)

But like any avant garde thing, once too many people know about it, it’s not cool enough to talk about any more, and people move along looking for the next big thing that will be even edgier.  If you’re worried that you won’t look enlightened enough just from talking about the importance of walks and OBP (by now everyone’s read Moneyball, even Eric Seidman), then you need to know what the new walk is so that you can sound edgy and have people look up to you as someone whom they don’t fully understand, but must be cool.  And no, talking about OPS doesn’t make you cool any more either.

Yep, it’s the end of civilization as we know it.  I am giving instructions to people on how to be cool.

So let’s have some candidates for the “new walk.”  They should be:

  1. Easy to understand stats.  Nothing that requires too much calculating.
  2. Easily get-able online.
  3. Something that no one really talks much about, but is brilliant.  You know.  Edgy.

And the nominees are:

Doubles: Quick, who led the majors in doubles last year?  Don’t peek.  You have no idea do you.  You know who led it in HR.  In the usual stats that we look at, doubles don’t really show up very well.  AVG and OBP treat a double like a single (SLG is at least a little nicer), plus a double is often a homerun that just missed clearing the wall, or just missed going out by five feet.  In our culture, we don’t like “just missed” so the double gets de-valued.  Close only counts in horseshoes and atomic weapons.  True, a double is not a homerun, but it’s much better than a single.  The problem is that a guy who has a .280 average, but specializes in doubles is a better hitter than the guy who hits .280 and is a singles hitter.  Give a quick look to how many doubles (and triples) a player has.

Close lead protection rate: We know.  We know.  K-Rod saved five million, seven hundred and twenty-two thousand, eight hundred and ninety-three games last year.  And that’s nice.  He was not the best reliever in baseball last year.  I’d even say he wasn’t the best reliever in the American League.  K-Rod benefitted from playing on a team built for saves.  I’ll spare the whining about how the save rule has ruined baseball (so passe!)  What do we really want to know about a reliever.  When he was stuck into a tense situation, did he protect the lead?  But it really doesn’t matter if it was a save situation or not.  This is why the “hold” was created (and spent some time as the edgy way to talk about middle relievers.)  Let’s look at the number of times that a pitcher saved a lead or held a lead, and then how many times he could have, but blew the lead.  (SV + Hld) / (SV + Hld + BS).  Trust me, your friend has never thought of this and you will sound amazingly smart.

tRA: True run average.  Or what would happen if Earned Run Average got a clue.  Get to know this one.  It’s not as easy or intuitive to calculate as some of the other stats.  But it’s good… and you’ll be able to “call” guys who are pitching way above their heads (or beneath their true talent) before your friends.

Defense: There once was a time when people didn’t stop to think about defense.  Sure, there were guys who were fun to watch in the field, and there was the idea that they might be saving the team a few runs with their glove.  Now, there are several different defensive systems to choose from.  All the cool people have developed one.  Pick your favorite.  But understand that a player who is 60 runs above replacement/league average/some arbitrary line I drew in the statistical sand with his bat, but 30 below with his glove is really only a 30 run player.  Wonder why Adam Dunn took so long to sign?  He does have an outstanding bat.  But, he’s a butcher in the field and people now fully realize that.  And the market is starting to price those guys more accurately.  You don’t even have to know how the numbers are actually calculated.  Just know that they exist, and that a good offensive player may be giving back a lot of those runs in the field.

Learn a few of these, and you’re guaranteed to sound smart in front of your friends.  And maybe you’ll learn a little.

The measure of a man, Part II

In part one, I created a four-factor structure to describe a player’s offensive abilities (rather than his performance).  So… do these four factors tell us anything interesting about a player’s performance?  After all, it’s nice to have little mathematical abstractions, but what’s the practical value, you may ask.  The four factors were the Ichiro (grounders/speed) to Ryan Howard (flyballs/power) continuum, contact skills, risk-taking, and solid contact.  Do they predict anything useful?

To answer that question, I looked at it from a few different angles.  First, I calculated the factor scores for everyone who had more than 100 PA in 2008.  Then, I calculated some basic performance measures (1b%, xbh%, hr%, k%, bb%, hr/fb, obp, slg, ops).  Then, I started with some basic correlations to see what was related to what.

Ichiro-Howard (higher numbers mean more Ichiro than Howard) correlates rather well with single rate (r = .495) and hr rate (r = -.636).  Makes sense that slappy hitters hit singles and power hitters hit homeruns.  Power hitters were also more likely to hit their flyballs out of the park (r = -.485) and to have a higher SLG (r = -.507), although the effect for OBP was not as pronounced (r = -.203).

Contact skills are, no shock, correlated with not striking out (r = -.822!!!), although they didn’t correlate well with things like OBP (r = .208) or SLG (r = -.034).  It’s not that contact hitters are better or worse at getting on base, just at not striking out.  However, they are more likely to be singles hitters (r = .516). 

Risk taking wasn’t correlated with much of anything.  The biggest correlation I got was with walk rate (r = -.332, which was the highest correlation found for walk rate).  More on risk a little later.

Solid contact, however, was a pretty good measure of xbh% (r = .389, may not seem like much, but it was the best correlate by far!), but was an even better predictor of OBP (r = .514) and SLG (r = .477).

There was one other variable that I created, that unto itself was not correlated with any of the outcome measures.  I squared the values for the Ichiro-Howard contiuum.  It then becomes a measure of extremism in approach.  A player who is a strict slap and run guy (like Ichiro) who has a score of 2.07 (scale mean = 0, SD = 1), is an extreme case and so when we square his number, it will be rather big.  However, a guy like Joe Crede (-2.02) who is extreme in the other direction,  A-Rod, is actually a very well-balanced player between the two ends of the spectrum (.024), so his squared number will be rather small.

Now, the thing about baseball skills is that skills build off one another.  If you can hit the ball a mile, but can’t make contact, you might as well not have the power.  We need moderator analyses, to see whether the interaction of two skills predicts to anything interesting.  A quick math review: to test for a moderator set up a linear regression, the two variables that you think moderate one another, plus the two variables multiplied by one another.  If the interaction term is significant, you have a moderator.  Then, it’s a matter of figuring out what moderates what and how.  There are different types of moderator effects.  What a moderator means though is that one variable changes (in some way) the effect that the other has.

Contact skills proved to be a very common moderator in these analyses, particularly moderating the Ichiro-Howard continuum, although the effects weren’t very neat.  For example, players who were more to the Ichiro end of the continuum, if you raise their contact skills, it doesn’t move their walk rate very much.  But for guys on the Howard end, a jump in contact skills means a lower walk rate.  The effect is more pronounced for the Ichiro-squared numbers.  The best walk numbers are for those who are balanced in their approach (not too close to either Ichiro or Howard), but don’t make a lot of contact.  The worst are the extreme guys who make a lot of contact.

Contact skills also moderate the effects of extra base hits and HR/FB, depending on what sort of hitter you are.  If you’re a Howard, an increase in contact percentage will drive down your HR/FB rate, but will drive up your extra base hit rate.  There are some guys who are built to hit doubles on their fly balls, not HR.  Because they have high levels contact skills, they won’t strike out as much in general.  Consider this list: D. Wright, Morneau, Ibanez, McLouth, Pujols, Millar, Kinsler, McCann, Carlos Lee, and Lowell.  They are guys who are in the top 25% of the league on both the Howard end of the I-H continuum and in contact skills (top ten in plate appearances on that list of 20.)  Outside of Pujols, who is just amazing, they all have reputations as guys who are good, but not great power hitters, but who are good for 25 HR over a season… and 40 doubles.  And they all have strikeout rates at 15% or below.  Not a bad profile to have.

One other interesting effect of contact skills.  If you’re someone who is in the middle of the Ichiro-Howard continuum, as contact skills rise, you see a slight bump up in OBP, although it’s pretty high to begin with (around .330).  But, if you’re at one of the extreme ends (either a major GB hitter or a major FB hitter), if you have limited contact skills, your OBP is likely to be south of .320.  If you have good contact skills, it jumps up past .340 on average.  So, you can be a groundball hitter, you just have to be able to make contact, and be happy with a lot of singles.  However, on the flip side, SLG has a nearly opposite pattern.  Guys with high contact skills are generally not going to be huge SLG guys.  However, guys who are in the middle of the Ichiro-Howard continuum plus low contact skills (apparently trading contact for power) see a huge jump in their SLG.  Guys who are extreme (either in the GB or FB direction) don’t get that bump from sacrificing contact for power.  So, if you want to be an OBP hitter, be someone who is extreme in his approach with good contact skills.  If you want SLG, be someone who is middle of the road in his approach with bad contact skills.  If you want both, be Albert Pujols.

Then, there’s the other two variables, risk and solid contact, which seem to moderate one another on a couple of occasions as well.  Players who take fewer risks generally strike out less than those who take more.  But those who make solid contact when they swing strike out even less than that, but a couple of percentage points.  Guys who wait back and don’t swing a lot, but make good contact when they do are less likely to strike out.  It works for extra base hits too.  If you don’t have much solid contact power, you won’t have many XBH’s.  If you do, you’ll have more XBH’s if you’re a low risk-taker rather than a high-risk taker.  There’s something to be said for waiting back for your pitch.  The guys who swing all over the place are probably more fun to watch.  But the guys who are patient are the ones who will get the benefit from their skills.

So, it looks like these skills do interact with one another to predict some useful player typologies.  Can they combine to actually predict outcomes?  If I know a player’s skill set, how well can that be used to predict his actual performance.  I took all four factors and all six interaction terms and threw them into a stepwise regression (just to keep things clean) to see what fell out and regressed the following outcomes on those ten variables.  I’ve listed the dependent, the significant predictors in the final model (in order) and the final R-squared for the model.

  • K rate: contact, risk, Ichiro-Howard; .747
  • BB rate: risk, Ichiro-Howard, solid contact, contact, contact x risk; .209
  • HR/FB: Ichiro-Howard, contact, solid contact, risk, IH x risk, IH x contact; .497
  • HR rate: Ichiro-Howard, contact, risk, solid contact, IH x risk, IH x contact, .580
  • 1B rate: contact, Ichiro-Howard, solid contact; .565
  • XBH rate: solid contact, Ichiro-Howard, IH x contact, IH x risk; .248
  • OBP: solid contact, risk, contact skills, Ichiro-Howard, IH x contact, contact x risk; .413
  • SLG: Ichiro-Howard, solid contact, risk; .499

For a bunch of these outcomes, I’m picking up the majority of the variance with my four factors (and their interactions).  Remember, that’s R-squared, so even that lousy little .209 is really a correlation of .46.  Extra base hits and walks, I’m at a loss to really explain for now.  Seems that sometimes the ball just finds a hole in the outfield.  However, walk rate is a pretty consistent stat from year to year and in a split-half framework.  Perhaps the ability to draw walks is its own animal, not related to anything presented here?

In Part III, we’ll look at how consistent these metrics are, and how players age with respect to each of them.

BABIP Projection, Batted Ball Types, and Interaction Terms

Normal
0

false
false
false

MicrosoftInternetExplorer4

st1\:*{behavior:url(#ieooui) }

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

This is my first post
at StatSpeak, and I am excited to join the excellent StatSpeak crew.  I am an Economics Ph.D. student and a
Phillies fan, which affects my ability to analyze baseball objectively in a
positive and negative way, respectively. 
Most of my baseball research is empirical, despite the fact that my
dissertation is actually theoretical, but I will occasionally post general
economic analysis of baseball decision making. 
While this post will be a continuation of some of my older research, I
will summarize some of my previous results and hyperlink a few things as
needed. 

 

Voros McCracken introduced the concept of Batting Average on
Balls in Play (BABIP) nearly a decade
ago
, when he suggested that pitchers may not vary in their ability to
control it.  There are clear year-to-year
correlations with respect to a pitcher’s ability to control homeruns, walks,
and strikeouts, but the correlations were smaller or non-existent for
BABIP.  A slew of research followed, in
an attempt to determine exactly how much pitchers do control BABIP.  What was agreed upon within the sabermetric
community was that you can learn the vast majority of what you need to know
about a pitcher by studying his ability to affect the Three True Outcomes (HR,
BB, K).  The first thing I do when I
analyze a pitcher is to look at his Defense Independent Pitching Statistics
(DIPS).  Hitters certainly do exhibit
stronger correlations with respect to these outcomes than they do with respect
to BABIP, but BABIP skill is clearly a real thing for hitters, and a large
portion of a hitter’s value derives from their ability to control BABIP.  Last year, 70% of major league plate
appearances resulted in a ball in play. 
Trying to determine how valuable a hitter will be requires some model of
predicting their BABIP, even if not explicit.

 

A
few years ago
, Dave Studeman introduced a couple different ways to
approximate a hitter’s BABIP: Firstly, he suggested simply finding line drive
rate and adding .120.  Later, he
suggested a regression using groundball rate, line drive, and strikeout
rate.  This spawned a lot of research on
hitter’s BABIP, and this baseball off-season has seen a flurry of excellent
research on the topic.

 

Perhaps the most widely read articles have been an article
by Chris Dutton and Peter Bendix in which they introduced a regression formula
illustrating a number of strong correlates with BABIP, and an article
by Derek Carty which declared that an updated version that Dutton had done of
that formula slightly beat Tom Tango’s Marcel projections’ BABIP estimates and
outperformed the model Studeman introduced a few years ago.

 

I have written a couple of articles as well over at www.thegoodphight.com, where I have
been posting all of my research until this article.  My first
significant article
on the topic, written in January, suggested that the
way to analyze BABIP is to dissect BABIP by batted ball type, and I ran a few
regressions and tested correlations to determine GBBABIP, FBBABIP, and LDBABIP.  The data set that I have been using is rather
small–just the 224 hitters who managed 100 PA each year from 2005-2008–and I am
working on acquiring a larger data set (if you have any way to get me this,
please email swartzm@econ.upenn.edu
and let me know), but I have been able to get a significant amount of
information out of this small dataset. 
In the first article, I developed regression models for each batted ball
type, and found the following list of dependent variables for each regression:

Groundballs’ BABIP (GBBABIP):

–GBBABIP (positive)

–Infield hit rate (a more repeatable
skill within GBBABIP, positive)

–Contact rate (as defined as on
fangraphs.com, the percent of pitches that a hitter swings at which he makes
contact with, positive)

 

Flyballs’ BABIP (FBBABIP):

–Infield fly rate (negative)

 

Line drives’ BABIP (LDBABIP):

–Ln(HR/AB) (positive)

 

I also noted that GB% itself is positively correlated with
GBBABIP, and that FB% is negatively correlated with FBBABIP.  This will be related to the subject of my
post today, as I will be introducing interaction terms into my regression.

 

In my second
article
a few weeks ago, I developed a larger regression formula for BABIP
using this knowledge, and developed a prediction method for using one year of
data and another method for using three years of data to improve the existing
methods for predicting BABIP.  Using the
121 hitters who were able to get 300 PA in each year from 2005-2008, I
developed a regression model that was able to achieve a .63 correlation with
actual BABIP using only a few regressors:

 

–GB%
(line drive rate was insignificant in this regression as the other statistics
proved to be more reliable)

–Natural
Log of HR/AB

–GBBABIP

–IFFB%

–Outfield
flyball BABIP

–Natural Log of Contact rate (again, as defined by
fangraphs.com)

 

 

I also developed a model for determining expected BABIP
using one year of data.  Using the 148
hitters in my dataset who were able to get 300 PA in both 2007 and 2008, I was
able to generate an expected BABIP for 2008 from 2007 data that had a .53
correlation with actual 2008 BABIP.  The
regressors that I used included:

 

–LD%

–GB%

–Natural log of HR/AB

–IFFB%

–Outfield flyball BABIP

–Natural log of Contact% (as defined by fangraphs.com)

–Spray (as defined by Dutton and Bendix, the absolute value
of LF%-RF% for hit location)

–Dummy
variables for handedness

 

The
same model applied to 2005/2006 data was able to yield a correlation of .54,
but the model using 2006/2007 data was only able to yield a correlation of .38.  I found this surprising and I am curious what
may be causing this–whether it is noise or something else.  My personal belief is that some of this may
be defenses adjusting to the massive amount of new information that was
available from 2005-2006 and adjusting their defenses accordingly.  As league-wide BABIP was actually higher
(.303) in 2007 than 2005 (.295), 2006 (.301), or 2008 (.300), I am not sure if
this theory tells the whole story. 

The
point that I am trying to make is that the very reason that BABIP was developed
in the first place was that it was not
defense independent.  It was intended to
be segregated from the Three True Outcomes on the basis that defenses affected
it.  It is true that some of this was a
way of saying, “There is some luck involved in whether you hit the ball at
people or between them, but not as much luck involved in whether you swing and
miss,” but some of it is that hitters hit the ball in certain places, at
certain trajectories, and baseball teams budget large sums of money to
determine where those places are and put fielders there.  Then hitters train themselves to hit the ball
where the fielders are not.  In fact,
this is the reason why Dutton and Bendix’s spray variable comes up as significant
in so many regressions–hitters who spray the ball across the field are able to
avoid fielders all clustering on one side of the field to defend against them.  It also justifies the introduction of
interaction terms in the regressions, and you will see that these come out as
significant and improve BABIP prediction.

As
I was thinking about introducing interaction terms, I realized how appropriate
it was to include them.  If I am going to
use batting average by batted ball type, I should acknowledge that each of
those terms has varying levels of usefulness depending on how frequently those
batted balls are hit.

Consider
this BABIP equation (ignoring bunts):

BABIP=
GB%*GBBABIP + FB%*FBBABIP + LD%*LDBABIP

One
could also say:

BABIP=
GB%*GBBABIP + FB%*(1-IFFB%)*(OFFBBABIP) + LD%*LDBABIP

(where
OFFBBABIP is Outfield Flyball BABIP.)

And
therefore:

BABIP=GB%*GBBABIP
+ (Outfield flyball hits)/(Total Balls in Play) + LD%LDBABIP

As
line drive rate itself does not have that strong year-to-year correlation, I
did not even use it in the regression where I used multiple years of data.  So I developed a regression for hitters using
2005-2007 data to predict 2008 BABIP using the following regressors:

–GB%

–GBBABIP

–GB
HITS/TOTAL BALLS IN PLAY (GBHITP)

–IFFB%

–OF
HITS/TOTAL BALLS IN PLAY (OFHITP)

–NATURAL
LOG OF HR/AB

–NATURAL
LOG OF CONTACT RATE

This
regression had an R-squared of .4324, meaning that the actual BABIP for 2008
and the expected BABIP for 2008 had a correlation of .66, beating my previous
correlation of .63, and with a higher adjusted R-squared (to account for the
additional variables) as well.

Here
is the output for that regression:

Source

SS

df

MS

 

#Obs

121

Model

0.049269

7

0.007038

 

F(7,113)

12.3

Residual

0.064663

113

0.000572

 

Prob>F

0

Total

0.113932

120

0.000949

 

R^2

0.4324

 

 

 

 

 

Adj R^2

0.3973

 

 

 

 

 

RMSE

0.02392

tbabip08

Coef.

Std.Err.

t

P>|t|

95% CI
min

95% CI
max

gbpavg

0.933799

0.332795

2.81

0.006

0.274473

1.593125

gbbabipavg

1.476632

0.576233

2.56

0.012

0.33501

2.618254

gbhitpavg

-2.88594

1.319034

-2.19

0.031

-5.49918

-0.27269

iffbpavg

-0.36193

0.072787

-4.97

0

-0.50613

-0.21772

ofhitpavg

0.717482

0.263895

2.72

0.008

0.194658

1.240306

loghraavg

0.013685

0.004989

2.74

0.007

0.003802

0.023569

logcontact~g

0.160608

0.044852

3.58

0.001

0.071748

0.249468

_cons

-0.08084

0.146061

-0.55

0.581

-0.37021

0.208531

 

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

Interestingly,
GBHITPAVG which is GB%*GBBABIP has a negative coefficient, meaning that those
hitters who historically had high groundball rates did not have as much of a
positive effect for historically high GBBABIPs as those hitters with low
groundball rates.  Perhaps groundball hitters give defenses an opportunity
to see where to play, and historically success erodes.  Alternatively,
there may be another variable here that is causing an effect, and I’m just not
thinking of it or don’t have access to it.

What was also interesting was that OFFBBABIP (outfield flyball BABIP) was no longer significant, and I even removed it from the regression as outfield hits per total balls in play seemed more relevant.  I do not have a great hypothesis for why this is, but I did find it interesting and worth noting.  If nothing else, I guess it means that getting hits via outfield flyballs is a persistent skill, but actually having those hits which get to the outfield land for hits is not a skill.  That does make some intuitive sense to me.  This has a lot to do with the general philosophy that many have with respect to BABIP for pitchers– that sometimes the ball is hit at people and sometimes it is hit between them– applies for hitters in some sense.  It’s just a matter of getting it to the outfield, not about necessarily hitting the ball in the gaps or being able to dunk a flyball in front of an outfielder.

I
also developed a regression for 2008 using 2007 data using the following
regressors:

–LD%

–GB%

–OFFBBABIP

–OFHITS/TOTAL
BALLS IN PLAY

–NATURAL
LOG OF HOMERUN RATE

–NATURAL
LOG OF CONTACT RATE

–SPRAY

–SWITCH
HITTER DUMMY VARIABLE

 

This
had an R-squared of .3090, meaning that actual and expected BABIP had a .56 correlation
instead of a .53 correlation as in my previous model, and also had an improved
adjusted R-squared as well.

Here
is the output for that regression:

Source

SS

df

MS

 

#Obs

149

Model

0.041241

8

0.005155

 

F(8,140)

7.83

Residual

0.092228

140

0.000659

 

Prob>F

0

Total

0.133469

148

0.000902

 

R^2

0.309

 

 

 

 

 

Adj R^2

0.2695

 

 

 

 

 

RMSE

0.02567

tbabip08

Coef.

Std.
Err.

t

P>|t|

95% CI
min

95% CI
max

ldp07

0.551839

0.12061

4.58

0

0.313386

0.790291

gbp07

0.436318

0.10191

4.28

0

0.234837

0.6378

loghra07

0.009712

0.004291

2.26

0.025

0.001229

0.018195

offbbabip07

-0.64351

0.263342

-2.44

0.016

-1.16415

-0.12287

ofhitp07

2.222809

0.716896

3.1

0.002

0.805467

3.64015

logcontact07

0.045724

0.038554

1.19

0.238

-0.0305

0.121947

spray07

-0.06485

0.031401

-2.07

0.041

-0.12693

-0.00277

shb

0.009259

0.005983

1.55

0.124

-0.00257

0.021087

_cons

0.041088

0.062641

0.66

0.513

-0.08276

0.164934

 

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

Here,
outfield flyball BABIP came up as negative and outfield hit percentage came up
positive.  This is surprising, given the
insignificance of the term for the regression using more data, but perhaps it
does indicate some of the same effect–being able to get the ball to the
outfield in the air is a skill, but those hitters who got it to fall in more
were just lucky.

So
who are the hitters we would expect to have the best BABIPs in 2009?

Normal
0

false
false
false

MicrosoftInternetExplorer4

st1\:*{behavior:url(#ieooui) }

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:”Table Normal”;
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:””;
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:”Times New Roman”;
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

1–Chipper
Jones: .356

2–Joe
Mauer: .356

3–Derek
Jeter: .342

4–Magglio
Ordonez: .341

5–Derrek
Lee: .335

6–Dmitri
Young: .335

7–Gary
Matthews Jr.: .334

8–Orlando Hudson:
.333

9–Jorge
Posada: .333

10–Brian
Roberts: .332

 

Also,
the hitters who the model expects to have the biggest change in BABIP from 2008
to 2009:

Name                           BABIP’08        E(BABIP’09)

Corey
Patterson:           .215                 .278

Gary
Matthews             .292                 .334

Gary
Sheffield               .237                 .282

Jason
Michaels              .258                 .300

Jose
Vidro                    .243                 .312

Luis
Castillo                  .267                 .326

Robinson
Cano             .283                 .329

 

And
due for a drop?

 

Name                           BABIP’08        e(BABIP’09)

Dioner
Navarro             .318                 .264

Manny
Ramirez             .370                 .319

Miguel
Olivo                 .310                 .257

Milton
Bradley              .388                 .316

Nick
Punto                   .335                 .290

Reed
Johnson               .360                 .308

Ryan
Doumit                 .333                 .282

 

It’s
pretty clear that it helps to consider hitters’ propensities to hit different
types of batted balls more often when considering BABIP by batted ball
type.  I believe that this article is a
movement in the right direction.  I am
not done with hitters’ BABIP, and I am eager to hear criticisms and suggestions
for future research.  I think that this
research is important and can help us improve BABIP projection and projection
for hitters in general.  Please feel free
to leave questions comments or contact me at swartzm@econ.upenn.edu as well.

Bringing home the BACON

Let’s talk about contact.

I’m sure you’re familiar with the concept of BABIP. Now, instead of looking at balls in play, let’s look at contact. (The difference is that we include home runs – excluding HR may make sense for pitchers, but in some contexts, not for hitters.) To that end, let’s introduce two “new” stats, or at least two stats that have probably been invented before that I’m giving names to:

BACON – Batting Average on CONtact. Yes, it’s pronounced “bacon.” This was unintentional, but freaking awesome.

SLGCON – SLuGging on CONtact. Sadly not a breakfast food.

I’ve adopted the terms batting average and slugging, but am not actually using them the way they’re traditionally defined. In this case, I’ve defined a hit as “reaching base safely on anything other than a fielder’s choice,” thus including reaching on an error. For slugging percentage, I’m using the furthest base attained to by the batter on a hit (as defined above). So if a runner hits a single and reaches second on an error, he gets credited two bases. (If he gets thrown out at second, he gets an out, not a hit.)

So what are we going to use these things for? Are we going to try to create the latest, greatest DIPS? Are we trying to project future hitter performance?

Nah. Let’s talk ‘roids.

I’m sure you know the conventional narrative on steroids:

  1. Baseball was a wholesome, pure sport, handed down to us from Valhalla (or Cooperstown, I guess) – perfect in every way, shape, and form.
  2. OMGZ ROIDS ROIDS ROIDS.
  3. Big Damn Sluggers use the home run to treat your pure, untouched childhood memories like the Vikings treated seaside English villages.

Act IV of this little shadow box play is apparently “turn baseball into competative cycling,” which isn’t my concept of a good idea. But I digress. Here’s an intersting question: is it true? Let’s take a look at the change in SLGCON from 1988 to 1999:

slgcon_by_year.png

SLGCON is very stable before and after 1993, but it undergoes a massive jump at that point in time. This is essentially the great story behind the boost in HR rates during the so-called “steroid era.” Let’s zoom the graph out a bit (instead using HR/PA as our unit of measure):

hr_pa_graph.png

Check out the two regression lines there. We have essentially very stable HR/PA numbers both before and after 1993, but a massive jump at 1993. This is not an original observation, and seems to hold true no matter what measure of power hitting you care to use.

And now we come round to Dan Rosenheck’s recent NYT article.

Where numbers can be somewhat more useful is in alerting us to the fact that something unusual — steroids or otherwise — was going on around the turn of the last decade.

Assuming that pitchers and hitters used performance-enhancing drugs equally, there is no reason to expect that overall run-scoring levels would change.

But what would go up because of steroid use is the degree to which players’ on-field results are separated from each other. The juicers’ production should, on the whole, exceed the league average, while the clean players should lag.

This is exactly what happened between 1993 and 2004. Using the standard deviation, a common measure of how tightly a set of numbers is bunched together, performances by both hitters and pitchers were more spread out during that time than in any 12-year period since World War II. Although some of the difference was caused by adding new teams, expansion was much more rapid in the 1960s than the 1990s, and standard deviations were still lower back then.

Dan is correct that something unusual was – and is – going on. I don’t think it’s steroids, though. Look at the changes in the graph – the change in run-scoring environments was an event, not a process. It strains credibility to try to explain the data with steroids – you’d have to have all of your juicers start getting help from their cousins at roughly the same time, sometime during the 1993 season, to explain the events on the ground.

What about the standard deviations? Can we explain that as something else as the growing gap between the clean and dirty players? I think so – and I think we have to. First, I don’t think that the evidence sustains the idea that (as a whole) clean players were significantly worse than dirty players. The positive tests for steroids since 2003 have largely been scrubs, not stars.

So what could explain the increase in variance along with the shift in slugging levels? I suspect that the increase in slugging levels is to blame. This has the benefit of being the obvious answer, but I suppose I should try and explain.

For whatever reason, it became a lot easier to hit baseballs hard and far right around the 1993 season. (The most likely culprit is actually the baseballs themselves, so from here on out we’ll simply refer to the “livelier ball.”) This livlier baseball was of greatest benefit to people who could hit the ball in the air to begin with. Let’s look at our on-contact numbers now, along with breakdowns by ground balls, air balls (fly balls, line drives and popups) and fly balls:

&nbsp

The predominant benefit of the livlier ball was to fly balls – an increase in on-contact slugging of .180 points! Ground balls saw a much more modest boost in on-contact slugging, only .018. That’s, well, 10 times the difference.

And that’s what I suspect is driving the increasing standard deviations; there was a league-wide change in offense that seems to favor power hitters, who were already the top talent in the league.

The other interesting thing to note is that the ground ball rate increased, and the fly ball rate decreased, when we moved into the modern era. Why is that? I suspect (but haven’t tested yet) that it’s because as the fly ball became a dramatically more dangerous thing, teams and pitchers came to place greater emphasis on getting ground balls.

But the evidence for a massive, steroid-fueled change in baseball offense simply isn’t there. Rosenheck notes:

None of this means that steroids are necessarily the cause of the separation. But the game’s fans are probably in no mood to write off the association as mere coincidence.

But that’s almost certainly what it is – a coincidence.

Different Factors For Different Folks Part II

A little while ago, in Part I of this series, I looked at how there appeared to be different homerun park factors at work for high and low percentage HR hitters, when comparing their records in Japan and in the States.

Since then, I’ve had a chance to update my park factors with the release of RetroSheet’s 2008 events files. As part of that, I rounded each park’s homerun factor to the nearest 0.05, or 0.1 if greater than 1.35 or less than 0.70. I also coded the all the batters from 1953 to 2008 for their career percentage of homeruns per batted balls.

AA .080+ A .060 – .080 B .045 – .060 C .035 – .045 D .020 – .035 E .010 – .020 F .000 – .010

The below chart cross references these ratings of batters and MLB ballparks, and shows the observed HR factor for each combination. The colors indicate the sample size, with dark green above 50,000 batted balls; light green 30,000-50,000, yellow 15,000-30,000 and orange below 15,000.

C is centered on the current mean rate of .040. As you can see in the chart the C batters HR factor (road rate divided by home rate) was just about the same as the factor for all batters. D, E and F had ratios increasingly further from 1 (effected more by the park) while B, A and AA batters had ratios increasingly closer to 1 (effected less).

This larger scale study agrees with my earlier study of Japanese batting stats, where it is generally acknowledged that the JPB parks as a group are a much easier HR hitting environment than MLB. By how much? In Part I, I created five groups of homerun hitters. The highest group had a JPB/MLB factor of 1.18, while the lowest had a factor of 2.27. On the chart, this corresponds to the Japanses parks as a group having a HR factor of 1.40-1.50 compared to MLB parks.

The main thing I wanted to get out of this study was a more precise way of measuring how each ballpark changed the homerun rates. Unfortunately, I haven’t wrapped my head around those numbers yet, which is part of the reason this article has remained a draft for several weeks. What we know going in is how many homeruns were hit in each ballpark, and who hit those homeruns. Some players hit more homeruns than others, but how much of that is due to their own talent at hitting a baseball a long way, and how much of it was the dimensions of the ballpark they played in? I have verified that players who hit a lot of homeruns are much less effected by their ballparks than players who hit few, but I have to avoid the circular logic of having to know what a hitter’s HR% is in order to calculate his HR%. Perhaps something along the line of calculating a player’s personal home/road factor, and then comparing that to the factors of the parks he played in.

While I’ve been pondering this, Greg Rybarczyk of Hit Tracker posted an article at Baseball Analysts offering a new approach using detailed batted ball data. Going forward, this is an approach I favor – look at the trajectory (distance, direction and speed off bat) and type (grounder, flyball) for each ball hit in each ballpark. Each classification of batted ball will have it’s own set of probable outcomes in each ballpark. Put a batter in a different set of home and road parks, and calculate how much the expected outcome changes based on those details of how each ball was actually hit. However, when looking back at past seasons, we still need to fine tune the normalization of batting stats with the data that’s available.

HR Factors by overall factor of ballpark vs career HR% of batter

Factor

AA

A

B

C

D

E

F

0.30

0.52

0.58

0.40

0.31

0.37

0.18

0.36

0.40

0.60

0.56

0.50

0.47

0.47

0.45

0.34

0.50

0.69

0.59

0.58

0.59

0.53

0.52

0.34

0.60

1.20

0.69

0.69

0.54

0.66

0.55

0.51

0.65

0.79

0.77

0.67

0.66

0.64

0.72

0.75

0.70

0.92

0.79

0.75

0.69

0.68

0.68

0.71

0.75

0.75

0.83

0.77

0.75

0.76

0.72

0.76

0.80

0.80

0.86

0.83

0.85

0.80

0.77

0.75

0.85

0.96

0.93

0.89

0.86

0.83

0.93

0.79

0.90

0.98

0.91

0.92

0.96

0.92

0.92

0.81

0.95

1.00

1.00

0.98

0.95

0.96

0.96

1.00

1.00

0.97

0.97

1.03

1.04

1.07

1.04

0.95

1.05

1.05

1.12

1.05

1.05

1.10

1.10

1.07

1.10

1.01

1.07

1.11

1.14

1.15

1.18

1.36

1.15

1.11

1.11

1.20

1.16

1.20

1.23

1.46

1.20

1.12

1.16

1.12

1.33

1.29

1.29

1.61

1.25

1.23

1.08

1.19

1.32

1.34

1.44

1.63

1.30

1.17

1.35

1.27

1.34

1.35

1.46

2.21

1.40

1.15

1.23

1.43

1.36

1.59

1.86

1.21

1.50

1.32

1.12

1.43

1.51

1.80

2.14

2.27

1.60

1.56

1.45

1.25

1.83

1.85

1.45

4.05

1.70

1.38

1.63

1.71

1.60

1.75

1.89

3.33

1.90

1.29

1.59

1.93

1.58

2.68

2.90

3.08

Did I scare you?

Traditionalist baseball fans like whining about three things.  The DH, the lack of “hard-nosed” players nowadays, and the decline of the brushback pitch.  Come to think of it, they usually whine about all three together.  Players have become pansies wearing all that body armor, pitchers don’t have to bat any more (and face the same 95 mph fastball that they might aim at someone’s head), and the league has clamped down on the time-honored tradition of throwing the ball at someone’s head because he dared to hit a homerun off you.  How rude!  Why if Nolan Ryan were here

People are generally afraid of hard objects hurtling toward them at a high speed and the pitcher, who just happens to hold a hard object in his hand and has an arm that can accelerate it to a high speed, knows that.  In theory, the idea behind plunking a batter is that the fear induced in him, and perhaps in his teammates, is actually worth more than the disadvantage of giving up first base on a freebie.  Perhaps he and his eight friends will back off the plate a little bit allowing the pitcher to work the outside corner a little bit better.  Maybe they’ll be a little more hesistant to swing.  It’s always been taken on faith that this one is true.  I’m not sure that anyone’s ever checked to see if the data actually support it.

Being hit by a pitch is a traumatic event, no doubt.  I remember getting plunked in 3rd grade rec league softball, and it still kinda gives me the willies.  It’s easy to picture how batters might be a little leery their next time up or how the guy on deck might get a sick feeling in his stomach at his next plate appearances.  However, not everything that makes sense is true, and not everything that is true makes sense.

Research on people who experience much greater traumatic events (think six months in a war zone or being witness to a murder) show that about 33% of them develop a disorder called PTSD (post-traumatic stress disorder), which affects all areas of a person’s life for an extended period of time.  It’s mostly an anxiety reaction.  Another percentage has a smaller reaction that either doesn’t last as long or is less severe or both.  But then there are some people who don’t see any effect on their life or mental health at all.  Are these the types that play baseball?

Does knocking a guy over cause him to be nervous?  What about watching one of your teammates get knocked over?  Is it worth it for the pitcher?  In order for it to be “worth it”, there should be some evidence that hitting a batter produces some sort decrease in batting skills, and this decrease should outweigh the fact that the price of hitting a batter it to put him on first for free.  If it doesn’t, then teams are putting themselves in a hole and the only benefit that they get is stupid male posturing.

I took all plate appearances from 2000-2008 (something like 1.6 million of them) and calculated the yearly on-base percentage for the pitcher and the hitter in the matchup.  I then calculated the expected on-base percentage for the at-bat, using the odds ratio method, and then took the log of the odds ratio (I’m putting this into a binary logit regression as a control.  If you don’t know what I’m talking about, don’t worry.  Just nod and smile.) 

I then looked for all HBP’s.  Now, we don’t know whether those were “message pitches” or fastballs that just got away (*wink wink*).  We also don’t know if they were full on plunks in the head or if they just grazed the jersey or whether the batter took one high and tight and got out of the way (for a called ball) that wouldn’t have been coded as HBP.  But, we’ll make do with what we have.

First, I looked to see whether the subsequent batters in an inning after a HBP showed any sign of being scared.  I entered the expected OBP for each matchup in the equation and then a dummy (i.e., dichotomous, i.e., yes or no) variable coding whether any of the batter’s teammates had yet been hit in the inning (not whether any would eventually be hit… it was a smart dummy variable… kinda like a jumbo shrimp… and knew whether the HBP had happened yet).  The dummy variable predicted that there was an effect on batter performance.  Batters get better after one of their teammates has been plunked, at least for the rest of the inning.  Of course, it probably has something to do with the fact that the pitcher is probably on the shaky side of control in this inning.  And he’s easier to hit. Given my methodology, it’s hard to say that the batter is X points of OBP better without knowing what batter/pitcher we’re talking about, but assume a league average batter faces a league average pitcher (using 2008 numbers, an OBP of .333).  If a batter has been hit previously in the inning, the expected OBP for that plate appearance is .334.  Not a huge effect.  Apparently the effect of being fired up is an extra walk/base hit every thousand plate appearances.

What about if we kept it only to the batter who immediately followed a hit by pitch?  The effect is no longer significant, but what effect there was again favored the batter.

But what of the hit batsman himself?  After all, he’s the one who has to endure the trauma and the bruise of being hit.  What happens the next time he comes to the plate?  The answer is that he is a less effective hitter in his next plate appearance.  Again, assuming he’s league average and facing a league average pitcher, he falls to an expected OBP of .321 in that next plate appearance.  Here, I’m not controlling for whether he’s facing a new (fresher, better?) pitcher, which could be a bias.

So, I kept it to those situations in which the batter was facing the same pitcher within the same game.  It must be pointed out that the pitcher has now faced 8 more batters (and thrown 30ish more pitches) so it’s biased in the other direction.  Psychologically, it makes sense that the batter would be more likely to be scared if he were facing the same pitcher, rather than a new one, and the memory will be freshest on the same day.  The effect was not significant, but favored the pitcher, and dropped the batter to a .326 expected OBP.

Interestingly enough, the effect of 30 pitches on the pitcher’s arm is about 4 points (.004) worth of expeted OBP in favor of the batter.  It looks like the overall effect, psychologically of getting plunked and then facing the same pitcher later is 10 points in favor of the pitcher.  The effect of facing a pitcher after he’s plunked you in the same game is about 7 points (it was actually 6 and some change).  Could just be a coincidence that those seem to match up rather well.

Perhaps there is even a carry-over effect, even if the batter doesn’t face the pitcher the same day.  I looked to see whether the batter showed some carry over effect whether his next meeting with the pitcher who hit him came 30 minutes or 3 years later.  There was.  Having previously been hit by a pitch by the same pitcher took an average batter vs. an average pitcher to an expected OBP of .323, no matter when it happened.  Some things you just never forget.

So, let’s call the effect of being hit by a pitch 10 points worth of OBP in the next at bat.  I checked to see whether there was an effect two plate appearances out, this time, an even stronger effect of 13 points.  The third appearance?  The effect is gone.  Being hit by a pitch seems to have an effect for two plate appearances, and then no more.  After that, players seem to have conquered their fear.

Now, is it worth it to intentionally hit a batter?  Not really.  You’re exchanging one at-bat in which you make a player a 1.000 OBP hitter (raising the average hitter’s chances by 67%) for two plate appearances where you drop his chances by about a percentage point.  Plus, you make his friends really angry.  But with that said, there’s a real effect, even if it is small, of being hit by a pitch.  Turns that out that hitters are only human and are a little shy the next few times that they come up.

Put Your Head Back in that Spreadsheet

All along, your eyes have been deceiving you, and you probably didn’t even know it. This is not the kind of deception that makes some people believe Miguel Cairo is a useful player, and it’s not the kind that makes us look at a young Daniel Cabrera and see him as an all-star. What I’m talking about is baseball as its very core–well below the surface of the stats and figures you can find on Fangraphs.

While we can’t answer questions such as whether or not a hitter has a hitch in his swing, or whether a pitcher is tipping his pitches, we can get close to answering similar questions about fielding. The Fans Scouting Report (FSR), run by Tom Tango, is a good way to see if a player has a good first step, a quick release on his throws, or several other fielding-related attributes. At the very least, the FSR numbers provide a good sanity check for the more advanced fielding metrics out there, such as UZR and plus/minus, or even the non play-by-play systems such as TotalZone and OPA! (which, I have to say, is the best name I’ve heard for any stat…ever).

FSR uses a wisdom of crowds approach where fans rate each player in a variety of categories, and the scores are weighted based on the importance of each category to its position (so speed is weighted more heavily for a shortstop than it is for a catcher). Before filling out a ballot, each voter reads this message:

Try to judge “average” not as an average player at that position, but an average player at any position. If you think that Willie Bloomquist has an average arm, then mark him as average, regardless if you’ve seen him play 2B, SS, 3B, LF, or CF.

DO NOT CONSIDER THE POSITION THE PLAYER PLAYS!
DO NOT CONSIDER THE POSITION THE PLAYER PLAYS!
DO NOT CONSIDER THE POSITION THE PLAYER PLAYS!

Ok, ok, we get it, position doesn’t matter. But the problem is, some people don’t get it. In general, the numbers spit out by Tango’s system do seem to work well. What happens if we look at a player who spent all of his time at one position one year, and then all of his time at a different position the following season? A player who fits this description is Mark Teahen, with a tip of the cap to this Joe Posnanski article, which gave me this idea. In 2006, Teahen played 109 games at third base. With the arrival of Alex Gordon in 2007, he played all but 9 of his games in the outfield. So let’s take a look at what the fans thought of Teahen in those two seasons, when they were not supposed to consider the position he played:

FSR chart Teahen jpeg.jpg

(Click to enlarge)

Those are some pretty wild swings in “ability,” and I have some serious doubts that Teahen’s throwing accuracy suddenly went from Chuck Knoblauch one season to Greg Maddux the next. I’m no psychologist, but I think some people’s opinions of these very granular data are being swayed by the demands of the position being played. It’s the same person on that diamond, all he’s doing is standing in a different spot, but it’s as if the fans are seeing two radically different players from one year to the next.

I don’t mean to speak negatively of the FSR, it is a fabulous project. This is by no means an exhaustive study, but it does goes to show that opinions are heavily swayed by context. Individuals, and groups of people in this case, can not be depended on to provide an unbiased opinion on something as simple as judging the quickness of a first step. Whether you want to admit it or not, your eyes are deceiving you.

Uh oh…

So (insert appropriate journalistic disclaimer stating that it only might have happened and that this is just a report from some other news organization that is quoting anonymous sources and this hasn’t been confirmed by the player and in truth, the matter is an open investigation and all the facts aren’t out yet so we shouldn’t jump to conclusions) A-Rod* used steroids.

Let the great debate begin now: A-Roid or A-Fraud?  Looks like the 2009 season has its first subplot, and pitchers and catchers haven’t even reported yet.

With all the hand-wringing that went on in the wake of Barry Bonds* breaking the home run record, there was always the thought that it was going to be OK.  The great hope was that the re-incarnation of Joe DiMaggio himself, in the form of Alex Rodriguez*, would be by in a few years to snatch the record away from Bonds*.  At that point, as a culture, we wouldn’t have to worry about the fact that one of the highest places in the cultural pantheon was taken up by a cheater.  Oops… we did it again.

Here’s to hoping that Albert Pujols hits about 200 home runs this upcoming season… just so people don’t have to worry.

My guess is that there will be no shortage of analysis as to what this “means” for baseball, whether A-Rod* is still a first-ballot Hall of Famer, whether he really “deserves” those MVP trophies, and whether Congress, which apparently has nothing else to do, should investigate the role of steroids in baseball… again.  (Aren’t we at war?)  Maybe what we need to do is stop confusing athletic performance with manly virtue?  Baseball players are little more than men, and men are not angels. 

Here’s what gets me though with A-Rod*.  Consider for a moment that last year, there were questions as to whether A-Rod* was having an affair with Madonna.  Like the steroid allegations, it is yet to be proven.  At the time though, no one questioned whether having an affair would bring his HOF credentials into question.  But, the steroid issue apparently does?

The measure of a man, part 1.5

Let’s put some names to the factors, shall we?  In part I of this series, I introduced a profiling system that distills hitters abilities (not performance) down to four factors.  The four factors were the Ichiro-Howard continuum (slap and run guys vs. lead-foot big fly guys), contact skills, risk-taking, and solid contact.  So, in 2008, who were the guys who best exemplified each of these ideals?

First, the Ichiro-Howard continuum.  The guys most on the Ryan Howard side (slow guys who hit big fly balls… although Howard himself is actually in the middle of the list… he actually hits more grounders than foul balls)

  1. Kevin Millar
  2. Joe Crede
  3. Mike Napoli
  4. Aramis Ramirez
  5. Marcus Thames

And on the Ichiro side:

  1. Argenis Reyes
  2. Joey Gathright
  3. Luis Castillo
  4. Ivan Ochoa
  5. Emmanuel Burris

None of those five had more than 400 PA last season and Reyes, Ochoa, and Burris were all 100 PA guys.  The best of those who had at least 400 PA (i.e., regulars)… well, it was Ichiro.

But let’s take a look at the guys who are closest to the middle.  The three on either side of the midpoint make for a rather interesting list: A-Rod, Ryan Zimmerman, Nick Markakis, Brandon Boggs, Corey Hart, and Clete Thomas.  I guess being in the middle means that you are either equally good or equally awful at both hitting big flies and slapping and running.

On to contact skills, which includes… contact percentage (easy enough), fouling off two strike pitches and having a good eye at the plate in general.

Best contact skills:

  1. Yadier Molina
  2. Nom-ah Garciaparra
  3. Omar Vizquel
  4. American League MVP Dustin Pedroia
  5. Ramon Santiago

Worst contact skills:

  1. Mark Reynolds
  2. Justin Upton
  3. Wlademir Balentin
  4. David Murphy
  5. Ivan Ochoa (apparently a slap hitter who misses on a lot of his slaps)

Really?  Mark “204 strikeouts” Reynolds has trouble hitting the ball?  (For the record, Jack Cust was not in this particular data set, oddly enough due to a problem in calculating his speed score.  It’s a technical problem on my end.)

Now, moving on to the players who take the most risks at the plate.  These are guys who swing a lot, don’t make contact as much, and hit a lot of foul balls for strike one or two, something associated with swinging for the fences.

Most risk-positive hitters:

  1. Vlad (who else?)
  2. Alex Cintron
  3. Pudge Rodriguez
  4. Delmon Young
  5. Josh Hamilton

Most risk-averse:

  1. Luis Castillo
  2. Dave Roberts
  3. Joe Mauer
  4. Reggie Willits
  5. Bobby Abreu

Some All-Stars at both ends of the spectrum.  Not bad.  Delmon Young and Josh Hamilton have also engaged in some risky behavior off the field too.  I’d have to wonder how much of this factor carries over to the players’ off-the-field personalities.

Then, there’s the final factor of solid contact.  These guys hit line drives and they hit balls that tend to go a long way.

Best solid contact skills:

  1. Dan Murphy
  2. Ryan Ludwick
  3. Milton Bradley
  4. Chris Davis
  5. Cliff Floyd

Worst solid contact skills:

  1. Brian Bixler
  2. Jeff Mathis
  3. Joey Gathright (apparently hits a lot of weak grounders)
  4. Reggie Willits
  5. Sean Rodriguez

Some of the “best of” list suffer from the problem that they don’t often make contact… but when they do… It’s pretty clear that one skill unto itself doesn’t guarantee success.  Plus, there are several ways to get a base hit.

My hope is that by giving a few examples at the extreme ends, it could be a little more clear what the four factors represent.

World Famous StatSpeak Roundtable: February 5

It’s back!  We’ve pulled the table out of storage and even got some new chairs.  Some of you have probably noticed that StatSpeak has been going through some changes over the past six weeks.  Brian, Colin, and Eric are still around and you’ll see them post here and there, although they’ve all been expanding out and doing some work for some other sites.  So, even though this is late, welcome Jon Walsh and Dan Novick to StatSpeak.  We’re hoping to keep the roundtable tradition going with the next generation (which would make me Chekov?), but stay tuned.

Question #1: The Yankees added Sabathia, Burnett, Teixeira, and Pettitte.  Does it help, or are they on the outside looking in?

Jon Walsh: Obviously as a Blue Jays fan my hopeful answer is no, but no time for optimism now.  I’ve always been of the opinion that teams can’t win through free agency and I still think the Red Sox are better on paper than the Yankees.  Also, having the pleasure of watching Burnett’s antics for the last three years I have my doubts of him handling the New York pressure.

Dan Novick: People seem to forget that this was an 89-win team last year. With these signings, the rotation became the best in the league (on paper), and the offense should be among the best, depending on the health of Matsui and Posada. So, yea, it certainly helped. With just an average bill of health, they should be the favorite to win the division, and having three or four stud pitchers in the playoffs certainly doesn’t hurt.

Pizza Cutter: Well, it sure didn’t hurt.  The thing is that it’s a Yankees-type signing.  It doesn’t take much Sabermetric wisdom (or any sort of wisdom really) to figure out that CC and Tex were the best pitcher and hitter on the market, and then sign them to really bloated contracts.  Burnett isn’t much of a mystery either, although 9+ K per nine innings plus high groundball rate is always a good, and I think underappreciated, combo.  However, while I can’t blame the Yankees for signing Tex, what becomes of Jorge Posada, who really shouldn’t be catching any more and Nick Swisher who really shouldn’t be allowed to roam in an outfield?  But still, the signings help them quite a bit this year, and because they’re the Yankees, they’ll be able to ride it out if those big contracts start looking more like fat than muscle.  On paper, they’re right back to being into the thick of things.

Colin Wyers: In almost any other division in baseball you’d have to consider calling them the preseason favorites. But the AL East is a tough mistress. They’re a good team, although there are some question marks (the outfield seems very unsettled, Posada’s ability to play catcher is unsettled, Cano and Jeter are both coming off of disappointing years) – you can say that about most of the top teams going into spring every year, though. They’re definately contending.

Question #2: The White Sox this off-season have traded away Javier Vazquez and let Orlando Cabrera walk away. But they are now rumored to be interested in signing Bobby Abreu, despite an already crowded outfield, and will bat A.J. Pierzynski 2nd in the lineup despite an OBP no higher than .312 the last two years. What’s going on with the South Siders?

Jon Walsh: I’d answer that question if I could.  It’s an excellent time to pick up a talent like Abreu via free agency for cheap but who are they going to trade?  Dye?  What motivates the team that would be trading for Dye not to just go out and sign Abreu or Dunn for cheaper anyways?  Maybe Williams has a plan but I just can’t see it right now.

Dan Novick: If a normal team did this, I’d say they’re having an identity crisis. They traded one of their best pitchers, and allowed the starting short stop to leave, but now they’re interested in another aging corner-OF/DH type with Abreu, which they already have in Jermaine Dye. And oh yea, they gave away Nick Swisher for a bag of balls. I’m not sure if they’re selling off parts or trying to compete. You just never know what’s going on in the mind of Kenny Williams.

Pizza Cutter: This is the same team that looked at the worst right fielder in baseball (Ken Griffey, Jr.) last year in the middle of a pennant race and said “well, he’s played center before, so let’s stick him out there again.”  Does Ozzie Guillen really look like the kind of guy with a plan?  There’s a test that a lot of people are familiar with called the Myers-Briggs Type Indicator (they’re the people who call you an ENFJ or somesuch nonsense… I actually took the M-B once and split three of the four scales completely down the middle.)  Ozzie is very clearly a perceiver, not a judger.  He’s the type that goes by how it feels, even if those of us who are hardcore J’s (the only one of the four scales that I didn’t split…) it doesn’t make sense.  I guess he’s got a front office that either shares the same biases or is just too frightened of what he will do if they contradict him.  Maybe both.

Colin Wyers: My legal counsel has advised me to not answer Roundtable questions about the Southsiders anymore.

So I’ll take this opportunity to simply ask: If PC is Chekov, what the heck does that make me and Brian?

Question #3: Manny Ramirez is still available.  What gives?

Jon Walsh: Simultaneous realizations by teams that signing late thirties players to long term deals isn’t a great idea?  The fact that his defence (yes, that is how you spell it) drags down his value?  I’m surprised that the two high bidders are both National League teams as he would be better as a DH.  But then, maybe he’s insisting in playing in left and would mail it in if he wasn’t allowed.  Remember that he still wants a Gold Glove.

Dan Novick: The only man in America who seems to think Manny is getting another $100 million deal is Scott Boras. Maybe he knows something the rest of baseball doesn’t, but it seems unlikely. The only team besides the Dodgers linked to Manny these days is the Giants, and they don’t seem like the kind of team interested in paying that much for a headache so soon after getting rid of Barry Bonds. That’s if they are even willing to pay him that much in the first place. Their highest ever payroll was about $90 million, and as it stands now, they’d have to go over $100 million in order to sign Manny. Boras had better have some tricks up his sleeve if Manny is going to get the lucrative contract he’s looking for.

Pizza Cutter: It would be silly to give a 36-year-old man a contract that pays him $25 million dollars four years from now, even if that man is Manny Ramirez.  It’s a bad risk.  But alas, the free agent market is still priced using the guidelines of “Well, Manny’s just about as good as A-Rod right now, so he should get just below A-Rod money.”  Maybe it’s just the bad economy or the fact that it’s a bad market for power-hitting corner outfielders or maybe that a lot of teams are actually wising up, but it seems that a lot of players have been slow to realize that this year’s free agent market isn’t being played under the old rules.

Colin Wyers: Ramirez seems to have an inflated sense of self-worth at this point that is colliding with two market realities: a dismal economy and a glut of big-bat-no-glove types on the free agent market.

It also doesn’t help that people have figured out that Boras’ “mystery team” is typically the Altoona Curve or the Gateway Grizzlies. The Dodgers aren’t going to bid against an organization that pays in donut burgers just because Boras says he has other offers.

Follow

Get every new post delivered to your Inbox.