A couple of things

1. I was completely wrong about GPA2: Go here and scroll down to see why. Needless to say, 2*OPS-BA does not work well at all.
2. Thanks to the new version of WordPress being used by MVN, I now have the capability of adding a stats page. I will do so shortly, and it will keep track of fun stuff like Pitching Runs Created, Defensive Ratings, and my Power Rankings. At the least. I’ll probably update it weekly. More updates coming soon.

Advertisements

Pitching Runs Created – August 26

I’m back, and will be back to posting regularly. I haven’t calculated Pitching Runs Created in three weeks so the standings have changed dramatically. Johan Santana has gotten ahead as the clear AL Cy Young favorite, while Chris Carpenter continues to lead the NL. In this case, I hope the Cy Young voters do look at wins, because he deserves it.
AL:

NAME

TEAM

PRC

Johan Santana

MIN

113

John Lackey

ANA

94

Mark Buehrle

CHA

91.9

Randy Johnson

NYA

83.6

Jeremy Bonderman

DET

78.6

Dan Haren

OAK

78.4

Bartolo Colon

ANA

77.9

Roy Halladay

TOR

77.8

Matt Clement

BOS

76.2

Mike Mussina

NYA

75.9

For all the talk about the Yankees’ crappy pitching staff, New York is one of only two teams with multiple players in the top-ten. It’s the rest of the staff that is the problem. Nevertheless, if the Yankees get into the playoffs, they have a chance with that scary offense, and Johnson, Mussina, and Rivera.
NL:

NAME

TEAM

PRC2

Chris Carpenter

SLN

114.5

Roger Clemens

HOU

106.5

Jake Peavy

SDN

103.8

Pedro Martinez

NYN

103.8

A.J. Burnett

FLO

96.82

John Smoltz

ATL

96.22

Roy Oswalt

HOU

93.01

Dontrelle Willis

FLO

91.68

Andy Pettitte

HOU

88.25

John Patterson

WAS

87.48

 

Clemens was eight runs behind Carpenter three weeks ago; that hasn’t changed at all. But Martinez, who was pretty much tied with Carpenter the last time I put together these standings, has fallen to fourth, having experienced some struggles lately. Patterson, meanwhile, has rocketed in these standings, and is one of only five National League starters with over 5 PRC/G.

The 2005 Scouting Report: By the Fans for the Fans

Go to Tangotiger’s Fan Scouting Report and help him continue to add a new dimension (scouting!) to sabermetric fielding analysis. Do it.

Pitcher Replacement Level

Still on vacation, but since I have a moment: I’ve been thinking about replacement level for pitchers a lot recently. What is a replacement level pitcher? To me, a replacement level pitcher is the type that cannot keep a spot on a major league roster, but is good enough to pitch in the major leagues in place of an injured starter or for the Kansas City Royals.
I’ve heard a lot of very smart people say that to pitch in the major leagues, you have to have a BABIP of .330 or lower. So let’s say that a replacement level pitcher has league average fielding indepent (HR, BB, SO) rates, but a .330 BABIP since he is a fringe player. How many extra hits will he allow over an average pitcher? About 2.3. Given that the average ball in play is worth roughly .55 runs, he would allow roughly 1.3 runs more per game than a league average pitcher, meaning that a replacement level pitcher is roughly 78% of a league average pitcher. Baseball Prospectus says the number is 80; that seems about right.

A new statistic

Okay, it’s actually simply an attempt to correct an old one. Last week, I wrote that GPA is a better statistic than EQA and easier to use for comparison purposes. But there is a problem with GPA; something that makes it not so accurate.
***
Imagine the weights given to each type of hit by OBP and SLG. In OBP, 1B = 1; 2B = 1; 3B = 1, HR = 1; and BB = 1. They all have equal weights. On the other hand in SLG, the weights are: 1B = 1; 2B = 2; 3B = 3, and HR = 4. Thus, OPS gives the following weights:
BB = 1
1B = 2
2B = 3
3B = 4
HR = 5
GPA, which weights OBP as 1.8 times SLG, gives the following weights:
BB = 1.8
1B = 2.8
2B = 3.8
3B = 4.8
HR = 5.8
But what ratio would be correct? Using linear weights, this is an easy question to answer. The answer is roughly:
BB = 2
1B = 3
2B = 5
3B = 7
HR = 9
***
See how big the difference is between GPA and the true weights? GPA undervalues extra base hits while overvaluing walks and singles. So, you might say, why not simply decrease the value of OBP? Because then walks will be undervalued, as they are in OPS. It’s not a big problem, but we’re looking for a measurement that pretty much pinpoints the actual value of each component.
What screws the whole thing up is walks. So why not forget about walks for a moment, and simply find the right weight for OBP to make the other ratios work? That’s simple enough: .5*OBP + SLG gives us the following weights:
1B = 1.5
2B = 2.5
3B = 3.5
HR = 4.5
Does that look a little familiar? It should; multiply those values by two and you get the true weights which I quoted a little earlier. But here’s the problem: walks are given a weight of .5 in such an equation, half of what they should be. So here’s an idea: why not add some extra value for walks by adding .5*BB% to the equation. That would double the value of a walk and give us the perfect ratios we’re looking for. What is BB% though?; it’s simply OBP – BA. So, we end up with the following formula:
SLG + .5*OBP + .5*(OBP – BA)
which is equal to
SLG + OBP – .5*BA
or in other words
2*OPS – BA
Can it really be that simple? Yes! As you can see, both OPS and BA are very valid statistics, especially if combined the right way. And if you want to approximate the GPA scale, simply divide by 7. Thus, GPA2 = (2*OPS – BA)/5. It’s more accurate than the original GPA and just as simple.
So how does GPA2 compare to GPA? Let’s take a look at the top-ten hitters this season based on GPA. GPA2 is in parentheses:
1. Derrek Lee – .367 (.374)
2. Jason Giambi – .358 (.350)
3. Albert Pujols – .356 (.358)
4. Miguel Cabrera – .352 (.356)
5. Alex Rodriguez – .347 (.340)
6. Travis Hafner – .343 (.336)
7. Nick Johnson – .340 (.315)
8. Adam Dunn – .331 (.344)
9. Carlos Delgado – .330 (.319)
10. Brian Giles – .325 (.303)
As you can see, GPA overrates high-OBP guys like Giambi, Johnson, Delgado, and Giles while underrating the high-SLG guys like Dunn. GPA2 corrects for that, while maintaining GPA’s simplicity.

Koufax and Marichal

Sorry guys, I’m away for the next three weeks, so updates will be somewhat sparse. But an interesting thread on Fanhome is the reason for this post.
There’s a lot of interesting stuff in that thread but what sparked my interest was the Sandy Koufax/Juan Marichal comparison. Here’s the question: did Koufax really throw an incredibly large number of pitches, a number that was not in-line with his number of starts? Let’s see: from age 25-30, Koufax threw an estimated 24,804 pitches. At the same age, Marichal threw an estimated 25,138 pitches, more than Koufax! But wait, there’s also the post-season. In three World Series between the ages of 25-30, Koufax threw an estimated 681 pitches. Marichal did not have any post-season appearances in the same age period. So overall, Koufax passes Marichal with 25,485 pitches. But 300 pitches is nothing when distributed over six seasons (like three innings per year).
But maybe Koufax’s abnormally large pitch counts came in those last two seasons, when he threw a combined 658.7 innings pitched? Well in 1965-66, Koufax threw an estimated 10,280 pitches (playoffs included). In 1968-69, Marichal threw an estimated 9,048 pitches. So yes, there is a pretty large difference there. 1,232 pitches corresponds to about a 74 inning difference; 37 innings a year is a pretty large gap. But much of Koufax’s advantage comes from the fact that he played a few extra games: he averaged 119.5 pitches per start, while Marichal averaged 122.3. Marichal was throwing more pitches per start; he just had less starts!
Nevertheless, Koufax did throw more pitches over that two year period. But over a five year period at the same age, the two were virtually equal. Koufax’s load per start was not unusual for an ace in that period. Both in 1965 and 1966, there were pitchers with as many or more starts than Koufax, so his number of appearances was not unusual as well. I see no reason to exclaim that Koufax had an unusually high load of pitches.

BaseRuns

In response to a comment about my EQA post, I posted a link to a Baseball Primer discussion thread on BaseRuns. Obviously, I’m a little late to the party, but let me try to explain once and for all why BaseRuns is a better run estimator than any other.
First, let us address accuracy. As shown by US Patriot, BaseRuns are as accurate as any other run estimator. In fact, they vastly surpass the accuracy of basic runs created and are slightly better than anything but Extrapolated Runs. So, in terms of accuracy, BaseRuns work as well anything else.
But that’s not where their real advantage kicks in. What makes BaseRuns better than a linear formula or runs created or any other run estimator is its performance at extremes. See, BaseRuns models the actual scoring process in baseball: Runs = % of runners that score*base runners + home runs. This in an undeniable fact. So if you want to find the number of runs Pedro Martinez was expected to allow, the only system that will give you an accurate estimate is BaseRuns. In fact, in high scoring enviroments, BaseRuns is the only system that will give an accurate estimate of runs because it is the only system that makes sure that a team can’t have more runs scored than base runners.
In the thread I linked to, you will find two major quibbles with BaseRuns: one, that they are too complicated, and two, that they cannot be applied to hitters.
To address the first quibble: well, I disagree. BaseRuns may be complicated compared to linear weights or basic runs created, but they are no more complicated (and actually much simpler) than Extrapolated Runs, Equivalent Runs, the Tech versions for RC, and so on. BaseRuns require minimal information and just a few steps. And again, BaseRuns work for any run enviroment, any era. Every other run estimator is centered on a certain amount of data: that is, it works only because it has been fitted to the major league run enviroment. But once you start evaluating individual players or 19th century baseball, or whatever, you need to use BaseRuns. Every other run estimator will simply miss by a wide margin.
The second quibble is not exactly true. It is true that BaseRuns cannot be applied directly to a hitter because a hitter interacts with the other players in a lineup. A pitcher is a team in himself – imagine that a pitcher pitches a complete game; in that case, his numbers will be equivalent to his team’s defensive numbers – but a hitter cannot score himself (unless he hits a home run). But a hitter’s BaseRuns can be calculated by first calculating his team’s BaseRuns and then subtracting his numbers from his team’s numbers and re-calculating the team’s BaseRuns. The difference is the number of runs created by that individual hitter.
In short, BaseRuns are superior to any other run estimator, and are the only true run estimator out there. There is no reason to do any serious studies or comparisons without using BaseRuns.
Other good reading on BaseRuns can be found at:
US Patriots’ Website
Tangotiger’s Website: see his three part series on how runs are really created
David Smyth’s BaseRuns Primer and the thread following it