UZR data for the last 4 1/3 years

A few years ago on Baseball Think Factory, MGL provided complete UZR data. If you are reading this site you probably know that UZR is the most complete fielding system that has been made public. He had to stop publishing complete UZR data when he worked for the St Louis Cardinals, but still gave us selected players now and then, as well as the top and bottom players at a position, posting them every now and then in BTF threads.
Today, we once again have complete UZR data for 2003 to 2006, as well as for the first third of the 2007 season. MGL was generous enough to post them on The Book Blog, in an excel spreadsheet.
The Hardball Times publishes John Dewan’s plus minus data at the team level (if you want it for individuals, you’ll have to pay)† It is broken down by defense against ground balls and flyballs.† I totalled the UZR† data by teams, infielders, and outfielders in order to compare to the THT/Dewan data.† I put the results into a google spreadsheet here.† This is my first attempt at using google docs, so if it doesn’t work, let me know and I’ll try and fix it.
The correlation on teams was .69, with Oakland (35) and Cleveland (30).† UZR sees Oakland as poor in the outfield (as any team that plays Jack Cust deserves to rate) and average in the infield.† THT has them average in the outfield and great in the infield.† UZR has Cleveland great in the outfield and poor in the infield, while THT has them slightly below average in the OF and even worse in the infield.
Overall, the correlation is much better (.86) for the infield than for outfield (.57).† One discrepancy is that I think the THT figures are in hits saved while UZR is runs.† Perhaps we should multiply the THT by .75 or .80 to compare, but that won’t change the correlations.† These are both good fielding systems, but we are nowhere near the agreement we would get with different offensive measures.† For example, if we compared linear weights to runs created per game or equivalent average, our correlation would be greater that .95.† The fielding systems have somewhat similar designs, though there are differences in what types of adjustments are made, and the data they are based on come from different sources (STATS vs Baseball Info Solutions).† I suspect that part of the difference is unavoidable data classification differences.† We get a much higher correlation for groundballs because everyone knows what a groundball is.† When classifying flyballs and line drives, there is a gray area in between.† Two teams of scorers are not going to match all the time on these.† BIS may be using another category, “fliners”.† Though the intent I’m sure was to remove the gray area, all it really does is add two gray areas.† Is that ball a true liner or a fliner?† Is the next one a fliner or a flyball?


9 Responses to UZR data for the last 4 1/3 years

  1. MGL says:

    Your analogy to agreement among offensive systems is not a good one, with all due respect. Offensive systems SHOULD agree because the outcome of an offensive event is pre-determined and fixed. That is the difference between offense and defense. Let’s say that the official scorer (or someone) was responsible for awarding points, negative or positive, to all fielders near where a ball was hit. Now, all of the defensive metrics would all be in near agreement. Just because different defensive metrics do not agree does NOT mean that they are poor metrics. Imagine if there were no means to “score” an offensive event. We would have the same trouble as with defensive metrics in terms of different systems (especially when they use different databases that are scored somewhat subjectively) if offensive events were not scored as they are. And the scoring of the offensive events are somewhat arbitrary and definitely not accurate in terms of using the data to forecast offensive performance. The other day when A-Rod tricked the TOR 3B into not catching that IF pop fly, it gets scored as a hit. What good is that for forecasting? IOW, the fact that offensive metrics agree is because of the artificiality of the scoring of offensive events and that is NOT a good thing.
    Imagine that we decided to a lot better with offensive events. Let’s say that we watched every play (like they do with defensive data) and, for example, we scored that pop fly an out, or we scored a ground ball through the IF as a half or a 1/3 of a hit. Etc. We would do alot better than just using the actual result of the hit, wouldn’t we? (Of course we would.) Well, given that, all the offensive metrics would not NOT agree, especially when using different databases, even though each database would be better than the actual results (s,d,t, etc.) of the offensive event. So, moral of the story is that different metrics not agreeing does NOT equal bad metric.

  2. David Gassko says:

    If you’re talking about the data on the THT site, that’s not Dewan’s plus/minus; it’s a bit simpler. That data simply looks at the average conversion rate of each BIP type and compares that to the actual conversion rates for the team (see the 2006 THT Annual for more details). Team-level results according to Dewan’s plus/minus system are available in the 2007 THT Annual; they have a correlation of .7 with the stats we report on the website.

  3. studes says:

    Thanks, David. I was going to mention the same thing.

  4. Sean Smith says:

    My bad, I thought that was the Dewan plus minus.
    To be clear, I never said anything about these being bad metrics. But the disagreements mean fans ahave a lot more to argue about than with offensive statistics.

  5. Pizza Cutter says:

    Interestingly enough, in my business, a correlation of .70 is usually the gold standard for agreement between metrics (we call it construct validity). You got .69, which we would all fudge and say… yeah, close enough.

  6. Sean Smith says:

    I looked at UZR vs the plus minus from John Dewan’s article in the 2007 THT. What’s cool is its broken down even more, corner infield vs middle infield. One problem in comparing these is for the players who played for multiple teams, for example MGL has Bobby Abreu’s full 2006 line listed with Philadelphia.
    Anyway, UZR vs PM:
    All: 0.65
    Middle IF: 0.8
    Corner: 0.52
    Outfield: 0.55
    So the best available methods are in strong agreement on how to evaluate middle infielders. All hail Adam Everett. But not so much outfielders, or the corners.
    Once again this doesn’t make them bad metrics. I’m comparing what are considered the best metrics. If zone rating doesn’t agree, then we can say that’s because zone rating doesn’t account for everything UZR does.
    When UZR and PM don’t agree, to me that means we aren’t really sure about the player. Like Andruw Jones. His UZR from 03 to 06 is -2,-4,-1,+7. Looks like an average CF to me. His PM the same years is +13,+6,+7, and +30. I’m using the “enhanced” column from The Fielding Bible and latest figure from the 2007 Bill James Handbook.
    Is UZR right on Jones? Is PM? Is it somewhere in the middle? If I’m a GM considering spending Soriano money on him this upcoming winter, I’m going to want to know more. Maybe hire a stat intern, buy the datasets, recreate the measures, and have him watch tapes of every play Jones makes or does not make, and come to some sort of reconciliation before I decide to spend or pass.

  7. MGL says:

    “When UZR and PM donít agree, to me that means we arenít really sure about the player.”
    I am not sure that is true. In fact, I don’t think it is. It is entirely possible that the combination is an excellent estimate of the player’s true talent and that each one by itself is still very good with a higher uncertainty rate.
    So if a player is, let’s say, around +10 in both metrics that might be exactly the same as if he were +20 in one and 0 in the other. I think that is roughly the case.
    It is a little like the argument that we had in the SAFE discussion where Shane was suggesting that 3 consecutive (in years) values of +5 was more accurate than values of 0, -5, and +20 (not considering weighting each year differently). We pretty much debunked that idea. I think the same thing applies here.
    There is probably a mathematical solution (answer) though.
    Of course this assumes that both metrics are roughly the same in terms of their accuracy and reliability (of what we are trying to measure, which is the average true defensive talent over the course of the data, in runs saved/earned, given the actual distribution of balls hit to each player and a neutral context otherwise). If they are not, then a weighting should probably be applied.

  8. robert linzmeier says:

    say someone is on firstbase and the pitcher throws a past ball or wild pitch when the runner moves to second does that count as a stolen base or what does it count as?

  9. Pizza Cutter says:

    If the runner wasn’t initially going, then it goes down as an advancement on the past ball/wild pitch, with no SB awarded. If he was going, he’d probably get the SB, but that’s a judgment call on the part of the official scorer.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: