Scout’s honor

Tom Tango has started the 6th Annual Fans’ Scouting Report. (Read more and comment about it on his blog.) The FSR is a rather unique thing in sabermetrics – it is perhaps the only publicly-available body of indexed and collected scouting reports on players. If you know of another, I’d love to hear from you.

[Some might object and say that PitchF/X data is very close to scouting data. And it is very close to scouting data, but I’d argue that it is not precisely the same as having human scouts view a player. The FSR is, even if it does employ some rather amateurish scouts, i.e. you and I.]

I suspect it will be a little while before we get completed FSR results. In the interim, I’m going to do a little housekeeping. We’re going to chat a bit about what we can do with the FSR data, and when we do get this year’s, hopefully I’ll do a followup.

What I do have right now is data from 2004-2007 (the 2003 data doesn’t seem to have been collected in quite the same way as far as I can tell, so I left it out of my dataset). Tom breaks fielding down into seven tools – however, in any analysis you tend to notice a strong correlation between several skills, like firststep and speed for instance. For the sake of simplicity I decided to try to sift through and combine several tools using a factor analysis. (For the more technically minded among the readers – no, I didn’t use a PCA, I used a more simple factor analysis with a promax rotation in GNU R.) I call my factors “Field,” “Range” and “Arm.”

Instincts
First Step
Speed
Hands
Release
Strength
Accuracy
Field
0.861
0.324
-0.138
0.849
0.493
-0.123
0.141
Range
0.115
0.794
1.009
0
0
0.128
0
Arm
0
0
0
0.158
0.553
0.684
0.912

Each factor is derived by summing the multiple of each player’s observed talent in that skill and the number above. (I then divided each by a constant to put them on the same 0-99 scale as the original observations.) Each factor is designed to correlate very poorly with one another; we’re trying to capture as much unique information about each player as possible.

So how are these tools represented by position?

Position
Field
Range
Arm
1B
43
38
43
2B
50
54
49
3B
48
49
49
CF
52
63
48
LF
42
50
43
RF
43
51
47
SS
54
57
54
Grand Total
47
51
47

Nothing you probably didn’t already know. Center fielders have a lot of range, first basemen have almost no range. Players with some range but no real fielding instincts are relegated to the corner outfield spots, players with the best fielding instincts tend to be shortstops. Throwing arm has the smallest spread among the positions; range has the largest spread.

We didn’t need to use three tools, of course. We could have just as easily used one tool (that’s what those negative numbers are hinting at):

Overall
ACCURACY
0.874
FIRSTSTEP
0.729
HANDS
0.909
INSTINCTS
0.898
RELEASE
0.953
SPEED
0.463
STRENGTH
0.601

And so let’s add overall to our table, and sort by that:

Field
Range
Arm
Overall
SS
54
57
54
55
CF
52
63
48
53
2B
50
54
49
50
3B
48
49
49
49
RF
43
51
47
46
LF
42
50
43
44
1B
43
38
43
42
Grand Total
47
51
47
48

That defensive spectrum is pretty recognizable to readers of Tango’s blog. It’s vaugelly troubling that the averages don’t seem to line up very well, I should say.

And so now – here is the average STATS, Inc. zone rating, by position, based upon their overall Fan Scouting Report score:

PM
CH
ZR
1B
26750
31440
0.851
0-10 115 143 0.804
10-20 1798 2175 0.827
20-30 3034 3618 0.839
30-40 5925 7009 0.845
40-50 5481 6434 0.852
50-60 4217 4966 0.849
60-70 3961 4535 0.873
70-80 2204 2542 0.867
80-90 15 18 0.833
2B
48786
59759
0.816
0-10 30 37 0.811
10-20 429 528 0.813
20-30 1206 1507 0.800
30-40 6618 8381 0.790
40-50 8832 10987 0.804
50-60 13111 15985 0.820
60-70 10423 12649 0.824
70-80 6528 7806 0.836
80-90 1609 1879 0.856
3B
39074
50592
0.772
0-10 81 119 0.681
10-20 521 750 0.695
20-30 1645 2217 0.742
30-40 4486 5909 0.759
40-50 7012 9209 0.761
50-60 9324 12159 0.767
60-70 8041 10255 0.784
70-80 5124 6468 0.792
80-90 2840 3506 0.810
CF
45764
52206
0.877
0-10 83 102 0.814
10-20 635 745 0.852
20-30 1419 1650 0.860
30-40 3814 4397 0.867
40-50 9048 10451 0.866
50-60 9135 10407 0.878
60-70 10779 12133 0.888
70-80 7086 8025 0.883
80-90 3240 3697 0.876
90-100 525 599 0.876
LF
32689
37895
0.863
0-10 34 39 0.872
10-20 1080 1299 0.831
20-30 4945 5740 0.861
30-40 7008 8334 0.841
40-50 8157 9418 0.866
50-60 6668 7682 0.868
60-70 3648 4115 0.887
70-80 1043 1155 0.903
80-90 106 113 0.938
RF
36833
42204
0.873
0-10 92 113 0.814
10-20 833 968 0.861
20-30 2410 2813 0.857
30-40 5517 6363 0.867
40-50 7742 8893 0.871
50-60 8846 10173 0.870
60-70 7198 8148 0.883
70-80 2658 3007 0.884
80-90 576 627 0.919
90-100 961 1099 0.874
SS
51780
62186
0.833
0-10 38 46 0.826
10-20 369 449 0.822
20-30 1377 1727 0.797
30-40 3591 4390 0.818
40-50 7141 8740 0.817
50-60 9735 11816 0.824
60-70 11362 13701 0.829
70-80 11499 13564 0.848
80-90 6668 7753 0.860

There seem to be some sampling issues at the top and bottom of the curves at some positions that I’m not entirely sure how to address. But this provides us with a springboard of numbers to regress to when projecting fielding performance.

UPDATE: Included for those of you that are into hardcore numbers stuff, the full output from the first factor analysis in the article.

Uniquenesses:
ACCURACY_THROW FIRSTSTEP_FIELD     HANDS_FIELD INSTINCTS_FIELD   RELEASE_THROW
0.034           0.005           0.126           0.081           0.080
SPEED_FIELD  STRENGTH_THROW
0.145           0.565
Loadings:
Factor1 Factor2 Factor3
ACCURACY_THROW   0.141           0.912
FIRSTSTEP_FIELD  0.324   0.794
HANDS_FIELD      0.849           0.158
INSTINCTS_FIELD  0.861   0.115
RELEASE_THROW    0.493           0.553
SPEED_FIELD     -0.138   1.009
STRENGTH_THROW  -0.123   0.128   0.684
Factor1 Factor2 Factor3
SS loadings      1.865   1.688   1.635
Proportion Var   0.266   0.241   0.234
Cumulative Var   0.266   0.508   0.741
Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 141.09 on 3 degrees of freedom.
The p-value is 2.2e-30

If it’s greek to you, well, some of it is greek to me.

Advertisements

4 Responses to Scout’s honor

  1. Pizza Cutter says:

    hooray for Factor Analysis! Colin, what did your factor loading plots look like, for those of us who like this sort of stuff.

  2. dan says:

    What do you mean that it’s troubling that the averages don’t line up? That the averages aren’t 50 like they’re supposed to be?
    You need to teach Pizza and Eric how to make these fancy red shaded charts.

  3. Colin Wyers says:

    Yeah, I’d really prefer to see everything work out to 50 as the average. I don’t know why it doesn’t.

  4. Pizza Cutter says:

    The tables are pretty… but I’m not nearly that cool.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: