# Scout’s honor

Tom Tango has started the 6th Annual Fans’ Scouting Report. (Read more and comment about it on his blog.) The FSR is a rather unique thing in sabermetrics – it is perhaps the only publicly-available body of indexed and collected scouting reports on players. If you know of another, I’d love to hear from you.

[Some might object and say that PitchF/X data is very close to scouting data. And it is very close to scouting data, but I’d argue that it is not precisely the same as having human scouts view a player. The FSR is, even if it does employ some rather amateurish scouts, i.e. you and I.]

I suspect it will be a little while before we get completed FSR results. In the interim, I’m going to do a little housekeeping. We’re going to chat a bit about what we can do with the FSR data, and when we do get this year’s, hopefully I’ll do a followup.

What I do have right now is data from 2004-2007 (the 2003 data doesn’t seem to have been collected in quite the same way as far as I can tell, so I left it out of my dataset). Tom breaks fielding down into seven tools – however, in any analysis you tend to notice a strong correlation between several skills, like firststep and speed for instance. For the sake of simplicity I decided to try to sift through and combine several tools using a factor analysis. (For the more technically minded among the readers – no, I didn’t use a PCA, I used a more simple factor analysis with a promax rotation in GNU R.) I call my factors “Field,” “Range” and “Arm.”

 Instincts First Step Speed Hands Release Strength Accuracy Field 0.861 0.324 -0.138 0.849 0.493 -0.123 0.141 Range 0.115 0.794 1.009 0 0 0.128 0 Arm 0 0 0 0.158 0.553 0.684 0.912

Each factor is derived by summing the multiple of each player’s observed talent in that skill and the number above. (I then divided each by a constant to put them on the same 0-99 scale as the original observations.) Each factor is designed to correlate very poorly with one another; we’re trying to capture as much unique information about each player as possible.

So how are these tools represented by position?

 Position Field Range Arm 1B 43 38 43 2B 50 54 49 3B 48 49 49 CF 52 63 48 LF 42 50 43 RF 43 51 47 SS 54 57 54 Grand Total 47 51 47

Nothing you probably didn’t already know. Center fielders have a lot of range, first basemen have almost no range. Players with some range but no real fielding instincts are relegated to the corner outfield spots, players with the best fielding instincts tend to be shortstops. Throwing arm has the smallest spread among the positions; range has the largest spread.

We didn’t need to use three tools, of course. We could have just as easily used one tool (that’s what those negative numbers are hinting at):

 Overall ACCURACY 0.874 FIRSTSTEP 0.729 HANDS 0.909 INSTINCTS 0.898 RELEASE 0.953 SPEED 0.463 STRENGTH 0.601

And so let’s add overall to our table, and sort by that:

 Field Range Arm Overall SS 54 57 54 55 CF 52 63 48 53 2B 50 54 49 50 3B 48 49 49 49 RF 43 51 47 46 LF 42 50 43 44 1B 43 38 43 42 Grand Total 47 51 47 48

That defensive spectrum is pretty recognizable to readers of Tango’s blog. It’s vaugelly troubling that the averages don’t seem to line up very well, I should say.

And so now – here is the average STATS, Inc. zone rating, by position, based upon their overall Fan Scouting Report score:

 PM CH ZR 1B 26750 31440 0.851 0-10 115 143 0.804 10-20 1798 2175 0.827 20-30 3034 3618 0.839 30-40 5925 7009 0.845 40-50 5481 6434 0.852 50-60 4217 4966 0.849 60-70 3961 4535 0.873 70-80 2204 2542 0.867 80-90 15 18 0.833 2B 48786 59759 0.816 0-10 30 37 0.811 10-20 429 528 0.813 20-30 1206 1507 0.800 30-40 6618 8381 0.790 40-50 8832 10987 0.804 50-60 13111 15985 0.820 60-70 10423 12649 0.824 70-80 6528 7806 0.836 80-90 1609 1879 0.856 3B 39074 50592 0.772 0-10 81 119 0.681 10-20 521 750 0.695 20-30 1645 2217 0.742 30-40 4486 5909 0.759 40-50 7012 9209 0.761 50-60 9324 12159 0.767 60-70 8041 10255 0.784 70-80 5124 6468 0.792 80-90 2840 3506 0.810 CF 45764 52206 0.877 0-10 83 102 0.814 10-20 635 745 0.852 20-30 1419 1650 0.860 30-40 3814 4397 0.867 40-50 9048 10451 0.866 50-60 9135 10407 0.878 60-70 10779 12133 0.888 70-80 7086 8025 0.883 80-90 3240 3697 0.876 90-100 525 599 0.876 LF 32689 37895 0.863 0-10 34 39 0.872 10-20 1080 1299 0.831 20-30 4945 5740 0.861 30-40 7008 8334 0.841 40-50 8157 9418 0.866 50-60 6668 7682 0.868 60-70 3648 4115 0.887 70-80 1043 1155 0.903 80-90 106 113 0.938 RF 36833 42204 0.873 0-10 92 113 0.814 10-20 833 968 0.861 20-30 2410 2813 0.857 30-40 5517 6363 0.867 40-50 7742 8893 0.871 50-60 8846 10173 0.870 60-70 7198 8148 0.883 70-80 2658 3007 0.884 80-90 576 627 0.919 90-100 961 1099 0.874 SS 51780 62186 0.833 0-10 38 46 0.826 10-20 369 449 0.822 20-30 1377 1727 0.797 30-40 3591 4390 0.818 40-50 7141 8740 0.817 50-60 9735 11816 0.824 60-70 11362 13701 0.829 70-80 11499 13564 0.848 80-90 6668 7753 0.860

There seem to be some sampling issues at the top and bottom of the curves at some positions that I’m not entirely sure how to address. But this provides us with a springboard of numbers to regress to when projecting fielding performance.

UPDATE: Included for those of you that are into hardcore numbers stuff, the full output from the first factor analysis in the article.

```Uniquenesses:
ACCURACY_THROW FIRSTSTEP_FIELD     HANDS_FIELD INSTINCTS_FIELD   RELEASE_THROW
0.034           0.005           0.126           0.081           0.080
SPEED_FIELD  STRENGTH_THROW
0.145           0.565
Factor1 Factor2 Factor3
ACCURACY_THROW   0.141           0.912
FIRSTSTEP_FIELD  0.324   0.794
HANDS_FIELD      0.849           0.158
INSTINCTS_FIELD  0.861   0.115
RELEASE_THROW    0.493           0.553
SPEED_FIELD     -0.138   1.009
STRENGTH_THROW  -0.123   0.128   0.684
Factor1 Factor2 Factor3
Proportion Var   0.266   0.241   0.234
Cumulative Var   0.266   0.508   0.741
Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 141.09 on 3 degrees of freedom.
The p-value is 2.2e-30```

If it’s greek to you, well, some of it is greek to me.

### 4 Responses to Scout’s honor

1. Pizza Cutter says:

hooray for Factor Analysis! Colin, what did your factor loading plots look like, for those of us who like this sort of stuff.

2. dan says:

What do you mean that it’s troubling that the averages don’t line up? That the averages aren’t 50 like they’re supposed to be?
You need to teach Pizza and Eric how to make these fancy red shaded charts.

3. Colin Wyers says:

Yeah, I’d really prefer to see everything work out to 50 as the average. I don’t know why it doesn’t.

4. Pizza Cutter says:

The tables are pretty… but I’m not nearly that cool.