From Z-Scores to T-Tests
July 26, 2008 13 Comments
Every now and again I’ll get an e-mail asking me to explain certain statistics terms or methods used in an article from myself, Pizza Cutter, or whomever. While I do not profess myself to be an all-out expert, I did pay attention in my classes through high school and college, and gladly try to make a post an author slaved over more enjoyable through a learned understanding. I pride myself on accessibility and so, from time to time, will offer some primers or “lessons” using baseball examples. One thing I still fail to understand is how so few collegiate-level classrooms incorporate sports heavily into their statistics curriculum. Seriously, do you know how many people would jump at the opportunity to learn statistics if sports were involved as an instructive tool? But I digress…
Today the topics on the metaphorical table are standard deviations, z-scores, and t-tests. Getting right into it, what is a standard deviation?
The given definition of a standard deviation is a measure of the dispersion of a set of values, but what does that mean and how does it relate to baseball? Essentially, standard deviation is the square root of the mean of the squared deviation of each member of the dataset to the overall mean. I realize that can be very confusing. Say we have ten hitters, who average out to a .345 OBP. To calculate the standard deviation of this group we must begin by finding how far from the mean, or average, each hitter falls. If Hitter A has a .329 OBP, his deviation would be -.016. The same is calculated for each member of the group. The results are then squared, so the -.016 becomes a .000256.
Once we square all of the deviations, they are averaged together to form our mean of squared deviation. This is known as the variance and it goes hand in hand with standard deviation. In fact, to calculate the standard deviation, from here on out called SD, we simply take the square root of the variance. If the variance–the average of squared individual deviations from the mean–comes to, say, .000231, we take its square root: .0151. This tells us the standard deviation of OBPs amongst these ten players is .0151. What do we make of this number, though?
SDs are tremendous for exploring ranges and where numbers in a dataset are expected to fall. Since the mean is .345 and the SD is .0151, to find the range of 1 SD we add and subtract .0151 to/from .345. So, 1 SD of our mean would fall between an OBP of .329 and .360. If our data follows a bell-shaped curve, then the 68-95-99.7 rule comes into play. This explains that 68% of the data in our sample is expected to fall within 1 SD; 95% is expected to fall within 2 SDs; and 99.7%, virtually everything we have, should fall within 3 SDs.
In terms of ranges, if 1 SD = .0151, then 2 SDs = .0302, and 3 SDs = .0453. 1 SD would fall between .329 and .360; 2 SDs would fall between .315 and .375; and 3 SDs would fall between, .300 and .390. Of course this is just an example of this particular hypothetical dataset.
We can use the mean of a dataset as a jumping off point, so to speak, with the use of the z-score. A z-score tells us how many standard deviations from the mean an individual piece of data fell. To calculate, we subtract the individual data from the mean and divide by the standard deviation. If the average amount of home runs hit was 14, and our player of interest hit 34, we know he exceeded the mean, but by how much? Assuming our guy belonged to a set of data wherein 1 SD = 3.2 HR, the z-score would be: 34-14/3.2 = 20/3.2 = 6.25.
Recall the 68-95-99.7 rule in 99.7% of data can be expected to fall within 3 SDs of the mean, because this player exceeded the mean by over six SDs. The z-scores are great when comparing players from different eras. If you wanted to know whether Roger Maris’s 61 HR were more impressive than Mark McGwire’s 70 in 1998, find the mean HR in each of those years as well as the standard deviation and calculate the z-score.
Finally, that brings us to t-tests, which I used as recently as this past week and will use as soon as this upcoming week. The t-test compares the means of two different groups to explain whether or not they are significantly different. This is different than gauging a general difference between two groups. If we have two sets of data, one with a mean BA of .276 and the other with a mean BA of .264, it definitely appears Group A performed better with regards to batting average. This may be true but is not necessarily. Sure, the number itself is higher, but perhaps the sample sizes are too small and the difference is purely noise. This test accounts for that possibility and explains when means are or are not significantly different from one another.
The calculation of the t-test can be found here, though it is much easier to automate the process via SPSS or some other statistics programs. Once the t-value is calculated we then have to match it up with its significance level to see if the difference between the means is real. SPSS goes right to the significance level to save some time. A p-value of .05 or below corresponds to the means being significantly different; any higher and the differentials begin to lose significance. If Group A had a .276 and B had a .264, and the p-value of the t-test is .013, then yes, the means are different from each other and Group A really did perform better relative to that metric.
T-tests are great for comparing the means of two different datasets and, in baseball terms, can be used to do things like compare players before and after in splits, on the road or at home, etc, anything along those lines. They help us understand that a higher number doesn’t always mean the group with the higher number is better, or that the lower number is worse. For more recap on statistics, I highly recommend Pizza Cutter’s primers, which can be found by clicking on some words in this sentence.