# From Z-Scores to T-Tests

July 26, 2008 13 Comments

Every now and again I’ll get an e-mail asking me to explain certain statistics terms or methods used in an article from myself, Pizza Cutter, or whomever. While I do not profess myself to be an all-out expert, I did pay attention in my classes through high school and college, and gladly try to make a post an author slaved over more enjoyable through a learned understanding. I pride myself on accessibility and so, from time to time, will offer some primers or “lessons” using baseball examples. One thing I still fail to understand is how so few collegiate-level classrooms incorporate sports heavily into their statistics curriculum. Seriously, do you know how many people would jump at the opportunity to learn statistics if sports were involved as an instructive tool? But I digress…

Today the topics on the metaphorical table are standard deviations, z-scores, and t-tests. Getting right into it, what is a standard deviation?

The given definition of a standard deviation is a measure of the dispersion of a set of values, but what does that mean and how does it relate to baseball? Essentially, standard deviation is the square root of the mean of the squared deviation of each member of the dataset to the overall mean. I realize that can be very confusing. Say we have ten hitters, who average out to a .345 OBP. To calculate the standard deviation of this group we must begin by finding how far from the mean, or average, each hitter falls. If Hitter A has a .329 OBP, his deviation would be -.016. The same is calculated for each member of the group. The results are then squared, so the -.016 becomes a .000256.

Once we square all of the deviations, they are averaged together to form our mean of squared deviation. This is known as the variance and it goes hand in hand with standard deviation. In fact, to calculate the standard deviation, from here on out called SD, we simply take the square root of the variance. If the variance–the average of squared individual deviations from the mean–comes to, say, .000231, we take its square root: .0151. This tells us the standard deviation of OBPs amongst these ten players is .0151. What do we make of this number, though?

SDs are tremendous for exploring ranges and where numbers in a dataset are expected to fall. Since the mean is .345 and the SD is .0151, to find the range of 1 SD we add and subtract .0151 to/from .345. So, 1 SD of our mean would fall between an OBP of .329 and .360. If our data follows a bell-shaped curve, then the 68-95-99.7 rule comes into play. This explains that 68% of the data in our sample is expected to fall within 1 SD; 95% is expected to fall within 2 SDs; and 99.7%, virtually everything we have, should fall within 3 SDs.

In terms of ranges, if 1 SD = .0151, then 2 SDs = .0302, and 3 SDs = .0453. 1 SD would fall between .329 and .360; 2 SDs would fall between .315 and .375; and 3 SDs would fall between, .300 and .390. Of course this is just an example of this particular hypothetical dataset.

We can use the mean of a dataset as a jumping off point, so to speak, with the use of the z-score. A z-score tells us how many standard deviations from the mean an individual piece of data fell. To calculate, we subtract the individual data from the mean and divide by the standard deviation. If the average amount of home runs hit was 14, and our player of interest hit 34, we know he exceeded the mean, but by how much? Assuming our guy belonged to a set of data wherein 1 SD = 3.2 HR, the z-score would be: 34-14/3.2 = 20/3.2 = 6.25.

Recall the 68-95-99.7 rule in 99.7% of data can be expected to fall within 3 SDs of the mean, because this player exceeded the mean by over six SDs. The z-scores are great when comparing players from different eras. If you wanted to know whether Roger Maris’s 61 HR were more impressive than Mark McGwire’s 70 in 1998, find the mean HR in each of those years as well as the standard deviation and calculate the z-score.

Finally, that brings us to t-tests, which I used as recently as this past week and will use as soon as this upcoming week. The t-test compares the means of two different groups to explain whether or not they are significantly different. This is different than gauging a general difference between two groups. If we have two sets of data, one with a mean BA of .276 and the other with a mean BA of .264, it definitely appears Group A performed better with regards to batting average. This may be true but is not necessarily. Sure, the number itself is higher, but perhaps the sample sizes are too small and the difference is purely noise. This test accounts for that possibility and explains when means are or are not significantly different from one another.

The calculation of the t-test can be found here, though it is much easier to automate the process via SPSS or some other statistics programs. Once the t-value is calculated we then have to match it up with its significance level to see if the difference between the means is real. SPSS goes right to the significance level to save some time. A p-value of .05 or below corresponds to the means being significantly different; any higher and the differentials begin to lose significance. If Group A had a .276 and B had a .264, and the p-value of the t-test is .013, then yes, the means are different from each other and Group A really did perform better relative to that metric.

T-tests are great for comparing the means of two different datasets and, in baseball terms, can be used to do things like compare players before and after in splits, on the road or at home, etc, anything along those lines. They help us understand that a higher number doesn’t always mean the group with the higher number is better, or that the lower number is worse. For more recap on statistics, I highly recommend Pizza Cutter’s primers, which can be found by clicking on some words in this sentence.

Excellent article on a topic quite close to my heart. In the advanced stats courses I took during college (time series/forecasting and regression/multivariate data analysis) my professors let us pick our own data sets for assignments. When you have the opportunity to play with the data on Fangraphs/B-Ref/THT and get course credit for it, life is pretty good.

Haha, yeah if there was something like what you just mentioned, if not even an entire class devoted to it, people would be so much more interested. Just let students work with whatever datasets they want to. Statistics classes are usually mandatory in the gen-ed department so why not let students enjoy them as opposed to what happens now, where they barely show up or listen, and then get by with a C.

Whenver I taught stats, I would always seem to drift back to baseball for my examples. The problem was that I was often teaching to a bunch of non-baseball fans!

> In terms of ranges, if 1 SD = .0151, then 2 SDs = .0302, and 3 SDs = .0453. 1 SD would fall between .329 and .360; 2 SDs would fall between .043 and .647; and 3 SDs would fall between, well, someone with a negative OBP. . .

I think I see an arithmetic error here. The base OBP is 0.345. When we +- 0.0151 to it, you’re correct: we get 0.329 and 0.360. But I think you slipped a decimal place in your other two examples: 2 SDs being 0.0302, the range should be 0.315 to 0.375; and 3’s range should be 0.300 and 0.390.

Either that, or there’s something that I’m drastically misunderstanding here.

And, BTW, if I’m right, then I don’t really think 99.7% of all OBPs fall anything like between .300 and .390. I mean, there are fewer than 1,000 ML ballplayers, right? And 0.3% of 1,000 is 3, so 0.3% of ML ballplayers is less than 3. And I think there are WAY more than 3 players each year outside the range .300-.390 OBP. Again, please show me what I’m misunderstanding here. Thanx.

Yeah, I tried to explain this over at Seamheads a few weeks ago, but not nearly as eloquently as you did here. Well done, as anything that helps more people understand statistics is a good thing in my book.

Phil, good catches. The 99.7% of .300-.390 is literally only a hypothetical situation to explain the method. I think you’re looking too far into a hypothetical, which is why 0.3% or, 3 or less of a dataset seems odd. Thanks for the heads-up. Damn decimals.

Matt, glad you enjoyed it. I just asked myself constantly, how would I want something I had trouble with explained to me. It’s hard to gauge that so I guess the true test here would be if people take something away from it.

I have yet to take a statistics course (Pizza’s Stats 201,202, and 203 don’t count), so thanks for this.

Dan, if you’re really interested, I highly recommend the book “Teaching Statistics Using Baseball” by Jim Albert. It’s the equivalent of a stats course textbook but taught solely through baseball.

Pizza and I I’m sure will have more of these though through the coming months (perhaps in the off-season when nothing’s happening).

It’ll definitely help you in the class to relate it to baseball. That’s what I did in my freshman year of college; whatever they taught I’d ask myself how it applied to baseball or sports.

Due to the fact that I’m just entering High School, and the highest math I’ve taken is algebra, I don’t really understand any of this.

I’m hoping we go over some of this stuff this school year, though. I’m guessing we’ll over regression this year.

When did you start going over more advanced math like this? College?

Also, thanks for this. Love this blog. ðŸ™‚

Hylton, in my 11th grade math class we delved into statistics as one of the segments of the curriculum, but at most colleges you are required to take at least one such class. What parts didn’t you understand, though? I am trying to break this down so anyone can get it and knowing what’s getting lost would help me for future primers.

Actually, I understand everything in you post. Pretty simple stuff (though I’m sure it’s not as easy when you put it into practice), it’s the stuff that Pizza did like Bionary logistic regression that I didn’t understand.