Stats 203: Signal Detection Theory and the Strike Zone

After my previous post on plate discipline, I’ve gotten plenty of feedback on the article (for discussions, check here and here and in the comments of the original post), but mostly everyone wanted me to actually translate the math in the article into… well… English.  Signal detection theory isn’t something that’s commonly taught even in college stats classes (I teach the theory, but not the math, in my intro to psychology class) so I figured I’d give a quick whirl at explaining it here.
On a conceptual level, here’s the best explanation for signal detection theory I have.  It’s the one I use in my intro psych classes.  I assume you have a cell phone, because even 10-year-olds have cell phones now.  (If I could get away without one, I would throw mine into Lake Michigan… here’s to having a job where you’re on call 24 hours!)  Anyway, suppose you have it on vibrate and in your pocket, like I usually do.  Now, at any given moment, you are doing one of two things with the phone.  You are either in the process of answering the phone or you are not.  (In signal dection theory – SDT from here on out – this is called your response.)
If the phone rings (or starts vibrating), the sensible and proper thing to do is to answer the phone.  If it’s not ringing, then the sensible thing to do is to leave it in your pocket… unless you’re in the mood for a good game of bowling on your phone.  The phone ringing is the signal in the signal detection theory.  In theory, the signal produces the response.  Now, we have all had times when the phone rings, but we don’t hear/feel it.  In that case, the signal was present, but we did not respond.  Other times, you’ve sworn that you feel the cell phone buzzing, but you pull it out, and find out that it was just your leg tingling.  In both cases, you’ve done the wrong thing, but those are two separate events.  In one case, you miss a phone call, in the other, you look like a moron.  In an ideal world, you would only reach for your phone when it was ringing, and ignore it when it was not ringing.  However, we all make mistakes.  If you make a lot of mistakes (you are constantly checking your phone despite the fact that no one is calling you, yet somehow you miss the people who actually do call…) you clearly have no idea whether the signal is present or not.  You are not sensitive to the signal being present.  If you get it right every time, you are very sensitive to whether or not the phone is really ringing.
New situation.  Suppose that you are expecting an important phone call.  It’s one of those phone calls where you basically don’t do anything important while you’re waiting for it, because you don’t want to be in the middle of something when it comes.  And every time your leg tingles a little tiny bit, you grab for your phone.  You may pick up your phone 100 times without it actually ringing, but you are determined not to miss this call.  Finally, the call actually comes and you go back about your business.  But when that same tingle happens in your leg, you don’t go grabbing for your phone.  Why not?  Because you’ve changed your response bias back to normal.
Transposing this into baseball, the batter faces a similar situation.  The potential signal is the pitch that is currently hurtling toward him at 95 mph.  He must decide whether this pitch is hittable (a strike) or not (a ball).  Those just reading about this for the first time, I’m very aware that this is a flawed framework.  The batter must then decide whether or not he will produce the response, a swing or keep the bat on his shoulder.  If he swings, he might have made the right decision evidenced by the fact that he hit the ball.  Or,  he might miss.
Some interesting questions.  Are there some batters who are particularly good at telling whether a pitch will be a strike or not?  That is they take the hittable pitches and put them into play?  Are there some who seem to be up there guessing and swinging at everything?  Well, we can determine mathematically who is who.  SDT allows for the calculation of a pair of statistics, one measuring sensitivity to the signal and the other measuring ones response bias.
If all you want to know is a little better understanding of the theory, you can get off the bus here.  If you want to know about the math involved, prepare for some major nerdiness.

Let’s take another example.  Every time I play a specific tone in your ear (signal), I want you to smack my friend Larry upside the head (response).  I also have another tone which is different, in which you are not to take any action.  Well, how well you are able to hear the tone will depend on how sensitive your ears are in picking up the target tone.  How likely you are to smack my friend Larry will depend on whether you’re the sort of person who looks for any good excuse to smack someone or the sort of person who likes to be sure about what you heard before you act. 
Pretend I ran this scenario on you 100 times, 50 times I played the target (smacking) tone and 50 times I played the other tone.  On the target tone, you produce the appropriate response 42 times and fail to respond 8 times.  On the alternate tone, you smack Larry 15 times by “mistake” and properly keep your hand at your side 35 times.  In general, you’re doing pretty well, but are you’ve still made mistakes, and it looks like you seem to make the “slap someone when I shouldn’t” mistake (called a false alarm) more than the “refrain from slapping when I should slap” (called a missed signal). 
In baseball, if the pitch is in the strike zone (signal is present), the batter should swing and hit it (yes, I know, I know, not all pitches in the strike zone are actually hittable and sometimes it’s a good idea to just take a strike).  If the batter doesn’t swing, the pitch will go into the catcher’s glove, and he is then the proud owner of a new strike.  If the ball is outside the zone, he shouldn’t swing, because he probably won’t make good contact with it (I know, not always the case) and if he lets it go by, it will be a ball.  So, a called strike represents a time when the signal was present (ball was in the strike zone) but the batter did nothing (a missed signal).  A ball in play (BIP) is the proper response.  If the ball is outside the strike zone, the best thing to do is nothing at all and take a ball.  If the batter swings and misses, he has made a response, but it was a bad idea.  He shouldn’t have done that (false alarm).
Going back to our tone/slapping example, when the signal to slap was there, you got it right 42/50 times, or 84% of the time.  However, you slapped 15/50 times when you shouldn’t have, or 30% of the time.  So, you have an 84% hit rate and a 30% false alarm rate.  This is the first step in SDT calculations.  At what rate (percentage) do you make the response in situations where you should and how often do you make the response in situations when you shouldn’t.  In baseball, the hit rate is given by the formula BIP/ (BIP + called strikes).  The false alarm percentage is swinging strikes / (swinging + balls).
With me so far?  OK, time to get lost.  If you remember your Z-distribution from stats class, you may recall that different scores within the Z-distribution cut off different parts of the area of the curve.  A z-score of +1.96 always cuts the graph into two parts, on the left will be 97.5% of the area of the graph and on the left will be 2.5%.  A z-score of zero will have 50% to the left and 50% to the right.  We need to find the Z-score that cuts off an area equal to the hit rate (and eventually, the false alarm rate) to the left.  In other words, we want to find the z-score that has 84% of the area to the left and 16% to the right.  There are a few calculators out there on the web that will do this for you.  For our hit rate, the appropriate Z score is .994.  For the false alarm rate (30% to the left, 70% to the right) that number is -.524.  Sensitivity is calculated by the formula z(H) – z(FA).  In this case, it’s 1.618. 
The higher the sensitivity number the better, more sensitive of a person you are… at least in hearing the target sound.  The closer to zero, the more it appears that you are just guessing.  Think about it.  If you make the response 65% of the time when it’s called for and 65% when it’s not, those z-scores will be equal and the equation will be zero.  You weren’t really listening to distinguish between the target sound and the other sound.  65% of the time when you heard anything, you just smacked Larry.  Jerk.
Response bias is a little more complicated.  This tells you whether you are more likely to be a wait-and-see sort of person or a more impulsive sort of person.  In our phone example, if you are constantly grabbing for the phone at any little tingle in your leg, you are clearly using a different response bias than if you only grab for your phone when you feel your leg shaking all over the place from the force of the phone’s vibrations.  The person who grabs at the phone constantly will probably pick up a few extra phone calls from being so vigilant and not miss any, but he will probably spend a lot more time looking at a non-active phone.  The person who ignores his phone until he’s sure will probably not have to deal with the embarassment of pullng out his phone when it’s not needed, but he’ll probably miss a lot of calls.  The best situation is to balance it out; don’t be too eager to answer the phone or too passive.  Response bias is a measure of how eager or passive you are, and mathematically, the best place to be, the place where you minimize the number of errors you make is 1.00.
To calculate it, take that z-score that you got for the hit rate and plug it into the formula [e ^ (-hit rate z * hit rate z)] / (square root of 2 * pi)  More advanced statisticians will recognize this as a cousin to the equation for the normal curve.  You calculate the formula for false alarm z in much the same way.  This give you something called the phi statistic for each z score.  Response bias is false alarm phi / hit rate phi.
I honestly have no idea why that works mathematically.  However, I do know that it corrects for the fact that you may not have gotten equal numbers of times when it was appropriate to respond and not respond.  Response bias is not a direct ratio of how many called strikes to swinging strikes you take, but it is a function of what percentage of the time you do each.
If you are above 1.0 on response bias, you respond (in the case of baseball, you swing) too much.  By swinging more, you might put another ball in play.  By reaching for your phone more liberally, you might catch a call you otherwise would have missed.  But, you will make more mistakes in the direction of swinging and missing or staring at a non-ringing phone.  A common mistake that I hear is to say “Ah, so his response bias is 1.2, so for every extra ball in play, he’ll have 1.2 more swinging strikes.”  That’s not it.  However, the further away you get from 1.0, the more extra swinging strikes you will need to endure before you get one more ball in play.
On the flip side, if you are below 1.0, you take too many pitches.  You will take a few extra balls, but you will take them at the cost of many more added strikes.  At 1.0 exactly, it does not mean that you will not make any mistakes, only that you have mathematically minimized the number of errors.
I hope this made sense.  I fear it didn’t.  Signal detection theory is a hard one to fully explain in an straightforward way.

Advertisements

9 Responses to Stats 203: Signal Detection Theory and the Strike Zone

  1. Sean Smith says:

    Good piece. I understand it a lot better now. Plus I feel sorry for Larry, and wonder why he’s still your friend. Perhaps he has low self esteem?

  2. Guy says:

    If you haven’t seen it yet, Dan Fox has a piece on this today and references your work: http://www.baseballprospectus.com/article.php?articleid=6322. Using Gameday data, he’s able to create (for some players) two helpful metrics: 1) how often hitter swings at a ball (“Fish”) and 2) how often hitter lets a strike go by (“Eye”). I like this approach, as it focuses on the correctness of the swing/no-swing decision — which is what plate discipline means to me — and leaves aside both contact rate and result if hitter swings. He suggests he will try to create a single, combined metric (“fisheye”?) in the future.
    * *
    I agree that Ks and BBs are not the only conseqence of discipline. They can only tell us whether hitters are taking pitches at the right time, not if they are swinging at the right time. In any case, as more work is done with the Gameday data, we’ll clearly have better metrics.

  3. Guy says:

    Good explanation, PC. A question and a comment:
    For sensitivity, why do you make separate calculations for success and failure, as opposed to just a single success rate of (Ball + BIP)/Pitches? Does that change the ratings?
    * *
    I think there are two separate issues combined in the sensitivity metric: 1) is the correct decision made when pitcher throws a strike, and 2) is correct decision made when pitcher throws a ball? The first is extremely hard to determine given the data you have. It’s not necessarily true that a BIP is success, since about 67% of the time that will result in an out. Nor is a strike (swinging or called) necessarily a failure. All we can say for sure is that strike 3 is failure and a hit is success. Everything in between is murky.
    Balls are much more clear: in general, the hitter should take a pitch out of the strikezone. Yes, there may be some out-of-zone pitches that can be hit successfully (esp. if Vlad is hitting), but it’s a reasonable generalization to say that hitters should lay off pitches outside the zone. So if the data could identify every pitch outside the zone (as it eventually will), the swing/take proportion on those pitches would be a very useful measure. In the meantime, the difference between a hitter’s BB rate and strikeout-called rate can give us an approximate measure of how well they’re reacting to balls. This rewards hitters for taking ball 4 successfully, but punishes them to the extent they do this simply by taking lots of pitches (rather than actually identifying an out-of-zone pitch).

  4. Pizza Cutter says:

    On the question, if you read the second half of my paper in BTN, I did essentially use that formula for the stepwise regression I ran (I called it “good decision rate”, the only amendment to your formula I made was that I removed intentional balls.) It appeared to be pretty well correlated with sensitivity (and is easier to calculate), although I prefer the SDT methodology because it gives us the response bias as well, and can tell us whether a player is mathematically swinging too much or too little.
    As to what happens when the ball is in play, Neifi is the perfect example. He ranks highly on the sensitivity scale, but he’s a terrible hitter. In my study, I specifically wanted to isolate plate discipline unto itself. The measure should come with a disclaimer (once the ball leaves the bat, I claim no responsibility for what happens). It would be interesting though to re-score the measure with BIP resulting in outs as bad swings though. Maybe I’ll do that while I’m in L.A.
    On BB vs. K as measures of plate discipline (whether a ratio or just subtracting), I’m not sold. I don’t buy into the idea that plate discipline only leads to walks and strikeouts. A player can be vry disciplined and never walk (If a pitcher gives me a nice hanging curve every time, the most disciplined thing I can do is launch it into the second deck.) My initial thought when creating the measure was to create one that didn’t penalize players for not walking, but being contact guys.

  5. Pizza Cutter says:

    Ooooooooooooooo…. not only is BP giving me some love(!) but someone else is picking up the torch. His article has some promise. Casting aside concerns about whether GameDay is reliable, his ideas are sound, although I’d like to see another metric of “Holes” in the swing. How often does a batter swing and miss at a ball inside the strike zone? It’s sorta picked up by the contact measure. I could be greedy and start asking questions like what about inside pitches, outside pitches, etc.

  6. Guy says:

    “I could be greedy and start asking questions like what about inside pitches, outside pitches, etc”
    Absolutely. Once we know BA/SLG for balls hit in different sectors of the stike zone, you could rate batters based on the “hitability” of the pitches they let go by. You could even customize it once there’s enough data, measuring how good a job each hitter does at swinging at strikes in his own hitting zone, and penalizing him little for taking pitches which the data show he really can’t hit.
    Fox also suggests the possibility of taking the count into consideration, so that a hitter gets penalized more for taking a strike on a 2-strike count than on a 0- or 1-strike count.

  7. Pizza Cutter says:

    I sent Dan an e-mail with some of my suggestions. I plan on writing some follow up stuff that I’ve learned since the article’s initial publication, and I’ll include some of that in there. This is getting to be rather exciting. We’ll probably run into some problems of not enough data being available (he points out that the pitch tracking software is only available in some parks), but everything is there at least for a theoretical framework. When the data actually become available, it would just be a matter of plugging in the numbers.

  8. John Beamer says:

    Nice explanation PC. I now understand more about SDT! I liked Dan’s article … shame there isn’t more data. I’m not convinced this data will be *that* useful. Will we really be able to isolte star players? Looking at Fox’s chart obviosuly it is more ideal to be lower and to the right, but eyeballing the names it isn’t clear cut what whether the gradiation between names is especially signifincant

  9. Pizza Cutter says:

    John, “useful” is in the eye of the beholder. Suppose that plate discipline isn’t really correlated with any of the usual suspects that we look to for outcomes. Knowing that still tells us something about the game. Maybe plate discipline doesn’t really matter all that much. My list included Neifi Perez… hardly the most useful player out there.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: