On throwing to first, Part I

Who are the most dangerous baserunners in the league?  Why don’t we ask the people who see them up close, Major League pitchers.  I suppose that I could go and ask them myself, but that would take a lot of travel and stalking, neither of which I’m really up for now.  Thankfully, though, we already have a little bit of data on the subject.  It might be the most boring event in baseball, because it usually results in… well… nothing.  At least to the naked eye.  With a runner on first, the pitcher turns and throws a nice soft toss over to the first baseman, who occasionally feigns a tag on the runner, who is almost always safely back to first.  So, if the play is so useless, why does it still happen so often in a game?
In 2006, there were 16,654 throws, 16,349 of them by pitchers, to “check on” a runner at first who had an open second base in front of him.  The conventional wisdom is that pitchers do this in order to “control the running game” and although it rarely happens, to pick the runner off.  Sometimes, a runner draws a few throws before the first pitch is even made.  Sometimes, the first baseman doesn’t even bother to hold the runner.  The reason is fairly obvious: the pitcher wants to keep the runner close to the base so that he can’t get too long a lead and take off to second.
So, which runners drew the most throws last year?  Well, first let’s do some set up work.  I took the 2006 PBP database from Retrosheet and isolated all instances in which a runner was on first base, but second base was un-occupied.  This means that he could, theoretically, steal second base (about 44,000).  I counted how many throws to first were made by the pitcher (or catcher) in that case and whether the runner received any throws at all (coded as zero for no, and one for yes), and how many times he made a break for second (whether he was caught stealing, stole the base, or the batter fouled the ball off).   I eliminated instances where the runner ran on a 3-2 count with 2 outs.  Next, I identified how many events were “duplicates.”  This would be a case where a runner singles to start the inning, but the next three hitters strike out without him leaving first base.  This way, he gets credit for only one time on first, rather than three.
So, who gets the most throws his way to first base per time that he’s on (with a minimum of 20 appearances in this situation)
Top 10:

  1. Ryan Freel – 1.54
  2. Dave Roberts – 1.52
  3. B. J. Upton – 1.41
  4. Nook Logan – 1.30
  5. Juan Pierre  – 1.29
  6. Chris Duffy – 1.29
  7. Curtis Granderson – 1.26
  8. Jose Reyes – 1.20
  9. Chone Figgins -1.17
  10. Willy Tavares – 1.15

Reyes, Pierre, and Roberts were 1, 2, and 4 (respectively) in the NL in steals.  Figgins was 2nd in the AL.  The rest all have reputations as speesters.  For those interested, the bottom 10 includes such fleet-footed folk as Frank Thomas, David Ortiz, Ryan Howard, and Mike Piazza.
Does throwing to first keep these guys, or anyone, from running?  I ran a chi-square testing the association between whether a throw was made and whether the runner ran.  There was a significant association such that a runner was more likely to run if there had been a throw to first.  (More likely: a pitcher threw to first when it was occupied by a runner who was more likely to run.)  Runners only tried in 8.5% of the instances in which a throw had been made to first, but 21.8% of the time when the pitcher had thrown over.  So throwing over certainly isn’t a deterrent from running.  Nor, is making more than one throw.  Using a binary logit regression, I regressed whether or not the runner made an attempt on the number of throws made.  This explained about 1% of the variance.  In other words, throwing over more than once didn’t work to stop the runner from going.
But, does throwing over affect stolen base rates?  Yes.  When a runner makes an attempt for second (and the ball is not fouled off), he is successful 76.8% of the time if there has not been a throw and 65.4% if there has been.  So, throwing to first shaves 11.4% off the stolen base success rate.  Looks like throwing to first is a defensive maneuver that actually works.
Or does it?  What about the throw that gets away?  Does throwing to first affect the pitcher’s ability to pay attention to the batter and thus pitch to him effectively?  But then again, what about the pickoff?  What about the possibility that the runner might not “steal” third on a single or home on a double?  All told, is throwing to first an effective strategy? 
You’ll just have to read Part II.


Game Averaged PythagenPat

Perhaps the biggest problem with seasonal pythagorean win estimators is that they are heavily influenced by blowout games.  When a team faces the Royals and beat the tar out of an injured Runelvys Hernandez to the tune of 20 runs, that makes far less of a statement (in reality) about their intrinsic ability to win games against average competition than it would if they faced, say, the Blue Jays and beat them 6-4.  It cannot truthfully be said that there AREN’T teams that carry the “bully” personality, beating up on bad teams and getting consistently outplayed by good ones.
I was thinking about this problem of blowout games vs. close games about a year ago and then I realized that a good way to neutralize runs scored in “Garbage Time” would be to use PythagenPat itself to inforce the assertion that all games have a maximum value of ONE win.  Whether you win a game 25-1 or 6-1, you can only get that one W, and chances are, that 25-1 game was largely against inferior competition.
So here’s what I did:

  1. I calculated PythagenPat winning percentages for each individual game in baseball history using the gamelog files from Retrosheet.org.  In order to do this right you have to find a Patriot Exponent for each game (X = (RS + RA) ^ 0.285 for a single game), and then calculate each team’s W% in that game (W% = RS^X / (RS^X + RA^X)).
  2. I grouped the data by winning and losing team and gathered statistics on run scoring and allowing, and winning and losing Game PythagenPat W%s.
  3. I merged all of that information into one nice Excel table which shows RS and RA for each team in just their wins, just their losses, and in all games combined, as well as showing total Single-Game PythagenPat wins garnered in their wins, in their losses, and in all games.
  4. I calculated seasonal PythagenPat W% for just the wins, just the losses, and for the entire season for each team.
  5. I calculated Game-Averaged PythagenPat W% (the Single-Game wins divided by the number of games in which those wins were obtained) for the wins, the losses, and the total season.

Immediately, I observed that teams with a reputation for not being as good as their RS and RA were showing up as having a significantly weaker Game-Averaged PythagenPat (I’m going to dub this new statistic PythagenMatt so I don’t have to keep typing Game-Averaged all the time) records than Seasonal PythagenPat records.  I also noticed, however, that in general, all teams tended to pull toward the center (a .500 W%) when doing it this way.  This makes a certain amount of sense as a majority of run scoring happens on the winning side…the winning team tends to outscore the losing team by roughly double the rate if you look through history, and PythagenMatt will take a bite out of every one of those wins, while taking a bite out of the negative results of losses as well (thus moving teams toward the middle).
I considered abandoning the idea of PythagenMatt, but I decided first to see how PythagenPat and PythagenMatt related to actual W%.   I graphed PythagenPat (Seasonal) W% (Y axis) vs. Actual W% (X axis) first, and took a Linear Best Fit Line through the data.  Note here, I eliminated all teams that did not have at least 100 games played from the sample before doing this, because small samples of games cause wonky outliers occasionally give an incorrect sense of the utility of both PythagenPat and PythagenMatt.  This is what the PythagenPat distribution looks like:
Seasonal PythagenPat W%
(click to view full image)
Note the R^2 value (0.9127).  This lines up well with most studies done on the reliability of PythagenPat and all other W% estimators.  It represents an R value of 0.9553.
This is what you get when you look at PythagenMatt:
Game-Averaged PythagenPat
(clickable once again)
The tilt is wrong, obviously, thanks to the center-pulling bias, but notice how much more compact the scatterplot looks along the line of best fit.  Also notice the dramatically improved R^2 of 0.9585 (an R value of .979).
We can solve the center-pulling bias with a simple linear translation using the line of best fit obtained above (y = 0.6938x + 0.1531).  Remember that y is PythagenMatt, so if we want to use PythagenMatt to project Actual W%, we need to invert the equation with some simple algebra to get W% = (PyM – 0.1531) / 0.6938.  Linear translations have no effect at all on correlation, so we’ve removed all sources of bias andretained our stronger correlation.
What results is a pythagorean W% estimator that (a) has no bias by definition (b) removes the problem of blowout games from the equation which (c) causes a much stronger correlation with reality.
I’ll leave it up to the commenters to decide whether I’m off my rocker or whether I stumbled into a useful improvement on Pythagoras, but I found it interesting.

National League Team Projections

Here they are:
NL East:
Mets 87-75, Phils 82-80, Braves 80-82, Marlins 78-84, Nationals 67-95
Kind of a boring projection, it looks a lot like last year.  The top 3 teams all have issues with their pitching.  The Mets will win because the front line talent in their lineup is just too good.
NL Central:
Cards 86-76, Cubs 84-78, Astros 83-79, Brewers 81-81, Pirates 78-84, Reds 73-89
I didn’t think the Astros would look this good, Pettite is gone and I’m not factoring in a return for Roger just yet.  I’m still projecting a very strong defense to support the pitching staff.  Adam Everett is a great defender, but I may not have accounted for just how bad the outfield defense is going to be.  The Cardinals are good mainly because they have Pujols.  The Cubs spent a lot of money, and have greatly improved their team, but will come up a bit short.  They spent great player money on good players (Soriano and Ramirez)  and good player money on average (Lilly) or not so good players (DeRosa, Marquis).
NL West:
Padres 83-79, Giants 82-80, Dodgers 80-82, Diamonbacks 80-82, Rockies 75-87
Padres have very strong pitching (both starting and relief) and a great pitcher’s park.  Their defense suffers a bit in left field with Sledge and Cruz replacing Dave Roberts, but the new platoon will certainly outhit Roberts.  They aren’t a great team, have a few top players getting long in the tooth, but are the best bet in a weak division.
Barry Bonds, whether you like it or not, is going to break the all time homerun record this year, and he’s going to do it with a team that has just enough to make one more playoff run.  The Dodgers are woefully short on power.  The Diamondbacks are potentially a very good team, but they really can’t count on Randy Johnson meeting his solid CHONE projection (198 innings, 3.75 ERA)
The projections don’t seem to have much variance.  I don’t really think we’re going to see 10 teams with 5 wins of each other.  What the final team records are depends on this:
True Talent + Luck + Injuries
We can’t predict luck, we can only crudely project injuries (like assuming Miguel Tejada will have more plate appearances than JD Drew).  For true talent, we may or may not project that correctly, though comparing the results of projection systems to a theoretical maximum, we seem to be doing fairly well.
For the NL East, my predictions have a standard deviation of 7.4 wins for those 5 teams.  I constructed a crude simulator that adds the element of luck and gives me a final record.  Here’s how it works:  Say a team is projected to have a .535 winning percentage.  So flip a coin (well, actually a random number generator) 162 times and see what record you come up with.  That team might play as expected, might get lucky and win 96 games, and might only win 74.  Repeating this exercise for the 5 teams in the NL East, in a typical sim I usually get an observed standard deviation of around 10 wins.
In plain English, I think 1 or two teams are going to win 90 games in the NL this year.  I don’t know which, but the Mets and Cardinals are where the smart money will be.

General Theories on the Study of Baseball

Greetings from the last man to join the “party” here at Statistically Speaking.  I apologize for the delay in my arrival, but you all can be certain I will be active enough to make your head spin once I get fully settled into the MVN blog circuit.   I have a rare gift for somehow being loud even thoguh sound is impossible in a communication medium that is limited to text and color, so that may or may not be a good thing for some of the readership (heh).  I was so loud that although I’d never written an article for the Hardball Times, my works in Sabermetrics were on their radar screen over there and they recommended me for this “job.”   I was as surprised as anyone.
I’m going to start with a small warning.  I (grudgingly) play a little roto, but my posts will not be particularly geared toward a specifically “Fantasy Baseball” mindset as it does not hold my interest to talk about projecting RBI totals, or E counts or Holds or any of the other woefully inadequate measures of performance used in even the most enlightened of leagues.  I am interested in learning about this object of our unique obsession purely from the perspective of a dispassionate observer (whenever possible) assimilating all of the available data and drawing what conclusions he can defend.
I’m going to try to focus my posts here on those elements of my research – and the research of other big names in the field who have caught my eye – that can fully explained without the use of higher-level math (anything beyond the normal scope of a high school math program), but if some methodology is used that requires an advanced mathematical topic, I’ll say so explicitly in the title of that entry.
For those of you who aren’t familiar with me, my name (as implied in my user name) is Matthew Souders, although I generally go by SABR Matt online.  I’ve been following baseball since 1992 (I was 10 years old when I saw my first big league game and I was hooked immediately).  I lived in Seattle at that time and the Mariners just happened to be developing into an interesting team to root for at that moment in their history, packing exciting young players like Randy Johnson, Ken Griffey Jr, Jay Buhner, Edgar Martinez, Tino Martinez, Chris Bosio and Bret Boone and inhereting the mean streak of their new manager (Lou Piniella).  I’ve been a rabid Ms fan ever since, which make for some interesting conversations with Sean, who is unfortunately an Angels fan (seek help Sean – seriously, you folks rooting for the Angels must have some kind of strange psychological problem).
I got interested in statistics thanks to looking at baseball cards – my first big league game was a give-away day, a pack of 25 Mariner baseball cards in a nice little collector’s book, which lead me to a short-lived card collecting phase.  By the time the 1993 season was over, I was copying statistics from the internet (yes, I was online way back then!) and simulating seasons on a hilariously cheesy (by modern standards) baseball platofrm for my Super Nintendo (LOL).  By the time the Mariners were celebrating in a delerious mass in October of ’95, I had my first baseball encyclopedia and was reading voraciously about the history of baseball.  By the time I headed off to college (in 2000), I was building really simple player evaluation methods (I laugh at the first analysis I ran now when I look back on it).
This is now my seventh season as a sabermetrician, each season I learn a little more and each season I get a little further from accomplishing my goals (the more you learn about baseball, the more you realize you don’t know anything)!  After that long premable (sorry, I tend to be a bit wordy when making introductions), I want to talk about my approach to problem solving as it applies to baseball and about what I view as some serious deficiencies with some other approaches.
The Scientific Method
The study of baseball is not unlike the study of meteorology (my second love – I am finally close to earning a degree in Meteorology and moving on to grad school) in that both require a scientific approach and deal with principles of chaos and random selection.  I’ve used Monte Carlo simulations, for example, in the analysis of wind speed fluctuations over time AND to generate Win Probability Added statistics for baseball games in the play by play era.
If someone describes himself as a Sabermetrician and that someone does not have a primary career path linked to the sciences, you should immediately have doubts about his or her ability to think in the manner of a scientist and therefore about his relevence.  There is exactly one exception to this point, and his name is Bill James.  He only stands out as an exception because he has an unusual gift for intuition regarding baseball that leads him to scientifically incorrect methods that produce bizarrely accurate results.
The scientific approach starts with an observation.  Which means you can’t be a sabermetrician if you don’t spend a LOT of time looking at the data.  It’s a really fascinating history if you take the time to study it and look for patterns, but a lot of people are tragically short on patience and skip this step, choosing instead to focus only on the most recent seasons and the data that’s right in front of them.  Doing this can lead to erroneous conclusions (the most famous of which was the initially bold statement by Voros McCracken that pitchers had ZERO control over batted balls, which he has since realized was fatally inaccurate).
When I come up with an idea about how to answer a question that’s been nagging me, the first thing I always do is define specifically what my problem is.
Do baserunning habits change based on the run scoring environment?
Then I propose an answer.
While the frequency of positive and negative events changes, baserunning aggression does not change because an equillibrium has been found through years of professional play that balances risk and reward and the overall abilities of baseball players have not changed much over the years.
Then I define exactly what kind of data I need to examine to test my hypothesis.
In this case, a simple test was devised using play by play era information about the likelihood that a lone runner on first (with no one out) would advance to third on a single.  The simplicity of the starting base/out state eliminates the interactions between multiple runners who might interfere with each other, and removes the number of outs, which is another variable that can impact baserunning, allowing me to focus on aggression and only aggression.
Then I examine the data and make a decision about my first guess.
This small study revealed that I was incorrect about baserunning aggression being independent of run scoring environments.  In fact, there is an abnormally strong correlation (roughly 0.94!) between the rato of RS/G to the alltime average RS/G and the ratio of the odds that a runner will take third base on a single to the all time average odds of that result.  This suggests that baserunners take more risks when a single run might decide a game.  Interestingly, however, the odds of being thrown out attempting third base were only slightly higher in lower run scoring environments, which suggests that the pressure of a low run-scoring environment forces the development of players who are good baserunners and improves that aspect of the game.
Correlations, Confidence, and Lines of Best Fit
Since sabermetrics is inherently a statistical pursuit, almost everything done in the field tends to come down to statistical tests of significance and statistical methods for defining empirical relationships in the data.  I believe there is a severe over-reliance on correlation in baseball today.  A rather frustrating (to me) example of correlation run amok is this new concept that the rate at which pitchers allow HR is defined entirely by the number of outfield flies they surrender.  We didn’t learn our lesson from the past overconfidence of McCracken?  Don’t get me wrong.  I’m a strong believer in the proper utilization of DIPS theory to the analysis of both pitchers and team defenses.  My own methods rely heavily on DIPS to separate pitching from fielding.  It no more accurate, however, to claim that pitchers have no control over their HR/F rate than it is to say that they have no control over their BABIP.
The reason things like this frustrate me is simple: rather than actually using the data they have, many sabermetricians continue to fall into the trap of believing that baseball analysis can be simplified down to a series of correlations that will explain everything.  The problem gets compounded when they stick to their guns in the face of examples that disprove theories, or in the face of straight logic.  How can it be true that all outfield flies are essentially created equally?  Do we really expect that there aren’t some pitchers with the ability to induce poor contact (that results in a series of easy fly balls), and some pitchers who get hammered whenever they leave a ball up enough to be hit in the air?
I said above that a lot of time must be given to studying the data, but you’ve got to stop and think about what you’re saying when you make assumptions or strongly worded conclusions based on that data.  I’ve been guilty all too many times of defending bad ideas in my youth, a practice I’ve worked hard to curtail.  I only wish I’d started thinking like a scientist sooner – I might be further along now if I had.
Creating Metrics
Do you know how many times I find someone noodling with the data and trying to come up with a statistic withuot the foggiest idea what it is they’re trying to measure?  The easiest way to improve someone else’s metric is to ask them “What are you trying to show here?”  Most of the time, home cooked statistics have no direct connection with a real world event, and no statistic can or should be used unless (a) you are actually measuring something real and tangible and (b) you know exactly what that something tangible is!
OPS is evil.  It measures nothing at all.  It’s junk science that happens to kinda-sorta give the right general impression while being easy to calculate.  The mixed denominators and incorrect proportions in OPS gnaw at me like nothing else.  ERA is not a good measure of pitching skill (too many things embedded within that are not under the control of the pitcher) but it IS a good statistic as long as you know what it means.  The original DIPS ERA is a bad statistic.  It measures nothing.  It’s just the sum of three componants of a correlation and bears no resemblence to real world data.
The biggest offender is the traditional park factor.  A little unit analysis reveals immediately one of the biggest problems with park factors.  They are unitless!  A park factor is the ratio between the runs a team and its’ opponents score at home and the runs they score on the road.  In otherwords it’s (R / R).  That makes no sense whatsoever.  Ballparks do not have a proportional (scalar) impact on run scoring.  If a league scored runs at double the rate they are scored today, do we really think Coors Field would be twice as bad (relative to the league) for pitchers?  A park factor is attempting to measure the park’s real influence on scoring.  The more you play at any given park, the more it will influence you.  A proper analysis of parks should begin not with a ratio factor but an additive factor with units like runs per game.
Stop and think before you run around mixing numbers together and rating players.  What are you trying to measure and how does your metric accomplish said measurement?
I’ll save further comments for later articles, but hopefully this will give you a sense for what to expect from me in the future.

A second look at Pythagorean win estimators

Over the past few days on the SABR statistical analysis list-serv, there’s been a bit of chatter about the Pythagorean win estimator.  My guess is that most of the folks reading this post are familiar with the formula, but for the benefit of those who may not be, the formula was created by Bill James, in an attempt to model how many games a team “should” have won, based on how many runs they scored and how many they allowed.  The original formula read: Winning % = RS^2 / (RS^2 + RA^2).  It’s eerie resemblance to the Pythagorean theorem in geometry (the one you hated in high school) gave it a name.  Several different modifications have been suggested in the intervening years, including changing the exponent to 1.82 (some say 1.81), and two “dynamic exponent” formulas (one by Clay Davenport, the other by David Smyth) which have a formula to calculate the proper exponent, which is then substituted in on a case-by-case basis.

Before coming on board here at MVN, I had meditated briefly on these formulae and their merits relative to each other, with the Smyth formula coming out the winner, if only by a tiny margin.  In evaluating any estimator, there are two important questions to answer: how closely does it predict the observed values (in this case, the team’s actual winning percentages) and are the mistakes (in statistics-speak, residuals) in some way biased.  In my original post, I found that the residuals were essentially centered around zero (very good!) and the standard deviation of the residuals for all four of the formulae was somewhere in the neighborhood of 4.3 wins.  Additionally, the residuals all showed a minimal amount of skew.

There are a few more residual diagnostics to run to check for any additional biases in the estimators.  For example, if the estimators over-estimate the winning percentages of good teams, but under-estimate the winning percentages of bad teams (or vice versa, for that matter), then there is a built-in bias to the estimator.  Along with being accurate, no matter the team quality, an estimator should work no matter how many games were played in the season, and how many runs the team scored and/or gave up.

I used the Lahman database for this one, and selected out all teams who played at least 100 games.  This gave me a database of 2370 team-seasons to work with.   I calculated the projected winning percentages for the Pythagorean, Exp 1.82, Davenport, and Smyth formulae, and then subtracted each of them from the actual winning percentage to get the residual for each. 

I calculated (well OK, my computer calculated them) correlation coefficients for the residuals of each formula and the following variables: games played, runs scored per game, runs allowed per game, wins, and actual winning percentage.  None of the formulae were correlated with games played.  There were small correlations observed between the original Pythagorean formula and runs scored per game (.106) and runs allowed (-.071).  No other such correlations were observed.  Those correlation values were significant, although are rather small in magnitude.

The biggest finding in my analyses was the fact that the residuals from the Exp 1.82 formula, Davenport, and Smyth formulae were all correlated with wins and winning percentage.  The Exp 1.82 formula, likely the most-used and reported formula, showed correlation coefficients of -.346 and -.380, respectively.   The Davenport (-.253 and -.269) and Smyth (-.256 and -.273) correlation coefficients were lower, although still notable.  The original Pythagorean formula residuals had much lower correlations of -.095 and -.101.  These findings suggest that Exp 1.82, Davenport, and Smyth all have a bias such that better teams are more likely to have their estimates in the formulas be lower than their actual winning percentage.  Poor teams are more likely to have their estimates be higher than their actual winning percentage.

If the previous sentence made your head spin, here it is in English with numbers made up on the spot for pure illustrative purposes:  Let’s say that a team won 94 games in the year in question.  The Exp 1.82, Davenport, and Smyth formulas are more likely to be wrong in the direction of saying that the team should have won fewer games (91).  A poor team that won 61 games is more likely to have their projection be much higher (perhaps 65).

So what?  Since these formulas became popular, the differences between the projections and the actual results have been taken to indicators of such things as manager ability.  (A less-than-proper use of the formula in my opinion, but it is the common application.)  If a team wins more than its projection, the manager must be doing a good job, because he’s maximizing runs at the proper time to win games.  If a team wins fewer than projected, the manager might be fired.  If the formulas are biased though, some of the credit and blame being passed along due to them may be a statistical artifact.  The bias built into the formula would make a manager from a last-place team look like he is underperforming, even as he now has to answer to the GM on having just lost 101 games.  On the other hand, the manager on the successful team is more likely to look like he is over-performing and maybe will get a nice contract extension and raise out of it.  Managers on bad teams look even worse, and managers on good teams look even better.

It looks like the Pythagorean estimators need a little bit of tinkering.  They don’t need to be thrown out.  In fact, to the contrary, they perform exceptionally well overall.  The bias I identified is going to be most noticeable at the extremes, which is a common problem in estimators of this type.  Analysts just need to be a little more careful in interpreting the results in those cases.

Remember: Even the Scarecrow didn’t get the Pythagorean theorem exactly right on the first try.

American League Final Standings, a sneak preview

This is my second year of making predictions on the web. Last year I correctly called all 6 Division winners (No, I didn’t pick the Tigers for wild card), and also predicted the St. Louis Cardinals to be the best team in baseball. If I had added “in the postseason” instead of “regular season”, I would have nailed it. My predictions for last season can be found here.
The predictions are based on two things, projected hitting, pitching, and defensive stats from the CHONE system (posted on my blog 1-6-07), and my totally subjective decision of who is going to be allowed to play and in what role. I try and determine who the team’s management will give playing time to, so Minnesota has Ramon Ortiz and Sidney Ponson, the gopher brothers, in the rotation instead of a good crew of younger and better pitchers.
This same approach was also done recently on the Hardball Times by John Beamer, using the THT projected stats. My results are pretty similar, but there are a few things the two systems see differently. Baseball Prospectus has similar predictions for their PECOTA system, and if you are a subscriber I recommend checking that out as well.
AL East:
Yankees 95-67, Red Sox 93-69, Blue Jays 80-82, Orioles 75-87, Devil Rays 73-89
AL Central:
Indians 90-72, Twins 89-73, White Sox 83-79, Tigers 82-80, Royals 66-96
AL West:
Angels 88-74, A’s 84-78, Mariners 79-83, Rangers 74-88
The Yankees and Red Sox are favorites for another division and wild card finish, just like pretty much every year except last. I thought the Blue Jays would have a better chance, but after Halladay and Burnett the pitching just is not there.
The Indians have a powerful and young offense, they should top 200 homers, their starting pitching is good enough to keep them in games, and it seems every other year they have an OK bullpen. The Indians outscored the opposition by almost 80 runs last year. Its hard to do that and have a losing record, and I’m betting they don’t do it again. The Twins are a talented team, how quickly they pull together and give the Indians a challenge depends on how quickly they decide to put their best rotation out there.
I’m glad to see the Angels return to their rightful place. They have done everything in their power to keep the West close, like not getting a big bat in the offseason, multiple starting players breaking bones or needing surgery (Rivera, Figgins, McPherson), and starting pitchers not quite being ready for the start of the season (hope Weaver and Colon get healthy and stay that way). The Angels are still the favorites because they have tremendous depth and a farm system that is on the verge of producing some real major league position players, not just gather awards from Baseball America. I am in no way objective when it comes to the Angels, that’s my team and I want to see them run away from this division, but when the numbers say they don’t have it, I report that too. Last year’s prediction had the A’s winning the west, which they did, but this year the when the numbers talk, I like what they are saying.

Some Fantasy Tips

This year I’m in two fantasy leagues, one is set to draft tonight and the other is coming up this weekend.  I’ll share a few things that have worked over the years, and hope nobody from my leagues reads this until after draft day.
1.  Get a catcher who doesn’t catch.
Its hard to find a good hitting catcher, and what makes things even worse is that most catchers play only 5 times or less, making it hard to pile up Runs and RBI. The viability of this strategy really depends on what eligibility rules your league allows.  In one league you are eligible for a position if youhad played 5 games the previous season, or one game this season.  People in my league have often waited until Phil Nevin made an emergency appearance behind the plate.  Last year Chone Figgins was supposed to be the emergency catcher if anything happened to the first 2.  That would have been a nice bonus getting so many steals from my backstop, but he was never needed back there.   In 2005 I was able to pick up Chris Shelton, and when he took over 1B midway through the year, he gave me great catcher production.  The best part was he was playing 1st base and could stay in the lineup every day.  Last year Josh Willingham was my target, and he put up a great year, but my brother had the same idea and beat me to him.  For 2007, Mike Piazza looks like a good choice, though you won’t surprise anyone by bidding on him.  Willingham only caught 2 games last year, so he’s not eligible, but perhaps he or Craig Wilson could get in a game sometime this year?  If Willingham catches a game 3 weeks into the year, you’ll wish you had him in the OF to start the season.
2. Keep regression to the mean in mind for pickups as well.
Every year, some owner in your league is going to panic and release a good player going through a slump.  Be there with a waiver claim.  Pizza Cutter did a good job regression to the mean in his last post, but its important enough to mention it again.  There will be players who get off to great starts despite a history of poor hitting.  Avoid them.  When will these players “regress”?  The correct answer is you should expect them to play at their true talent level, which likely isn’t any different after a hot April than it was on draft day.  My smartass answer is that that player will regress as soon as you put in the waiver claim and activate him on your team.  In other words, Neifi Perez will not play for my fantasy team, even if he hits 20 homeruns in his first 20 at bats.  In the 1 in a trillion chance that actually happens, expect Neifi to finish the year with 22 homers.
3. Matchups are important 
Is your hitter scheduled for a road trip to Colorado?  You know what to do, thats easy.  I’ve found you can leverage your pitching matchups more than your hitters though.  If you have a decent number of reserve spots, try and get 3-4 more starters than you need.  If you have Johan Santana, obviously he plays every week, but for marginal pitchers look at matchups as much as the pitcher’s actual ability.  Is Jeremy Bonderman a better pitcher than David Bush?  Yes he is, but if Bonderman is facing the Red Sox and Bush taking on the Pirates, put David in.
4.  Try a gimmick if you’re desperate.
Lets say you are in a keeper 5×5 league, you have no good starting pitchers, and it seems almost all of the projected top 10 starters are already on somebody’s team.  Maybe you can try an all-reliever team.  You don’t need Mariano Rivera or BJ Ryan, just wait around and get the cheaper closers, the Bob Wickmans and Todd Joneses.  If everyone else is trying for wins and saves, they’ll go with 5 or 6 starters out of 9 pitchers.  It doesn’t matter if they get the best closers, you’ll do fine in saves.  Some of your marginal closers are bound to lose their jobs, but thats OK, you have room on your roster for their backups too!  Round out your staff with Scot Shields and Cla Meredith.  They won’t get many saves but will keep your ERA and Ratio low.  Because relievers generally have better rate stats than starters,  Your ERA and Ratio should stay low while your opponents take their lumps with their back of the rotation starters.  With this strategy (best used in an auction league) you should be able to take first place or near it in Saves, ERA, and Ratio while finishing last in wins and strikeouts.  In addition, you can do this while spending almost nothing on your pitching, giving you a big leg up in getting the best hitters.  This strategy may not be allowed in some leagues depending on what custom rules you set (such as a minimum number of innings or starts needed).  It is most effective if you are the only one in your league trying it.
Good luck to all in Fantasy Baseball! (except those of you who play me).