Is WAR the new RBI?

Let’s face it, it’s the only reason we remember Tony Batista.

In 2004, in his final full season as a major-leaguer, T-Bats drove in 110 runs for the Montreal Expos, despite a putrid .272 OBP. Although he was, arguably, the worst everyday player in the majors in ’04, he was hardly the worst player to ever drive in 100 runs (see Ruben Sierra, 1993), nor was 110 the highest RBI total ever amassed by a replacement-level player (see Joe Carter, 1990). However, for some reason, Tony Batista became a sabremetric icon, our favorite cause celebre when we rage, rage against the RBI.

You’ve heard it before. RBIs are just neat round numbers and context. Given the opportunity to hit behind a couple of on-base machines like Brad Wilkerson and Jose Vidro, anybody could drive in 100 runs. But just because a blind squirrel gets a nut every once in awhile, that doesn’t mean he should bat cleanup.

In the wake of T-Bats glorious season, the sabremetric cause was moving from its grassroots mail-order infancy to full-blown mainstream phenomenon, buoyed by New York Times Bestsellers, championship GMs, and senior columnists. When an broadcaster spouted out a flurry of “traditionals” – batting average, homers, RBI, wins, saves – to make his point, basements full of fantasy addicts looked up from their digital almanacs and replied in unison: Bleh.

Give me OBP, give me OPS, give me IPO, give me WPA, give me K/BB; just don’t give me RBI! If you’re going to give me RBI, Mr. McCarver, I’d rather you gave me nothing.

And then came WAR.

The concept was ratified by the sabremetric Godfather, Bill James, who’d created Win Shares according to a similar ideology in 2002. It was a neoclassical economist’s wet dream, like baseball GDP: an elegant equation which accounted for all the sport’s diverse variables and yielded a single number roughly reducible to the oldest and most hallowed statistic of them all, the win. Hallelujah.

Wins Above Replacement is a beautiful idea. Euclidean grace in a quantum world. A simple answer, not only for age-old baseball conundrums like “Mantle or DiMaggio?”, but also a formula for unprecedented comparisons like “Rickey Henderson v. Johnny Bench” and “Roy Halladay v. Alex Rodriguez“.

There’s only one problem. It doesn’t work.

At least, not yet.  Not in the fantastically straight-forward way we try to use it.  The idea is so good, so clarifying – like democracy or the rational market – that we really, really want it to work, we’re willing to suspend our disbelief just a little while longer in the hope that it might. Because it’d be so great to know with statistical certainty that Albert Pujols was worth $200 Million, that we really couldn’t win that pennant without Andy Pettitte, that Jacoby Ellsbury is definitely the AL MVP, and that Ben Zobrist is exactly 9.3% better than Adrian Gonzalez.  Darn that dream.

The cruel irony, the I-could’ve-had-Sean-Doolittle-and-all-I-got-was-stupid-Barry-Zito irony, is that the problem with WAR is the same as the problem with RBI.  It frequently measures context as much as performance.  Especially when used to evaluate single seasons, it doesn’t sufficiently account for the inevitable variations in opportunity and environment.

What if Granderson played behind Ian Kennedy and Daniel Hudson?: UZR & Flyball Rates

A few weeks back I critiqued Steve Berthiaume’s analysis of Curtis Granderson’s defense by looking at some inconsistencies in the way Ultimate Zone Rating (the defensive metric associated with Fangraph’s WAR) assesses outfielders. Mark Simon of the ESPN Stats & Info Blog followed up with a very interesting review of specific plays which have adversely effected Granderson’s low ratings in 2011.  While Simon isn’t looking at UZR specifically, he does point out that most defensive metrics do not account for positioning and that half a dozen plays can cause sizable shifts in the aggregate numbers when we’re dealing with less than a season’s worth of data.

I’m not the only one who’s noticed that UZR frequently yields suspicious results in small samples, at Fenway, and when several good outfielders are playing alongside one another.  I do, however, want to expand upon my claim that outfield UZR is substantively effected by flyball rates.

In the Granderson article I pointed out that the teams in each league which rank highest in outfield UZR for 2011 – Boston and Arizona – also ranked #1 in their league in FB%.  This remains true.  However, this is obviously not sufficient proof of correlation, for a couple reasons.  Not only is there a high possibility of coincidence in any single example, but both the D-Backs and Red Sox feature several outfielders traditionally regarded highly by both sabremetricians and scouts.  For anybody who’s watched them consistently, it would be pretty hard to argue that the trio of Gerardo Parra, Chris Young, and Justin Upton isn’t among the best in the major leagues, no matter who’s on the mound.

So, I looked back at all teams that finished at the extremes of the flyball scale since 2003.  I do not claim that there is a perfect or, in the parlance of economics, a “strong” correlation.  That is, a team with a 35% flyball rate wouldn’t have a dramatic disadvantage in OF UZR compared to one at 38%.  There is, however, significant evidence that pitching staffs with extreme batted ball tendencies can dramatically effect their outfielders UZR numbers.  (These extremes I defined at upward of 40% at the high end and below 33% at the low end.)

Average OF UZR for FB% > 40.0: 10.1

Average OF UZR for FB% < 33.0: -10.6

Of the sixteen teams at the high end of the range, five finished #1 in their league in OF UZR.  Of the 21 teams at the low-end, only five finished with a UZR north of zero.

From these I would point to some interesting pieces of anecdotal evidence:

The 2010 Giants and their 40.7 FB% led the majors in outfield UZR by a substantial margin (40.7 to 31.6), despite the fact that they gave more than 1100 innings to Pat Burrell and Aubrey Huff, lead-footed former DHs who nonetheless somehow finished with positive UZRs for the season.

The 2007 Cubs had an exceptional 44.3 OF UZR in a season where they handed most of the innings to Alfonso Soriano, Jacque Jones, and Cliff Floyd, all of whom substantially outperformed their career numbers with some help from a Chicago staff that sent 40.6% of batted balls in their direction.

On the other side, the ’05 Cardinals, despite featuring some premier outfield talent in Jim Edmonds, Larry Walker, Reggie Sanders, and So Taguchi, finished with a -6.1 OF UZR, thanks to a pitching staff that put only 29.7% of batted balls in the air.

The difference between 30% and 40% can easily be several hundred plays, so when you consider Simon’s point about the significance of even a handful of mistakes in a few months of play, you can see what kind of advantage those extra opportunities provide.

This is not to say that UZR is useless, just that is unreliable in single season increments and that unreliability is passed on to WAR, which we habitually use/misuse when discussing single seasons and partial seasons.

I can’t play several positions. (or “The Adam Dunn Effect”)

WAR’s move to the mainstream is deeply tied to the rising popularity of FanGraphs.  One of the first of it’s “unlikely results” to spark considerable conversation was Ben Zobrist leading AL batters (and finishing behind only Albert Pujols and Zack Greinke overall) in 2009.  Zobrist had a breakout season which was impressive by any measure, but his WAR was given a major boost by his defense (only Franklin Gutierrez and Nyjer Morgan got a greater advantage from fielding).

On one level, this seemed legit.  Zobrist appeared at every position on the diamond in ’09 and over the years has proven himself to be an above-average defender at second base and in right field.  Managers have long lauded the value of versatility and lavished praise on players like Zobrist, Mark DeRosa, and Placido Polanco, who play several key positions well and also swing decent sticks.  Zobrist’s looked like evidence of their wisdom.

But while it isn’t much of a stretch to believe that Zobrist’s glove was worth a couple wins to the Rays in 2009, try selling this: According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.

There are two types of utilitymen, those who are given the job because they play many positions well and those who are given it because they play no position well.  As yet, WAR struggles to distinguish between the two.  It reads Houston’s inability to decide where Lee hurts them least as evidence of Lee’s versatility.  It suggests that Howie Kendrick‘s defense at second base has gone from average to exceptional since Mike Scioscia started giving him more starts in left field.

UZR results get weirder the smaller the sample gets.  The utility player may log a thousand innings in total, thus suggesting his UZR is somewhat more reliable, but what actually happens is that several hyper-unreliable samples of a few hundred innings or less are bundled together like toxic mortgages and rated AAA.

WAR Hates Sluggers

One of the things which advanced stats should be applauded for is the extent to which they’ve decreased the fetishizing of the homerun and raised awareness of all-around contributions.  Jonah Keri and Dave Dameshek debated the relative merits of Willie Stargell and Tim Raines this week, largely based on the fact they had identical career WAR totals.  Dustin Pedroia has a real shot at his second MVP, despite the fact that his “traditionals” (.309 AVG, 85 R, 18 HR, 74 RBI, 25 SB) are basically the same as Melky Cabrera‘s (.303, 83, 17, 79, 17).

However, one can’t help but notice that a cross-section of the most intimidating hitters in the game are treated with relative disdain by the metric.  It doesn’t like them because they play first base or left field (or DH), which aren’t scarcity positions.  It doesn’t like that they are fat and slow.

While I understand that everybody would love to have Chase Utley or Troy Tulowitzki, a middle-of-the-order hitter who makes big contributions in the field and on the basepaths, as well as at the plate, the fact remains, building a lineup without a slugger (or two) is like building a mall with seven Sunglass Huts and no department stores.  A few sluggers are swift, slender middle-infielders.  Most of them aren’t.  To paraphrase Reggie, there are lots of drinks and precious few straws.  If you get left without one, no amount of Range Factor, WHIP, or baserunning acumen can save your season.  Just ask the Padres, or the Mariners.

Yet, we misuse WAR to insist that it’s better to have Ian Kinsler than Miguel Cabrera or that Peter Bourjos is as valuable as Prince Fielder or Mark Teixeira.

We’ve struggled to understand and statistically represent the effect hitters have on one another.  Would Nyjer Morgan be hitting .306 if he wasn’t batting directly in front of Ryan Braun and Prince Fielder?  (WAR suggests, by the way, that Morgan has been more valuable on a per game basis than Fielder.)  Morgan is taking free passes this season at only about half his career rate.  Has he become less patient?  (On the other side of things, Adrian Gonzalez‘s career OPS is fifty points higher when the pitcher is throwing from the stretch.  He’s enjoyed that situation in 52% of his plate appearances in 2011.)

While I admit the difficulty of building a model that accounts for the effect a pairing like Braun/Fielder or Pujols/Holliday has on the rest of the lineup, this is one area in which I find the conventional wisdom to be irrefutable.  While I applaud WAR (and other metrics) for aiding in our appreciation of defense and baserunning, it’s beyond asinine to conclude that Ellsbury is twice as valuable as Fielder.  Too often WAR is used as a means of comparing oranges to apples.  One of the things that makes baseball great is the diversity of the fruit basket.  WAR give incredible weight to scarcity of shortstops, but no weight to the scarcity of pitcher-intimidating, strategy-altering cleanup hitters, which I see as a form of reverse discrimination.

These are not the last of the problems.  WAR evaluates catching using only the ability to control the running game.  There is abundant evidence that certain park factors have not been sufficiently accounted for.  I’m not arguing, however, that WAR should be completely discounted.  As yet, it is probably as good a singular statistic as is widely available.  But, WAR is not a debate-ending statistic, especially for single seasons.  Even WAR’s adherents, like Dave Cameron, generally admit the margin of error is at least 15%.  When we stubbornly suggest that 0.5 WAR means anything, we are grossly exaggerating the statistic’s accuracy, even according to its creators.  It remains true that any reasoned discussion of an individual’s contributions still requires analysis of the various components that go into WAR, as well as several that don’t, and, as such, subjectivity reigns.

Statistical elegance is elusive.  Variables get short shrift or go unaccounted for entirely.  Results yield unintended consequences.  Misunderstood data is misrepresented and polemicized.  In the words of Tolstoy: WAR makes fools of us all.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

EDITOR UPDATE (9/7/11): For Hippeaux’s reply, please click here. It’s certainly worth reading and addresses many of the thoughts, issues, concerns, debates mentioned in the comments below. Brien, too, has a follow-up that you might want to check out. Thanks.  -J

117 thoughts on “Is WAR the new RBI?

  1. BrienJackson

    Excellent post Hippeaux, and I totally agree. WAR is a great concept and deserves continued work, but at this point there's too many contentious variables that go into it to make it anything close to reliable.

    • Thanks, Brien.

    • John

      Thank you for this article, Hippeaux. I am getting sick of all the articles utilizing WAR almost exclusively to convince readers of a player's worthiness regarding MVP, Cy Young, etc. It seems as though common sense is not allowed when considering a player's overall value. I am no statistical guru by any means, but I have wondered about the validity of the "replacement player" concept. Wouldn't a replacement player typically come up from Triple A? If such a player had limited Major League experience, couldn't this be considered an advantage (a pitcher who hasn't been seen by most big leaguers) or a disadvantage (a young position player who hasn't learned to hit a slider well0? I don't understand how anyone can standardize the value of a replacement player, so I cannot understand how the WAR concept has any real validity.

  2. ian

    I'm not sure Huff played a single inning in the OF in 2010.

    • He played 502 innings there in fact, as SF experimented with Travis Ishikawa at 1B and gave Buster Posey and Pablo Sandoval some time there as well.

      • ian

        right on, thanks for clearing that up.

        I'm no fan of UZR, but the Giants 2010 OF defense was pretty great. Rowand was above average, while Torres and Schierholtz were probably elite level. Even Burrell rated out as average in LF, and Cody Ross gave them plus defense once he was acquired. So while I don't put any stock in UZR, I don't believe it's indicating a false positive here either.

        • Dan

          That's insane. At his absolute peak, Rowand was an above average OF. His physical decline was apparent in 2010, and never having been much of an athlete to begin with, his defensive play greatly suffered. He was definitely NOT an "above average" OF in 2010.

          As for Burrell, Pat the Bat was a below average LF even in the friendly confines of Citizens Bank. On that point, both Fangraphs and the eye test agreed (his 2006-08 UZRs were -15.9, -29.6, and -12.3). Now you expect me to believe that Burrell improved to a 13.4 UZR in over 600 innings in 2010 in a much more expansive outfield, and that there is no extraneous factor causing that result? Sorry, that's pretty much the definition of a false positive.

          • Dan

            Whoops, those were Burrell's UZR/150 numbers. His UZR numbers for 2006, 2007, and 2008 were actually -12.3, -20.9, and -9.0, respectively.

          • Naveed

            As someone who watched the Giants play just about every day last year, Rowand's defense passed the eye test. He wasn't Andres Torres in center field, but he was certainly good. The trouble with Rowand was never his glove; it was his bat.

  3. Eric

    I like this article and I agree with most of it. I have one problem with it. When you say that UZR hates sluggers and you compare how Nyjer Morgan is more valuable than Prince Fielder you are forgetting one of the components of WAR, wins above Replacement. While Fielder is great if you compare him to the average first baseman his statistics are not that far superior to many other first basemen in the league. However Nyjer Morgan's performance is more superior relative to his positional replacements than Fielders is.

    • JeffG

      I think you may have a typo – UZR is a defensive metric so wouldn't doesn't measure batting or "sluggers". Also, Hippeaux seems to be agreeing with you regarding the results of the methodology (i.e. sluggers are valued lower than fielders) but the article contends that the amount by which they are valued lower is not representative of actual contributions.

      I do, however, have an issue with the statement "it’s beyond asinine to conclude that Ellsbury is twice as valuable as Fielder". Since this is wins above replacement, even a WAR that is twice (say Fielder had a WAR of 3 compared to Ellsbury's 6.1) that's a difference, not a total value. A replacement level player has his own value (depending on which position) which is greater than 0, so even a double WAR would only mean someone would only be a percentage better when you take into account the base/replacement level, not double.

      • This is good point. I slightly misrepresent how the stat works here, in favor of making the statement more hyperbolic. I would say, however, that when we make arguments with WAR, we almost frequently fall victim to exactly this kind of error.

  4. LarryAtIIATMS

    Damn.

    I have to read this article more closely. But my first read is, damn. This is a helluva post.

    • Thanks, Larry.

      • Bill

        You use the wrong "effect." You mean "affect" most commonly used as a verb. "Effect" as a verb means to bring about, not to impact. That said, as a fan who has serious reservations about saber stats for many of the very reasons you highlight, I greatly appreciate your post. It is very nicely done.

      • Lee

        You guys are a joke.

  5. Eric

    THANK YOU for this post.

    UZR may be good for multiple years, but it is a flawed single season stat. Saying Carl Crawford one year in TB went from being an elite LF to being a terrible one the next year makes no sense.

    • Two of the run environments which seem to give UZR the most problems are Fenway and the Trop, so, although I certainly believe Crawford is a good outfielder, his seasonal UZRs have probably been unreliable in almost every season of his career.

      • Eric

        This was when he was in TB, not last year to this year.

      • BJs

        I see this argument and I just don't get it.

        Why is a player's defense always considered consistent when we know that their offense is all over the board. With UZR, it's entirely possible that a guy like Crawford mis-judges a handful of hits. That could take a solid UZR to a negative in no time at all.

        Players DO NOT always play consistent defense. This is true within a game, from game to game, and within a season. I don't find it surprising one bit that UZR fluctuates because of this.

        Of course, there are issues with UZR so I'm sure that's part of it. But if UZR, TZR, +/- all tell the same story then you can be pretty confident that playing level, not the metrics system, is what is driving the variance.

    • Is it any different from saying he went from being an elite hitter to a terrible on the next year, because I'm guessing you'd agree with that.

  6. jay_robertson

    Good article – with one stat alone, you showed the fallibility of even advanced stats.

    "On the other side of things, Adrian Gonzalez‘s career OPS is fifty points higher when the pitcher is throwing from the stretch. He’s enjoyed that situation in 52% of his plate appearances in 2011."

    Wow – did not know that. It isn't that much of a stretch to believe there are other similar situations out there where numbers are skewed by the situations, rather than reflecting a player's actual skills and performance.

    • Pete

      Context, context, context.

      If you use bases empty vs. men on to get a rough estimate (it’s not perfect because most relievers always pitch from the stretch, but it’s the best I could find on Baseball Reference), the average AL hitter’s OPS is 30 points higher from the stretch and Adrian Gonzalez’s is actually 15 points lower from the stretch than from the windup in 2011. So while Gonzalez has had (by my count) 51% of PA with men on compared to the AL average of 44%, he hasn’t been as effective in those situations as he has in the past.

  7. Matt

    Great post. Thanks!

  8. Dave

    THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU.
    I have been raging a mini-war against WAR with my buddies for almost 2 years now. If used the sluggers argument before, and Dustin Pedroia has been my poster child for the anti-WAR stance this season, but now I have some more ammo with the FB%.

  9. Mike

    WAR would be much better if it used more than one year samples of UZR. Every year there are dozens of extreme UZR fluctuations, that even if you believe it is a very accurate measure, it's impossible to argue that is anywhere near accurate in one year samples. When comparing players that often amounts to multi-win differences.

    I also question how useful it is to compare players only their own position. If the goal of the stat is to measure "value" to a team than that is the way to go. People use it now to say how good a player is, though, which seems unfair. A player's performance should not be downgraded because he is in a particularly strong class of first basemen (for example).

    Great article. It's strange that most of my baseball stat arguments are no longer trying to get people to forget about RBI and BA, but instead trying to get people to not take sabermetrics as gospel.

    • stratobill

      I agree that one year samples of UZR are inadequate to produce an accurate picture of a player's defense, but taking longer samples won't produce more useful results.

      That's because over the course of multiple seasons, players change. A young player tends to get better defensively because he's gaining experience, repetition, and learning from teammates and coaches. A veteran player declines defensively due to age and accumulated wear and tear
      on his body. And a player in his prime can have significant swings in his defensive ability due to injuries that limit his range or throwing.

      So computing a player's UZR over the course of 3 seasons and combining them into one rating is
      almost like taking the UZR of 3 different players and combining them. The Chase Utley of 2009 is not the same as the Chase Utley of 2011, so exactly what useful info do we gain from a rating that combines them?

      It's kind of like trying to determine the speed of a horse in a race by taking the average of his speed coming out of the starting gate with his speed down the homestretch and averaging
      them. Such a stat tells you nothing about the horse's peak speed.

  10. mikeNicoletti

    Huge HUGE fan of this article. Great post. I'm an engineer, and solving all of the hardest physics problems that we have overcome, statistics is a powerful tool. All the different ways to visualize and create metrics for the data should be viewed as a different lens and nothing more. The truth lies in the data, and sometimes the truth is fuzzy enough that it is immeasurable. This is why we play the games, this is the reason sports are fun for jocks and nerds alike, and this is why no matter how smart front offices are, sports will never be as boring and predictable as some of the old school writers would have you believe. Nice work.

  11. HankG

    You say, "It [WAR] doesn’t work." But you also say, "As yet, it [WAR] is probably as good a singular statistic as is widely available." Doesn't it follow logically then, that all other singular statistics don't work either?

    I think your actual argument is that WAR has limitations, that it is dependent on the accuracy of the statistics and valuations of its components. I doubt very many of the proponents of WAR would disagree with that. But saying that WAR doesn't work is hyperbolic and inaccurate.

    • I say "it doesn't work," but I clarify that by saying "not in the fantastically straight-forward way we try to use it." WAR being the best singular statistic we have available is also a function of there not being many which make the claims it implicitly makes (i.e. this is one-stop statistical shopping). That implicit claim is a big part of why we repeatedly hear arguments based upon WAR and WAR alone (i.e. in the MVP debate vaswani hilariously invokes below).

      I do find WAR useful as a starting place, especially for multi-season and historical analysis. My point is, many fall victim to the illusion of elegance and use it, as they did RBI and HR in the past, to say, "7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson." You're right, the creators of WAR don't argue for that method, but in this case, the intention has clearly diverged from the practice.

      Moreover, any time you aspire to such elegance, you risk having that aspiration mistaken for its achievement. I would suggest that all attempts to build elegant statistical models, no matter what the caveats, will be treated by some proponents as scripture (see Black-Scholes, etc.). From this perspective, we must wonder, would we be better off without them? I'm not sure the unintended consequences should be used to discourage the urge to innovate, but it's a question that has to be asked.

      • Curious George

        I say "it doesn't work," but I clarify that by saying "not in the fantastically straight-forward way we try to use it."

        —————-

        Who is "we"? I think you may want to distinguish between those with a grasp of the metric's limitations and the great, unwashed, innumerate public for whom all metrics are weapons that are poised to be misused.

        Your use of "we" is awfully presumptuous.

  12. I'm confused: Is Jose Bautista the MVP or not?

  13. Mike

    20 comments in and no one has pointed out that Protection Theory has been proven time and time again to be a total myth? That pretty much kills the entire second half of your post here

    This entire thing is all over the place and full of non-sequitors like "It's asinine to suggest that Ellsbury is twice as valuable as Fielder?" Why? Because your gut says so? Pathetic.

    • Tim

      It might not appear in the thread because it's mentioned in the article.

    • I agree, Mike. It's kind of a stream of consciousness post that picks up where his last anti-UZR rant ended by continuing to say, "I hate this, but I can't point out what I don't like." This could have been about 1,500 words shorter and actually to the point and perhaps I would have found it less stultifying. To me, it just screams of anti-stats nerd has tantrum that his favorite player is not better represented by a number, because his eyes tell him all he needs to know.

  14. Eli

    Finally it is nice to know I'm not the only one who thinks WAR is not best way to measure players.

  15. Interesting article, and very well written. A few notes:

    About the flyball percentage and UZR point:

    How can we be sure that that teams that give up flyballs don't optimize for outfield defense? We don't know which way the relationship goes in terms of causation.

    And about the sluggers thing:

    I'm pretty sure that the influence of the on-deck batter has been shown to be pretty minimal, so I'm not sure how strong of a point this is.

    • BrienJackson

      If I may, I think questions like this are mostly non-quantifiable, if only because there's so many variables to them, and also because they tend to focus on the wrong questions. The protection question, for example, seems mostly fixed on pairings of really good hitters. But that doesn't make a whole lot of sense. Would having Matt Holliday rather than Ryan Ludwick in the on deck circle make you want to groove a fastball to Albert Pujols? I should hope not, because he's still Albert Pujols! But Nyjer Morgan up with Braun and Fielder on deck? I know that if I'm a pitching coach, the last thing I want is my pitcher walking Morgan.

      Of course, just because he's getting challenged more often doesn't mean he'll get more hits or do noticeably better either. But I'm not sure that's really the question to ask, at least not so much as whether or not opposing teams change their approach based on a certain factor.

      • Nathan

        My counter point to that would be that just because a team is changing their approach if it doesn't affect the teams overall performance or only does so minimally (as has been shown in studies) what place does it have in a stat that is trying to express a players overall value.

    • BrienJackson

      And I would analogize the slugger in Hippeaux's post to the oil in frying chicken. In the grand scheme of things it might not have as much value over similar products as a good spice mixture or high quality frying pan does, but it's awfully damn hard to end up with the end result you want without it.

  16. Patrick

    Excellent post. I'm a fan of WAR as a shorthand, but more as a jumping off point than the end of a discussion.

    If I can pick one minor thing to quibble with — my understanding is that WAR doesn't say that Troy Tulowitzki and Carlos Lee have made equal defensive contributions this season. Tulo and Lee have saved similar numbers of runs compared to an average fielder at their respective positions, but that's not the same as making equal contributions. The "positional adjustment" factor that goes into WAR is an adjustment based on fielding position, i.e., for the fact that catchers and shortstops do more on defense than left fielders and first basmen. Tulo has a positional adjustment of plus-six runs (for playing SS) and Lee a minus-eight run adjustment (for being in LF). So despite having the same UZR, WAR still says there's a fourteen run difference in defensive contribution between the two.

    • williamnyy23

      That's really not a minor quibble. The positional adjustment is major component of WAR. Unfortunately, there quite a few similar statement that perhaps imply an incomplete understanding of WAR. Although a provocative piece, I am not sure much was covered to expand the topic.

      • Positional adjustment accounts for about 1.5 WAR difference max for comparing non DHs. Now, that is–in theory–a huge difference. But 1.5 wins isn't that far off from what is considered to be margin of error (which I assume comes from not knowing what the exact value of a replacement at any one position would be). The Lee-Tulo argument is a knock on how UZR gets weird in samples (Lee rates good defensively for the first time ever in the outfield, and good at first base), and he's certainly right that the multiple positions just lead to lots of really really small samples.

        I think the best thing to get out of the article is that UZR isn't trustworthy in small samples and that WAR shouldn't be used as the end of the conversation. But we might have known that to some extent.

    • Thank you, Patrick. Of all the glaring distortions and misrepresentations in this piece, the blatant disregard of positional adjustments was the most troubling. I can't believe it took this long for someone to point it out, but I'm very thankful someone beat me to it.

      The irony of the argument that Carlos Lee is overvalued and Troy Tulowitzki is undervalued while simultaneously claiming that WAR is reverse discriminating against slow fat sluggers was similarly auce.

  17. Lee

    This is a poor attempt at disarming WAR. First, you make no legitimate points against the offensive contributions to WAR. As has been stated above, "Protection" doesn't exist. Your only legitimate critique of the stat is with respect to UZR. Yes, it comes with significant error bars – even after 162 games. I completely agree this component of WAR needs to be taken with a grain of salt. I generally perform a mini-regression in my head when I see a UZR that seems out of line with a player's career performance.

    It seems, as is the case with most, if not all, "I don't like X stat, it doesn't work" arguments, is that you misunderstand the stat's scope and purpose. No stat is perfect. No stat comes without error bars. No stat is meant to be a utility knife to perform many roles. WAR makes the largest effort to incorporate a multitude of factors, yet it still tells us a very specific thing: over his innings played, how much value did a player provide over what our theoretical replacement player would have provided.

    It doesn't mean the BZA is a BETTER player than AGone, but during the innings that he played this year, according to wOBA, league and park factors, replacement levels, and UZR, he provided more value. It doesn't mean that given the same salary, you would take Zobrist's contract, it doesn't tell you who will be better next year, it doesn't tell you who is more skilled – it tells you nothing past what WAR intends to measure. Raw value provided over replacement. And when you account for the error bars in UZR, you can't even take these decimal places literally – as you claim is commonly done. Make your mental UZR adjustment if you don't agree. Make you additional park factor adjustment if you don't agree. This just gives us great starting point to begin discussing value.

    Leave the extremism to politics. Claiming WAR "doesn't work" simply makes you look stupid.

    • tenaciousdeucer

      I've read Kieth Law dismiss OPS as useless due to OBP and SLG having different denominators, then reference WAR as a fine metric. Yet lumping offense, defense, and baserunning into a single stat is not logical, there is not yet a defined single season defensive stat, park adjustments are inherently flawed, and the respected WAR compilers do not agree what "replacement" means. WAR may be a fine starting point for discussion, but as far as a true metric of value it is not an opinion but a fact that WAR "doesn't work."

      • Lee

        "it is not an opinion but a fact that WAR "doesn't work.""

        WAR is a framework for assessing value. The logical process is sound. WAR works. It uses linear weights for offensive contributions (wOBA), augmented by park and league factors (wRC+). It uses UZR for defensive contributions. Do you disagree with the methodology of UZR? Totally valid, assuming you've done enough personal research to back up your opinions. But to say that "WAR doesn't work" shows that you simply do not understand what it is we are even talking about here. Sorry to put it so bluntly. If you truly care about understanding and being an active participant in the discussion, you should spend some time to learn the pieces of the puzzle first, then being to dig into understanding what it is WAR really tries to do, from the ground up. Only then will you actually be an active participant in this discussion, as opposed to a mindless troll spouting "facts" for which you have no logical basis.

        • tenaciousdeucer

          One only needs to scratch the surface of WAR to see it is too flawed to be taken as seriously as it is. If you think my post is trolling, then you should reasses your own logic and not take this kind of business so personally.

          • Lee

            You realize that you continue to talk rhetorically, without providing any actual substance. You aren't saying anything. At all.

      • MontgomeryBurns

        You should probably email/tweet Klaw about this instead of just throwing his name out like it makes your argument more sound. It's not the best way to sound credible….

        • tenaciousdeucer

          Uh, he's on record on this issue all over the place, bro.

          • MontgomeryBurns

            Hey, bro, I mean you should email Klaw about why he says OPS could use some change on how it's valued and why he values WAR. Your assertion is that because he feels OPS is flawed, but likes WAR, means that he is being illogical. My assumption is that you would probably make 0 salient points against him in such a debate. So, before you try to make that assertion, why don't you talk to him and ask him why he likes WAR. Bro.

          • tenaciousdeucer

            Well, I'm not assuming anything since he is on record on that matter. I also don't expect Law to give a dang what I say, bro, but feel free to go tattle on me via twitter or just give him a call for me and say "hi".

          • MontgomeryBurns

            Don't need to do that. The fact that you use his bias against OPS and his support of WAR as a representation that he (and WAR) is being (or is) illogical without bring that to his attention yourself shows that you are, in fact, a deucer. Bro.

    • joe

      Can people stop using the word 'myth' and doesn't exist when talking about protection?

      Not measurable on aggregate samples means… not measurable. For some reason people keep bastardizing stats and take this to mean it is conclusively a myth.

      The only CONCLUSIVE thing that can be said about lineup protection is the impact is not measurable/statistically significant with today's measurements. When you hear current and former pitchers talking about how they change their approach based on who's on deck or what the score is or if there's a runner on 3rd, I'm of the opinion they are not simply lying and making it up for the heck of it.

      I understand the discussion of how much of an IMPACT things like lineup protection or contextual variables when pitching (score. runners, # of outs, etc); but the "it's a myth" meme is generally given by folks who don't understand what statistical test are designed to show or not show.

      So when you talk about extremism… taking a position that "protection doesn't exist" is rather ironic.

      • Lee

        The extent to which I used the word myth was hardly hyperbolic.

        "The only CONCLUSIVE thing that can be said about lineup protection is the impact is not measurable/statistically significant with today's measurements."

        Newsflash – today's measurements are pretty darn good. They aren't perfect, we don't have sophisticated analysis of batted ball profiles, but the fact there's basically zero affect that we can measure through the actual outcomes (the most important factor!) at this point means… there just isn't much of an effect.

        Then compare that to the widely held belief that "protection" is a considerable factor, then calling it a myth seems… eminently appropriate, if you ask me.

        • Hank

          What perception is there that protection is a "considerable" factor? Define "considerable" and define how "widely held" this belief is… It's ironic that SABR folks/fans often try to smash generalizations and at the same time often create such generalizations (with no data) to make their counterargument stronger without any data. The other day Dave Cameron created the ridiculous strawman "Montero will not be the savior Yankees fans think he will be this Sept" so he could come to the incredibly astute observation that 1 month at DH is not going to have a huge impact on the leagues' highest scoring offense. He used an anonymous quote from a team official, reworked it for his own purpose and then assigned it to the fanbase in general…

          When ex major league pitchers ACKNOWLEDGE they pitch people differently depending on who's on deck(Hersheiser, Smoltz, Glavine have all talked about this), you can't say protection is a myth. I can understand discussing how much of an impact it might have or how widespread it is, but myth implies it doesn't exist at all.

          Just because on aggregate the impact is minimal/near zero, that should not be confused with non-existent, non-important in every situation. The aggregate may be close to zero but you should not confuse that with it being near zero in every individual circumstance..

          I'm a big SABR fan but one of the biggest issues I see with the ever increasing popularity of these stats is applying a statistic that is based on large aggregates and applying it to a specific player or situation mindlessly and not understanding the limitations (the run expectancy… i.e. "how can manager X be so dumb?" is another good example of this). Another example is the meme "UZR is what happened on the field" – but built into 'what happened on the field is a large aggregated model of out expectancies in individual zones over the last 10(?) years and a broad stoke that the distiribution of balls in that zone is roughly what the actual player saw that year (not to mention the stringers that are plugging the data into the model aren't actually measuring anything…. they are simply judging zone, speed and runner speed in some coarse buckets with whatever subjectivity is associated with those judgements).

          • Lee

            Use your imagination, Hank. First of all, the author of this article cites protection. According to "joe" former MLB pitchers cite protection. Do you want a list of every time I've heard someone talk about protection during a game's telecast? But congratulations on pushing the discussion back towards petty semantics. You only spent about 1000 words doing so.

            Of course some pitchers will approach hitters differently based on who is on deck… but guess what, the hitters KNOW that they are being treated differently and adjust to it! There are a million factors at play, and the best we can do at this point is to analyze the results. This is what actually happened. In millions of at bats. And guess what, not really a factor.

            Please excuse any misuse of the words "factors", "analyze" or "the".

  18. If you don't trust the defensive metrics use Offensive WAR on baseball-reference.com or VORP on Baseball Prospectus. Those are both everything but the defense metrics.

    oWAR just assumes every player is an average fielder.

    • Excellent point. People criticize WAR for being a black box, but it really isn't. It is a framework from which you can pick and choose what you trust. If WAR was designed to mislead, the component values would be hidden the sum.

    • tenaciousdeucer

      Right on. The defensive stats are not "there" yet and there is little need to combine hitting and defense into one number anyway.

  19. andregodin

    WAR is very much a work in progress and currently there are several working prototypes. Chances are whichever one is the best will still have serious limitations and probably can't be used as the argument ender that we would like it to be. That being said I feel like a more interesting avenue of inquiry would be determining which prototype is the least flawed rather than showing the particular flaws of just one prototype.

    • tenaciousdeucer

      "Wins above replacement" is a bad idea for a stat in the first place. Something else will make WAR the Edsel of stats.

  20. Sylvan

    This article makes a couple interesting points, but it's majorly flawed. Your gut may tell you that a big, fat slugger is automatically more valuable than a speedy line-drive-hitting defensive wiz, but that doesn't make it so. The burden is on you to prove it.

  21. rone

    So… is this a criticism of WAR, or a criticism of UZR? As near as i can tell, the problem you have is strictly with UZR. What about DRS or TZ (or some other advanced collective fielding metric)?

  22. Ian
  23. Hank

    I'll post this again….

    Carl Crawford career LF at the Trop 22.5 UZR/150
    Carl Crawford career LF everywhere else: 7.5 UZR/150

    This is over 8 years (so each sample size is the rough equivalent of 4 full years).

    1 year OF UZR samples are bad, but even the general "3 years is what you need" can also have issue as UZR can have systematic biases…. input bias, park effects and subjective components (armR, errR for outfielders) which don't even out over a 3 year period.

    I like the concept of WAR and the issue I have with it is bad input data (the defensive stats in both WAR models and the baserunning values now put into the fWAR). Until these variables can be measured better (FieldFX?), any difference in WAR between players based on these components should be taken with a huge boulder of salt.

    • What is the issue here exactly? Why would a difference in home and road UZR be evidence of a flaw in UZR? Some players hit a lot better at home than on the road? Why wouldn't fielders see the same effect.

      For some reason when defensive numbers are inconsistent across splits or years folks gnash their teeth and blame the faulty defensive metrics, but Carl Crawford can go from an OPS+ of 135 to 82 in a year and people won't question the offensive numbers.

      Why do we expect greater consistency on defense than on offense? That doesn't seem to me to be a valid expectation.

      • Hank

        The issue is you are drawing an analogy of a 1 year variation in OPS to a consistent EIGHT year variation in UZR…. do you really think your strawman is appropriate here? You did see the 8 year part?

        If I saw a consistent 8 year home/road OPS split…. no I wouldn't chalk it up to noise… I don't think anyone in their right mind would if it was 15 runs worth.

        So I understand you, Crawford 8 years of splits is simply noise? And thanks for putting words in my mouth but I don't expect greater consistency on defense… which is why I looked at eight years of data and not one or two. If it was merely one or two I would chalk it up to variation.

        • Of course it likely isn't noise, though some of it could be.

          He is likely a better fielder at home than on the road. Better jumps, reads the ball better off the bat due to experience. Is able to go into the wall more aggressively due to familiarity. Do you really think it unlikely that Crawford makes a play or two a week more at home than on the road?

          • Hank

            He's 15 runs better at home than on the road…

            The difference between Crawford at the Trop vs Crawford anywhere else in the world is similar to the difference between Matt Holliday and Raul Ibanez (I just randomly took a 4 year sample)….

            You are chalking up that much of a difference to a combination of comfort level playing at home and noise? Perhaps…. but perhaps it's just a bit more than that… I guess when he gets comfortable in Boston and we have 3+ years of data there we will be able to assess how much of a rather massive 15 run difference can be attributed to comfort level.

            Is there any data/study which shows a home/road comfort level benefit. And is it anywhere near a UZR/150 of 15 runs?

        • Eugene

          By the way, the year to year fluctuation in OPS+ that Sean mentioned is about the same level as the 8 year fluctuation in UZR you point to.

          Since it takes three years for UZR to stabilize as much as the general offensive stats, then 6 years of even home road fielding splits would be about as stable as year to year offense numbers. There's only slightly less noise in the fielding you cited than in the offense Sean mentioned.

          Oh, and 135 to 82 in OPS+? That's about a 35 run difference.

          It was a real point.

          • Hank

            I'm talking about what seems to be systemic variation, not year to year variation

            Year Home/road (UZR/150)
            2010: 36.4 / 6.0
            2009: 27.9 / 8.8
            2008 36.3 / 8.0
            2007 0.9 / -6.7
            2006 14.7 / 6.6
            2005 22.2 / 2.6
            2004 31.1/21.0
            2003 20.7/10.6

            Look at the #'s for a second…. and think about equating that to year to year OPS (or wRC or whatever offensive metric) variation…. the only random component I see is impacting just how much of a gap there is between home and road…. but EVERY year has a minimum 7.6 run difference in the same direction…. every year. Now some years are as high as 30 runs – now the change in the size gap seems to be more might be traditional UZR noise…. but I'm talking about the gap itself.

            Folks seriously think the above splits is no different than random hitting fluctuations from year to year? (especially when you consider home/road games are interspersed so it's not like you have some of the temporal variables when looking at year to year…..an jinjury, swing adjustment , etc

            If you can point me to a systematic hitting variation similar to the above, then I'd buy the analogy to hitting variation… but simply throwing out one year of hitting to another is looking at something completely different than what I'm talking about…. now if I was simply talking year to year UZR fluctuations then I could see the analogy.

            Hope this clear this up some.

          • Jonathan C. Mitchell

            One of the main reasons Crawford's home numbers are so much better is due to the amount of extra space there is to make a play out of the zone. Fenway and others don't have as much. Is that a bad thing? No, it just says, to me, that Crawford can cover a heckuva lot of ground and his value is best used in a spacious outfield.

          • Hank

            The larger zones cut both ways… in UZR the # of zones is constant so there will also be 'routine' high probability outs which may be a bit tougher in Tampa. or tough zone outs that will be even tougher to make given the extra area.

            There are also park factor adjustments to account for these (I have no idea how accurate or effective they are). Again UZR is a comparison to a league average fielder… so the park effect should be distilled out of the value equation, no?

            Crawford is also an above average arm at home (I think it's about a 5 run home road split on UZR/150)…. this might be a turf effect (speculation on my part) but if UZR is comparing to league average should he still be rated as an above average arm because he plays on turf?

          • Eugene

            Wow, I didn't realize that it was so consistent an effect. I wouldn't be surprised if those numbers weren't exaggerating a real difference in Crawford's home/road fielding production. Players frequently have large home/road splits in batting/pitching and outfields would probably show the same in the field. It also could be that the exaggerated effect is an outlier common to statistical systems like baseball.

            Assuming that fielders are expected to play exactly the same in every park, an effect like Crawford would show up about 1 in 2500. So there is likely some systematic bias. If fielders can actually be better in one stadium over another like batters can, then the chances of this happening are much greater. For example, for players that have an expected home UZR/150 distribution 5 or so runs better, about 1 in 30 should show an effect like Crawford's.

            One thing to keep in mind is that you are using UZR/150 for half year samples. That means that the actual run difference between home/road for the given year would only be half of what is reported in those numbers. So the difference spans between ~4 runs to 15 runs.

            Anyway, Here's a systematic hitting variation similar to the above.
            It would have taken too long to get month to month home/road splits, which would have been an exact analogy for stability in the sample sizes. As it was I had to look at someone I knew had home road splits. Matt Holliday.

            YearHomeRoad (wRAA – Offensive runs produced above average)
            200418.8-7.7
            200522.4-0.8
            200640.14.3
            200746.411.8
            200827.918.2
            200925.710.4
            201021.319.3

            This example is apt partly because one can and should easily point to the parks as a significant influence in his split. Crawford may just have been better at Home. According to Fangraphs, it looks like Crawford has averaged 7.5 runs better in LF at home than away for a full season of half home half away games.

            By the way, where did you get those UZR splits per year? I'd like to take a look at some other players.

          • Eugene

            Apparently, Tabs aren't formatted well… Holliday's splits again:

            2004: 18.8 / -7.7
            2005: 22.4 / -0.8
            2006: 40.1 / 4.3
            2007: 46.4 / 11.8
            2008: 27.9 / 18.2
            2009: 25.7 / 10.4
            2010: 21.3 / 19.3

    • The Trop is a notoriously difficult place to play in the outfield with the white backdrop. If you're comparing Craw's performance to all others that play there (sparingly) then of course he's going to be much better as most players don't grasp the nuance of the field parameters on too of a white ceiling.

  24. Eli

    "huge bolder of salt" those are my exact thoughts.

  25. Michael

    A lot of things you said in here are wrong. Most of the complaints you have with WAR aren't with the stat itself, but with your own understanding of it. It's too long for me to comment on it all here, but you can read my response here: http://www.pinstripealley.com/2011/9/6/2408743/re

    • hugh

      Don't particularly want to put words into Hippeaux's mouth nor to get into a long one, but I think the point about having fewer opportunities in the OF on low-FB% teams is that it magnifies the effect of the mistakes that even the best outfielders make. It doesn't magically lower everyone's UZR, sure. But it makes the likelihood of it doing so to an unrepresentative degree rather greater.

      Forgive me if I'm barking up the wrong tree.

      • Michael

        It doesn't, he's wrong. Just to make sure, I just went to fangraphs, exported team FB rates for every year from 2002 to 2011 to excel, and graphed them next to team outfield UZR for the same years. I ran a linear regression, and the R^2 correlation is .0234. So, fly-ball rates account for 2.34% of the variation in UZR, which is statistically insignificant

  26. JR05s

    So if greater FB% means greater UZR. War tries to show runs saved on the defense side. If the outfielders are saving more runs because they get more chances then others. How is this any different than teams that have their clean up numbers inflated because they get more at bats because their teams make less outs etc. The extra at bats would increase their counting stats. Yet this is why the averages (wOBA) are used. It appears to me that UZR should be adjusted in the same manor where its not counted in a linear formula.

  27. Jim

    Please go to the link in the reply above me. It is a thorough evisceration of this half-assed, misguided critique. If Tango had been able read it first I don't think he would have even bothered to respond and give it hits. Hippeaux, you're out of your league (not saying I'm in that league, but I'm smart enough to know when smart people know more than me).

    • bigandthick

      A worthwhile discussion and article. I think that a number of people have gotten to the point of blindly accepting the numbers. And, if anyone dare challenge them, there is this posse who reject any disagreement. Look, at one time Bill James was rejected and look how that turned out. If you are to grow you need to be challenged. And, The defensive numbers in these systems don't pass the smell test!

  28. Mike

    "it" meaning "Is WAR the new RBI?" not the pinstripealley post.

  29. R. Mann

    "Dustin Pedroia has a real shot at his second MVP"

    Really? That's the first time I've even heard his name in relation to the 2011 MVP: Maybe just give it to the entire Red Sox lineup to keep the media happy?

  30. explodet

    I thought you were going to mention that UZR has a different baseline (league average) than wRAA does (replacement level).

    • Tony

      wRAA = weighted runs above average, not replacement.

  31. Half Brit like Nate

    Great post. And for all the haters who want to prove you wrong, the burden is on them.
    And I bet most of the haters have not developed their baseball eye, meaning they can't get tell you much while watching a player, they need their stat book. These people are not only stubborn, they are boring to converse with.

  32. Tony

    Interesting post. WAR is far from perfect, but I disagree with some of you problems with it.

    1. UZR and Fly balls – I wonder how much of your findings were due to selection bias. A team with a lot of groundball pitchers would probably more prone to sign the slugging, poor defensive left fielder than one with a flyball heavy staff. Conversely, teams like the Mariners, Padres, etc. will probably lean towards good defenders for their parks, and won't shy away from flyball heavy pitchers as say, the Yankees or Rockies, might. That said, I definitely do think there is some effect on UZR, I just question how much.

    2. Dunn Effect – Maybe I'm misunderstanding you here, but I disagree with this whole section (and I'm aware that UZR is far from perfect). UZR doesn't suggest that Carlos Lee has as much defensive value as Tulowitzki, it suggests that he had as much defensive value compared to other LF/1B as Tulo did to SS. Since SS are collectively among the best defenders on the diamond, Tulo has a higher bar, but that's the whole point of the positional adjustment.

    • Tony

      Two types of utility men – Lee's and Kendrick's defensive numbers have nothing to do with the fact that they play multiple positions, it's the fact that they play them well(according to UZR, which as I said before, is not perfect especially in small samples, but that's not what you were talking about in this paragraph). Pedroia hasn't played an inning outside 2B, but his boost in UZR this year is similar to Kendrick's. Michael Young has played all over the infield and his UZR is as bad as ever. You might have a problem with UZR, but no one would argue if you wanted to use other defensive stats and/or multiple year samples when evaluating players.

      3 Sluggers – (Cabrera vs Kinsler) Why is it crazy to suggest that the 6th best offensive (wRC+) 2B, who is regarded (by multiple stats over the last few years) as a good defender and baserunner, is better than the best offensive 1B, who is regarded by stats and scouts alike as a poor fielder and baserunner?

      • Tony

        (Morgan vs Fielder) Why is crazy that the 8th best offensive CF (wRC+) (and that's counting Hamilton and CarGo as CF even though they've played most of their innings in a corner), if he qualified, that has been rated above average (and sometime excellent) every year of his career by multiple defensive stats, could be better per game than the 4th best offensive 1B, who is regarded as a poor defender and baserunner? (That said you should be careful extrapolating a part time player's WAR to a full time role. One reason for this is platooning, and Morgan is a prime example of this. That doesn't mean there's a problem with WAR, just like there's not a problem with BA/OBP/SLG even though Morgan looks very good in those stats too, but there could be a problem if someone misuses it.)

        I would not argue that Ellsbury is specifically twice as good as Fielder. But Ellsbury has the 4th best CF wRC+, Fielder the 4th best 1B. However, when Ellsbury adds value with his glove and baserunning, and Fielder takes it away, I don't think it's a stretch to say that Ellsbury has been better by a pretty good margin.

  33. Yonkees

    Fantastic article. Those accusing the author of "not understanding what WAR is supposed to be used for" I think are missing his point that the proponents of WAR often regard it as an end all-be all stat defining how good a player is, not the subtler stat looking at his relative value that it truly is. I believe those people are misusing and misunderstanding WAR, just as you accuse this author of. I think he is trying to change the conversation so people start to recognize that just as a simple stat like RBIs has simple flaws, a complicated stat like WAR has more complicated flaws, ones that are easier to dismiss because of their intangibility.

  34. Jed

    I'm pretty sure you've been sitting on that ending quote for a couple weeks until you could write a pertinent WAR-related article. Props on the pun.

  35. Andrew

    IIATMS has some great articles but this one is rather poor. As Michael above illustrated excellently in his response on PA, your understanding of WAR is flawed in several places, particularly in your lack of understanding of positional adjustment and of UZR, which is a very complicated metric with failings that are well-known. No one credible has ever suggested that one player's season is definitively better than another's because they put up 6.5 WAR against 6.3 WAR. Also, you seem to be particularly targeting UZR and fWAR, which incorporates it. rWAR offers an entirely different perspective on WAR which does not utilize UZR and even provides oWAR and dWAR separately if you don't trust Total Zone either or are too lazy to subtract UZR values from fWAR. WAR is not a perfect tool, but it is one of the best when used properly, and numerous people have devoted their efforts to making sure that all factors are taken into consideration and the numbers are as accurate and objective as possible. The biggest failing of WAR is one which you did not mention – its failure to address the issue of luck (though there is an argument to be had over whether luck has any place in an evaluative statistic). In short, WAR is not as simple as it may seem, and it is your responsibility to have a complete understanding of it before offering your criticism.

  36. Brian Cartwright

    WAR is a framework, a method of combining all facets of a player's performance. It's not only published at FanGraphs, using UZR as it's defensive component. You can find it at Baseball Reference or The Hardball Times, or called VORP at baseball Prospectus. Sean "Rally" Smith had his version as well, before he was hired as a consultant and stopped publishing.

    I develop the numbers for The Hardball Times. I'm glad to say that many of the numbers you complain about in the article are not as extreme at THT, where Prince Fielder's WAR per PA is 28% higher than Nyjer Morgan's, and where Fielder has the 15th best overall WAR for a position player, ahead of Bourjos' 46th.

    Most of the article deals with UZR issues, not WAR. The two biggest differences between WAR implementations are defense and positional adjustments. WAR doesn't hate sluggers – it jusy refuses to rate highly those sluggers who are limited to defensive positions populated by other sluggers. THT has Mark Teixeira as having the 11th best 2011 season among first baseman. He may have 35 HRs and 100 RBI, but the production is not nearly as good as Prince Fielder, whom I rank 4th (behind Votto, AGon and Miggy). So the bar for first base is much higher than that for shortstop, where the 10th best THT War goes to Elvis Andrus.

    Your comments on correlating fly ball rate with defensive ratings did get me thinking, as that touches on work I'm doing now with estimating batting average on balls in play for batters and pitchers by their ground ball (or fly ball) rate. More flies indicate a higher mean vertical angle, which results in easier to catch balls. So you did find what I believe is a true effect. I will begin work asap to get this coded into THT's outfield defense as an adjustment to the expected catch rate.

  37. As a fervent supporter of all things WAR, I was taken aback by some valid and astute points here. On the whole, though, I think you’re too harsh on WAR and your analysis caters more to the flat-earth crowd than the “enlightened-but-questioning” contingent. My rebuttal:
    http://replacementlevel.wordpress.com/2011/09/07/in-defense-of-war/

  38. Tangotiger

    Hippeaux: I think you did a great job on the fielding part. You were too quick on the conclusions pretty much everywhere else. You seem to be somewhat in tune with the concept. If you email me, perhaps we can have a dialogue, and we can see where we can plug up some holes and straighten out some edges.

    tom~tangotiger~net

  39. Sabrina

    You got 80+ comments. Whether everyone likes your post or not, good job!

  40. Taylor

    In neoclassical economics as in real life, analysts often fallaciously assume that they can clearly delineate what a market is. Thus, Conventional Wisdom holds that there is one global market of baseball players, and that Carlos Lee must be a good one because he hit more HRs than any other Astro this year. This baseball market is global.

    WAR makes the same assumption about delineation, although it draws the line completely different. For WAR, there is one market for shortstops and a separate market for 1B, and it operationalizes this assumption by comparing a SS only to a replacement-level SS when making its valuation. This baseball market is local.

    Hippeaux's point about sluggers highlights the limits of this reasoning. There are some commodities, skills, etc., that are abundant locally but scarce globally, despite the fact that the location of abundance is by definition part of the global market. So slugging 1B should not be compared only to 1B because slugging, while abundant at 1B, is scarce generally. This global scarcity gives slugging more value than WAR suggests, without even resorting to Jim Rice-y arguments about intimidation. However, the local abundance of slugging at 1B does depreciate the value of a slugger who is marooned at 1B quite a bit more than Conventional Wisdom suggests. Thus, the real money should go to a skill set – such as Tulowitzki's – that is scarce both locally and globally.

    • Thank you, Taylor. You made this point much more gracefully than I did.

  41. Jon

    For those "protection is a myth" people…

    You are looking at this backwards. The point about Adrian Gonzales isn't that he is being "protected" by a player behind him, but being elevated by being put in a spot to succeed.

    Players hit better with men on base. Pitchers have to throw more strikes and OPS, overall, is raised when hitters are on base. So, If I see 20% more men onbase this year than last, shouldn't my OPS be increased by a percentage of that? Gonzales isn't making them hit better, they are putting him in a position to hit better.

  42. KevinH73

    My main problem with WAR has always been the defensive side of it so big thumbs up there. I perfer VORP myself.

  43. Gary

    Maybe you can’t compare outfielders to first basemen, but I assume you can compare shortstops to shortstops, as in Derek Jeter to Elliot Johnson. According to WAR, Jeter, who is playing every day for a championship team and over the past two months has been among the best offensive players in the American League, has a WAR of 0.9, while Johnson, a part time player who has more strikeouts than total bases, has a WAR of 0.7. According to WAR, you could pretty much interchange Johnson for Jeter and have no effect in your lineup.

    My girlfriend’s niece is married to Johnson, so my girlfriend has a vested interest in how well he does. She wants to see him succeed and follows him more closely than she does Jeter. Yet when I mentioned that according to WAR, he was virtually as good as Jeter, my girlfriend burst out laughing. Her reaction was that WAR must be a faulty statistic because even with her personal bias in favor of Johnson, she knew there was no way he was as valuable a player as Jeter, and that it wasn’t even a close comparison.

    I understand that the biggest reason for this similarity in their WAR is because of fielding – Johnson has a 0.7 defensive WAR while Jeter is rated at -0.9. While I have no doubt that Johnson is a fine fielder I also don’t believe Jeter is more than 1.0 worse than him. I subscribed to the MLB TV package this year to watch Yankee games and have seen quite a few of them. One of the reasons was to see just how much Jeter has lost at shortstop because of all the negative reports I’ve read.

    Unfortunately, I have not seen this and in fact, have often been surprised at how many plays he makes on balls that other shortstops often don’t get. But a lot of it has to do with his positioning, which I’m sure is a result of years of experience. I’ve seen him field a ball hit right over second base by moving one step to his left. I’ve seen him field balls that were hit past the third basemen because he’d positioned himself so far to the right. My question about UZR is, if he has positioned himself near second base because of the way the pitcher is throwing or the way the batter has been tending, and instead the hitter smacks one through the shortstop hole, does he get penalized for not making a play an “average” replacement shortstop, positioned normally, would have made?

    • Michael

      Jeter's fWAR is 2.5, and baseballprospectus has his WARP at 2.0, so the .9 rWAR you are mentioning is kind of an outlier between the three WAR sources. His average between the three would be 1.8, while Johnson's average is .3. So no, WAR does not say they are the same, you need to look at the other sources too.

      • Gary

        Thanks for the reply. I looked on baseball-reference to find Jeter's WAR, which was 0.9 total when I looked (1.8 offensive and -0.9 defensive). So you're saying there are three different sources for WAR and that each one has him listed differently? How do we know which one is correct? What does it even tell us if there are three different (and apparently widely varying) sources for WAR? I'm beginning to see why this stat is having trouble catching on.

    • To your latter question, no, UZR does not take play-to-play positioning into account, although it does shift its definition of normal positioning according to left and right-handed hitters (it also takes into account batter speed). While I have some concerns about this, especially with outfielders, I understand the rationale. Smart positioning is part of a player's defensive acumen. Where problems arise is usually in rare late-game scenarios, shifts, etc. Who gets the debit for a hit that goes through the normal SS or CF zone when the SS or CF isn't playing there? (The answer, as I understand it, is often "two players".) However, as the samples get larger, these rare situations become more evenly distributed and are largely inconsequential.

      • Gary

        Thanks for the reply. Another question I have about UZR is who figures it out? Who decides if a play should have been fielded by an "average" shortstop or outfielder? Does it take into account things like arm strength, relay positioning, turning double plays and holding runners on, or is it purely an assessment of fielding a batted ball?

  44. When you see that Tony Batista had a fWAR of -0.4 in that illustrious season you pointed towards, maybe you'll realize that WAR gets more right than it gets wrong and until you come up with a better solution, perhaps you should leave the kvetching to those more qualified. Do the same with UZR while you're at it.

  45. Chris Cody

    One problem I have with WAR is the use of the word "wins." These are hypothetical wins. The great Dodger infield of the 70's and early 80's didn't rack up much WAR, but they sure won a lot of baseball games. (Compare WAR of Garvey, Lopes, Cey, and Russell to Rose, Bench, Morgan, and Perez, no comparison, but look at their postseason records from 74 to 81, pretty comparable.) On the flip side, Bobby Abreu, darling of the WAR stat, hasn't been much of a help getting teams to the postseason. I liken WAR to a players full scale baseball quotient – like an IQ score. Just because you have a high IQ doesn't necessarily mean you do smart things and just because you garner the right stats, doesn't mean you win baseball games.

  46. Chris Cody

    I disagree with the author's contention that WAR doesn't prop up sluggers. Look at the top WAR guys at both Fangraphs and B-R, they are pretty much all sluggers (Ty Cobb led the league in HRs). Look how inflated McGwire is on both lists, I'd take Keith Hernandez or Will Clark as my first baseman in a heartbeat over Big Mac and all his useless walks and meaningless solo home runs (He has the lowest RBI to HR ratio of any of the top HR guys). Yes, useless walks, what do you do now that you have, perhaps, the slowest professional athlete in the world standing on first base. He only scores if the guy behind him goes yard.

  47. Joe Ely

    Awesome, Matt, just awesome. Thanks.

  48. Joe

    Hippeaux:

    Aren't you being too harsh on the RBI? Isn't the problem with RBI exactly the same problem that you have identified with WAR — that people misuse generally it?

    RBI does not purport to tell us everything about a player's overall value. Indeed, it does not purport to tell us everything about the player's value as a hitter. Never did, never will. Since the dawn of time, there have always been other statistics — such as batting average, stolen bases, and (even before OBP!) walks — which were intended to tell us more about a player's value as a hitter. And everyone knew it: it was obvious (even to Tim McCarver, I think), that a RBI-machine in the batting order needs table setters who bat .300 and are willing to take a walk to get on base, steal a base, and score the run.

    RBI is a counting stat, and in counting, it does exactly its job — it is quite accurate. A player with more RBIs has batted in more runners. If you use the RBI as it was intended, it cannot really be misused.

    So we should stop picking on the RBI, ok?

    Indeed, I would argue that WAR is far more problematic than RBI, because while RBI was misused in a way that it was never intended to be used, WAR (or really, fWAR, which is what I think you are really focusing on) has in fact has been misused in exactly the way that it was intended to be used.

    One of the points that you make — and others make, in comments to Tango and Neyer's responses to your post — is that WAR is a shiny new tool and so people tend to misuse it. I want to take that a step further: WAR is intended to be used as — it invites us to use it as — an all-encompassing stat. Unlike RBI, which nobody should even have confused with an all-ecompassing stat, WAR was created precisely for the purposes of giving stat-heads a way of encapsulating a player's full value into one number. As such, it should not be a surprise that WAR is being used to represent a player's full value.

    Since WAR is susceptable to misuse, blame for failing to avert its misuse should be placed on WAR (or the creators/proponents thereof). It's not something that can be shrugged off as "someone else's" mistake. Think of it as a labeling issue: we require manufacturers of products to warn users against expected or expectable misuses, but not against unexpectable or ridiculous misuses. If McDonalds makes coffee that is too hot to drink (or makes coffee sold at drive-thrus too hot to be spilled on a lap in the car), then McDonalds has to label against that because it invited the coffee to be used that way. But McDonalds' does not have to label that it's coffee is too hot to be poured in somebody's eyes, because nobody should be using coffee that way in a million years.

    Thus, critizing WAR as you did (and, again, here I probably mean fWAR) for others' misuse is fair.

    But let's leave poor RBI out of it.

    Joe

    PS — Sorry I'm so late to the party

    PPS — Tango's reaction to your article is absurd. As others have said, if WAR is defined so narrowly as to be a framework — deviod completely of the flawed implementation of the framework — then yes, it's going to be imune to criticism. But then it's pretty much useless. In finance, as I am sure you know, we have a concept called the discounted cash flow analysis (DCF) for valuing a company that is, in theory, perfect. But the devil is in the details, and small differences in assumptions and preductions can lead to vast differences in valuation using this method. That's not a flaw in the implmentation — that's a flaw in DCF itself. Great in theory, but problematic in fact. This is your criticism of WAR, and it's a good one.

    Thanks for the great article!!

  49. Joe

    Here's an example of exactly the type of labeling I'm talking about. Sweetspot can use WAR all it wants in my book, so long as it reminds us every once in a while that WAR should not be a conversation-ender.
    http://espn.go.com/blog/sweetspot/post/_/id/27050

  50. The idea is so good, so clarifying like democracy or the rational market that we really, really want it to work

Comments are closed.