Is WAR the new RBI?

Let’s face it, it’s the only reason we remember Tony Batista.

In 2004, in his final full season as a major-leaguer, T-Bats drove in 110 runs for the Montreal Expos, despite a putrid .272 OBP. Although he was, arguably, the worst everyday player in the majors in ’04, he was hardly the worst player to ever drive in 100 runs (see Ruben Sierra, 1993), nor was 110 the highest RBI total ever amassed by a replacement-level player (see Joe Carter, 1990). However, for some reason, Tony Batista became a sabremetric icon, our favorite cause celebre when we rage, rage against the RBI.

You’ve heard it before. RBIs are just neat round numbers and context. Given the opportunity to hit behind a couple of on-base machines like Brad Wilkerson and Jose Vidro, anybody could drive in 100 runs. But just because a blind squirrel gets a nut every once in awhile, that doesn’t mean he should bat cleanup.

In the wake of T-Bats glorious season, the sabremetric cause was moving from its grassroots mail-order infancy to full-blown mainstream phenomenon, buoyed by New York Times Bestsellers, championship GMs, and senior columnists. When an broadcaster spouted out a flurry of “traditionals” – batting average, homers, RBI, wins, saves – to make his point, basements full of fantasy addicts looked up from their digital almanacs and replied in unison: Bleh.

Give me OBP, give me OPS, give me IPO, give me WPA, give me K/BB; just don’t give me RBI! If you’re going to give me RBI, Mr. McCarver, I’d rather you gave me nothing.

And then came WAR.

The concept was ratified by the sabremetric Godfather, Bill James, who’d created Win Shares according to a similar ideology in 2002. It was a neoclassical economist’s wet dream, like baseball GDP: an elegant equation which accounted for all the sport’s diverse variables and yielded a single number roughly reducible to the oldest and most hallowed statistic of them all, the win. Hallelujah.

Wins Above Replacement is a beautiful idea. Euclidean grace in a quantum world. A simple answer, not only for age-old baseball conundrums like “Mantle or DiMaggio?”, but also a formula for unprecedented comparisons like “Rickey Henderson v. Johnny Bench” and “Roy Halladay v. Alex Rodriguez“.

There’s only one problem. It doesn’t work.

At least, not yet.  Not in the fantastically straight-forward way we try to use it.  The idea is so good, so clarifying – like democracy or the rational market – that we really, really want it to work, we’re willing to suspend our disbelief just a little while longer in the hope that it might. Because it’d be so great to know with statistical certainty that Albert Pujols was worth $200 Million, that we really couldn’t win that pennant without Andy Pettitte, that Jacoby Ellsbury is definitely the AL MVP, and that Ben Zobrist is exactly 9.3% better than Adrian Gonzalez.  Darn that dream.

The cruel irony, the I-could’ve-had-Sean-Doolittle-and-all-I-got-was-stupid-Barry-Zito irony, is that the problem with WAR is the same as the problem with RBI.  It frequently measures context as much as performance.  Especially when used to evaluate single seasons, it doesn’t sufficiently account for the inevitable variations in opportunity and environment.

What if Granderson played behind Ian Kennedy and Daniel Hudson?: UZR & Flyball Rates

A few weeks back I critiqued Steve Berthiaume’s analysis of Curtis Granderson’s defense by looking at some inconsistencies in the way Ultimate Zone Rating (the defensive metric associated with Fangraph’s WAR) assesses outfielders. Mark Simon of the ESPN Stats & Info Blog followed up with a very interesting review of specific plays which have adversely effected Granderson’s low ratings in 2011.  While Simon isn’t looking at UZR specifically, he does point out that most defensive metrics do not account for positioning and that half a dozen plays can cause sizable shifts in the aggregate numbers when we’re dealing with less than a season’s worth of data.

I’m not the only one who’s noticed that UZR frequently yields suspicious results in small samples, at Fenway, and when several good outfielders are playing alongside one another.  I do, however, want to expand upon my claim that outfield UZR is substantively effected by flyball rates.

In the Granderson article I pointed out that the teams in each league which rank highest in outfield UZR for 2011 – Boston and Arizona – also ranked #1 in their league in FB%.  This remains true.  However, this is obviously not sufficient proof of correlation, for a couple reasons.  Not only is there a high possibility of coincidence in any single example, but both the D-Backs and Red Sox feature several outfielders traditionally regarded highly by both sabremetricians and scouts.  For anybody who’s watched them consistently, it would be pretty hard to argue that the trio of Gerardo Parra, Chris Young, and Justin Upton isn’t among the best in the major leagues, no matter who’s on the mound.

So, I looked back at all teams that finished at the extremes of the flyball scale since 2003.  I do not claim that there is a perfect or, in the parlance of economics, a “strong” correlation.  That is, a team with a 35% flyball rate wouldn’t have a dramatic disadvantage in OF UZR compared to one at 38%.  There is, however, significant evidence that pitching staffs with extreme batted ball tendencies can dramatically effect their outfielders UZR numbers.  (These extremes I defined at upward of 40% at the high end and below 33% at the low end.)

Average OF UZR for FB% > 40.0: 10.1

Average OF UZR for FB% < 33.0: -10.6

Of the sixteen teams at the high end of the range, five finished #1 in their league in OF UZR.  Of the 21 teams at the low-end, only five finished with a UZR north of zero.

From these I would point to some interesting pieces of anecdotal evidence:

The 2010 Giants and their 40.7 FB% led the majors in outfield UZR by a substantial margin (40.7 to 31.6), despite the fact that they gave more than 1100 innings to Pat Burrell and Aubrey Huff, lead-footed former DHs who nonetheless somehow finished with positive UZRs for the season.

The 2007 Cubs had an exceptional 44.3 OF UZR in a season where they handed most of the innings to Alfonso Soriano, Jacque Jones, and Cliff Floyd, all of whom substantially outperformed their career numbers with some help from a Chicago staff that sent 40.6% of batted balls in their direction.

On the other side, the ’05 Cardinals, despite featuring some premier outfield talent in Jim Edmonds, Larry Walker, Reggie Sanders, and So Taguchi, finished with a -6.1 OF UZR, thanks to a pitching staff that put only 29.7% of batted balls in the air.

The difference between 30% and 40% can easily be several hundred plays, so when you consider Simon’s point about the significance of even a handful of mistakes in a few months of play, you can see what kind of advantage those extra opportunities provide.

This is not to say that UZR is useless, just that is unreliable in single season increments and that unreliability is passed on to WAR, which we habitually use/misuse when discussing single seasons and partial seasons.

I can’t play several positions. (or “The Adam Dunn Effect”)

WAR’s move to the mainstream is deeply tied to the rising popularity of FanGraphs.  One of the first of it’s “unlikely results” to spark considerable conversation was Ben Zobrist leading AL batters (and finishing behind only Albert Pujols and Zack Greinke overall) in 2009.  Zobrist had a breakout season which was impressive by any measure, but his WAR was given a major boost by his defense (only Franklin Gutierrez and Nyjer Morgan got a greater advantage from fielding).

On one level, this seemed legit.  Zobrist appeared at every position on the diamond in ’09 and over the years has proven himself to be an above-average defender at second base and in right field.  Managers have long lauded the value of versatility and lavished praise on players like Zobrist, Mark DeRosa, and Placido Polanco, who play several key positions well and also swing decent sticks.  Zobrist’s looked like evidence of their wisdom.

But while it isn’t much of a stretch to believe that Zobrist’s glove was worth a couple wins to the Rays in 2009, try selling this: According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.

There are two types of utilitymen, those who are given the job because they play many positions well and those who are given it because they play no position well.  As yet, WAR struggles to distinguish between the two.  It reads Houston’s inability to decide where Lee hurts them least as evidence of Lee’s versatility.  It suggests that Howie Kendrick‘s defense at second base has gone from average to exceptional since Mike Scioscia started giving him more starts in left field.

UZR results get weirder the smaller the sample gets.  The utility player may log a thousand innings in total, thus suggesting his UZR is somewhat more reliable, but what actually happens is that several hyper-unreliable samples of a few hundred innings or less are bundled together like toxic mortgages and rated AAA.

WAR Hates Sluggers

One of the things which advanced stats should be applauded for is the extent to which they’ve decreased the fetishizing of the homerun and raised awareness of all-around contributions.  Jonah Keri and Dave Dameshek debated the relative merits of Willie Stargell and Tim Raines this week, largely based on the fact they had identical career WAR totals.  Dustin Pedroia has a real shot at his second MVP, despite the fact that his “traditionals” (.309 AVG, 85 R, 18 HR, 74 RBI, 25 SB) are basically the same as Melky Cabrera‘s (.303, 83, 17, 79, 17).

However, one can’t help but notice that a cross-section of the most intimidating hitters in the game are treated with relative disdain by the metric.  It doesn’t like them because they play first base or left field (or DH), which aren’t scarcity positions.  It doesn’t like that they are fat and slow.

While I understand that everybody would love to have Chase Utley or Troy Tulowitzki, a middle-of-the-order hitter who makes big contributions in the field and on the basepaths, as well as at the plate, the fact remains, building a lineup without a slugger (or two) is like building a mall with seven Sunglass Huts and no department stores.  A few sluggers are swift, slender middle-infielders.  Most of them aren’t.  To paraphrase Reggie, there are lots of drinks and precious few straws.  If you get left without one, no amount of Range Factor, WHIP, or baserunning acumen can save your season.  Just ask the Padres, or the Mariners.

Yet, we misuse WAR to insist that it’s better to have Ian Kinsler than Miguel Cabrera or that Peter Bourjos is as valuable as Prince Fielder or Mark Teixeira.

We’ve struggled to understand and statistically represent the effect hitters have on one another.  Would Nyjer Morgan be hitting .306 if he wasn’t batting directly in front of Ryan Braun and Prince Fielder?  (WAR suggests, by the way, that Morgan has been more valuable on a per game basis than Fielder.)  Morgan is taking free passes this season at only about half his career rate.  Has he become less patient?  (On the other side of things, Adrian Gonzalez‘s career OPS is fifty points higher when the pitcher is throwing from the stretch.  He’s enjoyed that situation in 52% of his plate appearances in 2011.)

While I admit the difficulty of building a model that accounts for the effect a pairing like Braun/Fielder or Pujols/Holliday has on the rest of the lineup, this is one area in which I find the conventional wisdom to be irrefutable.  While I applaud WAR (and other metrics) for aiding in our appreciation of defense and baserunning, it’s beyond asinine to conclude that Ellsbury is twice as valuable as Fielder.  Too often WAR is used as a means of comparing oranges to apples.  One of the things that makes baseball great is the diversity of the fruit basket.  WAR give incredible weight to scarcity of shortstops, but no weight to the scarcity of pitcher-intimidating, strategy-altering cleanup hitters, which I see as a form of reverse discrimination.

These are not the last of the problems.  WAR evaluates catching using only the ability to control the running game.  There is abundant evidence that certain park factors have not been sufficiently accounted for.  I’m not arguing, however, that WAR should be completely discounted.  As yet, it is probably as good a singular statistic as is widely available.  But, WAR is not a debate-ending statistic, especially for single seasons.  Even WAR’s adherents, like Dave Cameron, generally admit the margin of error is at least 15%.  When we stubbornly suggest that 0.5 WAR means anything, we are grossly exaggerating the statistic’s accuracy, even according to its creators.  It remains true that any reasoned discussion of an individual’s contributions still requires analysis of the various components that go into WAR, as well as several that don’t, and, as such, subjectivity reigns.

Statistical elegance is elusive.  Variables get short shrift or go unaccounted for entirely.  Results yield unintended consequences.  Misunderstood data is misrepresented and polemicized.  In the words of Tolstoy: WAR makes fools of us all.


EDITOR UPDATE (9/7/11): For Hippeaux’s reply, please click here. It’s certainly worth reading and addresses many of the thoughts, issues, concerns, debates mentioned in the comments below. Brien, too, has a follow-up that you might want to check out. Thanks.  -J

Matt teaches at The University of Alabama. Roll Tide. He specializes in American Literature and Rhetorical Economics. Fate chose for him the peculiar perdition of rooting for the Chicago Cubs and the Los Angeles Clippers.

  1. Excellent post Hippeaux, and I totally agree. WAR is a great concept and deserves continued work, but at this point there's too many contentious variables that go into it to make it anything close to reliable.

  2. I like this article and I agree with most of it. I have one problem with it. When you say that UZR hates sluggers and you compare how Nyjer Morgan is more valuable than Prince Fielder you are forgetting one of the components of WAR, wins above Replacement. While Fielder is great if you compare him to the average first baseman his statistics are not that far superior to many other first basemen in the league. However Nyjer Morgan's performance is more superior relative to his positional replacements than Fielders is.

  3. Damn.

    I have to read this article more closely. But my first read is, damn. This is a helluva post.

  4. THANK YOU for this post.

    UZR may be good for multiple years, but it is a flawed single season stat. Saying Carl Crawford one year in TB went from being an elite LF to being a terrible one the next year makes no sense.

  5. Good article – with one stat alone, you showed the fallibility of even advanced stats.

    "On the other side of things, Adrian Gonzalez‘s career OPS is fifty points higher when the pitcher is throwing from the stretch. He’s enjoyed that situation in 52% of his plate appearances in 2011."

    Wow – did not know that. It isn't that much of a stretch to believe there are other similar situations out there where numbers are skewed by the situations, rather than reflecting a player's actual skills and performance.

    I have been raging a mini-war against WAR with my buddies for almost 2 years now. If used the sluggers argument before, and Dustin Pedroia has been my poster child for the anti-WAR stance this season, but now I have some more ammo with the FB%.

  7. WAR would be much better if it used more than one year samples of UZR. Every year there are dozens of extreme UZR fluctuations, that even if you believe it is a very accurate measure, it's impossible to argue that is anywhere near accurate in one year samples. When comparing players that often amounts to multi-win differences.

    I also question how useful it is to compare players only their own position. If the goal of the stat is to measure "value" to a team than that is the way to go. People use it now to say how good a player is, though, which seems unfair. A player's performance should not be downgraded because he is in a particularly strong class of first basemen (for example).

    Great article. It's strange that most of my baseball stat arguments are no longer trying to get people to forget about RBI and BA, but instead trying to get people to not take sabermetrics as gospel.

  8. Huge HUGE fan of this article. Great post. I'm an engineer, and solving all of the hardest physics problems that we have overcome, statistics is a powerful tool. All the different ways to visualize and create metrics for the data should be viewed as a different lens and nothing more. The truth lies in the data, and sometimes the truth is fuzzy enough that it is immeasurable. This is why we play the games, this is the reason sports are fun for jocks and nerds alike, and this is why no matter how smart front offices are, sports will never be as boring and predictable as some of the old school writers would have you believe. Nice work.

  9. You say, "It [WAR] doesn’t work." But you also say, "As yet, it [WAR] is probably as good a singular statistic as is widely available." Doesn't it follow logically then, that all other singular statistics don't work either?

    I think your actual argument is that WAR has limitations, that it is dependent on the accuracy of the statistics and valuations of its components. I doubt very many of the proponents of WAR would disagree with that. But saying that WAR doesn't work is hyperbolic and inaccurate.

  10. 20 comments in and no one has pointed out that Protection Theory has been proven time and time again to be a total myth? That pretty much kills the entire second half of your post here

    This entire thing is all over the place and full of non-sequitors like "It's asinine to suggest that Ellsbury is twice as valuable as Fielder?" Why? Because your gut says so? Pathetic.

  11. Finally it is nice to know I'm not the only one who thinks WAR is not best way to measure players.

  12. Interesting article, and very well written. A few notes:

    About the flyball percentage and UZR point:

    How can we be sure that that teams that give up flyballs don't optimize for outfield defense? We don't know which way the relationship goes in terms of causation.

    And about the sluggers thing:

    I'm pretty sure that the influence of the on-deck batter has been shown to be pretty minimal, so I'm not sure how strong of a point this is.

  13. Excellent post. I'm a fan of WAR as a shorthand, but more as a jumping off point than the end of a discussion.

    If I can pick one minor thing to quibble with — my understanding is that WAR doesn't say that Troy Tulowitzki and Carlos Lee have made equal defensive contributions this season. Tulo and Lee have saved similar numbers of runs compared to an average fielder at their respective positions, but that's not the same as making equal contributions. The "positional adjustment" factor that goes into WAR is an adjustment based on fielding position, i.e., for the fact that catchers and shortstops do more on defense than left fielders and first basmen. Tulo has a positional adjustment of plus-six runs (for playing SS) and Lee a minus-eight run adjustment (for being in LF). So despite having the same UZR, WAR still says there's a fourteen run difference in defensive contribution between the two.

  14. This is a poor attempt at disarming WAR. First, you make no legitimate points against the offensive contributions to WAR. As has been stated above, "Protection" doesn't exist. Your only legitimate critique of the stat is with respect to UZR. Yes, it comes with significant error bars – even after 162 games. I completely agree this component of WAR needs to be taken with a grain of salt. I generally perform a mini-regression in my head when I see a UZR that seems out of line with a player's career performance.

    It seems, as is the case with most, if not all, "I don't like X stat, it doesn't work" arguments, is that you misunderstand the stat's scope and purpose. No stat is perfect. No stat comes without error bars. No stat is meant to be a utility knife to perform many roles. WAR makes the largest effort to incorporate a multitude of factors, yet it still tells us a very specific thing: over his innings played, how much value did a player provide over what our theoretical replacement player would have provided.

    It doesn't mean the BZA is a BETTER player than AGone, but during the innings that he played this year, according to wOBA, league and park factors, replacement levels, and UZR, he provided more value. It doesn't mean that given the same salary, you would take Zobrist's contract, it doesn't tell you who will be better next year, it doesn't tell you who is more skilled – it tells you nothing past what WAR intends to measure. Raw value provided over replacement. And when you account for the error bars in UZR, you can't even take these decimal places literally – as you claim is commonly done. Make your mental UZR adjustment if you don't agree. Make you additional park factor adjustment if you don't agree. This just gives us great starting point to begin discussing value.

    Leave the extremism to politics. Claiming WAR "doesn't work" simply makes you look stupid.

  15. If you don't trust the defensive metrics use Offensive WAR on or VORP on Baseball Prospectus. Those are both everything but the defense metrics.

    oWAR just assumes every player is an average fielder.

  16. WAR is very much a work in progress and currently there are several working prototypes. Chances are whichever one is the best will still have serious limitations and probably can't be used as the argument ender that we would like it to be. That being said I feel like a more interesting avenue of inquiry would be determining which prototype is the least flawed rather than showing the particular flaws of just one prototype.

  17. This article makes a couple interesting points, but it's majorly flawed. Your gut may tell you that a big, fat slugger is automatically more valuable than a speedy line-drive-hitting defensive wiz, but that doesn't make it so. The burden is on you to prove it.

  18. So… is this a criticism of WAR, or a criticism of UZR? As near as i can tell, the problem you have is strictly with UZR. What about DRS or TZ (or some other advanced collective fielding metric)?

  19. I'll post this again….

    Carl Crawford career LF at the Trop 22.5 UZR/150
    Carl Crawford career LF everywhere else: 7.5 UZR/150

    This is over 8 years (so each sample size is the rough equivalent of 4 full years).

    1 year OF UZR samples are bad, but even the general "3 years is what you need" can also have issue as UZR can have systematic biases…. input bias, park effects and subjective components (armR, errR for outfielders) which don't even out over a 3 year period.

    I like the concept of WAR and the issue I have with it is bad input data (the defensive stats in both WAR models and the baserunning values now put into the fWAR). Until these variables can be measured better (FieldFX?), any difference in WAR between players based on these components should be taken with a huge boulder of salt.

  20. So if greater FB% means greater UZR. War tries to show runs saved on the defense side. If the outfielders are saving more runs because they get more chances then others. How is this any different than teams that have their clean up numbers inflated because they get more at bats because their teams make less outs etc. The extra at bats would increase their counting stats. Yet this is why the averages (wOBA) are used. It appears to me that UZR should be adjusted in the same manor where its not counted in a linear formula.

  21. Please go to the link in the reply above me. It is a thorough evisceration of this half-assed, misguided critique. If Tango had been able read it first I don't think he would have even bothered to respond and give it hits. Hippeaux, you're out of your league (not saying I'm in that league, but I'm smart enough to know when smart people know more than me).

  22. "Dustin Pedroia has a real shot at his second MVP"

    Really? That's the first time I've even heard his name in relation to the 2011 MVP: Maybe just give it to the entire Red Sox lineup to keep the media happy?

  23. I thought you were going to mention that UZR has a different baseline (league average) than wRAA does (replacement level).

  24. Great post. And for all the haters who want to prove you wrong, the burden is on them.
    And I bet most of the haters have not developed their baseball eye, meaning they can't get tell you much while watching a player, they need their stat book. These people are not only stubborn, they are boring to converse with.

  25. Interesting post. WAR is far from perfect, but I disagree with some of you problems with it.

    1. UZR and Fly balls – I wonder how much of your findings were due to selection bias. A team with a lot of groundball pitchers would probably more prone to sign the slugging, poor defensive left fielder than one with a flyball heavy staff. Conversely, teams like the Mariners, Padres, etc. will probably lean towards good defenders for their parks, and won't shy away from flyball heavy pitchers as say, the Yankees or Rockies, might. That said, I definitely do think there is some effect on UZR, I just question how much.

    2. Dunn Effect – Maybe I'm misunderstanding you here, but I disagree with this whole section (and I'm aware that UZR is far from perfect). UZR doesn't suggest that Carlos Lee has as much defensive value as Tulowitzki, it suggests that he had as much defensive value compared to other LF/1B as Tulo did to SS. Since SS are collectively among the best defenders on the diamond, Tulo has a higher bar, but that's the whole point of the positional adjustment.

  26. Fantastic article. Those accusing the author of "not understanding what WAR is supposed to be used for" I think are missing his point that the proponents of WAR often regard it as an end all-be all stat defining how good a player is, not the subtler stat looking at his relative value that it truly is. I believe those people are misusing and misunderstanding WAR, just as you accuse this author of. I think he is trying to change the conversation so people start to recognize that just as a simple stat like RBIs has simple flaws, a complicated stat like WAR has more complicated flaws, ones that are easier to dismiss because of their intangibility.

  27. I'm pretty sure you've been sitting on that ending quote for a couple weeks until you could write a pertinent WAR-related article. Props on the pun.

  28. IIATMS has some great articles but this one is rather poor. As Michael above illustrated excellently in his response on PA, your understanding of WAR is flawed in several places, particularly in your lack of understanding of positional adjustment and of UZR, which is a very complicated metric with failings that are well-known. No one credible has ever suggested that one player's season is definitively better than another's because they put up 6.5 WAR against 6.3 WAR. Also, you seem to be particularly targeting UZR and fWAR, which incorporates it. rWAR offers an entirely different perspective on WAR which does not utilize UZR and even provides oWAR and dWAR separately if you don't trust Total Zone either or are too lazy to subtract UZR values from fWAR. WAR is not a perfect tool, but it is one of the best when used properly, and numerous people have devoted their efforts to making sure that all factors are taken into consideration and the numbers are as accurate and objective as possible. The biggest failing of WAR is one which you did not mention – its failure to address the issue of luck (though there is an argument to be had over whether luck has any place in an evaluative statistic). In short, WAR is not as simple as it may seem, and it is your responsibility to have a complete understanding of it before offering your criticism.

  29. WAR is a framework, a method of combining all facets of a player's performance. It's not only published at FanGraphs, using UZR as it's defensive component. You can find it at Baseball Reference or The Hardball Times, or called VORP at baseball Prospectus. Sean "Rally" Smith had his version as well, before he was hired as a consultant and stopped publishing.

    I develop the numbers for The Hardball Times. I'm glad to say that many of the numbers you complain about in the article are not as extreme at THT, where Prince Fielder's WAR per PA is 28% higher than Nyjer Morgan's, and where Fielder has the 15th best overall WAR for a position player, ahead of Bourjos' 46th.

    Most of the article deals with UZR issues, not WAR. The two biggest differences between WAR implementations are defense and positional adjustments. WAR doesn't hate sluggers – it jusy refuses to rate highly those sluggers who are limited to defensive positions populated by other sluggers. THT has Mark Teixeira as having the 11th best 2011 season among first baseman. He may have 35 HRs and 100 RBI, but the production is not nearly as good as Prince Fielder, whom I rank 4th (behind Votto, AGon and Miggy). So the bar for first base is much higher than that for shortstop, where the 10th best THT War goes to Elvis Andrus.

    Your comments on correlating fly ball rate with defensive ratings did get me thinking, as that touches on work I'm doing now with estimating batting average on balls in play for batters and pitchers by their ground ball (or fly ball) rate. More flies indicate a higher mean vertical angle, which results in easier to catch balls. So you did find what I believe is a true effect. I will begin work asap to get this coded into THT's outfield defense as an adjustment to the expected catch rate.

  30. Hippeaux: I think you did a great job on the fielding part. You were too quick on the conclusions pretty much everywhere else. You seem to be somewhat in tune with the concept. If you email me, perhaps we can have a dialogue, and we can see where we can plug up some holes and straighten out some edges.


  31. In neoclassical economics as in real life, analysts often fallaciously assume that they can clearly delineate what a market is. Thus, Conventional Wisdom holds that there is one global market of baseball players, and that Carlos Lee must be a good one because he hit more HRs than any other Astro this year. This baseball market is global.

    WAR makes the same assumption about delineation, although it draws the line completely different. For WAR, there is one market for shortstops and a separate market for 1B, and it operationalizes this assumption by comparing a SS only to a replacement-level SS when making its valuation. This baseball market is local.

    Hippeaux's point about sluggers highlights the limits of this reasoning. There are some commodities, skills, etc., that are abundant locally but scarce globally, despite the fact that the location of abundance is by definition part of the global market. So slugging 1B should not be compared only to 1B because slugging, while abundant at 1B, is scarce generally. This global scarcity gives slugging more value than WAR suggests, without even resorting to Jim Rice-y arguments about intimidation. However, the local abundance of slugging at 1B does depreciate the value of a slugger who is marooned at 1B quite a bit more than Conventional Wisdom suggests. Thus, the real money should go to a skill set – such as Tulowitzki's – that is scarce both locally and globally.

  32. For those "protection is a myth" people…

    You are looking at this backwards. The point about Adrian Gonzales isn't that he is being "protected" by a player behind him, but being elevated by being put in a spot to succeed.

    Players hit better with men on base. Pitchers have to throw more strikes and OPS, overall, is raised when hitters are on base. So, If I see 20% more men onbase this year than last, shouldn't my OPS be increased by a percentage of that? Gonzales isn't making them hit better, they are putting him in a position to hit better.

  33. My main problem with WAR has always been the defensive side of it so big thumbs up there. I perfer VORP myself.

  34. Maybe you can’t compare outfielders to first basemen, but I assume you can compare shortstops to shortstops, as in Derek Jeter to Elliot Johnson. According to WAR, Jeter, who is playing every day for a championship team and over the past two months has been among the best offensive players in the American League, has a WAR of 0.9, while Johnson, a part time player who has more strikeouts than total bases, has a WAR of 0.7. According to WAR, you could pretty much interchange Johnson for Jeter and have no effect in your lineup.

    My girlfriend’s niece is married to Johnson, so my girlfriend has a vested interest in how well he does. She wants to see him succeed and follows him more closely than she does Jeter. Yet when I mentioned that according to WAR, he was virtually as good as Jeter, my girlfriend burst out laughing. Her reaction was that WAR must be a faulty statistic because even with her personal bias in favor of Johnson, she knew there was no way he was as valuable a player as Jeter, and that it wasn’t even a close comparison.

    I understand that the biggest reason for this similarity in their WAR is because of fielding – Johnson has a 0.7 defensive WAR while Jeter is rated at -0.9. While I have no doubt that Johnson is a fine fielder I also don’t believe Jeter is more than 1.0 worse than him. I subscribed to the MLB TV package this year to watch Yankee games and have seen quite a few of them. One of the reasons was to see just how much Jeter has lost at shortstop because of all the negative reports I’ve read.

    Unfortunately, I have not seen this and in fact, have often been surprised at how many plays he makes on balls that other shortstops often don’t get. But a lot of it has to do with his positioning, which I’m sure is a result of years of experience. I’ve seen him field a ball hit right over second base by moving one step to his left. I’ve seen him field balls that were hit past the third basemen because he’d positioned himself so far to the right. My question about UZR is, if he has positioned himself near second base because of the way the pitcher is throwing or the way the batter has been tending, and instead the hitter smacks one through the shortstop hole, does he get penalized for not making a play an “average” replacement shortstop, positioned normally, would have made?

  35. When you see that Tony Batista had a fWAR of -0.4 in that illustrious season you pointed towards, maybe you'll realize that WAR gets more right than it gets wrong and until you come up with a better solution, perhaps you should leave the kvetching to those more qualified. Do the same with UZR while you're at it.

  36. One problem I have with WAR is the use of the word "wins." These are hypothetical wins. The great Dodger infield of the 70's and early 80's didn't rack up much WAR, but they sure won a lot of baseball games. (Compare WAR of Garvey, Lopes, Cey, and Russell to Rose, Bench, Morgan, and Perez, no comparison, but look at their postseason records from 74 to 81, pretty comparable.) On the flip side, Bobby Abreu, darling of the WAR stat, hasn't been much of a help getting teams to the postseason. I liken WAR to a players full scale baseball quotient – like an IQ score. Just because you have a high IQ doesn't necessarily mean you do smart things and just because you garner the right stats, doesn't mean you win baseball games.

  37. I disagree with the author's contention that WAR doesn't prop up sluggers. Look at the top WAR guys at both Fangraphs and B-R, they are pretty much all sluggers (Ty Cobb led the league in HRs). Look how inflated McGwire is on both lists, I'd take Keith Hernandez or Will Clark as my first baseman in a heartbeat over Big Mac and all his useless walks and meaningless solo home runs (He has the lowest RBI to HR ratio of any of the top HR guys). Yes, useless walks, what do you do now that you have, perhaps, the slowest professional athlete in the world standing on first base. He only scores if the guy behind him goes yard.

  38. Hippeaux:

    Aren't you being too harsh on the RBI? Isn't the problem with RBI exactly the same problem that you have identified with WAR — that people misuse generally it?

    RBI does not purport to tell us everything about a player's overall value. Indeed, it does not purport to tell us everything about the player's value as a hitter. Never did, never will. Since the dawn of time, there have always been other statistics — such as batting average, stolen bases, and (even before OBP!) walks — which were intended to tell us more about a player's value as a hitter. And everyone knew it: it was obvious (even to Tim McCarver, I think), that a RBI-machine in the batting order needs table setters who bat .300 and are willing to take a walk to get on base, steal a base, and score the run.

    RBI is a counting stat, and in counting, it does exactly its job — it is quite accurate. A player with more RBIs has batted in more runners. If you use the RBI as it was intended, it cannot really be misused.

    So we should stop picking on the RBI, ok?

    Indeed, I would argue that WAR is far more problematic than RBI, because while RBI was misused in a way that it was never intended to be used, WAR (or really, fWAR, which is what I think you are really focusing on) has in fact has been misused in exactly the way that it was intended to be used.

    One of the points that you make — and others make, in comments to Tango and Neyer's responses to your post — is that WAR is a shiny new tool and so people tend to misuse it. I want to take that a step further: WAR is intended to be used as — it invites us to use it as — an all-encompassing stat. Unlike RBI, which nobody should even have confused with an all-ecompassing stat, WAR was created precisely for the purposes of giving stat-heads a way of encapsulating a player's full value into one number. As such, it should not be a surprise that WAR is being used to represent a player's full value.

    Since WAR is susceptable to misuse, blame for failing to avert its misuse should be placed on WAR (or the creators/proponents thereof). It's not something that can be shrugged off as "someone else's" mistake. Think of it as a labeling issue: we require manufacturers of products to warn users against expected or expectable misuses, but not against unexpectable or ridiculous misuses. If McDonalds makes coffee that is too hot to drink (or makes coffee sold at drive-thrus too hot to be spilled on a lap in the car), then McDonalds has to label against that because it invited the coffee to be used that way. But McDonalds' does not have to label that it's coffee is too hot to be poured in somebody's eyes, because nobody should be using coffee that way in a million years.

    Thus, critizing WAR as you did (and, again, here I probably mean fWAR) for others' misuse is fair.

    But let's leave poor RBI out of it.


    PS — Sorry I'm so late to the party

    PPS — Tango's reaction to your article is absurd. As others have said, if WAR is defined so narrowly as to be a framework — deviod completely of the flawed implementation of the framework — then yes, it's going to be imune to criticism. But then it's pretty much useless. In finance, as I am sure you know, we have a concept called the discounted cash flow analysis (DCF) for valuing a company that is, in theory, perfect. But the devil is in the details, and small differences in assumptions and preductions can lead to vast differences in valuation using this method. That's not a flaw in the implmentation — that's a flaw in DCF itself. Great in theory, but problematic in fact. This is your criticism of WAR, and it's a good one.

    Thanks for the great article!!