Thanks to everybody who read “Is WAR the new RBI?” yesterday. (Thanks especially to those who read all the way to the end.) I really appreciate all the feedback, retweets, and rebuttals. Although I won’t have the opportunity to respond to everybody in detail, I will offer some concessions and some clarifications. I’ve read the relevant posts by Rob Neyer, Tangotiger (direct and indirect), William of The Captain’s Blog, Bryan O’Connor of Replacement Level, and Michael at Pinstripe Alley, as well as the twitter feeds of Mike Fast and Colin Wyers of Baseball Prospectus.
Probably the foremost criticism (although it’s tightly-packed at the front) is that I don’t do justice, perhaps don’t even understand, positional adjustment. Of course, I make reference to the impact of positional adjustment at several points in the post, but many fixated upon this line: “According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.”
I’ll admit, I took some shortcuts in the name of efficiency, thinking people would understand what I was referencing, especially if they went and looked at the data. What I should’ve said, perhaps, is, “According to WAR, Carlos Lee is to players at his position as Troy Tulowitizki is to players at his position.” I hate the way that reads, but it’s more accurate. It’s also, I believe, ludicrous. Is Carlos Lee as good at any position as Troy Tulowitzki is at shortstop?
If he was, don’t you think the Astros would play him there (I take that back, I should never preface anything on what the Astros might do). Given sufficiently large data sets there is no metric which supports Lee as a premium defender at 1B or LF. The problem here is the amalgamation of multiple unreliably small samples.
To this, Michael asserts, “If you were throwing Adam Dunn around through all the fielding positions, WAR would most definitely reflect that he sucks at all of them.” That has been true of Adam Dunn, which was part of why I used him as the example of the guy who “can’t play any positions,” but it hasn’t been true for Carlos Lee in 2011 or Aubrey Huff in 2010. Their “suckage” is not reflected. These may be obvious outliers to the well-informed, but they make me suspicious of UZR data for other players who get moved around the diamond frequently and see big spikes as a result, especially for single-season samples.
The part of my argument which was most egregiously misrepresented was the impression that I was trying to “take down” or “dethrone” WAR. That we think about statistical metrics in these terms is, I think, part of the problem. Vehemence is unbecoming. In the post, I readily admit that it is probably the best singular statistic which is readily available and that I refer to it all the time. What I have been observing is the extent to which WAR is more and more frequently invoked in debates about postseason awards, contract extensions, All-Star selections, and potential trades, usually without any reference to its potential shortcomings.
Which brings us to the royal “we”…
Neyer calls me “intellectually bankrupt” on the basis of my use of the royal “we” (more accurately known as nosism), which reveals, apparently, that I’m arguing with a straw man – that is a “dunder-headed fool” who doesn’t exist, or, at least, doesn’t exist in any public forum except a mainstream message board. He insists, “Single season UZR’s can be terribly misleading, which everyone’s known for a long time.” Really? Everyone? (I can use rhetorical questions too!) Enraged to the point of “wanting to poke someone in the eye with a sharp stick” Neyer retorts, “Are there particular writers or broadcasters who have been abusing WAR this season?”
We could count how many times Carl Bialik uses WAR in this Wall Street Journal blog with only a passing reference to what WAR stands for – “total value to the team” – and no reference to it’s potential shortcomings, especially in July.
We could observe that Eric Karabell leans heavily on WAR in his prospective MVP ballots, making it his standard for inclusion in the conversation and going so far as to describe Adrian Gonzalez as “Boston’s third-best player” because of a fractional disadvantage in WAR.
And, of course, in recent MVP debates too numerous to mention, WAR has been used like a sledgehammer to prioritize Jacoby Ellsbury over Curtis Granderson, Jose Bautista over Jacoby Ellsbury, C.C. Sabathia over Justin Verlander, Roy Halladay over all NL hitters, etc., etc.
I’m not saying that everybody abuses WAR. I’m certainly not saying that Rob Neyer abuses WAR. But, I think the minor groundswell of reaction to yesterday’s post, from both sides, is itself a testament to the extent which fans and analysts alike are uncomfortable with the many scenarios in which WAR is cited as gospel. (And, yes, Rob, my friends do judge players using WAR discrepancies well within the margin of error. To my credit, I rarely call them on it.)
I am absolutely guilty of using WAR in overtly simplistic ways, so, to no small extent, my use of the pronoun “we” was quite literal.
WAR is better than RBI. On that point, I happily concede. My argument does not mistake them for perfect analogues. My point is, many fall victim to the illusion of elegance and use WAR exactly as they did RBI and HR in the past, saying “7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson.” The creators of WAR don’t argue for that method and they are open to debating its flaws, but often intention diverges from practice.
One could argue that any time you aspire to such elegance, you risk having aspiration mistaken as achievement. I would suggest that all attempts to build elegant statistical models, no matter what the caveats, will be treated by some adherents as scripture (see Black-Scholes, etc.). But statistical models are metaphors. As such, they are frequently very persuasive (and useful!), but they are also, by definition, imperfect. Unintended consequences should not preclude the urge to innovate, but do need to be acknowledged and debated, frequently and candidly.