In Defense of the Royal “We”

Thanks to everybody who read “Is WAR the new RBI?” yesterday.  (Thanks especially to those who read all the way to the end.)  I really appreciate all the feedback, retweets, and rebuttals.  Although I won’t have the opportunity to respond to everybody in detail, I will offer some concessions and some clarifications.  I’ve read the relevant posts by Rob Neyer, Tangotiger (direct and indirect), William of The Captain’s Blog, Bryan O’Connor of Replacement Level, and Michael at Pinstripe Alley, as well as the twitter feeds of Mike Fast and Colin Wyers of Baseball Prospectus.

Probably the foremost criticism (although it’s tightly-packed at the front) is that I don’t do justice, perhaps don’t even understand, positional adjustment.  Of course, I make reference to the impact of positional adjustment at several points in the post, but many fixated upon this line: “According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.”

I’ll admit, I took some shortcuts in the name of efficiency, thinking people would understand what I was referencing, especially if they went and looked at the data.  What I should’ve said, perhaps, is, “According to WAR, Carlos Lee is to players at his position as Troy Tulowitizki is to players at his position.”  I hate the way that reads, but it’s more accurate.  It’s also, I believe, ludicrous.  Is Carlos Lee as good at any position as Troy Tulowitzki is at shortstop?  If he was, don’t you think the Astros would play him there (I take that back, I should never preface anything on what the Astros might do).  Given sufficiently large data sets there is no metric which supports Lee as a premium defender at 1B or LF.  The problem here is the amalgamation of multiple unreliably small samples.

To this, Michael asserts, “If you were throwing Adam Dunn around through all the fielding positions, WAR would most definitely reflect that he sucks at all of them.”  That has been true of Adam Dunn, which was part of why I used him as the example of the guy who “can’t play any positions,” but it hasn’t been true for Carlos Lee in 2011 or Aubrey Huff in 2010.  Their “suckage” is not reflected.  These may be obvious outliers to the well-informed, but they make me suspicious of UZR data for other players who get moved around the diamond frequently and see big spikes as a result, especially for single-season samples.

The part of my argument which was most egregiously misrepresented was the impression that I was trying to “take down” or “dethrone” WAR.  That we think about statistical metrics in these terms is, I think, part of the problem.  Vehemence is unbecoming.  In the post, I readily admit that it is probably the best singular statistic which is readily available and that I refer to it all the time.  What I have been observing is the extent to which WAR is more and more frequently invoked in debates about postseason awards, contract extensions, All-Star selections, and potential trades, usually without any reference to its potential shortcomings.

Which brings us to the royal “we”…

Neyer calls me “intellectually bankrupt” on the basis of my use of the royal “we” (more accurately known as nosism), which reveals, apparently, that I’m arguing with a straw man – that is a “dunder-headed fool” who doesn’t exist, or, at least, doesn’t exist in any public forum except a mainstream message board.  He insists, “Single season UZR’s can be terribly misleading, which everyone’s known for a long time.”  Really?  Everyone? (I can use rhetorical questions too!)  Enraged to the point of “wanting to poke someone in the eye with a sharp stick” Neyer retorts, “Are there particular writers or broadcasters who have been abusing WAR this season?”


We could count how many times Carl Bialik uses WAR in this Wall Street Journal blog with only a passing reference to what WAR stands for – “total value to the team” – and no reference to it’s potential shortcomings, especially in July.

We could observe that Eric Karabell leans heavily on WAR in his prospective MVP ballots, making it his standard for inclusion in the conversation and going so far as to describe Adrian Gonzalez as “Boston’s third-best player” because of a fractional disadvantage in WAR.

We can find Ryan Korby of The Washington Post using WAR to make an argument for resigning Livan Hernandez, without using any other metrics.

And, of course, in recent MVP debates too numerous to mention, WAR has been used like a sledgehammer to prioritize Jacoby Ellsbury over Curtis Granderson, Jose Bautista over Jacoby Ellsbury, C.C. Sabathia over Justin Verlander, Roy Halladay over all NL hitters, etc., etc.

I’m not saying that everybody abuses WAR.  I’m certainly not saying that Rob Neyer abuses WAR.  But, I think the minor groundswell of reaction to yesterday’s post, from both sides, is itself a testament to the extent which fans and analysts alike are uncomfortable with the many scenarios in which WAR is cited as gospel.  (And, yes, Rob, my friends do judge players using WAR discrepancies well within the margin of error.  To my credit, I rarely call them on it.)

I am absolutely guilty of using WAR in overtly simplistic ways, so, to no small extent, my use of the pronoun “we” was quite literal.

WAR is better than RBI.  On that point, I happily concede.  My argument does not mistake them for perfect analogues.  My point is, many fall victim to the illusion of elegance and use WAR exactly as they did RBI and HR in the past, saying “7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson.”  The creators of WAR don’t argue for that method and they are open to debating its flaws, but often intention diverges from practice.

One could argue that any time you aspire to such elegance, you risk having aspiration mistaken as achievement. I would suggest that all attempts to build elegant statistical models, no matter what the caveats, will be treated by some adherents as scripture (see Black-Scholes, etc.).  But statistical models are metaphors.  As such, they are frequently very persuasive (and useful!), but they are also, by definition, imperfect.  Unintended consequences should not preclude the urge to innovate, but do need to be acknowledged and debated, frequently and candidly.

Matt teaches at The University of Alabama. Roll Tide. He specializes in American Literature and Rhetorical Economics. Fate chose for him the peculiar perdition of rooting for the Chicago Cubs and the Los Angeles Clippers.

About Matt Seybold

Matt teaches at The University of Alabama. Roll Tide. He specializes in American Literature and Rhetorical Economics. Fate chose for him the peculiar perdition of rooting for the Chicago Cubs and the Los Angeles Clippers.

35 thoughts on “In Defense of the Royal “We”

  1. Hippeaux speaks the truth, and I applaud Rob Neyer for recognizing it. It was clear to all of us who read his post with an open mind that he was including himself when he said "we".
    It seems Hippeaux is arguing for a blended approach, with some stars and some observation beyond which stats can achieve on their own. And if anyone doubts his method, ask him for his fantasy record over the last 5 years or so!

  2. wtf – your initial post was fine. Weren't THAT many posts when I left yesterday; was totally surprised by the crapstorm of 99 replies…

    OTOH, it was a response. ;)

    Wasn't your premise a simple one – that there really isn't ANY one number that can be taken as gospel – as the be all, end all, total summation of what a player is? Which IS how WAR is being used.

    Oh well. Good stuff. Fun to see that a lot of people are reading your stuff – even if their lips were moving the whole time.

  3. "I am absolutely guilty of using WAR in overtly simplistic ways, so, to no small extent, my use of the pronoun we was quite literal."

    I argued on Rob's post at MLB Nation yesterday that this was my reading of your article. Feels nice to be proven correct :).

  4. "My point is, many fall victim to the illusion of elegance and use WAR exactly as they did RBI and HR in the past, saying 7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson."

    If that is your point, then your thesis continues to be misguided and ill-focused. You rail against the usage in one sentence, then the implementation in the next. Anyone uttering the phrase you quoted above clearly misunderstands what WAR is, and what it measures. So why use that to make such outrageous claims like "WAR doesn't work"? It's not WAR's fault people don't take time to understand basic concepts.

    "Unintended consequences should not preclude the urge to innovate"

    On the contrary – in this case it decidedly should. People misuse WAR everyday. Does that mean we should change it's structure? No. WAR is a logically sound framework. If you have issues with the implementation of WAR, then by all means, do some research and write a thorough analysis of your findings. I would personally love to see more of the flyball/UZR work you did yesterday – that was the most interesting part of your original article, despite the sloppy methodology. But leave the empty, anti-WAR, page view garnering rhetoric in the garbage, because that's what it is.

    If your point all along was "People misuse WAR" then you are absolutely right. It happens everyday. But you insisted on carrying a belligerent tone throughout your piece that undermined any productive discussion on the most salient points you raised. Write a piece about how WAR is widely misused, great. It's worth writing. Then write an analysis of UZR and why it's biased in certain situations. Mixing them lead to write two pretty ugly and confusing pieces. (Evidenced by the fact that you have a slew of misunderstanders agreeing with you that WAR is broken. Which is false. UZR may be flawed, wOBA/wRC+ may not capture 100% of the true offensive contributions, but WAR as a framework is sound.)

  5. Who's "belligerent tone"?


    You need to re-read both posts and yourself and ask, who's spending more time in the gutter?

  6. Great debate, i find the past two days of response to yesterdays post as a good (for the most part) waypoint in the attempt to figure out the exact value of stats like WAR.

  7. Posted this on the next one, but it seems more appropriate here:

    "Here's the problem with abstractly using "we" without specifying who "we" is. If you're trying to remind people about the dangers of over-valuing Player X's WAR because of certain caveats, then I can't see anything wrong with that. But Hippeaux didn't say "we" was himself, the writers on this site, e-mailers, or followers on Twitter; he just said "we", leaving everybody to speculate about who specifically is misusing it. So everybody pointed the fingers at themselves, and of course they're gonna respond with "I'm not misusing it. Who is?" Failing to specify left everybody thinking that Hippeaux was criticizing them personally, and if that criticism was unwarranted, then, well, of course they're going to respond like that."

    But I guess you knew that already.

  8. Very well said. Thanks, Hippeaux, both for starting this conversation and for realizing my dream of finding my name on the Interwebs within centimeters of Rob's and Tango's.

  9. Ok, I see your point. You are right, using single season UZR in the WAR calculation can be misleading. I have asked the guys at fangraphs if they could use the aggregate defensive rating, which is the average of UZR, DRS, TZL, and the fan scouting reports, but doing so wouldn't allow them to have mid-season numbers since UZR and DRS are the only ones compiled throughout the year, and DRS isn't balanced out until the end of the year.

    Another idea I've discussed in the past was using a three year weighted average for WAR. I don't think there is absolutely nothing to single season UZR, as players can improve their true talent level or play better than their actual talent level.

    If I were an MVP voter, I wouldn't vote strictly by WAR leaderboards, but that's certainly where I would start. If a player is 2 wins above everyone else, I probably don't need to do more research. If there are like 5 players within a couple decimal points, I might need to look at things like WPA, adjust the WAR stats by using a 3 year average of UZR, and maybe take into account off-field leadership or other "intangibles." This, in my opinion, is far better than the traditional "RBI, HR, AVG, on a playoff team" formula that was used for so many years.

  10. "Their “suckage” is not reflected. These may be obvious outliers to the well-informed, but they make me suspicious of UZR data for other players who get moved around the diamond frequently and see big spikes as a result, especially for single-season samples."

    The big spikes aren't because they move around frequently, these things can just happen in a small sample. Michael Young has moved around the diamond and his UZR is bad at every position.

    My questions are:
    1. How do you propose this is fixed?
    2. Are suspicious of avg/obp/slg because Nick Evans is hitting .293/.371/.522 this year? Because that's not any different than your example.

  11. The wonder of baseball is that it's an intellectual exercise disguised as a pastime. Hippeaux is carrying on that tradition, combining the best of the new statistics with a skeptics intellect and a fan's flair for language and easy give and take. In short, he'd be a fun person to watch a ballgame with and drink a beer, commenting when prompted by onfield play, enjoying silently when not, which is the ultimate test of a true fan. To the righteous and raging, I just hope I never have to sit in front of you.

  12. This does sort of support the author's point that WAR must be understood in context…but I get tired of folks throwing out UZR for end of year awards. True, it is not a large enough sample size to decide if a player is a great/good/avg/bad defender. But in the context of a MVP discussion it is 100% relevant. UZR describes what actually happened. Again, it's predictive value isn't worth a bag of baseballs…but it describes what actually happened. If Brett Gardner hits 4 HR in the World Series…will anyone say he doesn't deserve the WS MVP award because he's not really a power hitter, and that sample size helped him look Ruthian? Of course not because he actually hit those home runs. Likewise, players have actually made the plays recorded for UZR. It may be ultimately unrepresentative of a player's ability (like a Matt Stairs Web Gem, or a Johnny Damon OF assist) but it actually happened, and players should get credit for having made the play.

  13. Like anyone will read this, but…

    Perhaps it's lack of anything better, but as a wanderer of the internet, people use UZR and moreso WAR as gospel (or at least the complete foundation of an argument) all the time. Rob is absolutely, completely wrong to say they don't.

    People who are more into the stats of course realize the limitations. They realize that both are works in progress. Which I think was the point of Hippeaux' column in the first place?