In Defense of the Royal “We”

Thanks to everybody who read “Is WAR the new RBI?” yesterday.  (Thanks especially to those who read all the way to the end.)  I really appreciate all the feedback, retweets, and rebuttals.  Although I won’t have the opportunity to respond to everybody in detail, I will offer some concessions and some clarifications.  I’ve read the relevant posts by Rob Neyer, Tangotiger (direct and indirect), William of The Captain’s Blog, Bryan O’Connor of Replacement Level, and Michael at Pinstripe Alley, as well as the twitter feeds of Mike Fast and Colin Wyers of Baseball Prospectus.

Probably the foremost criticism (although it’s tightly-packed at the front) is that I don’t do justice, perhaps don’t even understand, positional adjustment.  Of course, I make reference to the impact of positional adjustment at several points in the post, but many fixated upon this line: “According to WAR, in 2011, Carlos Lee has had as much defensive value as Troy Tulowitzki.”

I’ll admit, I took some shortcuts in the name of efficiency, thinking people would understand what I was referencing, especially if they went and looked at the data.  What I should’ve said, perhaps, is, “According to WAR, Carlos Lee is to players at his position as Troy Tulowitizki is to players at his position.”  I hate the way that reads, but it’s more accurate.  It’s also, I believe, ludicrous.  Is Carlos Lee as good at any position as Troy Tulowitzki is at shortstop?  If he was, don’t you think the Astros would play him there (I take that back, I should never preface anything on what the Astros might do).  Given sufficiently large data sets there is no metric which supports Lee as a premium defender at 1B or LF.  The problem here is the amalgamation of multiple unreliably small samples.

To this, Michael asserts, “If you were throwing Adam Dunn around through all the fielding positions, WAR would most definitely reflect that he sucks at all of them.”  That has been true of Adam Dunn, which was part of why I used him as the example of the guy who “can’t play any positions,” but it hasn’t been true for Carlos Lee in 2011 or Aubrey Huff in 2010.  Their “suckage” is not reflected.  These may be obvious outliers to the well-informed, but they make me suspicious of UZR data for other players who get moved around the diamond frequently and see big spikes as a result, especially for single-season samples.

The part of my argument which was most egregiously misrepresented was the impression that I was trying to “take down” or “dethrone” WAR.  That we think about statistical metrics in these terms is, I think, part of the problem.  Vehemence is unbecoming.  In the post, I readily admit that it is probably the best singular statistic which is readily available and that I refer to it all the time.  What I have been observing is the extent to which WAR is more and more frequently invoked in debates about postseason awards, contract extensions, All-Star selections, and potential trades, usually without any reference to its potential shortcomings.

Which brings us to the royal “we”…

Neyer calls me “intellectually bankrupt” on the basis of my use of the royal “we” (more accurately known as nosism), which reveals, apparently, that I’m arguing with a straw man – that is a “dunder-headed fool” who doesn’t exist, or, at least, doesn’t exist in any public forum except a mainstream message board.  He insists, “Single season UZR’s can be terribly misleading, which everyone’s known for a long time.”  Really?  Everyone? (I can use rhetorical questions too!)  Enraged to the point of “wanting to poke someone in the eye with a sharp stick” Neyer retorts, “Are there particular writers or broadcasters who have been abusing WAR this season?”

Well…

We could count how many times Carl Bialik uses WAR in this Wall Street Journal blog with only a passing reference to what WAR stands for – “total value to the team” – and no reference to it’s potential shortcomings, especially in July.

We could observe that Eric Karabell leans heavily on WAR in his prospective MVP ballots, making it his standard for inclusion in the conversation and going so far as to describe Adrian Gonzalez as “Boston’s third-best player” because of a fractional disadvantage in WAR.

We can find Ryan Korby of The Washington Post using WAR to make an argument for resigning Livan Hernandez, without using any other metrics.

And, of course, in recent MVP debates too numerous to mention, WAR has been used like a sledgehammer to prioritize Jacoby Ellsbury over Curtis Granderson, Jose Bautista over Jacoby Ellsbury, C.C. Sabathia over Justin Verlander, Roy Halladay over all NL hitters, etc., etc.

I’m not saying that everybody abuses WAR.  I’m certainly not saying that Rob Neyer abuses WAR.  But, I think the minor groundswell of reaction to yesterday’s post, from both sides, is itself a testament to the extent which fans and analysts alike are uncomfortable with the many scenarios in which WAR is cited as gospel.  (And, yes, Rob, my friends do judge players using WAR discrepancies well within the margin of error.  To my credit, I rarely call them on it.)

I am absolutely guilty of using WAR in overtly simplistic ways, so, to no small extent, my use of the pronoun “we” was quite literal.

WAR is better than RBI.  On that point, I happily concede.  My argument does not mistake them for perfect analogues.  My point is, many fall victim to the illusion of elegance and use WAR exactly as they did RBI and HR in the past, saying “7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson.”  The creators of WAR don’t argue for that method and they are open to debating its flaws, but often intention diverges from practice.

One could argue that any time you aspire to such elegance, you risk having aspiration mistaken as achievement. I would suggest that all attempts to build elegant statistical models, no matter what the caveats, will be treated by some adherents as scripture (see Black-Scholes, etc.).  But statistical models are metaphors.  As such, they are frequently very persuasive (and useful!), but they are also, by definition, imperfect.  Unintended consequences should not preclude the urge to innovate, but do need to be acknowledged and debated, frequently and candidly.

35 thoughts on “In Defense of the Royal “We”

  1. Excellent work today.

    • 100% agree. Thanks Rob, for taking the time to come back and read. Thank you, period.

      Thanks to all who have engaged in the debate in ways that are neither nasty or personal. For those who have chosen to bash the person, I hope you can find a better outlet for your anger. Try to find a way to disagree with someone without sinking to Congressional-like personal bashing, and make the debates more constructive and less destructive.

      Hippeaux, great work today, and yesterday. I thank you for taking the time to further the debates.

    • hugh

      I'm going to say "Excellent work yesterday, too".

    • noname

      rob, rob, rob….

      please be more specific. you wrote way too many words on this topic to be so vague.

      plus you had a number of people patting you on the back for your "rebuttal" the other day.

      don't you think you owe "us" (not "we") more?
      or would you be happier if i said "me"?

      p.s. definitions are important.
      if you can't DEFINE a stat, probably not a good idea to quote it. (that's my opinion)

    • August

      Rob's worst article in at least a decade, and I read almost everything he writes. Very disappointing.

  2. Bill

    http://dictionary.reference.com/browse/suckage

    I'd like to be the first to nominate "Suckage" as an addition to the dictionary.

  3. Croc

    Hippeaux speaks the truth, and I applaud Rob Neyer for recognizing it. It was clear to all of us who read his post with an open mind that he was including himself when he said "we".
    It seems Hippeaux is arguing for a blended approach, with some stars and some observation beyond which stats can achieve on their own. And if anyone doubts his method, ask him for his fantasy record over the last 5 years or so!

  4. Croc

    Sorry, should read 'stats' not stars..

  5. jay_robertson

    wtf – your initial post was fine. Weren't THAT many posts when I left yesterday; was totally surprised by the crapstorm of 99 replies…

    OTOH, it was a response. ;)

    Wasn't your premise a simple one – that there really isn't ANY one number that can be taken as gospel – as the be all, end all, total summation of what a player is? Which IS how WAR is being used.

    Oh well. Good stuff. Fun to see that a lot of people are reading your stuff – even if their lips were moving the whole time.

    • His initial premise wasn't completely wrong, but almost every example he gave to try to defend it was, lol. The bottom line is this: We all know that UZR and DRS aren't perfect, but until we get access to Field FX data, they are the best thing we have.

  6. hugh

    What a fine piece of writing. I envy the clarity of your thought and expression.

  7. "I am absolutely guilty of using WAR in overtly simplistic ways, so, to no small extent, my use of the pronoun we was quite literal."

    I argued on Rob's post at MLB Nation yesterday that this was my reading of your article. Feels nice to be proven correct :).

  8. Lee

    "My point is, many fall victim to the illusion of elegance and use WAR exactly as they did RBI and HR in the past, saying 7.7 is bigger than 6.7, therefore Jacoby Ellsbury is clearly better than Curtis Granderson."

    If that is your point, then your thesis continues to be misguided and ill-focused. You rail against the usage in one sentence, then the implementation in the next. Anyone uttering the phrase you quoted above clearly misunderstands what WAR is, and what it measures. So why use that to make such outrageous claims like "WAR doesn't work"? It's not WAR's fault people don't take time to understand basic concepts.

    "Unintended consequences should not preclude the urge to innovate"

    On the contrary – in this case it decidedly should. People misuse WAR everyday. Does that mean we should change it's structure? No. WAR is a logically sound framework. If you have issues with the implementation of WAR, then by all means, do some research and write a thorough analysis of your findings. I would personally love to see more of the flyball/UZR work you did yesterday – that was the most interesting part of your original article, despite the sloppy methodology. But leave the empty, anti-WAR, page view garnering rhetoric in the garbage, because that's what it is.

    If your point all along was "People misuse WAR" then you are absolutely right. It happens everyday. But you insisted on carrying a belligerent tone throughout your piece that undermined any productive discussion on the most salient points you raised. Write a piece about how WAR is widely misused, great. It's worth writing. Then write an analysis of UZR and why it's biased in certain situations. Mixing them lead to write two pretty ugly and confusing pieces. (Evidenced by the fact that you have a slew of misunderstanders agreeing with you that WAR is broken. Which is false. UZR may be flawed, wOBA/wRC+ may not capture 100% of the true offensive contributions, but WAR as a framework is sound.)

    • "UZR may be flawed, wOBA/wRC+ may not capture 100% of the true offensive contributions, but WAR as a framework is sound."

      If the inputs are flawed, the output is flawed. The IDEA of WAR is fantastic, but it is most decidedly a work in progress. This should be patently obvious, since there are 3+ different "versions" of WAR.

      • BrienJackson

        Exactly.

  9. Tim

    Who's "belligerent tone"?

    /garbage
    /sloppy
    /ugly

    You need to re-read both posts and yourself and ask, who's spending more time in the gutter?

    • Lee

      I don't think I misused any of those words in passion, if that is your implication.

      His methodology was sloppy. If you think it wasn't, read it again.

      Needlessly claiming "WAR doesn't work" is textbook muckraking. Journalistic garbage, the definition of.

      His articles commingled two distinct ideas, and he used raw emotion (like a politician railing against the other side of the isle) based on the mantra "WAR doesn't work" to drum up support of his haphazard analysis. An awful way to advance sabermetrics, if you ask me. I don't think ugly is a misrepresentation of that.

      Your contribution to the discussion, on the other hand, seems lacking in substance, quality, and… everything.

      • Tim

        Mantra means "statement or slogan stated repeatedly."

        Hippeaux say "WAR doesn't work" once and then clarifies that statement
        .
        You've now said it nearly a dozen times in two comment threads.

        Again, who's creating the mantra?

      • hugh

        Lee, if you really think Hippeaux's tone was belligerent then you should take the time to read some more of his posts, and the site generally, in order to get a feel for where he's coming from.

        You might also like to read your own last two comments and compare the tone of them to Hippeaux's piece before reaching judgement.

        And if you're still not there, then for what it's worth, I agree 100% with Tim and 0% with you on the question of the source of the belligerence. Whether you chose your words in passion or not is a matter of complete indifference to me.

  10. 27up-27down

    Great debate, i find the past two days of response to yesterdays post as a good (for the most part) waypoint in the attempt to figure out the exact value of stats like WAR.

  11. Noot

    Posted this on the next one, but it seems more appropriate here:

    "Here's the problem with abstractly using "we" without specifying who "we" is. If you're trying to remind people about the dangers of over-valuing Player X's WAR because of certain caveats, then I can't see anything wrong with that. But Hippeaux didn't say "we" was himself, the writers on this site, e-mailers, or followers on Twitter; he just said "we", leaving everybody to speculate about who specifically is misusing it. So everybody pointed the fingers at themselves, and of course they're gonna respond with "I'm not misusing it. Who is?" Failing to specify left everybody thinking that Hippeaux was criticizing them personally, and if that criticism was unwarranted, then, well, of course they're going to respond like that."

    But I guess you knew that already.

  12. Bryan

    Very well said. Thanks, Hippeaux, both for starting this conversation and for realizing my dream of finding my name on the Interwebs within centimeters of Rob's and Tango's.

  13. Ok, I see your point. You are right, using single season UZR in the WAR calculation can be misleading. I have asked the guys at fangraphs if they could use the aggregate defensive rating, which is the average of UZR, DRS, TZL, and the fan scouting reports, but doing so wouldn't allow them to have mid-season numbers since UZR and DRS are the only ones compiled throughout the year, and DRS isn't balanced out until the end of the year.

    Another idea I've discussed in the past was using a three year weighted average for WAR. I don't think there is absolutely nothing to single season UZR, as players can improve their true talent level or play better than their actual talent level.

    If I were an MVP voter, I wouldn't vote strictly by WAR leaderboards, but that's certainly where I would start. If a player is 2 wins above everyone else, I probably don't need to do more research. If there are like 5 players within a couple decimal points, I might need to look at things like WPA, adjust the WAR stats by using a 3 year average of UZR, and maybe take into account off-field leadership or other "intangibles." This, in my opinion, is far better than the traditional "RBI, HR, AVG, on a playoff team" formula that was used for so many years.

    • hugh

      Now here is an example of the kind of response that DOES move the conversation along. Or so it seems to me from my wholly ignorant perspective. Thanks.

  14. Tony

    "Their “suckage” is not reflected. These may be obvious outliers to the well-informed, but they make me suspicious of UZR data for other players who get moved around the diamond frequently and see big spikes as a result, especially for single-season samples."

    The big spikes aren't because they move around frequently, these things can just happen in a small sample. Michael Young has moved around the diamond and his UZR is bad at every position.

    My questions are:
    1. How do you propose this is fixed?
    2. Are suspicious of avg/obp/slg because Nick Evans is hitting .293/.371/.522 this year? Because that's not any different than your example.

    • The difference is obvious. When you see Nick Evans splits, you immediately look at his 105 PA. On most leaderboards, he wouldn't even show up, because he does not qualify. Carlos Lee has played lots of games, innings, etc., so the sample size problem is better obscured.

      • The problem is that you can't automatically say that Carlos Lee's higher UZR this year is an issue with small sample sizes. It's possible that he's actually playing better this year. If you look at his traditional defensive stats, this seems to be the case. He's only made 2 errors in the outfield all year, and has 10 assists already.

        The fact that he has a positive range runs with a somewhat poor ranged zone rating means that he is catching all the balls he should be catching, and the balls he isn't getting to in his zones are ones on the fringes that are tough plays, so UZR isn't penalizing him that much for not getting to them. Sometimes you need to look beyond the numbers and find the meaning, instead of just saying, "That's crap."

    • "The big spikes aren't because they move around frequently, these things can just happen in a small sample."

      Let's re-write this and see if it still makes sense: The big spikes aren't because they have a small sample at each position, these things can just happen in a small sample.

      UZR only stabilizes with a HUGE sample size – 3 years (say, 450 games) of data at one position. If we know one of the inputs to WAR isn't accurate over 1 year, then we know WAR isn't accurate over 1 year. Don't ask me how to fix it – if I knew, I'd be lecturing at SABR every year. I just know that the sabermetric community needs more accurate defensive metrics before we can compare single-year WAR data. Doesn't mean I hate WAR, and want to do away with it – just think it needs some improvement.

      • This is why I prefer to average WAR from the 3 sources who calculate it differently. I figure averaging UZR, TZL, and whatever defensive stat BP uses is better than UZR by itself.

      • Tony

        "Let's re-write this and see if it still makes sense: The big spikes aren't because they have a small sample at each position, these things can just happen in a small sample. "

        My point was that the post (more the last one than this one) made it sound like, to me, that playing multiple positions was inflating UZR/WAR. It can, but it can just as easily deflate it. Just like a mediocre player could OPS 1.000 in 100 PAs, but he could also OPS .450. Plus, as Michael points out above, just because Lee has been bad defensively the last few years, it doesn't mean that he has to be bad defensively this year. Going back to the Evans example, he's been a much worse hitter, but I wouldn't deny that he has hit well this year.

        And although I don't deny that defense needs larger samples than offense, let's not act like offense doesn't have huge variances. For example, Aubrey Huff's OPSs the last 4 years have been .912, .694, .891, and .670. If he keeps up his pace this year, he will have had at least 597 PAs in each of those years.

        I don't disagree that WAR can be improved. I just think this criticism is really saying that you can't use one one year of WAR/UZR to determine true talent, but the same is true for offense. One year of info will never be enough no matter how much perfecting is done. All one year can tell you is who was better in that year (and even then there's never going to be perfect accuracy).

        • michael

          Concerning the third paragraph:

          I believe you're mixing noise or variance that is incorporated in field metrics that do not have truly simple discrete measurements, compared to variance in performance (which is also part of UZR contributing to higher confidence intervals, error bars, what have you) at the plate, which does have a much smaller set of outcomes that are more thoroughly studied and have direct consequences that equate to run values based on a vast sample of history. Accuracy of offensive production is much more difficult to dispute than defensive value.

          You would be correct in suggesting that any finite sample of offensive statistics doesn't ascertain how 'good' a player is or their level of ability, or a give you a true projection going forward. You didn't state this, but you imply that the offensive statistics may not adequately measure the player's production, or their value to the team in runs or wins. Going back to what I said about offensive metrics, there is large variance in player performance, but the individual measurements of production themselves are well understood.

          This article http://www.fangraphs.com/blogs/index.php/seasons-
          puts all this to rest about fluctuations in Huff's performance. It certainly doesn't support the argument that there is noise in the offensive statistics themselves, but just random walks in when Huff's performance changes.

          Pujols could go 0-for in the world series while Cody Ross homers off Halladay. The metrics just say how they performed over such sample and it's up to anyone to interpret the significance of that sample of events.

          Considering that all this discussion is in the context of the most valuable player (of the year) award, one year of offensive performance and who is better for that year seems adequate to measure that portion of value for the award.

  15. Half Brit like Nate

    The wonder of baseball is that it's an intellectual exercise disguised as a pastime. Hippeaux is carrying on that tradition, combining the best of the new statistics with a skeptics intellect and a fan's flair for language and easy give and take. In short, he'd be a fun person to watch a ballgame with and drink a beer, commenting when prompted by onfield play, enjoying silently when not, which is the ultimate test of a true fan. To the righteous and raging, I just hope I never have to sit in front of you.

  16. KDL

    This does sort of support the author's point that WAR must be understood in context…but I get tired of folks throwing out UZR for end of year awards. True, it is not a large enough sample size to decide if a player is a great/good/avg/bad defender. But in the context of a MVP discussion it is 100% relevant. UZR describes what actually happened. Again, it's predictive value isn't worth a bag of baseballs…but it describes what actually happened. If Brett Gardner hits 4 HR in the World Series…will anyone say he doesn't deserve the WS MVP award because he's not really a power hitter, and that sample size helped him look Ruthian? Of course not because he actually hit those home runs. Likewise, players have actually made the plays recorded for UZR. It may be ultimately unrepresentative of a player's ability (like a Matt Stairs Web Gem, or a Johnny Damon OF assist) but it actually happened, and players should get credit for having made the play.

    • nhp

      exactly my thoughts. is anyone discounting gonzalez's contribution to the red sox this year because his babip is .382? why shouldn't we regress his offensive production based on his career babip?

  17. marc

    Like anyone will read this, but…

    Perhaps it's lack of anything better, but as a wanderer of the internet, people use UZR and moreso WAR as gospel (or at least the complete foundation of an argument) all the time. Rob is absolutely, completely wrong to say they don't.

    People who are more into the stats of course realize the limitations. They realize that both are works in progress. Which I think was the point of Hippeaux' column in the first place?

Comments are closed.