WAR Is Not the New RBI (but It Has Its Own Flaws)

(The following is being syndicated from The Captain’s Blog).

Is WAR the new RBI? That was the question asked in a thought provoking post at IIATMS, which is sure to draw a new battle line in the statistical debate over the value of composite metrics.

How much of Adrian Gonzalez' success is attributable to those who get on base before him?

At the heart of author’s argument is the suggestion that WAR, like RBIs, is context-based because so many elements of performance are interconnected. To illustrate this point, Adrian Gonzalez’ higher career OPS with men on base is offered as one of the exhibits. In this case, the implication is that Gonzalez’ performance benefits from his teammates getting on base ahead of him (just like with RBIs), so it’s unfair to consider OBP and SLG as strictly individual stats.  If we look more closely at Gonzalez’ splits, however, we see that a significant portion of his 50 OPS increase with men on base stems from the 108 intentional walks he has been given (he has only received two with no men on). Although one could still argue that those intentional walks are as much attributable to the men on base as the pitcher’s fear of Gonzalez, it raises other questions as well. Specifically, one must then also consider to what degree the hitters batting ahead of Gonzalez benefit from his presence in the lineup?

It should be pretty obvious that everything that happens on a baseball field is interconnected. Not only do players interact with their teammates, but the opposition has a say as well. In particular, the pitcher influences a batter’s outcome as much as any variable present on his team (other than his individual batting skill). Using the same example, we could posit that Gonzalez has a higher OPS with men on base because stronger pitchers don’t allow them as frequently as weaker ones (and especially not when they are on top of their game). If so, we might then expect to see that all hitters have a higher OPS with men on base, and in fact, this is the case. Since Gonzalez broke into the majors, the average OPS gap in these two splits is 34 points.

The undeniable bottom line is all baseball statistics are context dependent. As much as sabermetrics tries to neutralize contingencies, I don’t think anyone really believes they can be eliminated altogether. Rather, statistics like wOBA work under the assumption that these more subtle contingencies will cancel out over a season, or, at the very least, not make a significant difference. That’s why the offensive component of WAR is not the “new RBI”.

How does UZR account for this play?

What about defense? Even the most ardent proponents of UZR admit that it is limited in measuring certain positions, such as first base, and usually requires about three years worth of data to be accurate. What’s more, because classification involves human intervention, inherent biases and user errors come into play. On that basis alone, the metric seems ill suited to be combined with more refined statistics that measure offense.

Aside from these specific limitations, however, UZR has a more philosophical flaw: it treats defense as a zero sum game. Unlike offense and pitching, which are measured as individual rates of success, defense is calculated in terms of a fielder’s contribution to the team. If a fly ball is caught in left center, for example, the team records an optimal outcome (an out), but one defender is credited at the expense of another. This divergence between team and individual performance creates an inherent flaw in UZR, and any system that considers a teammates success to be another’s failure (or lack of success). That’s why, if anything, the defensive component of WAR makes it completely the opposite of the RBI, which credits a player for a teammate’s prior success.

One final component of WAR taken to task by the IIATMS post is the concept of replacement value. According to the author, “it’s beyond asinine to conclude that Ellsbury is twice as valuable as Fielder”. Unfortunately, despite making such a strong statement, no evidence is advanced to explain why.  Based on wOBA alone, Ellsbury and Fielder have had nearly identical seasons, so it stands to reason that Ellsbury’s defense and base running elevate him above Fielder to some degree. Do they add up to make him twice (according to fangraphs’ and baseball-reference’s WAR, Ellsbury respectively rates 83% and 49% better) as valuable? Perhaps not, but once you consider the relative scarcity of offense at each player’s respective position, the author’s unsubstantiated blanket statement seems more questionable than the conclusion he deems “asinine”.

Can speed and defense provide as much value as power?

So, is WAR the new RBI? Not if you use it properly. Although there are noteworthy flaws, the framework is sound. A player’s true contribution is not measured in a simple number like RBIs, but rather his performance in every facet of the game. WAR is far from perfect, but it does a much better job of providing a launching point for comparison than singular statistics like RBIs. In that sense, the only manner in which the two metrics are even remotely related is with regard to the lazy way many try to use them.

Before concluding, it’s worth taking a moment to circle back to the defensive component of WAR (UZR for fWAR and Total Zone for bWAR; for an explanation of the difference, click here). As long as defensive systems are designed as a zero sum game, they will continue to be flawed. Although such a methodology might be well suited for defining the very best fielders, it loses track of those who fall away from the margins. The Yankees’ outfield, which ranks third overall in UZR, provides a perfect illustration of this dynamic. On an individual basis, Brett Gardner and Nick Swisher rate highly, but Curtis Granderson does not. Does that mean Swisher and Gardner are picking up the slack for Granderson? Or, could the combination of the Yankees’ outfield alignment and UZR’s zero sum game be the reason for his low rating? As long as that doubt exists, UZR will be the subject of legitimate criticism.

Without the use of technological methods (such as Field/FX), improving the reliability of defensive metrics will remain a challenge. One possible solution would be to give a fielder credit for a play he could have made (for example, if the centerfielder, left fielder and short stop all converge on a pop up, then all three would be given credit for a putout). The Fielding Bible’s +/- goes halfway in this approach by not penalizing fielders for balls they could have caught, but it still does not give them credit. Another alternative would be to only measure balls that an outfielder should have, but did not catch, thereby avoiding the conflict that arises when two or more players could have made the same play. Although these adjustments would continue to require subjective inputs, they would at least remove the zero sum problem from the equation.

7 thoughts on “WAR Is Not the New RBI (but It Has Its Own Flaws)

  1. I wrote a big long response on this, only to have it erased. I’ll sum up:

    UZR is not “zero-sum” in the conventional sense. From the Primer on Fangraphs: “In UZR, when a ball is caught and turned into an out by one fielder, no other fielder gets docked any runs. This helps to minimize the effects of “ball-hogging”.” In short, Gardner making a play does not penalize Granderson’s UZR.

    Now, it is true that fielders can affect one another’s UZR by robbing them of the ability to add positive value on tough plays, but it also saves them from receiving negative values on balls they might not get to. The net effect of this phenomenon should be to regress the less rangy fielder’s UZR to 0; an epic fielder like Gardner would make Adam Dunn look better, but would hurt Nyger Morgan.

    Whether you believe this is happening to Granderson depends on your evaluation of Granderson’s relative ability to make lateral versus depth plays. If he’s substantially better laterally than he is at going back on balls, he’s being hurt by his teammates. Or it is possible that the decision to play him shallow is exposing his inability to go back. Or it is possible that he’s just having a bad year; he’s lost several balls in the sun this year, and those plays have a big effect on his overall performance.

    If you really want to know what’s happening, I think you have to evaluate Swisher’s play. In 2010, the Yankee OF was +25, +6.5, +1. This year, it is +22.5, -6, +11.5. The Gardner-Granderson dynamic shouldn’t be any different this year than last, when he posted positive value. But are Swisher’s improvements the result of lateral plays that Granderson used to make, or has he gotten better at going shallow or deep? In only one of those worlds would we expect Granderson’s UZR to suffer.

    • Another possibility: That more of Gardner’s play profile comes from LC than the LF line this year, which is doing more to suck up Granderson’s opportunity to make plays. But I find this unlikely.

      • In interviews, it has been stated that the outfield has been realigned because of Gardner’s range. I think it is very possible that he is branching out more into LC. Also, it seems to me as if Granderson is very deferential when another OF converges on one of “his” balls.

    • Getting penalized and not getting credit are very similar in a relative system like UZR. That’s why it basically is a zero sum game. The bottom line is UZR is a broken system that is might be beyond repair.

      • No.

        First, it is not clear to me that you understand how UZR works. You said:

        “One possible solution would be to give a fielder credit for a play he could have made… The Fielding Bible’s +/- goes halfway in this approach by not penalizing fielders for balls they could have caught, but it still does not give them credit.”

        But this is exactly what UZR does! Go back to my above quote, from the Primer on UZR.

        Second, you are misusing the zero-sum concept. While I would agree that Gardner is obscuring Granderson’s true value, it is not clear that he is doing so in a negative direction. A zero-sum game implies that any win for Gardner is NECESSARILY a loss for Granderson.

        In the example I cited above, putting Gardner next to Adam Dunn would actually INCREASE Dunn’s perceived defensive value, by masking his inability to make those plays in left-center.

        What you have to show is that Granderson is actually losing something by not having access to more lateral plays, ie that his increased lateral range would offset any deficiency in his depth range, and therefore that he’s being penalized for not having access to those plays.

        You have not offered any data, or even observational analysis, to indicate this.

        At worst, UZR has a validity problem that penalizes fielders with good lateral movement (and credits fielders with bad lateral movement). One way to estimate that error bar is to regress single-year data half way to the mean, which MGL recommends. Another way is to take the last three full seasons of data, which calls Granderson an average CF. A third way is to find Gardner’s OOZ plays and generate an estimate of how many you think Granderson could have gotten to…while at the same time penalizing Granderson and Swisher for the RF shift.

        None of this implies that UZR (and the various other zone ratings) are a “broken system”. What it does mean is that UZR has error bars…but so does most every statistic. UZR is not RBI, Wins, or Saves, which are so context dependent that you can mistake good for bad players (and vice-versa). It is closer to OPS or WHIP, which do a good job of 1) describing what happened over a specified time period and 2) given enough data, give us a reliable estimate of a player’s ability.

        For example, Justin Verlander had a 3-year WHIP of 1.26 from 2007-9, but that ranged from 1.17 (#1 starter stuff) to 1.40 (#4-5 starter). Is WHIP therefore useless, because there was so much variation in the 1-year samples? Similarly, is WHIP useless because this year it tells us that Verlander has a .9 WHIP, one of the 30 best performances since 1900, when his career WHIP is 1.2?

        Every stat in baseball has a DESCRIPTIVE purpose and a PREDICTIVE purpose. UZR DESCRIBES what happened this year, and it does so about as well as WHIP, OPS, or HR. Both WHIP & UZR, for ex., can penalize players for strategic decisions (the ball hit to the right side with a shift on counts against a pitcher’s WHIP, just as positioning Granderson might hurt him). In the aggregate, I am skeptical that personnel or positioning affects UZR more than a stat like WHIP, since it does a better job of stripping out “luck” by integrating batted ball profiles; those balls that Grandy may or may not have got to won’t make him 2010 Brett Gardner, but they also won’t make him 2004 Bernie Williams.

        Now UZR also has a PREDICTIVE purpose (is Granderson a good or bad CF?), but like OPS, WHIP, or HR, UZR is an imperfect predictor of talent that is best used in conjunction with other measures and by incorporating more data. Just like one year of UZR can’t predict fielding, neither can one year of OPS (given that HR, for ex., are affected by random weather conditions & “lucky” fielding plays) predict a player’s “true” hitting level except as a range.

        Putting error bars on UZR is not sufficient cause for throwing the stat away, as we can place error bars around any attempt to describe or predict a player’s performance. Furthermore, given that we give awards and evaluate seasons based on descriptive, rather than predictive statistics, you should have no problem incorporating UZR into your evaluations of talent. His -6 UZR is no more or less meaningful than his 39 HR.

        • UZR is NOT doing what +/- does. The latter considers a different degree of difficulty for a ball that could be caught by two players, while the former uses the same.

          That’s also why I am calling it a zero-sum game (which you are correct to say is not the literal definition). For example, if UZR is going to award the same score to a CF or LF for a fly ball they both could catch, you do wind up with a zero sum if you consider opportunity cost, which I think IS relevant because this is a comparative system. If you don’t agree with that assumption, fine, but I think is valid.

          As for my burden of proof regarding Granderson, I would encourage you to read that paragraph carefully. I didn’t conclude that Granderson is being robbed of UZR points. Instead, I stated that it was possible, leading to doubt about why his score has dropped, which in turn leaves UZR open to criticism.

          In summary, I have to disagree with you about the value of UZR in general. I do think it is a broken system and don’t regard it with anything more than a passing interest. You are free to regard it more highly, but I have moved on to better alternatives.