We Need to Dig a Deep Hole, Bury the Old Statistics, and Forget Them Forever

I wanted to wait a little bit before following up on my post about the AL MVP race. A week from today, it looks increasingly likely that Miguel Cabrera will be named the AL MVP over Mike Trout, an objectively wrong decision by any definition of the award. If this is the case, the decision will have been made solely based upon use of three statistics-runs, RBIs, and home runs-that are obsolete, useless, and misleading.

Let’s get something out of the way. This is not a case of “quantitative vs. qualitative” or “scouts vs. nerds” or anything like that. Runs, RBIs, and home runs are statistics. They are aggregated records of events that happened in baseball games, just like on base percentage, slugging percentage, .wOBA, UZR, DRS, or whatever else you pick.

Every single person uses statistics to determine who the better MLB baseball player is. No one watches all 162 games per year of every single baseball club, and even if they did their brains would not be physically capable of remembering and aggregating the individual details of the games that they watched in order to make a determination of who is better. These are not prospects with whom you are trying to look into the crystal ball and predict the future, and that qualitative stuff that doesn’t show up in the statistics becomes relevant.

The only question is what statistics we use to evaluate players. Any person who is capable of using a human brain to form thoughts can figure out that RBIs, home runs, and batting average produce almost no useful information about a player. I could make those arguments right now, but I shouldn’t have to. Moneyball happened, and thousands of people on and off the internet had those arguments. Batting average tells you a little bit about how a player goes about providing a piece of his value, but requires tons and tons of context and understanding of luck to be useful. Home runs tell you about one way in which a player can add value, but talk to Curtis Granderson or Adam Dunn this season if you think its a real measure of value. In Moneyball terms, batting average and runs are statistics that tell you something but lack the power of language.

RBIs and runs are useless statistics to determine how good a player is. This is a fact, not an opinion. And they are where the real problem is.

A long time ago, we used to use alcohol as anesthesia and primitive bone saws to amputate limbs. Then, we invented power tools, real anesthetics, and penicillin, and we started using those, and the old stuff became obsolete. That’s how things are supposed to work. You figure out objectively better ways to do things, and you stop doing the old stuff.

But that metaphor is actually wrong. Bone saws and booze didn’t do their job well, but they accomplished something before being replaced by better versions of themselves. Think of them as OBP and Slg%, which were eventually replaced by OPS+, which was eventually replaced by wOBP, etc.

In this metaphor, RBIs and runs look like how doctors used to use leeches to bleed the crap out of patients for no good reason. At some point, doctors just woke up and realized that they had been colossally stupid for centuries, and just killed the whole bloodletting idea.

Giving Miguel Cabrera the 2012 AL MVP award because he won the Triple Crown is equivalent to handing out the 2012 Most Valuable Physician award to the guy who was best at bloodletting.

We need to dig a hole, throw the old statistics inside it, and bury them below 10 feet of dirt. We need to just stop using them, ever. They need to get the hell off stadium scoreboards, Fangraphs dashboards, the Yankee Analysts blog, all of it.

If you haven’t read Eder’s post about how PECOTA inventor Nate Silver predicted the 2012 election (and 2008-2010 elections, and 2008 primaries) dead-on, I highly recommend you take the time. The great lesson of the last few weeks–where Silver faced mountains of criticism from all sorts of people who didn’t want to believe or even make the effort to think about basic math–was that rationalizing stuff works.

Math works. Logic works. If we make the choice to ignore math and logic, we’re doing something very stupid. If we use the RBI or runs statistic as anything other than fun, meaningless trivia, we are ignoring objectively true math and logic. The Triple Crown is a meaningless trivia fact, not something that people who aren’t being stupid put any value in.

E.J. Fagan been blogging about Yankee baseball since 2006. He is a Ph.D. student at University of Texas at Austin.

About EJ Fagan

E.J. Fagan been blogging about Yankee baseball since 2006. He is a Ph.D. student at University of Texas at Austin.

41 thoughts on “We Need to Dig a Deep Hole, Bury the Old Statistics, and Forget Them Forever

  1. ‘Love how you conveniently leave out Cabrera’s slashline.

    HRs, RBI, and runs are not useless and misleading.

    With your ridiculous logic, Robinson Cano wouldn’t get an 8-year $200M deal from someone if not the Yankees if he posted 50 HR, 145 RBI, 127 R in 2013 when he would.

    Cabrera was the primary offensive force behind the Detroit Tigers becoming 2012 AL Champions. He had a tremendous year not only with BA/HR/RBI, but with OBP and SLG, too. He had it despite moving back to his old position when he didn’t have to (he moved from 1B to 3B so Prince Fielder could remain a 1B.) He posted the numbers he posted as a veteran pitchers have faced literally thousands of times.

    “RBIs and runs are useless statistics to determine how good a player is. This is a fact, not an opinion. And they are where the real problem is.”

    Enlighten us as to why HR, RBI, and R are “useless”. I’ve yet to see your argument other than “Miguel Cabrera shouldn’t win the 2012 AL MVP because I say so.”

  2. I agree with you that Mike Trout should be MVP. I agree that Runs and RBIs are a function of opportunity and even though highly correlated with great performances shouldn’t be used to differentiate between great players. I disagree with you about Batting Average though. You didn’t really flesh out your argument there but I am assuming you that you are driving at the amount of luck involved (unusually high BABIP?). If I’m interpreting where you were headed correctly, then I would agree that it’s important in predicting repeatability/future performance and evaluating someone’s overall talent but irrelevant in determining the value of their season in question and worthiness of a particular award for that season. What matters is their contribution on the field in that year and not whether we think luck played a role in it. That said, BA still isn’t the greatest statistic as it tends to overrate players like Ichiro who never walk and mostly hit singles. I don’t think its next to useless though as you stated. And clearly defense, base running, and positional importance also need to be factored in.

  3. I don’t think homeruns and average are useless at all. RBIs are overrated, but they still at least signify something. As the Yankees proved this season, you can lose a lot of games by not being able to drive in runners in scoring position. It’s obviously a product of opportunity. Let’s be honest though, even if you put Scott Podsednik in the 4th hole of Texas or NY’s lineup, he’s not going to get >100 RBI.

    As for homeruns, yes, alone the statistic is not a great evaluator of player talent. But that’s why Curtis Granderson wasn’t in the running for the MVP. The only thing he did well was hit homeruns. Miguel Cabrera did more than that. Secondly, homeruns are still an important statistic. Hitting homeruns can turn momentum, not to mention that fact that it’s the best hit you can get in the game. You get to touch all four bases when you hit a homerun, and you score at least one run for your team. If you can do that at a high rate year after year, you are going to be valuable to your team. Curtis Granderson is no MVP, but he has excellent value in CF because of his ability to hit for power.

    Finally, there’s average. You could make the argument that OBP is a more comprehensive statistic, but you could also make that argument that hitting for average is more important. When you face a tough pitcher, he’s going to challenge you. If you have a .250 average, then you’re on base percentage in an at bat where you’re not going to be walked is .250. This is why the Yankees can’t beat good pitching. Yes, on base is important, but I think average is just as important.

    Miguel Cabrera earned the MVP fair and square in my opinion.

  4. Also, walks are great, and they extend rallies, but eventually someone has to hit the ball to drive in runs. Walks move players up one base at a time. Hits move them up 1, 2, 3, or 4 bases at a time. A walk can only drive in a run with the bases loaded, which is also ironically a scenario where a pitcher is less likely to walk a batter. Despite your assertion that RBI are not important, scoring runs is important if a team plans on winning.

  5. I think maybe EJ is a bit pissed that Trout will NOT be the AL MVP, and blames this on the overvalueing of ‘certain’ stats. Obviously BA, RBI and HR, as stats, have value. The problem is, as EJ pointed out, is that they are not telling enough to use in determining the MVP.

    Obviously Miggy had a killer year. Everybody knows that.

    But Trout just had a better one. A lot better in ways, if you consider his position… and don’t forget that Defense and Baserunning are an important part of the game.

  6. What I find important to value, is the ability to create runs and the ability to take them away. HR’s, RBI’s, and BA do nothing to indicate this value. BA is dependent on fielding, luck, and doesn’t take into account your ability to draw walks. RBI’s are dependent on your teammates, as well as luck in those opportunities. HR’s are the least contingent on other factors, as it’s mostly just ballpark dimensions, but also it doesn’t tell you the full story.

    Curtis Granderson’s 2012 is the perfect example. Although he hit 43 homeruns, you’d much rather have a player with a .400 OBP. Nick Swisher was more productive simply because he was on base more.

    You might say, homeruns matter because they drive in runs. What is more important than that, is making outs. If you don’t get on base, it means you’re ending the game quicker, you’re taking the bat out of other player’s hands, and you’re thus creating less runs. Homeruns are great, but it’s not a telling stat for a player’s value.

    I agree with you on trout of course, and that MVP does not mean most valuable hitter. At the same time, I agree with EJ 100%. These are stats that can’t be in discussion if we’re talking about the best player, and it’s not even something I’d bring up when trying to pick the most valuable hitter.

    We’ve come much further in value than OPS, but it’s at least a stat that’s somewhat talked about generally. That’s a better way to pick the best hitter, but if we have free range to create our own sabermetric triple crown, I am going with wOBA/ISO/UZR.

    wOBA gives you a much better indication of your ability to get on base, and values hits much more fairly than BA. ISO is slightly better stat than slugging, which simply takes singles out of the equation. UZR has it’s flaws, but it’s the best fielding stat we have, and it’s far more reliable than RBI’s or BA. I couldn’t imagine finding a player that ever led a baseball season in these three categories though.

  7. Conceivably, you can bat .000, go the whole season without a hit, without scoring a run, without an RBI….but you could walk 120 times and steal 2nd each time you walked…so 120 stolen bases. You can be the best defensive player in MLB, making no errors and catch every ball hit to your position. But, you’d get my vote as MVP because I’m an angry, bitter young man who obviously does not like his father.

  8. I’d just like to say that your commentary is spot on and honestly has me pumped up. Also, don’t let the illogical comments above get to you. Remember that people generally find change threatening and will lash out against anyone who challenges the status quo. I would just consider it a small taste of what Nate Silver and Bill James faced in the early days of sabermetrics. Keep up the good work.

  9. I find it amusing how you sabermetic guys think you are so relevant. You guys play your role in baseball management, but that is it. Outside of internet sites, no one cares; people just wanna enjoy the games.

  10. Got to take a mea culpa here. I just checked, and you’re right: Fangraphs lists Miguel Cabrera’s wOBA a little bit higher. It was actually the opposite when I wrote the MVP post a month ago. My guess is that some minor revision in ballpark adjustments happened. My bad.

  11. Absolutely fantastic piece of writing and logic. There is simply no valid logically reason to argue that RBI is a useful statistic when you look at player quality. As a scientist that uses statistics on a daily basis you come to realize that quantitative analyses must be accurate and predictive, otherwise you just have a pile of useless numbers.

    You can see this logic in some medical research. There are variables that we know are simply not predictive (say of something like the incidence of heart disease), but we think that they should be, so we rely on them and ignore the fact that, what we really want to know is whether we can predict whether someone will have a heart attack. We can measure all sorts of variables and pretend that they predict the likelihood of heart disease, or we can find methods that are truly predictive and rely on those. This is where RBI fails (and I don’t mean in predicting heart disease).

  12. I just don’t get these comments. Its like saying that economists or political pollsters are useless when we can have people simply speculate about what is likely to happen given some crude statistics. The usefulness of statistics are solely determined by their ability to predict some outcome.

    Baseball statistics are largely used because of historical precedent, not logic. Before computers we counted simple things and thought that these statistics might be useful. Then we invented the computer and realized that those things we were counting really were not useful pieces of information.

  13. Sorry to be unclear (or I should say, overly technical here). Prediction is not about the future when we use it in a statistical sense. It is a statement about model fitting – can we use some information to predict some outcome. Like can we use some data from batters to determine the likelihood that a team will score runs or win games. We fit the model using historic data and ask whether we can ‘predict’ from the model the historical outcomes.

    So a good model would be one that allows us to input pitching and hitting statistics and predict how many games a team will win, and then compare those numbers to how many games teams really won. We then say a model is ‘predictive’ if it can reconstruct what actually happened (and presumably, could be useful in assessing future value, but that is not the main goal necessarily).

  14. Mr. Fagan, your comment that “There’s no point in re-litigating math.” is patronizing. No one posting here cares to “re-litigate math,” but rather to validate the toolset we use to understand and predict the outcomes of a sport we love.

    I appreciate William J’s posts, and particularly his reminder that “context matters.”

    I also appreciate Paco Dooley’s comment that “There are variables that we know are simply not predictive… but we think that they should be, so we rely on them…. We can measure all sorts of variables and pretend that they predict… or we can find methods that are truly predictive and rely on those.”

    We need measurements that both explain and elucidate our observations, and that contribute to the creation of a predictive model of what we might expect going forward.

  15. I agree with you that the introduction and innovation in sabremetrics have introduced stats that can tell you more about a player’s production, and that trout was the best all around ballplayer in 2012. But damage numbers (rbi for instance) are still relevant because they are just that….damage numbers. 80 to 100+rbi tells me this guy is doing damage and I’ll take that anyday. And though I can argue that considering the rarity of triple crown winners explains the weight of such a season, miguel cabera deserves the mvp because of the fact that if cabrera is not on the tigers, the tigers do not make it to the playoffs.

  16. I totally agree with William J. is saying in this.
    EJ…you are going too far extreme into the saber world.
    Each number tells a story and has merit.
    Some numbers are better than others but they still tell part of the story.
    The other side is also…the history of baseball lives and breaths…the guys with the highest BA or the league leader in RBI, Runs scored.
    Who is the all time leader in BA? Cobb? Is what he did less meaningful that we know this?
    Who is the all time leader in Runs scored?
    Who is the all time leader of RBI’s?
    These are some of the best players in the history of the game…and now because we have gotten better at match equations we are supposed to throw all of that away?
    I say nonsense to that.
    While I agree there are stats which are better than others I like looking at War and OPS+…It doesn’t mean we should bury the historical stats we grew up on…just use it…in context.

  17. The lack of humility from sabermetricians is off-putting. The new statistics are very valuable, especially for management but also for diehard fans capable of understanding them. But the bulk of them are not within the reach of the casual fan who isn’t going to expend the effort to understand them, and it is on this fan and his revenue that the MLB’s vitality is grounded. Fans over the age of forty aren’t sitting around a stove or even a water cooler extolling the virtues of the great 1.000 OPS seasons of the past few years. And while runs and RBIs are reliant upon context, they do communicate useful data while being accessible to the layperson. Are there better stats? Sure, and I’d wager a guess that the bulk of MLB front offices, at least the successful ones, put a greater weight on them than they do the stats that appear on the back of baseball cards.

  18. I think the main problem with sabermetrics is PR. Most of the really good models used to evaluate and predict value (e.g. wOBA) require a basic understanding of statistics/regressions to fully understand what these statistics are. RBIs may explain some amount of variance in a given model. What has been discovered by sabermetricians is that other statistics explain the same variance and more while simultaneously reducing error. If you’re looking at a multiple regression involving several offensive statistics, the model can actually become more accurate when you remove certain independent variables (multicollinearity and heteroskedastic error problems). I haven’t run the numbers myself, but given how the sabermetric community has almost unilaterally come to the conclusion that RBIs are a nearly useless statistic, my guess is that this is the result of lots of hypothesis tests that almost all say that RBIs aren’t statistically significant. That’s not to say we should stop recording RBIs as a statistic. It just means that we shouldn’t take RBIs into serious consideration during, say, an MVP debate. If everyone in the baseball world had a degree in statistics, there would be relatively little disagreement over the value of RBIs.

  19. I hear what you’re saying about scientific stats, but I come away from these new measures simply caring less about baseball. I mean, how strenuously can you put down “subjective thinking” before acknowledging that baseball is pointless and not a worthy subject for a mature person’s attention? That said, I’m glad I found your blog. As a Sox fan I hear a lot of simplistic ridiculing of the Yankees, so even though I disagree about this MVP issue I will check out your blog again for your serious analysis.