Yankeeist regularly uses a variety of advanced baseball stats. While Larry most frequently uses wOBA to value hitters and FIP to value pitchers, I tend to focus more on the marginal value-based statistics VORP and WARP (called WAR if you prefer Fangraphs to Baseball Prospectus).
For those who don’t know, the names of these stats are acronyms that stand for Value Over Replacement Player and Wins Above Replacement Player, respectively. They attempt to measure the number of runs, or entire wins, any player in baseball can be expected to contribute to his team above the level of production that same team would have gotten from a minor league player called up to fill in for that starter if he were injured. Put another way, Albert Pujols has an enormous VORP because he gives the Cardinals a lot more offense than any player they would have to ask to step in for an injured Albert Pujols.
My fascination with these stats came from their supposed simplicity, their ability to provide a single number describing a player’s total value (at least in the case of WARP; VORP is offense-only). For example, in 2006 Derek Jeter put up slash stats of .343/.417/.483 with 14 home runs, 97 RBI and 34 steals. That same season Justin Morneau put up .321/.375/.559 with 34 home runs, 130 RBI and 3 steals. Both players had incredible years, but Morneau narrowly defeated Jeter for the MVP.
To choose between Jeter and Morneau that season baseball writers needed to look at the strengths and weaknesses of their seasons and make a subjective judgment between, essentially, power and getting on base. In the end the writers felt that Morneau’s larger power numbers were more valuable than Jeter’s better on-base numbers.
If WARP or VORP were as widely accepted in the baseball media as, say, batting average, then the writers would only have to consider a single number, instead of applying arbitrary weights to a series of numbers. In the case of 2006, the writers got it wrong. According to BP, Jeter was worth 5.4 wins to the Yankees while Morneau was worth 3.9 wins to the Twins. It was information such as this that drew me to these kinds of stats: a single number, capturing an entire player.
That fascination is waning. I recently ran a post comparing the 2010 projected Red Sox to the 2010 projected Yankees (these are the things I do when no actual baseball is being played). I used the 2009 WARP and WAR statistics from Baseball Prospectus and Fangraphs to determine which teams would be better in 2010. Something unexpected happened. The two sites didn’t agree. I won’t go back and rehash the numbers, but I’ll give the example of 2B. Baseball Prospectus felt that Robinson Cano was worth more wins than Dustin Pedroia, but Fangraphs felt that Pedroia was worth more wins than Cano. Say what now?
Anyone who read the article may recall that I was openly disappointed to learn that two well-respected baseball analysis sites would come to different conclusions about the number of wins the same player contributed to his team. It detracts greatly from marketability of these win-based stats. Applying this outcome to a different statistic shows how ridiculous it is:
Larry: What a season. Mark Teixeira hit 39 home runs.
Me: Wait, what? My tally has him at 46 home runs. What games were you watching?
The method of calculating a given stat needs to be universally accepted if that stat is to be meaningful. Otherwise, we’re trapped in a semantic debate over the value of fielding, or how heavily to weigh OBP. The point is not to argue the calculations, but to agree upon better ways of evaluating baseball players. With WARP (or WAR), clearly, we’re not there yet.
For a while I have wanted to examine the RP in these stats, the method of calculating Replacement Player. My goal was to do this for VORP since offensive statistics are more widely available than the defensive statistics that inform WARP. My criticism came from the adjustments these statistics make for each position. To calculate replacement level the stats are reported to take the average offensive production in each league at each position, and then multiply it by 80%, or 75% for a catcher, or 85% for a first baseman or a DH.
If you’re like me, your response to the above should be, “say what?” The idea of calculating league average production at a given position is fine. That’s what these stats are meant to do. But it is arbitrary, and frankly sloppy, to say that a replacement level shortstop is 80% of league average, while a replacement level catcher is 75%. These replacement level players are known quantities. In any given season we know who these guys are. There are excellent ways to separate the regulars from the replacements to determine if the players qualified to play short or third are in fact better hitters than those available to catch. Because I’m still single, I set out to recalculate VORP to determine how much variance would occur if we juggled the replacement level. Given my criticism of its sister stat, WARP, this seemed like a good time to hack away.
Herein lies the divorce from these statistics. I couldn’t calculate VORP. The fact that I couldn’t calculate VORP is irrelevant (except to me, since I spent hours trying to do this). The reason I couldn’t is much more telling. I never found the calculation. I was appalled to learn that Baseball Prospectus considers VORP to be a proprietary statistic. Its actual calculation is a well-guarded secret.
The launch codes for America’s nuclear arsenal should be a well-guarded secret. The Fed’s discussions on interest rate policy should be a well-guarded secret. The combination on the lock to Natalie Portman’s bedroom should be a well-guarded secret. The VORP calculation should be easily accessible to, at the very least, Baseball Prospectus subscribers.
The problem with guarding the calculation to VORP (aside from the fact that its categorically insane) is that it prevents the validation of the statistic. If VORP is as good as BP says it is, then what’s the secret? Release the code. Let mega-dorks like me mess around with it. If I draw the same conclusions BP did then I most certainly will keep my subscription.
Preventing the release of the statistic, on the other hand, leaves it open to criticism and speculation as to why respected sites can’t agree on how to calculate their advanced stats. If we don’t actually know what a win is, or how many additional runs a player is worth, then how can we trust these stats? In the case of Pedroia v. Cano the difference was material. If you were drafting a fantasy team and wanted one of these guys at 2B one site says Dustin and the other says Robinson. Thanks guys! Appreciate the input. Glad I spent that $34.
It’s possible that a big chunk of my criticism here is off base. It’s possible that Fangraphs and BP can’t agree on WAR versus WARP because they calculate the two stats differently and I’m getting pissed off about the different weights put on doubles versus home runs. If, however, that is the case, then it adds incredible weight to my second criticism. We need to agree on this stuff.
For my part I’m abandoning these stats. My new personal preference is to go back to the established statistics,
and appreciate them correctly. The combination of the slash stats with homers, steals and caught stealing tells us an incredible amount about a ball player, so long as those stats are weighted correctly. OBP is much more important than slugging; getting caught stealing more than 25% of the time is bad. And so on.
This adds credence to statistics like wOBA, or EQA if you’re a BP fan. The above is precisely what those stats are trying to do. And the calculation isn’t some well-guarded secret.