|
Time for session three in our series of rudimentary courses on advanced baseball statistics for non-math people by a non-math person. See number one, on OPS+, ERA+ and wRC+, here, and number two, on FIP and other defense-independent pitching statistics, here.
If you've read this blog, or any vaguely stats-oriented baseball blog, or a baseball message board of some type, any time in the last year, odds are good that you've come across the acronym WAR, for Wins Above Replacement. If you've been at it a little longer than that, there's a good chance you saw WARP, or Wins Above Replacement Player. They've become incredibly popular over the last few years, for good reason: WAR[P] gives you a single number with which one can compare any two players, no matter what types of players and what positions they play -- even, if you want to get really crazy, a position player and a pitcher -- and get an idea of which one was more valuable. And unlike, say, win shares, which are crazily complicated and give you a number up to (or above) 40 representing the number of arbitrarily-decided-upon thirds-of-a-win the player achieved for his team, WAR and WARP, at the highest level, are very simple: adding up the total value of everything the player does -- his hitting, fielding, pitching, sometimes baserunning -- how many wins did the player give his team over and above what they could have expected had they just plugged in any remotely serviceable player they could find in his place? So they're useful, and they're relatively easy to understand and come to terms with even if you don't get all the underlying math.
WARP was the only game in town for some time. Created, as so many great things are, by Baseball Prospectus (BP), WARP is really about as simple as what I just described above: add up a player's Batting Runs Above Replacement (BRAR), Pitching Runs Above Replacement (PRAR), and Fielding Runs Above Replacement (FRAR), and divide the result by the number of runs determined to be worth a "win" in that league and that season (typically around ten), and you end up with the number of wins a player earned above a replacement player in the same spot. The hard part, of course, is figuring out BRAR, PRAR and FRAR...which you can't do, unless you work for BP or know someone who does.
There are actually two versions of WAR, one designed by Sean Smith ("Rally") and available at BaseballProjection.com, and one available at FanGraphs. They can differ from each other, sometimes by a lot, though the major difference is that FanGraphs WAR, which uses UZR for its defensive component, is available only since 2002, while Sean Smith's defensive rating system changes according to the amount of data available for a given season, so his WAR goes all the way back to Al Spalding. But FanGraphs WAR and Rally WAR are a lot more similar to each other than either is to BP's WARP.
The components are slightly different among the three metrics. For one thing, BP calculates runs above replacement, while Rally and FanGraphs calculate runs above average, and then add in the replacement level -- the number of runs a replacement player is below average -- to arrive at runs above replacement. BP calculates different replacement levels for each position, while the other two use one replacement level per league, per year, and then add a position adjustment based on the difficulty of the positions they played. Rally adds a whole bunch of smaller adjustments into it separately, while with the other two, those effects are measured (if at all) as part of the total batting, fielding or pitching runs. But while I'm sure some of these differences have real effects, it seems to me to be basically a distinction without a difference. They're all adding up the total runs the player creates and/or prevents above a replacement player, and dividing by approximately ten to come up with the wins he created.
So what is the difference? Why did all these smart people set out to create a statistic that measured the exact same thing that BP's established stat did (and with an almost identical name and acronym)?
Well, at least a few reasons (I'm sure there were many, but here are the ones I know):
- Tom Tango, at least, was convinced that Baseball Prospectus' replacement level was far too low. BP doesn't divulge its replacement level (that, like most other things, is BP-proprietary), but their glossary does explain that "a team which is at replacement level in all three of batting, pitching, and fielding will be an extraordinarily bad team, on the order of 20-25 wins in a 162-game season," and has explained elsewhere that a "replacement player" is essentially an AAA player that might be called upon to fill in as an emergency injury replacement. Tango noted that replacement players are freely available who are not (as BP was assuming all replacement players were) far, far below average on both offense and defense, and so he adopted a much higher standard for a replacement player, one with a total contribution of about -20 against average. BP eventually announced that it was making upward adjustments to its own definitions of replacement level, which lowered its WARP numbers across the board...but still not to the level of the two WARs. Accordingly, WARP is virtually always considerably higher than either WAR number. Pujols' 2009, for example, gets 9.2 by Rally WAR and 8.5 by FanGraphs WAR, but a whopping 11.8 by WARP.
- BP's fielding numbers are pretty opaque, and don't seem all that trustworthy. Nobody really knows how FRAR was calculated before. Now they use play-by-play data, but nobody knows how. UZR and Total Zone, on which the two WARs are based, are also based on play-by-play data (where available) and are much more transparent and verifiable, and just seem to make a bit more sense.
- BP's numbers in general, as you might have picked up already, are opaque. As a subscription-based site that closely protects its secrets, it's hard to tell what goes into BRAR, FRAR and PRAR, and, for that matter, the replacement levels. Meanwhile, if you're willing to do enough digging/Googling and have the math chops, you can pretty much figure out every single thing that goes into the FanGraphs or Rally WAR. The creators of these numbers (particularly Tango -- see e.g. here and here) are generally committed to being open about the numbers they use and where they're coming from. I'm not one to dig into the math, as you know, but I do like to have some idea of what it is I'm looking at.
So you can probably tell where I come out. I love BP for many reasons, and I'll refer to all three systems now and then to check consistency and such, but if I have to pick one, I'm sticking with WAR (FanGraphs', when available). Another great thing about WAR is that, because it's so freely and widely available and has become so widely used and discussed, there are very easy reference points. A player with a full-season WAR of 2.0 is roughly an average regular (see Kosuke Fukudome, A.J. Pierzynski). 4.0 or so is a star, if not quite elite, player (Brian Roberts, and Phillies fans will kill me, but Ryan Howard). 6.0 is a superstar and possible MVP candidate (Mark Teixeira, Dustin Pedroia). 8.0 or better is a guy who had a huge year and almost certainly an MVP candidate (Pujols, Joe Mauer). I'm sure you can create similar benchmarks for WARP, and BP probably has, but I'll never remember what they are, so that kind of misses the point.
So that's Wins Above Replacement [Player]. Questions will be welcomed, then furiously Googled...
|
As useful as these metrics are, I think that your analysis, intentionally or not, highlights flaws in the numbers, at least as used by many people. I think the people who developed these metrics are properly humble about their limitations, but many "users" of the numbers are not. Quite simply, the metrics give an illusion of objectivity that is somewhat lacking.
More specifically, the metrics are built on a series of assumptions regarding:
(1) The run value of offensive statistics,
(2) The run value of defensive metrics/statistics,
(3) The run value of pitching statistics,
(4) Positional adjustments,
(5) Replacement values,
(6) Runs per win conversion.
1 and 6, and to a certain extent 3, are pretty solid in the sense that the statistical underpinnings are precise, objective, and uncontroversial. 2, 4, and 5 much less so; they are more subjective and imprecise. Even offense measurement is more problematic than some peoplew think, to the extent that baserunning isn't measured (or measured well) in many systems, and there are differences of opinion as to whether situational data should be considered.
Which isn't to say that this stuff isn't valuable, but it should be deployed with IMO quite a bit more humility than is often the case.
I do have a question: which of these systems (if any) consider (a) situational stats, and (b) baserunning, aside from steals and CS?