Ask the Experts: Lucas Apostoleris and Josh Weinstock answer our PITCHf/x questions

As regular readers know, I am incredibly interested in and fascinated by pitching analysis, and in turn, PITCHf/x. As a result, Brooks Baseball, TexasLeaguers.com and Joe Lefkowitz’s PITCHf/x site have become mandatory daily visits for me.  While I feel as though I have a pretty good grasp on a lot of the concepts, I find that I am continually learning new things from folks who have far greater expertise than I do, and as a result, I am constantly tweaking and refining the ways in which I attack pitching analyses. Two individuals who have more knowledge than I could ever hope to have of the way PITCHf/x works are Lucas Apostoleris—whose work you may be familiar with from his own site, Don’t Bring in the Lefty, as well as serving as a periodic contributor to the excellent Beyond the Boxscore, The HardballTimes and Fangraphs—and Josh Weinstock, who also contributes to BtB and THT, as well as It’s About the Money, Stupid.

Lucas and Josh have been extremely generous with their time whenever I have a PITCHf/x -related question, and my hope is that by arranging what I hope to be a semi-regular Q-and-A, readers of TYA will also be able to benefit from their vast reservoir of knowledge.

In this first installment, we’ll clear up some confusion between two-seam fastballs (FT) and sinkers (SI); whether there’s a way to classify “good” and/or “bad” horizontal and vertical movement; talk about why plate discipline results are a better barometer of success than individual game pitch type linear weights; and find out whether CC Sabathia actually throws a curveball.

TYA:  Following Sunday afternoon’s Yankee game, I noted in my recap that Brooks had Zach Britton throwing a ton of two-seamer —despite the fact that to the naked eye it appeared he was throwing sinkers—while Brooks also only captured 10 two-seamers for Colon, despite the fact that during and after the game Michael Kay, Ken Singleton (not pitch classification experts, but still) and Joe Girardi mentioned Bart going back to his two-seamer more as the key to his effectiveness. If Bart only threw 10 all game that doesn’t really strike me as a “return” to the two-seamer. This resulted in a discussion in the comments section about whether the two-seamer and sinker are actually the same pitch, thrown with the same grip but somehow yielding different results, or whether they should be treated as two distinct pitches, which seems to be how PITCHf/x sees them. So what’s the deal with how PITCHf/x treats Two-Seamers and Sinkers?

Josh: Two-seam and sinkers are, for all intents and purposes, the same pitch. PITCHf/x does not really see them differently. They just label the pitch differently depending on the pitcher. Thus you will never see a pitcher that throws both a sinker and a two-seam according to PITCHf/x. They manually change the label from two-seam to sinker.

MLBAM classification is usually pretty iffy when it comes to distinguishing between four-seam and two-seam, so Colon probably threw more than 10 two-seamers. Calibration error also plays a role here unfortunately and that’s hard to account for. How do we deal with this? Two ways, one complicated and one also complicated. The first, do data corrections and design a really advanced automated classification method. The second, which is usually the safer bet because it’s more practical and usually more accurate, is manually classify pitches.

Lucas: Colon definitely threw more than 10 two-seamers in that game—I’ll have to look more carefully, but I think the number was really more like 40. Camden Yards has the pfx_x shifted horizontally by a bit, so the algorithm was reading the two-seamers as having less “tail” than they actually did, therefore considering them straighter four-seamers. So, it’s a really funky process once you consider the adjustments for ballparks.

TYA: Last winter I spent quite a bit of time on a research project trying to determine what constituted “good,” “bad” and “average” horizontal and vertical pitch break for each of the nine pitch types classified by PITCHf/x. My chosen method of attack was to tally the cumulative seasonal H-break and V-breaks for all 239 pitchers that threw at least 10 innings as a starter in 2010, to ensure that the sample was large enough, binning them as righties or lefties, logging their 2010 Pitch Values per Fangraphs to see who had the “best” and “worst” pitches for each pitch type so I could rank the associated pitch break accordingly, and subsequently logging the average seasonal H-break and V-break for all the pitchers in the sample using Joe Lefkowitz’s site. What are your thoughts on “good” and “bad” pitch break?

Josh: With regards to “good” and “bad” movement, this stuff is actually pretty complicated. Basically, in my brief explorations, I have found that the relationship between quality of fastball (any type — FF/FT/w.e) and movement is not linear. By this I mean having a ton of tail (very negative pfx_x for RH) is not necessarily better than having cut, or little pfx_x. So usually the best combinations of movement for fastballs are the most unusual, i.e., super sinkers (Derek Lowe), rising cuttery fastballs (Jered Weaver, David Roberston). Normal fastball movement is a bad thing. Also pfx_z is generally more important than pfx_x. The lack of importance of movement has been confirmed by others (Chris Moore of Baseball Analysts, Josh Kalk).

To find how important movement is to pitch quality, you need to do regression analysis with your response variable being some proxy of the quality of a pitch (run value linear weights) and the explanatory variables being movement/w.e you want to include. It gets kind of complicated so there is really no simple “good” or “bad.” Relationships between breaking balls and movement are complicated too.

Lucas: Pretty much what Josh said. It’s particularly important to reiterate that the “more” movement that a pitch gets (in terms of more pfx_x or pfx_z) isn’t NECESSARILY better, but it’s more about being further away from average. Jeremy Greenhouse did a piece on that in last year’s THT Annual (Carl Pavano had the most unspectacular movement; I believe David Robertson, as Josh said, ranked well within that system).

TYA: After almost a year of judging the relative effectiveness of a given pitch based on individual game pitch type linear weights and never being told otherwise, Josh pointed out the flaws in doing so in a post I wrote about Phil Hughes’ curveball back in July, which caused me to reevaluate the way I looked at pitching analysis. While we don’t need to rehash that discussion here, what are your overall/general feelings about grading a pitch’s effectiveness by individual game pitch type linear weights compared with plate discipline results (i.e., strike%, swing%, whiff%, foul%, in-play%), which based on your work, seems to be your preferred method of evaluation? As a follow up, how do you feel about grading a pitch based on seasonal pitch type linear weights (i.e. Fangraphs’ wFB, wCB, wCH, wFB/100, etc.)? Is this data not as relevant to you as pitching analysts since Fangraphs uses Baseball Information Solutions’ (BIS) pitch classification data instead of MLBAM’s PITCHf/x?

Lucas: I think that all pitch-type linear weights are questionable, considering the fact that they are based on true outcomes. For example, let’s say that pitch X is hit into play on the first pitch of ten plate appearances, and every single time it results in a line out (simply saying this for the sake of argument; ten straight line drive outs is very unlikely). An average out reduces the run expectancy of an inning by roughly -0.28 runs, so in our hypothetical situation, that pitch is worth almost 3 runs better than average even though it’s been hit hard every single time. It’s true that things will even out over the course of a season, but since we already don’t like to use BABIP over one season’s worth of data, it wouldn’t be reasonable to use the same inputs for individual pitch types, especially considering that an individual pitch has a sample size of less than a full season (since most pitchers throw between three and five different pitch types).

One way to help strip away the luck is by using expected run values, which substitute batted ball types (ground ball, fly ball, line drive, pop up) for actual outcomes. Pitchers have good correlations year-to-year for groundball and flyball rates, so this is a somewhat helpful technique. When Fangraphs introduced their pitch type linear weights in 2009, they said that a defense-independent version was being looked at, but nothing has come of it yet.

And to address your question about specific metrics other than linear weights, I like to look at whiffs per swing as a measure of sheer bat-missing ability, balls per pitch to measure control, and location within divided regions of the strike zone to hone in a bit more on command (I’ve been using all of these stats recently for the pitching profiles I do at my blog). I think separating into components as opposed to using an all-in-one (linear weights) is usually the better way to go because it can give you more insight on WHY the pitcher is succeeding.

Josh: For individual games, pitch type linear weights do not give good information if you are trying to tell how good the pitch was. Why? Say you want to judge CC’s slider, based only on what you have seen in one game. If he throws 25 sliders, then that’s a really small sample and BABIP can have a huge effect. We apply caution to things like 25 at-bat samples, so we should do the same with pitches.

Instead I prefer whiffs/pitch on a game-by-game basis. That or whiffs/swing is going to be the best kind of information that you can get, in terms of stability on a game-to-game basis. Another stat that is useful is GB%, because that’s also fairly stable and helps to describe the effectiveness of a pitch designed to get grounders. I don’t really look at any other result-based statistic, but I may look at heatmaps to get feel for command. Occasionally I’ll look at swing rate for breaking balls, because sometimes that may indicate that the pitch is fooling the batters, because batters generally do not intend to swing at breaking balls.

On a seasonal level, linear weight pitch values do start to have more merit, but they still are not that reliable. Especially if they are from FanGraphs, which is using the previously discussed BIS classifications in their run values. A better method is an expected run value metric, which is usually done tERA style, and with your own classifications. This unfortunately still has to deal with batted ball classifications, which have their own issues, but expected run value is going to be a better measure than actual run value in most cases.

TYA: Last week I took a look at whether CC Sabathia’s gotten a bit too slider-happy of late, and in compiling my research I noticed conflicting data between Fangraphs and TexasLeaguers, with Fangraphs claiming CC to have thrown a curve 5.7% of the time this season and T-Leaguers at 0.8% of the time. I asked Josh about this on Twitter, and he mentioned to me that it’s possible CC doesn’t actually throw a curveball at all, and that the pitch he throws that appears to be a curveball may actually be a slower-than-average slider. As someone who’s watched Sabathia pitch many times over the last few seasons and seen him throw what I thought appeared to be a curveball—and what I’m pretty certain the announcers as well as CC himself have frequently referred to as a curve—I found this surprising. Given the inconsistencies between the two systems, how do we know which one to “trust” more? How often do you see a pitcher’s pitches being misclassified? It seems like for as long as Fangraphs uses a different system than the PITCHf/x sites we’ll have numerous problems with pitch classification issues, but, as we saw with the Bartolo question earlier, it’s not as if PITCHf/x is always right either. Do you think we’ll have one “go-to” system with consistently reliable information some time soon, or is something like that a ways off?

Lucas: Before diving into this question, it’s good to outline the differences between the two: the data from Baseball Info Solutions comes from John Dewan’s video scouts, who watch the games on television and then chart pitches from there. PITCHf/x data comes from ballpark cameras that pick up release speed, end speed, horizontal/vertical movement, horizontal/vertical plate location, and horizontal/vertical release points (defined as the point at which the ball is first detected by the PITCHf/x cameras). The PITCHf/x data are more scientific, but that doesn’t necessarily mean that you should use the PITCHf/x pitch types that you see on Fangraphs, Texas Leaguers, Brooks, etc.

However! The issue is that these sites take whatever the Gameday algorithm gives them and they don’t re-ID in order to fix the algorithm’s issues. Concerning the Yankees, the one pitch that the system probably mixes up the most is Ivan Nova‘s slider, which he’s used as his primary offspeed pitch since his return from the minor leagues. It gets less horizontal cut than the typical slider does, and Gameday often thinks it’s a changeup or a cutter. The BIS guys recognize it as a slider, so it’s labeled that way on Fangraphs. Also of note is that there is some crossover between BIS and PITCHf/x these days, as the video scouts consult PITCHf/x pitch movement/velocity in order to make more accurate IDs.

So, I’d say that BIS is pretty accurate and can be trusted for the most part. The one thing that’s unfortunate is that they don’t differentiate between four-seam and two-seam fastballs, which is often necessary because they often garner different results from each other. For example, A.J. Burnett has generated a 32% groundball rate with his four-seamer since 2008, but a 60% rate with his two-seamer. But still in all, I’d lean toward BIS right now since it’s hard to find re-ID’d Gameday PITCHf/x data from the big companies.

Josh: I’m not sure how comparable BIS and MLBAM classifications are because BIS does not differentiate between fastball types. FT/FF/FA are all grouped together. And to clarify, Fangraphs does use both BIS and PITCHf/x data. On the player pages in the season stats section under “pitch types” the data is from BIS, and then they also have the PITCHf/x tab.

Will there ever be a perfect classification system? No, unless we see what grip the pitcher is using every time. Perhaps eventually a free automated classification system will match the accuracy of manual classification, but we’re not that close at this point with MLBAM classification. I think right now MLBAM uses what’s called an artificial neural network for classification, and considering the fact that they do it without the benefit of fixing the miscalibrated data the system does a pretty great job for what it is.

TYA: Thanks for your time, guys. For more on PITCHf/x, Josh recommends this PITCHf/x primer written by Josh Smolow; Josh Kalk’s “Anatomy of a Pitch: The Curveball”; and Josh’s own “Anatomy of a Pitch: The Four-Seam Fastball.” If anyone has anything they’d like to ask Lucas or Josh, please feel free to do so in the comments.

4 thoughts on “Ask the Experts: Lucas Apostoleris and Josh Weinstock answer our PITCHf/x questions

  1. Mind blown. Huge help for me going forward. Thanks guys!

  2. smurfy

    First off, let me say what a complicated world you live in! Whew, I am very interested in pitching, but you’d have to pay me to get into all of that, and to face measurement weaknesses.

    It amazes me, the ability to measure, with cameras, meters, and the facilities or algoritms created to define the features. Not surprising that there are flaws, and I wish you good luck in healing them. What is especially surprising is the resources devoted to this cause: video scouts, equipment and analysts. MLB must be taking from the development kitties to pay for this, considering the economic value of good pitching.

    I will benefit from the primers you link, Larry, and will follow other leads, casually, looking for trustable measures for answers to vexing quandaries, like why are they hitting AJ’s fb so much. I found your pitch type linear scores surprising and useful, but now I hear that 4-seam and 2-seam may be averaged together, producing poor averages for movement and speed, and confusion as to the value of either. At least, I am warned to be careful.

    Re sliders: Al Leiter reminded me again last night that CC throws his slow or hard, 82 or 88, probably sweeping or fast breaking, more like a cutter. Another two pitch types that will not benefit from averaging across. Ivan’s is much different, relying on deception ala cutter or CC fast slider, with that late sudden drop. (Although I think I’ve also seen him experiment with a backdoor one to lefties, which is flatter.) Plenty of complexity within pitch type, eh?

    Thanks, Larry, and to Josh and Lucas for their work.

  3. [...] given what we know about the limitations of PITCHf/x, there are likely some classification issues as it is, but I can only go on the data we have [...]

  4. [...] All of the data in the tables you’ll see below is from the 2011 season, and should be mostly self-explanatory. I’ll be the first to admit that a one-year sample is less-than-ideal, but I tried to run a three-year search and TexasLeaguers.com didn’t take to that request too kindly. The columns headed by “w” and “w/100″ are the pitch type’s linear weights (representing the total runs that a pitcher has saved using that pitch) and linear weights per 100 pitches (the amount of runs that pitcher saved with their fastball over the course of 100 fastballs thrown), which provide some level of insight into a pitch’s relative level of effectiveness but should not be analyzed in isolation, as they are subject to the whims of both sequencing and BABIP. [...]

Comments are closed.