What Makes a Groundball?

Ivan Nova has a secret. His fastball acts like a sinker – lots of groundballs, big platoon split – but the pitch does not look like a sinker. By that I mean his fastball is rather bizarre in terms of pitch f/x data; it doesn’t really move like any kind of traditional fastball at all. I began by sifting through his data to try and find out what makes his fastball disguise itself as a sinker, and I came out with an exploration as to what makes any fastball result in a groundball.

I began my expedition by creating a sample from pitch f/x data. If you are not familiar, pitch f/x data comes from a system that is owned by Sportsvision, and is operational in all major league stadiums and some minor league ones. It works by using three cameras that take images of the pitched ball to determine position in space, and then uses physics equations to extrapolate movement, break, velocity, and final location. It is the same system used in ESPN’s “K-zone” and Gameday. I randomly selected 5000 fastballs of any type (FF, FA, FT) from right handers, excluding side-armers. I could have focused on each type of fastball individually, but that can result in bias due to pitch f/x classification issues. I also had to select only pitches that were put into play, including homeruns. This is important because it means my sample is limited to pitches that were both swung at and put into play. This is necessary because we are interested in what quantifiable pitch attributes induce groundballs, not what pitches batters like to swing at. Below I have broken down the data into the most important variables.

Does Pitch Location Matter?

For those of you who like heat mappery:

This graph shows groundball rate by pitch location, where red represents a low groundball rate and blue a high groundball rate. The plot is from the catcher’s perspective, so the right side of the graph is near left-handed batters and the left side is near right-handed batters. The dotted box indicates the strikezone. Predicted groundball rates come from a generalized additive model (with only location as the variable) with cross validated smoothing paramters and a specification for binomial data. Both smoothed terms (horizontal and vertical location) were significant at a 99.9% level, with vertical location having much higher significance.

*I used Millsy’s awesome website as a reference for the code. I also got the idea to use the mgcv package in R from him. Check out his site if you use R or want to learn.

Turns out that pitches low in the zone become grounders more often than other pitches. Duh. The graph also almost appears symmetric (horizontally), suggesting that horizontal location plays a role here too. Problem with this graph is that it’s really the product of two separate distributions – left-handed batters and right-handed batters. The following graphs demonstrate this point:

 

The graph shows groundball rate by horizontal location for both right-handed batters (blue) and left-handed batters (red). Gray bands indicate confidence. The dotted lines represent the horizontal borders of the strikezone.

This graph confirms my earlier suspicion that the symmetric appearance of the heat-map was due to it being the product of two different distributions. As you can see, righties hit the most grounders on pitches middle to outside and on pitches way inside. On pitches middle-in, they loft the ball in the air. For lefties, they hit lots of groundballs on pitches away from them, but not in. One might expect the groundball rates of lefties and righties to be mirrored images (as in flipped across y-axis), but that’s not the case here. This can be explained by the fact that I only used right-handed pitchers in my sample.

Graph shows groundball rate by vertical location for both right-handed batters (blue) and left-handed batters (red). Gray bands indicate confidence. The dotted lines represent the vertical borders of the strikezone. Axes are flipped for aesthetic purposes. This graph and the above graph use loess to create the trend lines, mainly for ease of plotting. n=2386 for left-handed batters and n=2614 for right-handed batters.

Unsurprisingly, pitches that were down were pounded into the ground. Again of interest is the different behavior of righties and lefties. Because there are only right-handed pitchers in the sample it’s to be expected that righties are always hitting more grounders with this data. Strangely, there is a large difference between the types of batters on pitches that are in the bottom half of the zone. This can probably be explained by difference in pitch selection in this area of the strikezone to the two different types of batters.

So yea, location matters. Vertical location turned out to be more important than horizontal location, which is not surprising. However, this does not account for a bias in pitch selection. The distribution of pitch types is not constant throughout the strikezone. This is important because a higher proportion of sinking fastballs are located in the bottom of the zone, while four-seam fastballs are located more up in the zone. I have not accounted for this bias in pitch distribution, so we need to exercise caution with concluding how important vertical location is.

Does Velocity Matter?

Graph shows groundball rate by velocity for both right-handed batters (blue) and left-handed batters (red). Gray bands indicate confidence.

Most major-league fastballs are located within the 88-95 mph range, which is confirmed by the size of the error bands (smaller when sample is larger). In this range groundball rate seems pretty constant, perhaps with a slight positive trend. The effect of velocity is also likely underestimated, because a higher proportion of slow fastballs (< 90) are going to be of the sinking variety than their faster counterparts. But that’s boring. Of interest here are the extremes. For lefties, the groundball rate on slow pitches is very high, probably because all of these pitches are miss-classified cutters or changeups. When we look at the other range, we see some odd behavior; elite velocity results in worm-burners for righties, but flyballs for lefties. A possible explanation is that both types of batters are late, just that being late manifests itself in different results. This is part of what makes truly elite velocity (high-nineties) coveted. It is also possible that there is a difference in pitch distribution on high-nineties fastballs to righties and lefties, but that does not seem like much of an issue; once you throw that hard, you probably don’t need to have a different approach against lefties and righties (with the fastball). We should also not be too hasty with these conclusions about pitches with extreme velocity, because the sample gets kind of small at these points.

Does Movement Matter?

Graph shows groundball rate by horizontal movement (pfx_x) for both right-handed batters (blue) and left-handed batters (red). Gray bands indicate confidence. Movement is defined as relative to ball thrown without spin. This is similar to what scouts refer to as “tail.” Since all these pitches are thrown by right-handers, nearly all of these fastballs have negative horizontal displacement. Most of the pitches with positive horizontal displacement are probably miss-classified cutters. Thankfully there aren’t too many of those.

For righties, groundball right spikes at about -7.5 inches. This is because this is where the line between four-seam and two-seam is blurred; naturally, two-seamers have more negative horizontal tail than four-seamers. So what we are seeing here is likely an example of “correlation, not causation.” Somewhat of a strange phenomenon is that pitches that are [likely] miss-classified cutters (positive on x-axis) cause many more groundballs for lefties than righties.

Graph shows groundball rate by vertical movement (pfx_z) for both right-handed batters (blue) and left-handed batters (red). Gray bands indicate confidence. Movement is defined as relative to ball thrown without spin. This is similar to what scouts refer to as “sink.” You may be surprised that most of these pitches have positive vertical movement. This is because even sinkers are still thrown with backspin. The average sinker has about 5 inches of vertical “rise” and the average four-seam has about 10 inches of “rise”. Again, rise here is a relative term, it doesn’t actually mean the force generated by the backspin of the pitch is stronger than gravity.

Unsurprisingly, pitches with more sink get more groundballs. Of interest is that the behavior is very similar for both righties and lefties, which we haven’t seen so far. Again, we have to be cautious with how much importance we place on vertical movement, due to the same problem we had with vertical location; the pitches with more sink are thrown lower in the zone. The reason the pitches with negative vertical movement get few groundballs is because they are probably miss-classifications.

Bringing it all together:

Using the same method described in the production of the heatmap (GAM model with smoothing and logit link) I created a model with all of these variables put together: horizontal movement and location, vertical movement and location, and velocity. I did not include anything about batter handedness. Horizontal movement had the highest p-value, being significant at a 95% level. All other smoothed terms were significant at a 99.9% significance level. Velocity and horizontal location had about the same significance. Being far more significant than anything else were vertical location (pz) and vertical movement (pfx_z), which is unsurprising. Because of the nature of the model, there is no concrete mathematical function to share like what we would get with linear regression. Additionally, a model with only pitch location as the variable(s) did a little better than a model with only movement as the variable(s). Seems that we can conclude vertical movement and vertical location are the most important factors.

Feel free to skip this part and go to the “finishing thoughts” section:

Now time to test out the model with some specific pitchers. I will exclude cutters. My database is updated through may 27th and I will use only 2011 data (except for one case):

known sinker-ballers:

Derek Lowe

Predicted GB% of 64% for only his sinkers (pretty much only fastball he throws)

Actual of 61.9% on only sinkers

Trevor Cahill

Predicted GB: of 55.5% for all fastballs (sinkers and four-seam)

Actual of 53.1% for all fastballs

Fausto Carmona

Predicted GB: of 57.7% for all fastballs (sinkers and four-seam)

Actual of 63.8% for all fastballs

I don’t have the computing power to run this model on a million pitchers, but it did a really good job with these extreme sinker-baller guys. I’m super pleased with these results so far, though obviously I chose the examples.

known flyball pitchers:

Jered Weaver:

Predicted GB: 32.1% for all fastballs

Actual of 27.5% for all fastballs

Colby Lewis:

Predicted GB: of 31.4% for all fastballs

Actual of 27.9% for all fastballs

Phil Hughes (2010):

Predicted GB: 32.9% for all fastballs

Actual of 30.00% for all fastballs

Ok, the model also handles extreme flyball pitchers well.

non-extreme pitchers:

A.J. Burnett:

Predicted GB%: 40.9%

Actual: 29.1%

James Shields:

Predicted GB: 38.9% for all fastballs

Actual: 32.9% for all fastballs

Dan Haren:

Predicted GB: 34.8% for all fastballs

Actual: 33.7% for all fastballs

The model did not do as well for these three pitchers that I selected.

others:

Felix:

predicted 50.8% for all fastballs

actual 38.8%

Ivan Nova:

Predicted 43.3% on all fastballs

Actual: 54.1%

Finishing Thoughts:

Through admittedly unscientific methods, the model seems to perform pretty well overall. Again, these predictions were made using only velocity, movement, location, and the knowledge that the batter had put the ball into play. This suggests that a great deal of batted ball results can be explained by the actual pitch. What we don’t know is the size of the effect of sequencing (previous pitches) and deception. I initially attempted to include data about the previous pitch thrown (as suggested by our own Will Moller) but that quickly became more complicated than originally anticipated. We have looked into what makes a groundball pretty thoroughly, but seems that this exploration also carries with it a message about both the power and limitations of pitch f/x data.

The implications of information like this are also relevant. Theoretically, we can use models like this to get around small sample sizes. Assuming pitch f/x data normalizes faster than traditional statistics, we can use models based on pitch f/x to get faster reads on prospects and other notable pitchers. We can also use this information to confirm common sense, which is what mainly happened here. Perhaps one day we will be able to construct a pitch f/x ERA…

-

*Pitch f/x data from MLBAM through Darrel Zimmerman’s pbp2 database
*http://princeofslides.blogspot.com/ – used as reference for R code/logistic regression

16 thoughts on “What Makes a Groundball?

  1. I'm totally speechless Josh. This is flat out awesome!

  2. jerkblog

    This kicks ass and articles like this are why this blog is one of my first checks every day

  3. Very well done. I'm incredibly impressed. I feel like I'm going to be coming back to this post a lot as a reference.

  4. BrienJackson

    On the one hand Josh, I totally love you and never want you to leave me. On the other hand; good gravy do you make me feel like a failure at life.

  5. THE ENTIRE INTERNET MUST KNOW ABOUT THIS. PLEASE tell me you posted it to FanGraphs Community Research.

  6. Josh, this is outstanding! Excellent job.

  7. I am so, so obsolete. Great job, Josh!

  8. "Somewhat of a strange phenomenon is that pitches that are [likely] miss-classified cutters (positive on x-axis) cause many more groundballs for lefties than righties."

    Actually, this makes a lot of sense to me. It's clear from the plot that an pitch that moves towards the handle of the bat causes grounders. For a right-handed pitcher to induce this effect he'd need to throw a tailing four-seamer or two-seamer to right-handed hitters, or a cutter or slider to left-handed hitters.

    What really could be construed as weird is how a zero-movement fastball induces more grounders from lefty hitters than a negative-movement fastball, but I believe there's a reason for that, too. Righty pitchers throw at an angle moving towards a lefty hitter, which can create a jamming situation. A negative movement effect would "straighten" the pitch, keep it over the play, and move it away from the handle. At the same time, zero-movement fastballs move away from the handle of a righty's bat, which could account for the dip in ground ball rate seen there.

    • Many of the zero-movement fastballs are miss-classified cutters. That may not make sense, but if you think about it, no pitchers throw 100% over the top (except maybe Josh Collmenter), so there's pretty much always going to be some tail on fastballs, so very few fastballs have 0 tail. The 0 movement on cutters is due to the wrist pronation and arm-angle canceling each other out, or something like that. The pitch does break into lefties though.

      • Right, I know; I'm familiar with the biomechanics and even more so with the fluid dynamics, so we're on the same page with that. The point I was trying to get at was that the rise in ground-ball rate for left-handed hitters as a pitch's lateral movement increases from negative to zero makes sense to me, and that I wouldn't consider it a strange phenomenon. I bet that if you ran your analysis with all pitch types you'd see similar trends and a direct correlation between ground ball rate and how inside the pitch is.

  9. danmerqury

    This is unbelievable work. I'm floored.

  10. Jeff

    It would be interesting to see if you could use this to predict how minor league pitches will hold up in the majors

  11. Anna McDonald

    I'm closing up shop — grabbing the vodka — and leaving all the baseball writing to you.

  12. what the eff Josh? I officially do not believe you are only the age you are. Totally freaking brilliant! Ladies and gentleman, I think we should all start referring to Josh as Dave Cameron, Jr. Like Brien, I feel completely inadequate.

  13. Amy

    Would you be willing to do the same study for LHP against the same batter data? I would love to know if the data is consistent with the inverse/symmetric dependent on the pitcher.

    • Amy

      (symmetric dependent on the throwing hand of the pitcher**)

Comments are closed.