A Groundball Problem

Have pitchers changed their approach to Jeter?

It would be hasty to immediately attribute all variation in Jeter’s performance to him alone, so it’s necessary to determine whether or not pitchers are simply throwing to him differently. A large part of pitching is pitch selection. Thankfully, there are a few great resources available that measure pitch selection. One such resource is the Baseball Info Solutions data available at Fangraphs.  An advantage of BIS data is that it dates back to 2002, which is great. Unfortunately, the pitch classifications are performed by – get this – humans. Gross! In all seriousness, this is not a good classification system because the individuals who actually manually classify the pitches are doing so by watching the same video feeds that we as fans use to watch the games on tv. The problem here is that most stadiums use an offset camera for tv broadcasts. This has the unfortunate effect of distorting the appearance of pitches, and the ability to determine depth is also compromised. For example, when you watch a C.C. Sabathia start, his slider looks like it has this ridiculous sweeping motion. It doesn’t. You may also notice that basically all left-handed breaking balls are distorted in this manner. This increases inaccuracy of pitch classifications to the point where they are not very useful in situations like this.

Now that we have shunned BIS data, we turn to the almighty pitch f/x data. Pitch f/x data comes from a system that is owned by Sportsvision, and is operational in all major league stadiums and some minor league ones. It works by using three cameras that take images of the pitched ball to determine position in space, and then uses physics equations to extrapolate movement, break, velocity, and final location. We can use this pitch f/x data to create pretty accurate pitch classifications which are used in gameday.  Unfortunately, these gameday (MLBAM) classifications are not the same every year. This is because the classification algorithm is sometimes tweaked to improve the system. While the improvements are welcome, they compromise our ability to compare pitch selection from different years. For example, according to MLBAM classifications, the percentage of two-seamers jumped over 10% from 2009 to 2010. This is obviously because the algorithm was tweaked, not because pitchers actually threw a drastically higher number of two-seamers. Because of the inconsistency of these classifications, I have attempted to create my own classification system. Using k-means clustering, I have been able to cluster Jeter’s data into 5 pitches: Four-seam (FF), Two-seam (FT), Curveball (CU), Slider (SL), and Changeup (CH). I originally wanted to include cutters, but that messed everything up so I had to leave the number of clusters at 5. Cutters were absorbed primarily into the FF cluster. I originally used velocity, movement, and pitch location to create the clusters. Turns out that pitch classification is difficult, so I had to include the original MLBAM classifications to help separate out the clusters. I also only used pitches from right handers for this data (and the entire post). I am actually quite pleased with the results; the “centroids” of these clusters actually very closely approximate their MLBAM counterparts. In other words, the characteristics of my classifications closely match that of the default pitch f/x classifications. I can provide more information about the clusters if necessary in the comments. Here is the change in pitch distribution from 2009 to 2010 using my classifications:

FF = four-seam. CU = curveball. CH = changeup. SL = slider. FT = two-seam. Shown here is the difference in the proportions of these pitches from 2009 to 2010. Above zero means that the pitch was thrown more to Derek in 2010 than 2009. You can click on the graph (and all following graphs) to enlarge.

As you can see in the graph, pitchers did throw Jeter more two-seamers than in 2009. However, they also threw him less sliders than before, which mitigates the effect. It’s important to note that this is not a large difference, merely about 8%. Default classifications found an increase of 15%. It’s also important to keep in mind that I had to use the original classifications in my clustering analysis, so its likely that the actual difference is less than the 8% that I found. However, this analysis does suggest that he saw more grounder inducing pitches in 2010 than before.

Also important to consider is where Derek was being pitched:

This graph is from the catcher’s perspective, so the right side is near left handed batters and the left side is near right handed batters (like Jeter). Blue represents locations where Jeter was thrown more pitches in 2010 than 2009, and red represents the locations where Jeter was thrown less pitches in 2010 than 2009. The box is the strikezone. You can click on the graph (and all following graphs) to enlarge. Again, this is only for pitches thrown by right handers because I had to cut out left handers to perform the k-means analysis.

As you can see in the graph, Jeter was pitched to down and in more often than before. This finding was unexpected, but some caution needs to be used here. This graph may indicate more about Jeter than the actual pitchers. Consider the fact that Jeter saw significantly less pitches per plate appearance in 2010 than 2009. This can be partly explained by the observation that he was swinging earlier in the count than in 2009. It’s possible that pitchers generally threw to the outside part of the plate when the count became deep, but in 2010 Jeter was in less deep counts than 2009 so pitchers threw a lower portion pitches to the outside part of the plate. Additionally, these difference in pitch densities are not really that big. What we need to consider is whether or not this new set of locations is more conducive to groundballs than 2009. Common sense would tell us that more pitches down and in (and less up and away) means more groundballs. However, the data tells us that Jeter actually hits a lot groundballs on pitches that are thrown to either side of the plate. To provide a more conclusive answer, I was able to create a model (loess) that predicts groundball rate by pitch location using all of the data from 2009-2011. I then used the model to predict a groundball rate for the 2009 and 2010 seasons. My model predicted a groundball rate of 67% for 2009 pitch locations and a rate of 68% for 2010 pitch locations. This suggests that yes, these pitch locations were more likely to induce groundballs than before, but not tremendously so.

Based on these analyses, we can see that pitchers did alter their approach to Jeter in 2010 in a manner that was likely to produce more ground balls. However, this change in approach was not so large that we can immediately attribute the ground ball % increase to only pitchers.

Jeter’s Changes

This graph shows the ground ball percentage by horizontal location. The smoothed lines are loess regression lines. Gray bands indicate confidence and the dotted lines at (-1,1) represent the horizontal borders of the plate.

As you can see, Derek hit a much higher proportion of ground balls on pitches that were middle to middle-away in 2010 than 2009. On pitches down the middle in 2009, he hit grounders 50% of the time, but in 2010 this number jumped to 60%. I am not qualified to judge batting mechanics, but this data does seem to suggest that Jeter is having trouble covering the outside part of the plate. Is this a plate coverage issue? If so, is this due to old age? Perhaps the flexibility in his swing has decreased. It’s possible that muscle memory also plays a role here, as Jack Clark and Anna discuss here. We also need to consider where he is swinging. If Jeter is swinging more frequently at locations that produce ground balls (i.e down and away), then obviously that’s going to lead to more ground balls. When looking at his swing contours, the locations of his swings seem similar. However, if we calculate Jeter’s O-swing – the percentage of balls that he chases – we find that he had a value of 26% for 2010 and 21% for 2009. Fangraphs’ BIS values corroborate these pitch f/x obtained figures. If Jeter is making more contact on balls, then it’s likely that he’s going to hit more grounders given that balls are usually not very good balls to hit. We also need to determine what types of pitches Jeter is swinging at. If he started swinging at 30% more two-seamers in 2010 than 2009, then that would explain almost everything.

Here is a graph of the difference of his swing percentages by pitch type:

FF = four-seam. CU = curveball. CH = changeup. SL = slider. FT = two-seam, again using my classifications. Shown here is the difference in Jeter’s swing percentages by pitch type from 2009 to 2010 against righties. Above zero means that Jeter swung more at the pitch in 2010 than 2009. The sum does not equal zero because Jeter swung a little more overall in 2010 than 2009.

As you can see, Jeter swung more at two-seamers, sliders, and changeups. As these are the three pitches that he has the highest ground ball percentage on, it’s clear that this contributed to the ground ball spike.  However, these differences of around 5% are really quite small. It’s also possible that pitchers changed when they threw certain pitches. This becomes an issue because Jeter does not swing an equal amount in all situations, so it’s difficult to determine how much of this change we can attribute to Jeter.

Random Variation

In individual seasons, crazy things can happen. In a single season, Brady Anderson can hit 50 homeruns. Nick Swisher can “hit” .219. There are nearly endless examples of seasons that just shouldn’t happen. Perhaps Jeter’s ground ball increase is just a fluke? As Brien said, baseball is a cruel mistress. Yet ground balls are not as consumed by fate as batting average and homeruns; ground ball rate actually stabilizes very quickly. It stabilizes so quickly that you don’t even need a tenth of a full season to judge it. The fact that this grounder trend has not only continued, but progressed, in 2011 seems to further preclude the possibility of luck playing a big role. It’s also possible to perform what’s called a hypothesis test. If we perform a 2 sample 1 sided z-test for proportions, we find that the probability of Jeter having a grounder percentage has high as he did in 2010, assuming that there is no difference in the true proportions for these years, is .0036 (significant at 99.5% level). This is very strong evidence that his grounder increase in 2010 was not due to chance. However, this assumes that all observations are independent, which in reality they are not. This also assumes that the pitchers used the same approach to Jeter in both years, but as I have shown that does not exactly seem to be the case. For these reasons you should take this test with a grain of salt.

Finishing Thoughts

It appears that the biggest finding here is Jeter’s ground ball troubles on pitches that are middle to middle – away. This suggests a mechanics issue. Is this due to old age? I can’t say. Pitchers slightly altering their approach and Jeter’s small change in swing tendencies also seem to contribute here. Although determining the cause of his ground ball woes is complicated, the consequences are simple. He currently has a wOBA of .279. Should this performance continue through 700 plate appearances, Jeter will 28 runs below average on offense. If we put his fielding “contribution” at -10 runs, Jeter is now at -38 runs. Throw in 24 runs for replacement level, 6 runs for positional adjustment, and 5 runs for baserunning, and Jeter’s total becomes -3 runs. In other words, if Jeter continues to play the way he is now, he will be a replacement level player. I’m not sure even the Yankees can afford to hand a replacement level player a full season’s worth of playing time.

That sounds…terrible. Thankfully, that situation is not likely to occur. Even last year, Jeter was an above average major leaguer, as measured by fWAR. We have simply been spoiled by an all-star hitting shortstop. The Yankees also happen to have a very good hitting coach in Kevin Long. Hopefully, Mr. Long and Jeter can figure out what is causing this ridiculous amount of ground balls. This issue has plagued Jeter last year and early in this season. Without improvement, the next few years are going to be very rough for the Yankee shortstop.


*Pitch f/x data from MLBAM through Joe Lefkowtiz’ pitch f/x tool.
*Jeter statistics from fangraphs

29 thoughts on “A Groundball Problem

  1. "The problem here is that most stadiums use an offset camera for tv broadcasts."

    I am amazed that this remains the case after all these years.

  2. "However, ground balls aren’t all bad news for offense; they actually have a much higher batting average on balls in play than flyballs. "

    It's worth pointing out that there's probably some classification noise in that fact, as most balls in the air that end up as hits get classified as line drives.

  3. "grounders actually have a much higher batting average on balls in play than flyballs."

    There is no correlation between GB% and BABIP whatsoever.

  4. The data supports the obvious and, if anything, understates the problem. There is a big difference from hard hit grounders versus weak grounders. Derek has not shown any ability this year to square up a ball and has more hits this year from poor contact that results in a slow grounder that a charging infielder can't field in time (plus a lucky chopper). versus the hard grounder through the hole that his career has been based on. Good luck trying to make that work for you over 162 games. For Yankee/Jeter fans, the question is what can/should be done to correct this. The recently reported Cashman/Jeter fielding range conversation is illuminating as to how Jeter thinks/works. The trend and data supports the conclusion that Derek just has not yet accepted the fact that his hitting approach is not working at this stage of his career. As they say the definition of insanity is doing the same thing over and over again and expecting different results….

  5. Josh, you wrote that "ground ball rate actually stabilizes very quickly. It stabilizes so quickly that you don’t even need a tenth of a full season to judge it." That's correct if you're looking at ground ball rates for a group of players, but not if you're looking at an individual's ground ball rate. The article you cited so states, and it's also evident from the numbers. Look at line drive percentage, as it well illustrates the point. For a group of players, line drive percentage stabilizes at the same place as ground ball percentage, at fewer than 40 plate appearances. But today, we currently have 47 baseball players with a minimum of 72 plate appearances and a FanGraphs line drive percentage below 15%. For the full season last year (minimum 500 plate appearances), there were 8 such players. The year before: 2 such players. The year before that: 1 such player.

  6. This is an incredible article. Thank you for doing such an in-depth job. The only thing I would change would be for the horizontal axis on the two pitch type charts to be the same.

  7. Great article! I like to thanks yoh for the research and writing something about Jeter and you know…what happens during games.

  8. Larry, about the second part of my statement, I was simply trying to say that, perhaps do to a mechanical issue or something, a batter might not be able to drive pitches they once did for linedrives (and instead may hit ground balls). I did not mean that batters can choose the outcome of their BIP.

    And about BIP classification issues: Correct me if I'm wrong, but most of the ambiguity is with determining what constitutes a flyball vs. a linedrive. One would think that it's pretty easy to classify a ground ball for the most part, or at least much easier than classifying linedrives or flyballs. And yea, pure batted ball info does not tell us a huge amount about BABIP. However, it's certainly not worthless info, and the examples that you point out here are players that are unique.

    And as you and others have pointed out, I am making the assumption that all of Jeter's ground balls are the same from year to year. It's hard to say how reliable this assumption is. A lot of people seem to think that he's hitting way more weak grounders this year. This may be true. And this may also just be confirmation bias. I really don't like having to go off my memory or personal observations to make judgments like that. That led me to make the assumption that I did.