Tuesday, May 21, 2013

Further Quantifying Pitching Excellence

This article was written at the beginning of November 2012, so the Cy Young Awards had not been voted on yet. 
 
As a baseball enthusiast and wildly unsuccessful former high school pitcher, I have always been fascinated by the greatness of a dominant pitcher.  As a child, I was lucky enough to watch the mastery of Greg Maddux and the dominance of Pedro Martinez.  At that time, I wasn’t sure how to calculate ERA, but I knew that Maddux’s seasons in the 90s under 2.00 were special.  Later, as I matured and developed a strong liking of numbers and all things mathematical, I found myself pouring over tables and tables of statistics, believing that the numbers could reveal true greatness.  In every statistic, there are inherent weaknesses, none of which need to be discussed in this forum.  Gone are the days that ERA and Wins dominate the statistical landscape.  They’ve been replaced with FIP and SIERA, both highly useful and well thought out statistics.  In the end though, I found myself wanting more.  To satiate my want, I found myself doing what every stat geek and math nerd would have done.  I opened up an Excel spreadsheet and went to work. 
 
The goal of DIPS theory and FIP was to quantify a pitcher’s effectiveness by only measuring things that he could control.  Voros McCracken’s research from the early 2000s told us that pitchers have little to no control over balls put in play.  FIP essentially tries to measure the exact opposite of BABIP.  There’s a lot of merit to this idea.  Pitchers that do not walk hitters and avoid giving up home runs are generally more successful that those that fail in these areas, something Greg Maddux taught me all those years ago. 

There is still something to be said though for a pitcher that just avoids solid contact, whether the ball leaves the yard or not.  Naturally, I’m not the first person to have this theory.  Balls in play are included in the calculations for both tERA and SIERA.  The problem with these statistics is that they are very complicated to understand.  I set out to find a much simpler method of determining a pitcher’s value.  This brings us to the basis of my study, the average hit given up by a pitcher.  After suffering through a 3-0 high school playoff loss some years ago in which the pitchers threw dueling three hitters with very different outcomes, it is safe to say that simply eliminating hits does not necessarily guarantee success as a pitcher.  Using very simple statistics, it is easy to figure out what pitcher “gets hit the hardest.”  The formula is Average Hit (AH) = SLG/BAA = TB/H.  If we take all qualified pitchers from the 2012 season, here are the pitchers that induced the weakest contact and those that got hit the hardest.

Pitcher
AH
Pitcher
AH
Felix Hernandez
1.38
Ervin Santana
1.95
Jake Westbrook
1.39
Derek Holland
1.84
David Price
1.41
Phil Hughes
1.78
Lucas Harrell
1.43
Ivan Nova
1.77
Josh Johnson
1.44
Mike Minor
1.75
Justin Masterson
1.44
James McDonald
1.73
Jarrod Parker
1.44
Edwin Jackson
1.73
Gio Gonzalez
1.45
Bruce Chen
1.73
Johnny Cueto
1.45
Jason Vargas
1.72
Tim Hudson
1.46
Tommy Hanson
1.71
 
As you might expect, the pitchers that excel at this category are generally either “dominant” pitchers, such as Felix Hernandez and David Price, or sinkerball pitchers, such as Jake Westbrook and Justin Masterson.  Flyball pitchers tend to find themselves in the right column.  There are many factors that affect the average hit though that are not accounted for, namely park and defense.  Not everyone gets to throw 125 innings in Safeco Field or AT&T Park.  Others gain benefit by pitching in front of strong defensive clubs such as the Braves and Angels.  The first adjustment to make is for the parks.  Now, it would foolhardy and shortsighted to simply adjust based on a pitcher’s home park.  For example, Matt Cain throws the majority of his innings in AT&T Park, but he also has to throw a handful of innings at Coors Field.  Based on innings pitched in each park, I calculated a weighted park factor for each pitcher, signified by PPF.  I’ll leave the nitty gritty details of this calculation out of this explanation.  The following shows with pitchers pitched in the most hitter friendly and most pitcher friendly environments this season.
 
Pitcher
PPF
Pitcher
PPF
Clay Buchholz
1.109
Felix Hernandez
0.851
Jon Lester
1.107
Madison Bumgarner
0.913
Jeremy Guthrie
1.097
Jason Vargas
0.914
Josh Beckett
1.088
Ryan Vogelsong
0.922
Gavin Floyd
1.066
Tim Lincecum
0.923
Jake Peavy
1.058
Matt Cain
0.924
Trevor Cahill
1.057
Dan Haren
0.926
Wade Miley
1.054
Barry Zito
0.933
Chris Sale
1.052
A.J. Burnett
0.941
Derek Holland
1.051
R.A. Dickey
0.942

The adjustment for park is applied directly to the average hit allowed as calculated above.  To adjust, I simply divided the average hit by each pitcher’s park factor.  For example, the average hit allowed by both Jake Peavy and Madison Bumgarner was 1.65 total bases.  After adjustment, Jake Peavy would have theoretically allowed 1.56 total bases on a neutral field, and Madison Bumgarner would have allowed 1.81.  The top ten and bottom ten in adjusted average hit (adjAH) are listed below.
 
Pitcher
adjAH
Pitcher
adjAH
Jake Westbrook
1.35
Ervin Santana
2.04
Gio Gonzalez
1.42
Jason Vargas
1.88
Johnny Cueto
1.42
James McDonald
1.83
Rick Porcello
1.42
Dan Haren
1.82
David Price
1.43
Ivan Nova
1.81
Trevor Cahill
1.44
Madison Bumgarner
1.81
Tim Hudson
1.44
Phil Hughes
1.81
Lucas Harrell
1.44
Tim Lincecum
1.80
Justin Masterson
1.45
Matt Cain
1.76
Luis Mendoza
1.46
Derek Holland
1.75
 
Assuming that baserunners do not take any extra bases with a ball in play in order to keep the calculations simple, I can now calculate how many hits it takes to score a theoretical run simply by dividing four total bases by the adjAH (i.e. Jake Westbrook gives up a run every 4/1.35=2.97 hits).  With this information and knowing how many hits a pitcher has allowed throughout a season, I can calculate how many runs a pitcher should have given up this year.  Continuing with the Jake Westbrook example, 191 hits allowed/2.97 hits per run gives us 64.29 runs allowed.  Using this run total and the basic ERA formula, I can figure an ERA component based solely on hits allowed.  I call this HERA.  The top and bottom ten pitchers for the 2012 season are:

Pitcher
HERA
Pitcher
HERA
Gio Gonzalez
2.38
Ivan Nova
4.64
David Price
2.64
Dan Haren
4.40
Clayton Kershaw
2.69
Ervin Santana
4.26
Justin Verlander
2.74
Bruce Chen
4.25
Yu Darvish
2.77
Mike Leake
4.21
Chris Sale
2.93
Phil Hughes
4.16
Jered Weaver
2.93
Joe Blanton
4.11
Trevor Cahill
2.98
Rick Porcello
4.10
Johnny Cueto
3.01
Henderson Alvarez
4.09
Tim Hudson
3.04
Ubaldo Jimenez
4.04
 
While this is a nice start, it does not tell the whole story.  As we all know, pitchers also give up earned runs by walking batters.  Let’s call this component WERA.  Once again using the theory of four total bases per earned run, I can calculate the runs given up by walks.  Like before, these runs are then inputted into the standard ERA formula to output another component ERA.  The best and worst ten pitchers of 2012 at eliminating runs via the walk are:

Pitcher
WERA
Pitcher
WERA
Cliff Lee
0.30
Ricky Romero
1.31
Bronson Arroyo
0.39
Edinson Volquez
1.29
Joe Blanton
0.40
Ubaldo Jimenez
1.21
Scott Diamond
0.40
Tim Lincecum
1.09
Kyle Lohse
0.41
Aaron Harang
1.06
Tommy Milone
0.43
Yu Darvish
1.05
Wade Miley
0.43
Matt Moore
1.03
Clayton Richard
0.43
C.J. Wilson
1.01
Mark Buehrle
0.44
Justin Masterson
0.96
Dan Haren
0.48
Tommy Hanson
0.91
 
If I sum these two components, I get an initial estimate of how dominate a pitcher was this season.  I have yet to adjust for defense though.  Since I was interested in runs in this study, I used Defensive Runs Saved (DRS) as the metric for adjustment.  Taking a team’s total Defensive Runs Saved for the season and dividing by the total innings pitched by a team gives me theoretically the Defensive Runs Saved per inning.  Multiplying this by the innings pitched by a pitcher gives the theoretical runs saved while a pitcher was on the mound.  Once again, I took the runs saved and filled it into the standard ERA formula to give a component for calculation.  It is worth noting that some of these values are negative and indicate poor defensive performance.  The summation of the three components outputs a subtotal for estimated ERA, or eERA.  As is done with the FIP calculations, a constant is added to make the average eERA equal to the average ERA.  Using this metric, the best and worst pitchers from 2012 are: 

 
Pitcher
eERA
Pitcher
eERA
Justin Verlander
3.10
Ricky Romero
5.51
Gio Gonzalez
3.16
Tommy Hanson
5.40
Clayton Kershaw
3.35
Ervin Santana
5.39
R.A. Dickey
3.39
Dan Haren
5.25
David Price
3.40
Ivan Nova
5.24
Lucas Harrell
3.56
Henderson Alvarez
5.11
Kyle Lohse
3.57
Tim Lincecum
5.02
Chris Sale
3.58
Ubaldo Jimenez
4.93
Josh Johnson
3.60
Mike Leake
4.93
Jordan Zimmermann
3.60
Bruce Chen
4.88
 
The natural question to ask at this point is how well does eERA estimate ERA, and how does it compare to other ERA estimators?   


A strong correlation seems to exist between eERA and ERA, but how does this compare to other more widely accepted ERA estimators?  First, let’s look at how well FIP estimates ERA.  It is worth noting that all the following statistics were adjusted so that the average ERA, eERA, FIP, tERA, and SIERA of the 88 pitchers used in this study were equal.

 
 
 

As you can see, a strong relationship exists when using either eERA, FIP, or tERA.  The linear correlation goes down considerably when we use SIERA, which is surprising as it is widely considered to be a better estimator than tERA.  Of all the data presented though, eERA shows the strongest correlation.  There is not a large difference between eERA and tERA.  If you remove the high outlier on the tERA near 6.00 (Jeremy Guthrie), the correlation increases to 0.6329, which is still weaker than eERA.  Admittedly, this metric is not perfect, but what metric truly is?  I welcome feedback on the information I have presented here.  With the Cy Young winners yet to be announced, it will be interesting to see if Justin Verlander and Gio Gonzalez actually take home the prizes after leading their respective leagues in eERA.  Bill James and Rob Neyer’s Cy Young Predictor currently lists Verlander as the fourth best candidate in the American League and Gio Gonzalez as second in the National League.  The favorites by that metric are David Price and R.A. Dickey, who would be second and third in their leagues respectively by eERA.

 
--Stats All Folks


No comments:

Post a Comment