Friday, May 24, 2013

Luck and the Game of Baseball

Luck is one of the most unquantifiable aspects of anything.  That's no different when you consider the game of baseball.  As a case study on the effects of luck in baseball, I'd like to take a few minutes to consider the 2012 MVP campaign of Detroit Tigers 3B Miguel Cabrera.  Let me preface this discussion by saying I'm not trying to reenter the Trout/Cabrera debate over who should have won the MVP.  I just found the case of Cabrera quite interesting. 

As for a little background on Cabrera, he burst on the scene in 2003 as a midseason callup for the then Florida Marlins.  That Marlins squad, led by a strong pitching staff, went on to defeat the New York Yankees in the World Series.  By far and away, the two highest risers off of that Marlins squad were Miguel Cabrera and Josh Beckett.  As is typical for strong performers in Miami, neither remained on the team as long as fans would have liked.  During the 2007 offseason, Cabrera was traded along with pitcher Dontrelle Willis to the Detroit Tigers in exchange for mostly unremarkable prospects.  Cabrera has always been a strong hitter, having not posted a batting average less than .290 since his rookie season.  One of the major aspects of his hitting prowess is his remarkable consistency.  His career batting average and slugging percentage have increased each year, except for his first year in Detroit (2008).  We can most likely chalk that up to a combination of playing in a tougher league, adjustment issues playing in a new city, and pressing to live up to the expectations of the trade.


Noting Cabrera's remarkable consistency, what changed Miguel from just a really great hitter into a MVP and Triple Crown winner?  One of the first things that came to mind was the fact that in 2012 Cabrera got to hit each day in front of Prince Fielder.  While that might of had some effect, lineup protection is notably difficult to quantify.  Did the rest of the league collectively become worse hitters and Cabrera stay the same?  Regardless of cause though, what was the main difference in 2003-2011 Cabrera and 2012 Cabrera?  I would like to submit the idea that there wasn't a major change in Miguel Cabrera or the league for that matter.  The only difference for Cabrera was luck. 

The following table shows Cabrera's 2012 season against his averages from 2003-2011 over an equivalent number of plate appearances.  We can see two major changes.  First, Cabrera walked less and struck out less than years previous.  In other words, he was putting more balls in play.  Secondly, he saw quite the increase in power, hitting a home run every 16 plate appearances, as compared to 21 over his career. 

Years
PA
HR
H
AVG
SLG
OBP
OPS
BB%
K%
PA/HR
2003-2011
697
33
193
.317
.555
.395
.950
11.1%
17.5%
20.9
2012
697
44
205
.330
.606
.393
.999
9.5%
14.1%
15.8
 
Well you might argue that an increase in power is not luck, so this post is pointless.  On the surface, you might be right.  However, according to ESPN Home Run Tracker, Cabrera led all of MLB last year with six so-called lucky home runs.  A lucky home run is one that would not have cleared the fence without the weather conditions of that given day.  If we "remove luck from the equation" though, how would Cabrera's 2012 season have looked?  Considering that the majority of flyballs are converted into outs, if we simply remove those six lucky home runs and convert them into outs, this is how his 2012 line changes.
 
Years
PA
HR
H
AVG
SLG
OBP
OPS
BB%
K%
PA/HR
With Luck
697
44
205
.330
.606
.393
.999
9.5%
14.1%
15.8
Without Luck
697
38
199
.320
.568
.385
.952
9.5%
14.1%
18.3

As we should expect, we see a rather significant drop in SLG and OPS.  Now, let's compare this adjusted season line to his career line.
 
Years
PA
HR
H
AVG
SLG
OBP
OPS
BB%
K%
PA/HR
Career
697
33
193
.317
.555
.395
.950
11.1%
17.5%
20.9
Adjusted 2012
697
38
199
.320
.568
.385
.952
9.5%
14.1%
18.3

It's quite remarkable to me how similar these numbers are.  At the end of the day, Miguel Cabrera didn't make some astronomical leap to become the first Triple Crown winner since Carl Yazstremski.  He just played the same way he had his entire career and got "lucky" with six home runs.  Without these six home runs, is Miguel Cabrera the 2012 AL MVP?  No one knows that answer, and like I said before, I'm not trying to restart that debate.  I'm not trying to belittle Cabrera's accomplishments either.  The only word to describe his 2012 season is magical.  Lucky home runs happen to every one.  If he would have had six home runs robbed last year on top of these six not leaving the park, would people be asking what was wrong with Miguel Cabrera?  Most likely.  That's the point of this post.  Most of the time, there's nothing really wrong with a slumping player, and there's generally no major changes that led to a historic season for a player.  When you really sit down and look at things, luck plays a much bigger role than most people would like to admit.

--Stats All Folks

Wednesday, May 22, 2013

Wednesday's Worst

Here at the Twin States Twins Sports Blog we appreciate and admire greatness.  Mike Trout's cycle or Miguel Cabrera going yard yet again can be covered by just about anyone.  What people really like hearing about are just how bad some professional athletes are doing.  Granted these "bad" players are better that just about everyone else, but people always like to know at least they aren't doing as bad at their jobs as someone making 20 times the money.   Every Wednesday, we will look at the five players doing the worst in one statistical category. 

This week will focus on wins above replacement (WAR).  For those of you that are unaware, WAR is an all-encompassing stat that tries to quantify how many wins a player provides for your team over what would be considered the baseline major league baseball player.  This takes into account offense, defense, and baserunning.  According to a variety of sites, a team full of replacement level players would win about 47 ball games (which the Astros and Marlins could seriously challenge) in a standard 162 game season.  Considering that 90 wins is the gold standard for being a legitimate playoff contender, a team needs roughly 43 wins above replacement combined for all players.  A starter in the show should be able to accrue 2.0 WAR without too much difficulty. 

The current five worst players with respect to WAR (according to Fangraphs.com) are:

Player
WAR
Jeff Keppinger
-1.3
Victor Martinez
-1.1
Paul Konerko
-1.0
Rickie Weeks
-0.9
Ike Davis
-0.9
Matt Kemp
-0.9

I realize that is actually six players, but there is a three way tie for fourth worst.   Let's break each of these players down individually. 

Jeff Keppinger signed a 3 year/$12M deal with the White Sox this offseason.  He was also being courted by the Yankees to fill in for the injured Alex Rodriguez.  As a Yankees fan, I'm quite glad that he chose to play for the White Sox instead.   His offensive numbers are being drawn down by a miniscule 0.214 BABIP (batting average on balls in play) and a utterly ridiculous 1.3% walk rate.  He has only walked twice in 158 plate appearances.  Vlad Guerrero is somewhere thinking he was a patient hitter.   Keppinger has never been a very patient hitter, but if you aren't hitting the ball solid, you might as well try to work a walk every now and then.  He did manage to hit his first home run of the season last night, so things are starting to look up for him.  The low BABIP is 75 points lower than his career number, so you have to believe that he is due for a hot streak in the near future.  Up until now though, Jeff Keppinger is officially the worst player in Major League Baseball.

Victor Martinez is coming back after sitting out all last season with an ACL tear.  His WAR total is highly affected by the negative positional adjustment associated with being a DH, so he has a bit more of an excuse than Keppinger.  However, there's really no excuses for an offensive player this talented to be hitting this poorly.  His BABIP is a low 0.241, which is about 60 points lower than his career average, but the main issue here is the lack of power.  His ISO (slugging percentage minus batting average) is lower than such great sluggers as Ichiro, Jose Altuve, and Melky Cabrera Not on Steroids.  Granted Ichiro and Altuve are decent players, but Victor Martinez should be hitting with more authority than those notorious singles hitters.  He hits in the middle of the order with Prince Fielder and Miguel Cabrera.  I have a hard time believing he is not getting solid pitches to hit.  Martinez is a talented player, but given the DH positional adjustment, he has a realistic chance to finish the season with a negative WAR.

Paul Konerko is one of my favorite players to watch hit.  He's universally loved by White Sox fans and should be.  However, he is a terrible defensive player and possibly the worst baserunner in professional baseball.  He's clearly on his last legs as a ball player, and it is a shame that he is going out this way.  He's in the last year of his contract, and you have to assume he's retiring at the end of the season.  His walk rate is below career norms and his strikeout rate is above them.  When those two stats are both going in the wrong direction, it's most likely just not going to be a good season.  He's having a bit of bad luck to be sure (.242 BABIP), but this appears to be more of a situation where a player is at the end of his rope.  I hate to see this happen to Paulie, but he's had an outstanding career and will no doubt turn it around some as the season progresses.

Rickie Weeks is an interesting case.  He looked very good a few years ago hitting in front of Ryan Braun, Prince Fielder, and a healthy Corey Hart.  Of course, that protection will make a lot of hitters look good.  Weeks is simply not making contact so far this season.  He's striking out in 29.3% of his plate appearances.  That's going to be tough for most any hitter to overcome.  Couple that with a vastly increased groundball rate, and you're in for a rough patch.  His HR/FB is within reason for his career averages, but he's only getting 25.3% of balls in the air.  This is leading to less power, more groundballs, and a depressed BABIP (.229).  Until he starts putting the ball in play more often and getting it elevated a bit more when he does, he's just not going to be a useful player. 

Ike Davis was lauded last season as a guy that just got unlucky and would be a much better player this season just because of regression.  Clearly that hasn't worked out for him.  His BABIP is a nightmarishly bad .189, but Davis had a low BABIP last season as well (.246).  He is also striking out more than Rickie Weeks (30.6%).  Players are just going to be hard pressed to have success with that much difficulty making contact.  He has hit an inordinate amount of infield popups this season, so there's really no way to expect that to continue.  He is doing slightly better than last season about not swinging outside the zone, but he's making significantly less contact in the zone.  When you're seeing more pitching in the strike zone, but making less contact on those pitches, that's a lot of swings and misses.  Davis has a lot of raw power, but it's hard to tap into the one true skill you have if you can't put the ball in play.

Finally, we get to Matt Kemp.  Raise your hand if you expected Matt Kemp to be this bad so far this season.  For the first couple of months of last season, he was the best player in the NL.  The previous year, he was considered a favorite for the NL MVP.   His BABIP is right at his career averages, so he can't even say he's being that unlucky.  What is happening though is he is not hitting for any power.  He has a total of 11 XBH all season.  He currently has a HR/FB of 4.8% (compared to over 21% the last two seasons).   That number has to increase soon unless he has had his power zapped.  Perhaps he just isn't all the way back from shoulder surgery yet.  Perhaps it's going to be a lost year for him like last season was for Justin Upton.  For the struggling Dodgers, that can't be the case.  The Dodgers spent a ton of money this offseason, but Kemp was still expected to be the superstar.  He hasn't been remotely close to that so far.  Maybe he should worry more about hitting the baseball than dating Rihanna. 

In conclusion, we will update this column every Wednesday.  Let's hope that these names aren't being repeated every week.  Here at the Twin States Twins we hope these guys turn it around and turn into the ballplayers they are capable of being.

--Pinstripe Wizard

Tuesday, May 21, 2013

Further Quantifying Pitching Excellence

This article was written at the beginning of November 2012, so the Cy Young Awards had not been voted on yet. 
 
As a baseball enthusiast and wildly unsuccessful former high school pitcher, I have always been fascinated by the greatness of a dominant pitcher.  As a child, I was lucky enough to watch the mastery of Greg Maddux and the dominance of Pedro Martinez.  At that time, I wasn’t sure how to calculate ERA, but I knew that Maddux’s seasons in the 90s under 2.00 were special.  Later, as I matured and developed a strong liking of numbers and all things mathematical, I found myself pouring over tables and tables of statistics, believing that the numbers could reveal true greatness.  In every statistic, there are inherent weaknesses, none of which need to be discussed in this forum.  Gone are the days that ERA and Wins dominate the statistical landscape.  They’ve been replaced with FIP and SIERA, both highly useful and well thought out statistics.  In the end though, I found myself wanting more.  To satiate my want, I found myself doing what every stat geek and math nerd would have done.  I opened up an Excel spreadsheet and went to work. 
 
The goal of DIPS theory and FIP was to quantify a pitcher’s effectiveness by only measuring things that he could control.  Voros McCracken’s research from the early 2000s told us that pitchers have little to no control over balls put in play.  FIP essentially tries to measure the exact opposite of BABIP.  There’s a lot of merit to this idea.  Pitchers that do not walk hitters and avoid giving up home runs are generally more successful that those that fail in these areas, something Greg Maddux taught me all those years ago. 

There is still something to be said though for a pitcher that just avoids solid contact, whether the ball leaves the yard or not.  Naturally, I’m not the first person to have this theory.  Balls in play are included in the calculations for both tERA and SIERA.  The problem with these statistics is that they are very complicated to understand.  I set out to find a much simpler method of determining a pitcher’s value.  This brings us to the basis of my study, the average hit given up by a pitcher.  After suffering through a 3-0 high school playoff loss some years ago in which the pitchers threw dueling three hitters with very different outcomes, it is safe to say that simply eliminating hits does not necessarily guarantee success as a pitcher.  Using very simple statistics, it is easy to figure out what pitcher “gets hit the hardest.”  The formula is Average Hit (AH) = SLG/BAA = TB/H.  If we take all qualified pitchers from the 2012 season, here are the pitchers that induced the weakest contact and those that got hit the hardest.

Pitcher
AH
Pitcher
AH
Felix Hernandez
1.38
Ervin Santana
1.95
Jake Westbrook
1.39
Derek Holland
1.84
David Price
1.41
Phil Hughes
1.78
Lucas Harrell
1.43
Ivan Nova
1.77
Josh Johnson
1.44
Mike Minor
1.75
Justin Masterson
1.44
James McDonald
1.73
Jarrod Parker
1.44
Edwin Jackson
1.73
Gio Gonzalez
1.45
Bruce Chen
1.73
Johnny Cueto
1.45
Jason Vargas
1.72
Tim Hudson
1.46
Tommy Hanson
1.71
 
As you might expect, the pitchers that excel at this category are generally either “dominant” pitchers, such as Felix Hernandez and David Price, or sinkerball pitchers, such as Jake Westbrook and Justin Masterson.  Flyball pitchers tend to find themselves in the right column.  There are many factors that affect the average hit though that are not accounted for, namely park and defense.  Not everyone gets to throw 125 innings in Safeco Field or AT&T Park.  Others gain benefit by pitching in front of strong defensive clubs such as the Braves and Angels.  The first adjustment to make is for the parks.  Now, it would foolhardy and shortsighted to simply adjust based on a pitcher’s home park.  For example, Matt Cain throws the majority of his innings in AT&T Park, but he also has to throw a handful of innings at Coors Field.  Based on innings pitched in each park, I calculated a weighted park factor for each pitcher, signified by PPF.  I’ll leave the nitty gritty details of this calculation out of this explanation.  The following shows with pitchers pitched in the most hitter friendly and most pitcher friendly environments this season.
 
Pitcher
PPF
Pitcher
PPF
Clay Buchholz
1.109
Felix Hernandez
0.851
Jon Lester
1.107
Madison Bumgarner
0.913
Jeremy Guthrie
1.097
Jason Vargas
0.914
Josh Beckett
1.088
Ryan Vogelsong
0.922
Gavin Floyd
1.066
Tim Lincecum
0.923
Jake Peavy
1.058
Matt Cain
0.924
Trevor Cahill
1.057
Dan Haren
0.926
Wade Miley
1.054
Barry Zito
0.933
Chris Sale
1.052
A.J. Burnett
0.941
Derek Holland
1.051
R.A. Dickey
0.942

The adjustment for park is applied directly to the average hit allowed as calculated above.  To adjust, I simply divided the average hit by each pitcher’s park factor.  For example, the average hit allowed by both Jake Peavy and Madison Bumgarner was 1.65 total bases.  After adjustment, Jake Peavy would have theoretically allowed 1.56 total bases on a neutral field, and Madison Bumgarner would have allowed 1.81.  The top ten and bottom ten in adjusted average hit (adjAH) are listed below.
 
Pitcher
adjAH
Pitcher
adjAH
Jake Westbrook
1.35
Ervin Santana
2.04
Gio Gonzalez
1.42
Jason Vargas
1.88
Johnny Cueto
1.42
James McDonald
1.83
Rick Porcello
1.42
Dan Haren
1.82
David Price
1.43
Ivan Nova
1.81
Trevor Cahill
1.44
Madison Bumgarner
1.81
Tim Hudson
1.44
Phil Hughes
1.81
Lucas Harrell
1.44
Tim Lincecum
1.80
Justin Masterson
1.45
Matt Cain
1.76
Luis Mendoza
1.46
Derek Holland
1.75
 
Assuming that baserunners do not take any extra bases with a ball in play in order to keep the calculations simple, I can now calculate how many hits it takes to score a theoretical run simply by dividing four total bases by the adjAH (i.e. Jake Westbrook gives up a run every 4/1.35=2.97 hits).  With this information and knowing how many hits a pitcher has allowed throughout a season, I can calculate how many runs a pitcher should have given up this year.  Continuing with the Jake Westbrook example, 191 hits allowed/2.97 hits per run gives us 64.29 runs allowed.  Using this run total and the basic ERA formula, I can figure an ERA component based solely on hits allowed.  I call this HERA.  The top and bottom ten pitchers for the 2012 season are:

Pitcher
HERA
Pitcher
HERA
Gio Gonzalez
2.38
Ivan Nova
4.64
David Price
2.64
Dan Haren
4.40
Clayton Kershaw
2.69
Ervin Santana
4.26
Justin Verlander
2.74
Bruce Chen
4.25
Yu Darvish
2.77
Mike Leake
4.21
Chris Sale
2.93
Phil Hughes
4.16
Jered Weaver
2.93
Joe Blanton
4.11
Trevor Cahill
2.98
Rick Porcello
4.10
Johnny Cueto
3.01
Henderson Alvarez
4.09
Tim Hudson
3.04
Ubaldo Jimenez
4.04
 
While this is a nice start, it does not tell the whole story.  As we all know, pitchers also give up earned runs by walking batters.  Let’s call this component WERA.  Once again using the theory of four total bases per earned run, I can calculate the runs given up by walks.  Like before, these runs are then inputted into the standard ERA formula to output another component ERA.  The best and worst ten pitchers of 2012 at eliminating runs via the walk are:

Pitcher
WERA
Pitcher
WERA
Cliff Lee
0.30
Ricky Romero
1.31
Bronson Arroyo
0.39
Edinson Volquez
1.29
Joe Blanton
0.40
Ubaldo Jimenez
1.21
Scott Diamond
0.40
Tim Lincecum
1.09
Kyle Lohse
0.41
Aaron Harang
1.06
Tommy Milone
0.43
Yu Darvish
1.05
Wade Miley
0.43
Matt Moore
1.03
Clayton Richard
0.43
C.J. Wilson
1.01
Mark Buehrle
0.44
Justin Masterson
0.96
Dan Haren
0.48
Tommy Hanson
0.91
 
If I sum these two components, I get an initial estimate of how dominate a pitcher was this season.  I have yet to adjust for defense though.  Since I was interested in runs in this study, I used Defensive Runs Saved (DRS) as the metric for adjustment.  Taking a team’s total Defensive Runs Saved for the season and dividing by the total innings pitched by a team gives me theoretically the Defensive Runs Saved per inning.  Multiplying this by the innings pitched by a pitcher gives the theoretical runs saved while a pitcher was on the mound.  Once again, I took the runs saved and filled it into the standard ERA formula to give a component for calculation.  It is worth noting that some of these values are negative and indicate poor defensive performance.  The summation of the three components outputs a subtotal for estimated ERA, or eERA.  As is done with the FIP calculations, a constant is added to make the average eERA equal to the average ERA.  Using this metric, the best and worst pitchers from 2012 are: 

 
Pitcher
eERA
Pitcher
eERA
Justin Verlander
3.10
Ricky Romero
5.51
Gio Gonzalez
3.16
Tommy Hanson
5.40
Clayton Kershaw
3.35
Ervin Santana
5.39
R.A. Dickey
3.39
Dan Haren
5.25
David Price
3.40
Ivan Nova
5.24
Lucas Harrell
3.56
Henderson Alvarez
5.11
Kyle Lohse
3.57
Tim Lincecum
5.02
Chris Sale
3.58
Ubaldo Jimenez
4.93
Josh Johnson
3.60
Mike Leake
4.93
Jordan Zimmermann
3.60
Bruce Chen
4.88
 
The natural question to ask at this point is how well does eERA estimate ERA, and how does it compare to other ERA estimators?   


A strong correlation seems to exist between eERA and ERA, but how does this compare to other more widely accepted ERA estimators?  First, let’s look at how well FIP estimates ERA.  It is worth noting that all the following statistics were adjusted so that the average ERA, eERA, FIP, tERA, and SIERA of the 88 pitchers used in this study were equal.

 
 
 

As you can see, a strong relationship exists when using either eERA, FIP, or tERA.  The linear correlation goes down considerably when we use SIERA, which is surprising as it is widely considered to be a better estimator than tERA.  Of all the data presented though, eERA shows the strongest correlation.  There is not a large difference between eERA and tERA.  If you remove the high outlier on the tERA near 6.00 (Jeremy Guthrie), the correlation increases to 0.6329, which is still weaker than eERA.  Admittedly, this metric is not perfect, but what metric truly is?  I welcome feedback on the information I have presented here.  With the Cy Young winners yet to be announced, it will be interesting to see if Justin Verlander and Gio Gonzalez actually take home the prizes after leading their respective leagues in eERA.  Bill James and Rob Neyer’s Cy Young Predictor currently lists Verlander as the fourth best candidate in the American League and Gio Gonzalez as second in the National League.  The favorites by that metric are David Price and R.A. Dickey, who would be second and third in their leagues respectively by eERA.

 
--Stats All Folks