Uncategorized

Proven Closers: Further Debunking of a Worn-out Narrative

 

Throughout the MLB’s long and storied history, narratives have defined fans’ enjoyment of the game as well as many teams’ front office strategies and teambuilding. From the fan perspective, these stories build the mythology and lore of the sport. But from the team perspective, these unbacked assumptions can often be harmful and limiting. One of the best examples of this is the long-held belief that only a special, “mentally tough” reliever is capable of handing the stress of finishing a game. These “proven closers” supposedly have the guts and experience to hold up under the immense weight of securing a save. This assumption has finally started to be questioned  in recent years as advanced analytical front offices have conducted rigorous testing on many of these ancient baseball narratives. This can be easily seen by doing a simple glance at who successful current teams trust with the role compared to ten years in the past.

For this study, I used players who had at least 75 saves in the years preceding the chosen year as my proven cutoff. These pitchers would have recorded enough experience to allegedly calm their nerves and earn the title of proven closer in the public baseball lexicon. During the 2010 season, 60 percent of the league’s top 15 teams had these guys finishing games. That’s lower than the previous decades, but is huge compared to this season: In 2019 only 40 percent of the top 15 teams fielded these wily, validated veterans. While some are top pitchers in the league, including the likes Aroldis Chapman, Craig Kimbrell, and Sean Doolittle, the other group is just as effective. Inexperienced guys like Luke Jackson, Josh Hader, and Taylor Rodgers have taken the league by storm, and while they often are not as publicly recognized, they are often just as effective.

This, however, is all just anecdotal evidence. For the rest of this paper, I will evaluate whether this recent strategical change is valid and if being a proven closer really does make you more qualified to man the most continuously stressful position in the game.

The first test I will be running is a general comparison between the “proven closer” and the not-proven (which I’ll call the “young’uns”) for 2010 and 2019. One interesting part of this comparison is that these relievers over their careers have each played both roles at certain points. Due to this, you obviously can’t just take career stats for guys with 75 saves and those without. This can be easily solved by taking each season as its own individual data point. This will cause players like Mariano Rivera to have around 15 data points in the proven closer bucket, meaning he will have a larger impact on the end results than, for example, Luke Jackson. But this is fair, as his sample is much larger and deserves to have higher weight than a guy who has pitched only 30 innings in save situations.

After I built the aforementioned datasets, I took weighted averages of the two groups’ performance in save situations. The statistics taken into account were ERA, WHIP, K/9, and OPS allowed. I would have preferred to use some different metrics, but the ease of Baseball Reference’s save situations splits led me to use their numbers, which should be more than fine for this exercise. I then took the means of each of the previous statistics for both player buckets. This was used to run Welch’s T tests, a statistical method which tests differences in datasets that have different sample sizes. The results for every stat were pretty comparable across the board and actually gave some good insights. I did expect the proven closers’ numbers to look relatively similar to the “young’uns” numbers, but with maybe slightly better results due to the higher weight on a few very good players, like Rivera and Hoffman. But what I found was the young’un group significantly outperformed their proven counterparts on all stats across the board. ERA, which is probably the most important stat I tested, had the young’un group coming in at a 2.95 ERA in save situations compared to a 3.09 ERA for the vets. This may not seem like a lot, but when this data is tested, that result ended with a .21 p value when the null hypothesis used was the proven group having a lower ERA. In more layman’s terms, this means that in this dataset, it is very unlikely that this difference was just random variation and that having 75 career saves does not lead to a lower ERA in save situations. This discovery was consistent among the other statistics. While this provides some evidence to prove the irrelevance of the proven closer motif, these results could have resulted from other biases in the dataset, including that the young’uns are, well, younger, and that most guys who recently entered the closer spot are playing at the top of their game. This makes it necessary to conduct further testing if we want to say more confidently that the “proven closer” is myth.

One of the main issues with doing overarching quantitative analysis on this subject is the biases created by the uneven opportunities given by teams. In other words, playing time is not randomly generated for the players in the league. Teams are trying to win, and it obviously hurts this goal to have subpar pitchers on the hill during the highest leverage part of the game. To account for this, we need to create a baseline talent for each player. That way we would isolate pitching in the 9th inning as the only variable. To do this, I compiled the statistics for each player in the previous datasets for both save and not save situations. If having 9th inning experience makes an actual difference, the proven closers bucket would show a much larger negative delta when compared to the young’uns.

 

The results were as follows:

ERA Save SIt ERA Non-Save Sit OPS Save SIt OPS Non-Save Sit
Proven Closer 3.11 3.39 .636 .663
Young’uns 3.65 3.75 .683 .705

 

As you can see the differences, while pretty similar, are larger for the proven closer group. This slightly points towards proven closers having an edge. But this is not as significant a difference as the previous study when taking into account the smaller sample size from not repeated players. This makes an interesting counterargument to the previous point and definitely requires further testing.

The last statistical method I used incorporated predictive modeling into the equation. This could potentially add an extra layer of noise to the study, but I believe it’s worth it if you take that into account.  My idea was to create a usable model that predicts a closer’s save situation ERA from a variety of different inputs, but excluding everything that has to do with experience in the role. A few examples are innings pitched, saves, and any counting stat. From this starting point, I was able to build an OK but admittedly limited multiple linear regression model with various rate based inputs ranging from FIP, to K%, to HR/9. My final model ended with a mean absolute error of .4 on the validation datasets, which means it on average missed its target by a distance of .4 ERA.

After this, I input my two proven and unproven data buckets and evaluated the error metrics for each. What I was looking for was the model to predict the proven players bucket significantly worse than they actually were, and the unproven players significantly better. This would show that there was some hidden skill not accounted for in the model on top of just random noise. While we can’t say for sure what that skill would be, I attempted to design a model that would make that the most likely missing piece.

The results were the exact opposite. The model predicted the unproven guys to be worse than they actually were while the proven guys to be better. This easily could have been just variation, so I decided to run another Welch’s T Test to evaluate how significant the difference in error was. It was very confident that the mean error did not predict better results than actually seen for the proven guys and worse results than actually seen for the unproven guys. This was evident by a .9997 p value, which is a very significant number. Of course, this has to be taken with a grain of salt due to the previously mentioned issue of the added noise from my not-perfect predictive model. Nevertheless, this level of certainty is good evidence and backs the idea that experience in the 9th isn’t an important factor.

In conclusion, my quantitative studies, for the most part, back the industrywide trend of no longer relying on the archetypal “proven closer”. The abundance of 9th inning experience does not seem to make any significant, quantifiable difference. While the second test conducted didn’t back this claim, the other two studies more than did. This makes sense. A good pitcher is a good pitcher. And while it is difficult to deal with stress when you haven’t before, these are professional athletes who play on the biggest stage in the world. They deal with immense stress every day. In order to get to this level of excellence in this failure-based game, they have to be mentally tough. No matter how many career saves you have, pitching the 9th inning of a close game with playoff applications is just another day at the office.

Uncategorized

Old vs New and the Pitcher Development Battle

In baseball today, the importance of player development is no longer debatable. Everyone has rallied around the necessity of impactful training. Instead, battles are now erupting over which skills are crucial to develop in young players.

Pitchers are the focal point of these debates as new-school biometric training facilities have revolutionized the process by concentrating on improving spin, mechanics, and velocity. This strategy has been met with harsh counter claims from older-school coaches who emphasize repeatable mechanics, control, and pitch sequencing over the “glamour” skills that the newer biometric companies covet.

This battle has been waged in MLB front offices for years, and at this point, the new training seems to be winning. This past off-season, biometric data-driven coaches were brought in by numerous franchises. These guys are touted as the future of player development. But are the new skills that they focus on any more strongly positively correlated with MLB success than the older school skills? In this analysis, we will dive into the data and find out which underlining skills are the most crucial to big league success and whether these new school guys are in fact correct.

As you can probably guess, each individual skill will have a very low correlation to overall success. That’s because pitching is a complex conglomerate of skills that can be combined in myriad ways and still be very successful. There is no perfect mix. With that being said, when you compare each skill, you can still see which ones as a whole contribute most to the overall package and are, in general, the most crucial. The skills I chose to test are as follows, generically grouped into old school and new school:

Old School

Ability to locate pitches: Measured by BP’s CMD Metric

Ability to change speeds: Measured by velocity drop off between primary fastball and off-speed

A mixed arsenal: Measured by breaking ball, off-speed, and fastball percentages of pitches thrown.

New School

Fastball Velocity: Measured by highest average fastball velocity

Fastball Spin Rate: Measured by average RPM for 4 primary fastball

Slider Spin Rate: Measured by average Rpm for all pitchers who threw the offering at least 5% of the time

Curveball Spin Rate: Measured by average Rpm for all pitchers who threw the offering at least 5% of the time

The first step was isolating each variable’s effect on success. I quantified success as MLB ERA instead of more pitcher isolation based metrics like Fip. I did this for two reasons. The first is that Fip and its follower stats tend to be biased toward pitchers who lean towards the more new-school approach. This is due to the old -school approach’s emphasis on generating weak contact, which FIP factors out completely. Using ERA puts both skillsets on relatively the same playing field, even though it may give pitchers too much credit for weak contact in turn, slightly helping out the old school. Second, I wanted to measure the most basic definition of pitcher success: limiting runs. This keeps it simple and is easily understood by the general public.

After running simple linear regressions for each of my 9 variables (I split arsenal into 3 parts for the regressions) it became clear that some of the variable had zero or almost zero impact on ERA when isolated. These included some that I assumed beforehand — like breaking ball percentage thrown and changeup percentage thrown — but also more interesting discoveries, including command and ability to change speeds. Here are the plots:

 

Both of these highly touted old-school skills failed the correlation test, having a basically 0 correlation coefficients and flat slopes, meaning as they get better ERA doesn’t follow suit. This might have to do partially with sampling bias as only data from the last two season of Statcast is publicly available. This limits my sample to the current game, which has trended away from maximizing these skills. Even so, this level of separation from ERA is very noteworthy and should be taken into account. The data is not saying these skills are not entirely unimportant, as they are good auxiliary skills, but just that they alone aren’t enough to drive success.

Next, I will delve into what skills my analysis found most predictive of MLB success. These were, as new school advocates already are aware of, fastball velocity and fastball spin rate, followed by slider spin.

Each of these skills still may seem to have a small R^2 at .08, .078, and .023 respectively but when isolating a single trait, the first two are about as good as you can ask for. Just like you wouldn’t have a shot at telling me a player’s ERA if you just knew he threw 93 MPH, the computer can’t really tell, either. But the computer does have a better shot at it with those two skills than any other I measured, by a wide margin. Another important finding about these three skills is that they each have relatively steep negative slopes meaning as they increase, ERA will fall with them. Velocity and spin rate have been buzz words in baseball for years now and this is just more backing for them to gain further influence in the future.

Now that we’ve gotten through the breakdown of methodology and explanation of my backing, here’s the ranking of each skill, from most important to least important based on their correlation and slope:

  1. Fastball Spin Rate (pushed ahead by a slightly steeper slope)
  2. Fastball Velocity
  3. Slider Spin Rate
  4. Fastball Percentage Thrown
  5. Curveball Spin Rate
  6. Command
  7. Fastball-Changeup Velocity Delta
  8. Breaking Ball Percentage Thrown
  9. Changeup Percentage Thrown

It’s worth noting that after Slider Spin Rate all other variables have basically a zero effect by themselves.

As you can probably tell from the individual skills analysis, old school seems to be at a clear disadvantage. But as they always like to preach, it’s the total package that makes a player. To account for this, I used the new/old skill groupings listed above and ran multiple linear regressions for each. This basically means the computer took into account each group’s variables together and measured the relationship between them and ERA. The results were well, not that surprising: Old school got crushed again. The correlation coefficient for the Old school group was relatively tiny for multiple variables, .024, about the same as just slider spin rate. The predicted value scatter plot also shows this as the computer had no idea how to place anything and just threw everything around the mean to hedge its losses.

The new school group fared much better, having a pretty strong correlation, all things considered, at .115. The plot also showed this with a more accurate, spread out distribution and less severe errors.

The combination of the individual skill evaluations and the groups clearly show that the new-school training regimes are focusing on the more data-backed skills. This finding is no surprise as one of their main selling points is embracing data and implementing it in a useful way. While this work does show the success of these traits in the MLB, what it doesn’t take into account is whether these skills can be taught and whether they contribute to increased injury risk, two big complaints from skeptics. These might be covered in later pieces but I thought them important to mention here as well.

This analysis may seem to completely write off the skills of command, changing speeds, and mixing your pitches that old school baseball loves to glorify. But these skills absolutely have their place. As secondary skills, they are needed along with the other abilities but in general, they can’t hold by themselves. Maybe some guys can get by with just location and changing speeds, but if you are forced to choose one of the two sets of skills, the data shows you should pick new school.

Fastball velocity and pitch spin have been the main drivers of success. If you can’t hit the broad side of the barn with your pitches, they’re obviously a moot point. But studies like mine have repeatedly shown they are crucial to pitching in today’s game so they should be a focus of player development.

 

 

Uncategorized

Preview my sabermetric literature book-in-progress

Hi everyone – for the past fifteen or so years I’ve been working on writing a book length manuscript with the goal of summarizing as much of the sabermetric literature that I have been able to find.  I have finished versions of the first three chapters.  The first is an introduction directed mostly to those with little background in sabermetrics.  The second reviews work on the progress of the inning, plus a bit on the progress of the game.  The third covers material on situational factors.   All are available at the following website:

https://charliepavitt.home.blog/

I have also included a table of contents including all of the projected content.

I hope many of you find this helpful in your work.  If you want to look at any of the sources I have referenced, I have copies of most of them that I can send to you.  Please contact me at chazzq@udel.edu if you have any comments or suggestions for corrections that you think I can make in updated versions of the chapter.  I am particularly interested in any relevant material I am not aware of that you can send me.

I hope to have the fourth chapter, on strategy, completed in the next month or two.

Charlie Pavitt

Uncategorized

What a Drag: A Follow-up

Last week I wrote a post showing that there has been a sharp reduction in star players who have passed their 35th birthday. There was a lot of discussion about this on Twitter and elsewhere, mainly focusing on the likely explanations for my data. Most people seemed to believe the largest cause for this trend is “PED testing.” This might be correct, but I was trying to leave the speculation out of it and try to focus on what the data says.

A few people suggested that I should present the data for WAR/PA, rather than just total WAR for each age. I use WAR in studies like this (I have done many such studies showing contributions broken down by race) because I don’t want the replacement-level players to swamp the data. Which they would.

In the 1988-2017 period (30 years), there were 35,913 player-seasons. Here is a plot showing the annual average age, giving all players equal weight.

Untitled4

The same rise and fall shows up here as I showed last week. With over 1200 players every season, a drop in average age of 0.8 years in the past 12 seasons is fairly dramatic.

Of this huge pool of seasons, 70% of them are fewer than 1.0 WAR, which are (roughly speaking) replacement level. In fact, if you combine these 25,012 seasons together, they sum to less than 0.0 WAR (there is more negative value in this cohort than there is positive value).

To this end, I will “bin” the rest of the data.

Replacement (less than 1.0 WAR): 25,102 total seasons, 69.6%.

Useful (1.0 <= WAR < 3.0): 7210 seasons, 20.1 %

Good (3.0 <= WAR < 7.0): 3401 seasons, 9.5%

Great (7.0 <= WAR): 290 seasons, 0.8%

These bin choices are mostly arbitrary—Tom Tango specifically asked on Twitter whether there are fewer “old” players between 1 and 3 WAR, so I thought I might as well created a few other bins.

Now I will just show the average age of the players in each of these bins.

Untitled5

 

For the best cohort (shown in blue) I combined the “good” and “great” seasons, meaning that the line shows all seasons of at least 3.0 WAR. I do this because there are relatively few great seasons, and the “great” line becomes somewhat meaningless.

Although all four cohorts show the same rough trend, the replacement players tend to be younger (at least until recently), and the average age of the good and great cohorts both drop fairly dramatically between 2005 and 2009.

When added to the post from last week, it seems clear that the contributions of older players has shrunk dramatically in the past decade, and this is true across all levels of quality.

Finally, there was some speculation that the data I showed was partly due to teams deliberately playing younger players (to save money).  Its strikes me that the players most likely to be affected by salary-based attrition would be the replacement level players, but this is the part of the roster that has aged the least.  With the important caveat that teams do not know — in advance — how good their players are going to perform, it does not seem as if they are deliberately employing young players any more than they should.

 

 

Uncategorized

Free Psychometric Scouting Webinar

SABR Friends,

This next Thursday, Sept. 20th,  I will be co-hosting a webinar entitled, The Mindset Science of Playing at the Next Level. It is sponsored by my new venture, Diamond Scouting, Psychometrics. Todd Thomas, Diamond’s Director of Scouting, will be hosting the event.

The many benefits of psychometric scouting and player evaluation will be covered. We will explain how quantifying player makeup, mindset, and instincts can easily identify future All-Stars, that could be overlooked or missed altogether.

The Webinar is free and I would be honored to have the SABR Statistical Analysis Committee join in and share in the discussion. Please pass the word. I am looking forward to having you and the members join us. Here is the link to register, https://t.co/tj4EOuVT5K.

Hope to see you there,

Bill Bagley
SABR Member
Psychometrician

Uncategorized

Tom Ruane: Fun with Retrosheet Data

Yesterday Tom Ruane posted a note to the Retrosheet mailing list about his latest research: teams who score the most (or the fewest) runs with a specific number of hits.

You can read Tom’s most recent articles here.

Tom has been a Retrosheet volunteer and board member for many years, and over the past decade has written dozens of articles on things he has gleaned from Retrosheet data.

You can see Tom’s archive here.

WARNING: his articles are very addictive.

 

 

 

 

Uncategorized

What A Drag It Is Getting Old

front_6400435Sometime earlier this summer I got to thinking about Miguel Cabrera, and how sad it was that he—like Albert Pujols—had fallen from his rightful and longtime place as one of baseball’s best hitters. Pujols signed a 10-year contract with the Angels after the 2011 season and had his last 4-win season (using bWAR, Baseball-Reference.com’s WAR) in 2012 at age 32. Cabrera put up a great season at age 33 in 2016 followed by two seasons of mediocrity or injury. The Tigers still owe him $154 million for the next five years.

Although their declines seemed inevitable, I got to wondering if players weren’t aging as well as they had 20 years ago. There didn’t seem to be as many good old players as there used to be. I decided to try to figure it out.

Dan Levitt helped me gather the data I needed, namely all the bWAR in major league history broken down by year and by the age of the player who accumulated it. This was enough to answer my questions.

The following chart shows the percentage of WAR contributed by players of every age between 19 and 43. (Younger and older ages, with comparably minimal values, have been removed for simplification.) As usual, a player’s “age” is his age on June 30 of the year in question.

Untitled

From 1876 through 2017, 27-year-old players accumulated 8785 WAR, which is 10.19% of the all-time total. This has been the “most valuable” age, but the surrounding years have been comparable – ages 25 to 29 make up nearly half — 48% — of the all-time value. The basic shape of this chart is likely no surprise.

The sum of these age bars will necessarily total 1.0. What I am mainly interested at the moment is the right part of this graph – the older players. Have there been fewer good old players in recent years? For the rest of this paper, I will use the term “old players” to mean “players age 35 or older”. Historically, these “old players” have produced 7.5% of the value in the major leagues.

To simplify things I am going to look at the data since 1968 – 50 years. The next chart shows, for each of these seasons, the percentage of WAR that were accumulated by old players.

Untitled2

You can see that old player value has been in free fall in recent years. In 2017, age 35+ players accumulated 15.9 WAR in total, just 1.6% of the value in all of baseball. By percentage, this was the smallest total since 1877 when the major leagues were just getting started.

Although most of the annual percentages of WAR by old players falls between roughly 4% and 8%, there are a few exceptions.

There was a brief upsurge of old value in the early 1980s, peaking at 12% of the majors in 1982. Who were these old players? The following table lists all of the age 35+ players who attained 4.0 WAR in 1982:

1982 Player Age WAR
Steve Carlton 37 6.1
Joe Niekro 37 6.1
Al Oliver 35 5.3
Joe Morgan 38 5.1
Jim Palmer 36 4.8
Rod Carew 36 4.7
Tommy John 39 4.2
Hal McRae 36 4.1

Although there are a few position players here, this period was notable for its old pitchers; besides those in this table, players like Don Sutton, Phil Niekro, Tom Seaver, and others had fine “old” seasons in surrounding years.

A much larger and more sustained period of old age success came in the 1998-2007 period, with both its arrival and disappearance happening fairly suddenly.

In the 2002 season old players accounted for 14.2% of all big league value, the highest total since World War 2 upended major league rosters. The list of old players who had 4.0 WAR that season:

2002 Player Age WAR
Barry Bonds 37 11.8
Randy Johnson 38 10.5
Curt Schilling 35 8.5
Larry Walker 35 6.1
Jamie Moyer 39 5.6
Kenny Rogers 37 5.0
Greg Maddux 36 4.6
Rafael Palmeiro 37 4.5

The decade beginning in 1998 not only had an impressive collection of good old players, it also had a lot of GREAT old players. Here is a count of 7+ win seasons for the last three 10-year periods:

1988-1997 1
1998-2007 16
2008-2017 1

The best season (by this measure) put up by an old player in the 1988-1997 period was by … Ed Whitson in 1990.  I am not making this up.

As of today, the last old player to accumulate 7 WAR was Chipper Jones in 2008. It will not happen in 2018.

Here is a list of the 4.0 WAR players in 2017:

2017 Player Age WAR
Nelson Cruz 36 4.1

Cruz just squeaked over the the line, and the only other old player over 3.0 was Adrian Beltre at 3.6.  It is getting tough out there.

The season is not yet over, but I can say with confidence that the only 2018 qualifier will be Justin Verlander, who sits at 5.0 as of September 11.

Another way to show this obvious trend is to look at the average annual age, weighted by WAR. Every player’s WAR value is multiplied by his age, and then divided by the total WAR in the big leagues that season. Again using the past 50 seasons:

Untitled3

The average age, weighted by WAR, has recently been at its lowest level in 40 years.

 

Setting aside the reasons for the exceptional aging that went on between 1998 and 2007, which has been debated to death, it is interesting to wonder why players are not aging as well today. The most common answer will be “steroids testing,” but could there be other causes?

Could the current game, which values velocity in pitchers and the ability to hit velocity in batters, be more of a “young man’s game”? Could pitching velocity be causing more injuries and therefore shorter careers? Could an historical crop of young stars – Mike Trout, Mookie Betts, Francisco Lindor, Carlos Correa and many more – be temporarily skewing the data?

All of this and more could be true.

How should this effect how the game is managed? In the past 40 years, many or most of the “bad free agent contracts” have come about because teams have signed 30-year-old players and expected them to keep playing the way they played over the previous five seasons. The historically great aging that took place in the 1998-2007 period might have convinced teams that things had changed, that they could finally sign 30-year-olds with confidence. Oops.

The baseball salary system favors older (read: declining) players. Generally speaking, if a player reaches free agency after his Age 29 season, around 70% of his career is likely to be behind him. His first team likely got his entire prime (ages 25-29) at relatively low cost, and the team that signs him will get his expensive, declining seasons.

In the recent off-season there was some controversy because several free agents did not get offers well into the winter. The players claimed collusion, which is certainly possible. But it also could be a (belated) collective understanding of how players are aging. The “solution” to this problem, for the players, is a salary system that rewards players in their 20s instead of in their 30s.

 

A Follow-Up post.