The Lines Behind Baseball

It's late October, which means we're in the midst of another exciting World Series. This year features the (local) Boston Red Sox against the St. Louis Cardinals, who last faced off in 2004, when the Sox swept the Cards in four games.

Photo credit: Jeff Curry, USA Today Sports

Baseball has been changing in recent years. It used to be that when you went to a ballpark, the large display boards would show the usual player statistics: home runs, runs batted in, and batting average. These days, you see stats floating around like "OPS" and "WAR." What's going on?

If you've read Moneyball (or if you've seen the movie), then you probably have a good idea. Baseball is a game of skill, but also of chance. And just like with the weather or the stock market, by collecting lots of information (or "data") about baseball players and teams, you can use statistical methods to see patterns.

Here's the key question from Moneyball: What's the most important statistic for evaluating baseball players? Is it how many home runs they hit, how often they get on base, or something else entirely? The idea is that some stats, like how often a batter gets hit by a pitch, don't really matter too much, and won't change a team's chances of winning. Other stats, like home runs, probably increase a team's chances of winning.

So let's look at these two stats more closely with graphs. On the x-axis, let's plot the total number of HBPs (the number of times batters were Hit By a Pitch) for every baseball team over the last three seasons. And on the y-axis, we'll plot those teams' win percentages for each season. So 30 teams over 3 seasons gives us 90 total points in our graph.


What do you notice about this graph? There doesn't seem to be any strong trend. Next, let's look how many more home runs each team hit than its opponents. For example, this past season the Red Sox hit 178 home runs, but their pitchers gave up only 156 home runs. So the Sox hit 22 more home runs than their opponents in 2013. The San Francisco Giants, on the other hand, hit 107 home runs, but gave up 145 home runs. So the Giants hit 38 fewer (or -38 more) home runs than their opponents. Let's see how a team's "home run differential" compares to its win percentage:


This graph looks a bit different from the one that used HBPs. Now there's a clearer trend: this graph looks more like a line! Teams that hit more home runs than their opponents are more likely to win a greater number of games. (Now this graph doesn't tell you if it's the home runs that make a team win games, or if winning games is making the team hit more home runs. We'll let you think about which of these is more likely.)

Using a statistical technique called regression analysis, we can find the line that best fits our data points:


According to the slope of this red best-fit line, for every additional home run a team hits (or that its opponents do not hit), you would expect that team to win about 0.27 additional games. And so for every additional 10 home runs a team hits, you'd expect it to win an additional 2.7 games. In reality, teams can only win a whole number of games, but the best-fit line gives you a sense of how important each home run is.

These graphs show that the number of home runs players hit is more important than how many times they got hit by pitches. That result may not be too surprising. But what about other stats? Let's also look at four more advanced stats (if these don't make a lot of sense, then don't worry):
  • AVG (batting average): The fraction of the time a player gets a hit. Walks and HBPs don't count.
  • OBP (on-base percentage): The fraction of the time a player gets on base. It's a lot like batting average, but includes walks and HBPs.
  • SLG (slugging): The average number of bases a player reaches when they come to the plate. Singles count as 1 base, doubles as 2, triples as 3, and home runs as 4. As with AVG, walks and HBPs don't count.
  • OPS (on-base plus slugging): Take a player's OBP and SLG, add them together, and that's OPS.
Each one of these advanced stats seems more complicated than the last. But here's why they're important: not only can regression analysis show you what the best-fit line is, it can also tell you how close your data is to the line. "Correlation" (often represented by the letter r) is very close to zero for data that's not linearly related. For data with a strong positive correlation, meaning the data is very close to a best-fit line with a positive slope, r is very close to +1. And for data with a strong negative correlation, r is very close to −1.

Here are the correlations for the different stats:
  • HBP vs. win percentage : r = 0.215
  • Home runs vs. win percentage: r = 0.746
  • AVG vs. win percentage: r = 0.779
  • OBP vs. win percentage: r = 0.876
  • SLG vs. win percentage: r = 0.892
  • OPS vs. win percentage: r = 0.914
HBPs have by far the weakest correlation with winning among this group, while OPS has the strongest correlation. There's also a sizable jump in correlation between AVG and OBP (there are whole scenes with Brad Pitt and Jonah Hill in the Moneyball movie debating AVG and OBP). Here's the graph of OPS  vs. win percentage:


As you can see, the data points are all pretty close to their best-fit line line. Of all the stats we've looked at here, OPS is the most strongly correlated with winning. And that's why a player's home run total and batting average just aren't as important these days. The players with the highest OPS are the ones who are winning awards and getting the biggest contracts.

Baseball statistics, also known as sabermetrics, is an ongoing field of study. Over the last two years, WAR, which stands for Wins Above Replacement player, has become the hot new stat, and it has an even higher correlation with winning.

2 comments:

  1. We should inspect the upsides and downsides of each sort of bat because of their organization. axialsports.com

    ReplyDelete
  2. The Old Man At The Baseball game was useful for reviving the troops behind whatever group the fans were pulling for, it didn't make a difference to him long as the diversion was a delight to the general population getting a charge out of the amusement. softballbatbuddy.com bat reviews

    ReplyDelete