The Lines Behind Baseball

It's late October, which means we're in the midst of another exciting World Series. This year features the (local) Boston Red Sox against the St. Louis Cardinals, who last faced off in 2004, when the Sox swept the Cards in four games.

Photo credit: Jeff Curry, USA Today Sports

Baseball has been changing in recent years. It used to be that when you went to a ballpark, the large display boards would show the usual player statistics: home runs, runs batted in, and batting average. These days, you see stats floating around like "OPS" and "WAR." What's going on?

If you've read Moneyball (or if you've seen the movie), then you probably have a good idea. Baseball is a game of skill, but also of chance. And just like with the weather or the stock market, by collecting lots of information (or "data") about baseball players and teams, you can use statistical methods to see patterns.

Here's the key question from Moneyball: What's the most important statistic for evaluating baseball players? Is it how many home runs they hit, how often they get on base, or something else entirely? The idea is that some stats, like how often a batter gets hit by a pitch, don't really matter too much, and won't change a team's chances of winning. Other stats, like home runs, probably increase a team's chances of winning.

So let's look at these two stats more closely with graphs. On the x-axis, let's plot the total number of HBPs (the number of times batters were Hit By a Pitch) for every baseball team over the last three seasons. And on the y-axis, we'll plot those teams' win percentages for each season. So 30 teams over 3 seasons gives us 90 total points in our graph.

What do you notice about this graph? There doesn't seem to be any strong trend. Next, let's look how many more home runs each team hit than its opponents. For example, this past season the Red Sox hit 178 home runs, but their pitchers gave up only 156 home runs. So the Sox hit 22 more home runs than their opponents in 2013. The San Francisco Giants, on the other hand, hit 107 home runs, but gave up 145 home runs. So the Giants hit 38 fewer (or -38 more) home runs than their opponents. Let's see how a team's "home run differential" compares to its win percentage:

This graph looks a bit different from the one that used HBPs. Now there's a clearer trend: this graph looks more like a line! Teams that hit more home runs than their opponents are more likely to win a greater number of games. (Now this graph doesn't tell you if it's the home runs that make a team win games, or if winning games is making the team hit more home runs. We'll let you think about which of these is more likely.)

Using a statistical technique called regression analysis, we can find the line that best fits our data points:

According to the slope of this red best-fit line, for every additional home run a team hits (or that its opponents do not hit), you would expect that team to win about 0.27 additional games. And so for every additional 10 home runs a team hits, you'd expect it to win an additional 2.7 games. In reality, teams can only win a whole number of games, but the best-fit line gives you a sense of how important each home run is.

These graphs show that the number of home runs players hit is more important than how many times they got hit by pitches. That result may not be too surprising. But what about other stats? Let's also look at four more advanced stats (if these don't make a lot of sense, then don't worry):
  • AVG (batting average): The fraction of the time a player gets a hit. Walks and HBPs don't count.
  • OBP (on-base percentage): The fraction of the time a player gets on base. It's a lot like batting average, but includes walks and HBPs.
  • SLG (slugging): The average number of bases a player reaches when they come to the plate. Singles count as 1 base, doubles as 2, triples as 3, and home runs as 4. As with AVG, walks and HBPs don't count.
  • OPS (on-base plus slugging): Take a player's OBP and SLG, add them together, and that's OPS.
Each one of these advanced stats seems more complicated than the last. But here's why they're important: not only can regression analysis show you what the best-fit line is, it can also tell you how close your data is to the line. "Correlation" (often represented by the letter r) is very close to zero for data that's not linearly related. For data with a strong positive correlation, meaning the data is very close to a best-fit line with a positive slope, r is very close to +1. And for data with a strong negative correlation, r is very close to −1.

Here are the correlations for the different stats:
  • HBP vs. win percentage : r = 0.215
  • Home runs vs. win percentage: r = 0.746
  • AVG vs. win percentage: r = 0.779
  • OBP vs. win percentage: r = 0.876
  • SLG vs. win percentage: r = 0.892
  • OPS vs. win percentage: r = 0.914
HBPs have by far the weakest correlation with winning among this group, while OPS has the strongest correlation. There's also a sizable jump in correlation between AVG and OBP (there are whole scenes with Brad Pitt and Jonah Hill in the Moneyball movie debating AVG and OBP). Here's the graph of OPS  vs. win percentage:

As you can see, the data points are all pretty close to their best-fit line line. Of all the stats we've looked at here, OPS is the most strongly correlated with winning. And that's why a player's home run total and batting average just aren't as important these days. The players with the highest OPS are the ones who are winning awards and getting the biggest contracts.

Baseball statistics, also known as sabermetrics, is an ongoing field of study. Over the last two years, WAR, which stands for Wins Above Replacement player, has become the hot new stat, and it has an even higher correlation with winning.

How to Sail Upwind (with Trigonometry)

Up here in Boston, you'll see a lot of sailboats out on the Charles river in the fall. (We also just hosted the Head of the Charles, a major annual rowing event.) In sailing, there are all sorts of terminologies and rules, with words like tacking, jibing, and beating. Sailboats can travel upwind, which is pretty amazing when you think about it. But they can't travel completely against the wind -- they "beat" the wind by traveling at a slight angle to the wind. What's going on here?

Let's start off with what a sailboat looks like:

The boat is headed in one direction, its sails are facing a different direction, and there's wind blowing in some third direction (although you can't actually see the wind in the picture). Using the angles between these three directions, and some trigonometry, we'll discover how boats can actually sail upwind.

To help us out with the math, let's draw a simplified version of a sailboat, from a top-down perspective (see the picture below). Suppose that the wind (with strength W) is blowing in a particular direction, that the sails are set an angle θ from the wind, and that the boat is facing a direction that's an additional angle φ from the direction of the sails.

Because the sails are set at an angle from the wind, they won't feel the full strength of the wind. Think of it this way: take a piece of paper, hold it so it faces you, and blow on it -- it will, of course, move. But if you blow on the paper's edge instead, you'll have a much harder time moving it. The same thing happens with sails, and only the perpendicular component of the wind will actually push the boat. Let's break the wind down into components that are parallel and perpendicular to the sails:

In the above picture, the red component of the wind is parallel to the sails, and won't push them at all. But the blue component is perpendicular, and will push the sails down and to the right. As you might know from our lesson on trig functions, if the wind is blowing with strength W, then that perpendicular component has strength Wsin(θ). So if the sails are parallel to the wind, they won't get any push bceause sin(0°) = 0, and if the sails are perpendicular to the wind, they'll get the full force because sin(90°) = 1.

Now sailboats can only travel in the direction they're pointing (that's what rudders and keels are for). So if a sailboat is getting a push, only the component of the push in the direction of the boat will actually move it. A strong push in the perpendicular direction, on the other hand, wouldn't move the boat, but could topple or capsize the boat. We said the push on the sails was Wsin(θ), but now we again break down this force into components to find the component that pushes the boat.

Because of the rudder, the boat can only move forward (or backward), but not sideways. So the red component of the push from the sails won't move the boat. The blue component, however, will move the boat. And again, using trig functions, the blue component has a length of Wsin(θ)sin(φ).

So if the sails are an angle θ from the wind's direction, and the boat is an angle φ from the sails, then the boat can actually travel upwind, with a force that's proportional to sin(θ)sin(φ). The angle between "upwind" and the boat is θ+φ, so if this sum is less than 90°, then the boat is "beating" the wind. But as these angles get smaller, sin(θ)sin(φ) also gets smaller. That means the more you try to sail directly against the wind, the slower you'll go. Typically, the furthest upwind a sailboat can travel is about 35° to 45°.

And one other thing -- we assumed here that the direction of the sails was between the direction of the wind and the direction the boat was facing. Compare these two pictures below:

On the left is our sailboat with the sails between the wind and the boat's direction. As we just discovered, this boat can "beat" the wind. But for the boat on the right, the sails are a greater angle from the wind than the boat is. We could carefully work through the trigonometry again to see what happens, and we'd find that having the sails on the other side of the boat is equivalent to replacing φ with −φ in our previous work. That means the wind is pushing the boat forward with a force that's proportional to sin(θ)sin(−φ), which, by the trig identities for negative angles, is equivalent to −sin(θ)sin(φ). But for typical angles of θ and φ, that's a negative number -- so the boat on the right is in fact being pushed backward by the wind! So if you intend to sail upwind, make sure the sails are always pointing between the direction the wind is coming from and the direction your boat is facing.

Another Monday, Another Release

We've just released an update to School Yourself Beta, with new lessons, updates to older lessons, and a host of additional features. Here's a sampling of what's new:

New lessons on trigonometry

We've added two new lessons on angles (both in degrees and radians) and trigonometric functions (what sine, cosine, and tangent mean, and what their graphs look like). These lessons include six new interactives, and here's a screenshot of one of our favorites, which comes about halfway through one of the lessons:

In this interactive, you can sweep through different angles in the unit circle. For each angle, the height (indicated by the red line) of the yellow point on the unit circle's circumference is the value of the sine function for that angle.

Updates to existing lessons

A major part of the Beta is improving the School Yourself experience with every release. So not only are we adding new content, but we're studying the user experience for every lesson, and listening to what our students are saying.

One requested update was that mixed fractions should be acceptable answers. So we fixed that. We've also updated the lessons on slope and linear functions (some users found the tick marks confusing when the questions were asking for algebraic expressions).

As we continue adding content to the Beta, we'll keep improving the content that's already there. So keep the feedback coming!

New features

Aside from accepting answers that are mixed fractions, we've also upgraded the "laser pointer" that appears in the lessons. The red dot moving around the screen was analogous to a cursor you might see in other online video content. Well, we replaced it with a new highlighter that fades, and early user testing suggests that this highlighter flows more seamlessly with the content.

What's next?

Right now we're hard at work on delivering the most expansive lesson we've ever built, covering a wide range of trigonometric identities. You'll get to work through the various proofs, and seamlessly jump between them when you need one to prove another. We'll let you know as soon as this lesson becomes available.

So that's a summary of what's new with the Beta. We'll let you know as soon as the next update is out. Until then, keep getting schooled!

School Yourself Beta has launched!

If you visited our site recently, you might have noticed a few changes. Our home page now links directly to our brand new learning platform (currently known as "School Yourself Beta"), which was recently featured in the Boston Herald.

The beta is different from all the other platforms out there (edX, Coursera, Khan Academy, Udacity, etc.). We've done away with the lecture and made the learning more interactive. You can "choose your own adventure," and decide how to proceed through each lesson. You can go straight to the interactives, or jump back to earlier lessons needed to get through the next challenge. And as you learn, the platform will make recommendations and adapt to your unique style.

Here's a screenshot of the beta:

This is what you would see on your first visit, so it's recommending the "Introduction" lesson right now, where you can get a feel for how the platform works. Here's something you might see if you try out the lesson on graphing lines:

If you figure out the answer, then you would go ahead and type it in. And if you weren't sure, then clicking on the "I'm not sure..." option seamlessly breaks the question up into smaller parts, guiding you toward the answer.

Our goal is to make math (starting with calculus) a seamless, engaging experience. We're building out the platform more and more every day, and we'll be regularly adding content. Coming soon are some lessons on trigonometry, one of which will guide you through the proofs of over a dozen trig identities, in a way that's far more interactive than reading Wikipedia entries or watching Khan Academy videos.

And because this is a beta, we'd love as much feedback as possible. Let us know what you think!