Pythagorean Regression

In baseball, everything that goes up must eventually come down. That holds true over careers, over seasons, and often over games. That’s why only the best players hit .400, why only the truly great teams win in October, and why teams that strand 15 runners in a game usually win. Everything that goes up comes down. Sabermetricians express this idea with the term Regression to the mean. This term was introduced by Bill James, one of the founding fathers of sabermetrics. Regression to the mean isn’t an earth shattering idea; it’s simply an observation that inflated things must return to their original state, in this instance stats. For example, Charlie Blackmon was hitting .400 through the month of April, a decent sample size, but not nearly enough to put him in the same category as Ted Williams. There’s no doubt that Blackmon is a solid hitter, but the rules still apply for him; he is now hitting .307 with a much heftier sample size of 277 at bats. Of course, regression to the mean is relative; it changes for every hitter based on his skill level. Blackmon is a decent hitter, so it’s reasonable to expect he’s sit around .300 for the rest of the year. Troy Tulowitzki, on the other hand, is a far better hitter, so it’s also reasonable to expect that he won’t regress as much as Blackmon. Regression to the mean can also work backwards; if a player of high skill level is hitting .200 in April, he can reasonably be expected to have a good a May or June. Now that we are in June, regression has already taken affect on all the players to a certain extent, so that the numbers that the players sit at now can be expected to be near their end of season numbers.

Of course, regression to the mean applies to team performance too. One of the ways we can measure team performance is by using the Pythagorean expectation stat, developed by (guess who!) Bill James, who named the stat because of its likeness to Pythagoras’ geometric theorem. The equation is this: win% = runs scored (squared)/runs scored (squared) times runs allowed (squared). This stat basically measures what a team’s winning percentage should be based on the their number of runs scored and runs allowed. As we said above, team records are subject to regression, and the mean can be seen as the Pythagorean record. Lets look at how we can reasonably expect teams to perform in the second half using Pythagorean expectation.

Team         Pythag. W-L   Actual W-L

Toronto 40-31 41-30
Yankees 31-37 35-33
Baltimore 33-35 35-33
Boston 33-36 31-38
Tampa Bay 29-41 27-43
Detroit 33-32 36-29
Kansas City 35-33 36-32
Cleveland 34-36 35-35
Minnesota 32-35 32-35
White Sox 32-38 33-37
Oakland 49-20 42-27
Angels 37-31 37-31
Seattle 38-31 35-34
Texas 30-39 34-35
Houston 31-40 32-39
Atlanta 34-34 36-32
Washington 38-30 35-33
Miami 36-32 35-33
NY Mets 34-35 31-38
Philadelphia 29-38 29-38
Milwaukee 37-33 41-29
St. Louis 37-32 37-32
Pittsburgh 33-36 34-35
Cincinnati 33-35 33-35
Cubs 32-35 28-39
SF Giants 41-29 43-27
Dodgers 38-33 37-34
Colorado 36-33 34-35
San Diego 27-42 29-40
Arizona 31-41 30-42

Several things stand out in the table to the right. Here are some simple deductions we can make:

1) The A’s are the best team in baseball, not even close. When you see a team’s Pythagorean W-L record better than their actual record by seven games, you know that that team is clobbering their opponents. If they weren’t already, make the A’s your World Series favorites. They can do everything and do everything well.

2) The Yankees are in trouble. They’re pitching is running on fumes and their bats: Beltran, Jeter, Roberts, and Teixeira are looking their age. McCann hasn’t hit at all, and the only thing keeping them above .500 is a guy named Yangervis Solarte. Their Pythagorean W-L reflects their problems.

3) Pay attention to Seattle and Miami. Seattle has pitched a lot and hit just enough. They appear to be just floating around .500, but you should expect better things to come. For a young team, the Marlins do a lot of things well. They hit .260 as a team, and they have a 3.86 team ERA. Stay tuned.

4) The Cubs are better than you think. According to their Pythagorean W-L record, they should be about 32-35, which is as good as Cincinnati and Pittsburgh. Maybe all they need is a visit from Kris Bryant.

5) The Cardinals, Braves,  and Tigers have underperformed. All three of these teams were expected to be at the top of their respective divisions (two of them are), but so far they have performed worse than their records indicate (except the Cardinals). Not a major cause for concern here, but Royals, Nationals, and Brewers are all playing well next to them and are hungry for the division title.

Don’t forget, this method of evaluation hardly tells the whole story: some teams might function better by winning close games and then getting blown out, which affects their Pythagorean record. The Pythagorean record is only concerned with how many runs teams score and allow, so it is extremely important that you continue to search for another perspective. Still, terms like regression to the mean and Pythagorean expectation are worth keeping in mind.


