A Primer on Sabermetrics

Sabermetric Terms

Sabermetrics? What the heck is that?

Sabermetrics is the mathematical and statistical analysis of baseball records. To understand the field of sabermetrics, you first should be familiar with the game of baseball. This sport is one of the most popular games in the United States; it is often called the "national pastime". Baseball began in the eastern United States in the mid-1800's. Professional baseball started near the end of the 18th century; the National League was founded in 1876 and the American League in 1900. Currently in the United States, there are 28 professional teams in the American and National Leagues, with millions of people watching games in ballparks and on television.

 


Baseball, Hot Dogs, and Apple Pie

The game of baseball is played between two teams, each consisting of nine players. The nine players are a pitcher, a catcher, first baseman, second baseman, shortstop, third baseman, left fielder, center fielder and right fielder. A game of baseball consists of nine innings. One inning is divided into two halves; in the top half of the inning, one team plays in the field and the second team comes to bat, and in the bottom half, the teams reverse roles. The team that is batting during a particular half-inning is trying to score runs. The team with the higher number of runs at the end of the nine innings is the winner of the game.

During an inning, a player on the team in the field, called a pitcher, throws a baseball toward a player of the team at-bat, called the batter. The batter will try to hit the ball using a wooden stick (called a bat) in a location out of the reach of the players in the field. By hitting the ball, the batter has the opportunity to run around four bases that lie in the field. If a player advances around all of the bases, he has scored a run. If a batter hits a ball that can be caught, or that can be thrown to first base before he runs to that base, then he is said to be out, and cannot score a run. A batter is also out if he fails to hit the baseball three times or if three good pitches (called strikes) have been thrown. The objective for the batting team during an inning is to score as many runs as possible before obtaining three outs.

 


The Basic Batting Statistics

One notable aspect of the game of baseball is the wealth of numerical information that is recorded about the game. The effectiveness of batters and pitchers is typically assessed by particular numerical measures. The usual measure of hitting effectiveness for a player is the batting average which is computed by dividing the number of hits by the number of at-bats. This statistic gives the proportion of opportunities (at-bats) in which the batter succeeds (gets a hit). The batter with the highest batting average during a baseball season is called the best hitter that year. Batters are also evaluated on their ability to reach one, two, three, or four bases on a single hit; these hits are called respectively singles, doubles, triples, and home runs. The slugging average is computed by dividing the total number of bases (in short, total bases) by the number of opportunities. Since it weights hits by the number of bases reached, this measure reflects the ability of a batter to hit a long ball for distance. The most valued hit in baseball is the home run where a player advances four bases on one hit. The number of home runs is recorded for all players and the batter with the largest number of home runs at the end of the season is given special recognition.

The Basic Pitching Statistics

A number of statistics are also used in the evaluation of pitchers. For a particular pitcher, one counts the number of games in which he was declared the winner or loser and the number of runs allowed. Pitchers are usually rated in terms of the average number of ``earned" runs allowed for a nine inning game. Other statistics are useful in understanding pitching ability. A pitcher records a strikeout when the batter fails to hit the ball in the field and records a walk when he throws four inaccurate pitches (balls) to the batter. A pitcher who can throw the ball very fast can record a high number of strikeouts. A pitcher who is ``wild" or relatively inaccurate will record a large number of walks.

 


A Better Measure of Hitting Ability -- Runs Created

One goal of sabermetrics is to find good measures of hitting and pitching performance. In 1982, Bill James compared the batting records of two players, Johnny Pesky and Dick Stuart, who played in the 1960's. Pesky was a batter who hit for a high batting average but hit few home runs. Stuart, in contrast, had a modest batting average, but hit a high number of home runs. Who was the more valuable hitter? James argues that a hitter should be evaluated by his ability to create runs for his team. From an empirical study of a large collection of team hitting data, he established the following formula for predicting the number of runs scored in a season based on the number of hits, walks, at-bats, and total bases recorded in a season.

 

(Hits + Walks) (Total Bases) ÷ (At Bats + Walks) = Runs

 

This formula reflects two important aspects in scoring runs in baseball. The number of hits and walks of a team reflects the team's ability to get runners on base. The number of total bases of a team reflects the team's ability to move runners that are already on base. This runs created formula can be used at an individual level to compute the number of runs that a player creates for his team. In 1942, Johnny Pesky had 620 at-bats, 205 hits, 42 walks, and 258 total bases; using the formula, he created 96 runs for his team. Dick Stuart in 1960 had 532 at-bats with 160 at-bats, 34 walks, and 309 total bases for 106 runs created. The conclusion is that Stuart in 1960 was a slightly better hitter than Pesky in 1942 since he created a few more runs for his team.

 


A Better Measure of Pitching Ability

Sabermetrics has also developed better ways of evaluating pitching ability. The standard pitching statistics, the number of wins and the earned runs per game (ERA) are flawed. The number of wins of a pitcher can just reflect the fact that he pitches for a good offensive (run scoring) team. The ERA does measure the rate of a pitcher's efficiency, but it does not tell you about the actual benefit of this pitcher over an entire season. Thorn and Palmer (1993) developed the pitching runs formula

 

Innings Pitched (League ERA ÷ 9) - Earned Runs = Pitching Runs

 

The factor (League ERA ÷ 9) measures the average runs allowed per inning for all teams in the league. This value is multiplied by the number of innings pitched by that pitcher --- this product represents the number of runs that pitcher would allow over the season if he was average. Last, one subtracts the actual earned runs (ER) the pitcher allowed for that season. If the pitching runs is larger than 0, then this pitcher is better than average. This new measure appears to be useful in measuring the efficiency and durability of a pitcher.

 


Yet, It's Only Still a Game...

Currently, major league baseball games are recorded in very fine detail. Information about every single ball pitched, fielded and hit during a game are noted, creating a large database of baseball statistics. This database is used in a number of ways. Public relations departments of teams use the data to publish special statistics about their players. The statistics are used to help determine the salaries of major league ballplayers. Specifically, statistical information is used as evidence in salary arbitration, a legal proceeding which sets salaries. A number of teams have employed full-time professional statistical analysts and some managers use statistical information in deciding on strategy during a game. Bill James and other baseball statisticians have shown that it is possible to answer a variety of questions about the game of baseball by means of statistical analyses.

[Back]

References

  1. Albert, J. (1994), "`Exploring baseball hitting data: what about those breakdown statistics?", Journal of the American Statistical Association , 89, 1066-1074.

  2. Albright, S. C. (1993), "A statistical analysis of hitting streaks in baseball," Journal of the American Statistical Association , 88, 1175-1183.

  3. Barry, D., and Hartigan, J. A. (1993), "Choice Models for Predicting Divisional Winners in Major League Baseball," Journal of the American Statistical Association , 88, 766-774.
  4. Bennett, J. M. (1993), "Did Shoeless Joe Jackson Throw the 1919 World Series?", The American Statistician, 47, 241-250.
  5. Bennett, J. M. and Flueck, J. A. (1984), "Player Game Percentage", in Proceedings of the Social Statistics Section, American Statistical Association, 378-380.
  6. Casella, G. and Berger, R. (1993), "Estimation With Selected Binomial Information or Do You Really believe that Dave Winfield is Batting .471?", Journal of the American Statistical Association , 89, 1080-1090.
  7. James, B. (1982), The Bill James Baseball Abstract, New York: Ballantine Books.
  8. Lindsey, G. (1963) "An Investigation of Strategies in Baseball," Operations Research, 11, 447-501.
  9. Thorn, J. and Palmer, P. (1993), Total Baseball, New York: Harper Collins.
  10. Albert, James H. (1999), An Introduction to Sabermetrics, http://www.bgsu.edu/departments/math/

Please contact webmaster@twoteach.com with any questions concerning this site.
©1999-2000 Innovative Teaching Concepts. All rights reserved.
Click here for Terms and Conditions of use.