This article was originally published in the Winter 2000 issue of the *Expert Witness.*

Commonly, economic experts will testify that a particular characteristic of the plaintiff, such as his years of education or his marital status, is “correlated” with one of the factors that is of interest to the court, such as future income or retirement age. The branch of economics that seeks to determine whether such correlations exist is called econometrics. In this article, we explain briefly how econometric techniques work.

Assume that we are interested in determining whether the annual incomes that individuals earn are correlated with, or determined by, years of education. Assume also that 70 individuals have been observed and that for each individual, we know their number of years of education and annual income.

We have plotted the observations for these individuals in Figure 1. For example, individual A has 15 years of education and an annual income of $45,000.

When income levels are plotted against years of education, one would expect that the observations would be scattered, as seen in Figure 1. What the econometrician wishes to do is determine whether these scattered points form a “pattern.” One simple pattern that is often tested is that of a straight line. In this case, the formula for a straight line is:

I = a + b_{1}(E)

where I is income; a is a constant; b_{1} measures the amount that education influences income; and E is years of education.

What the econometrician tries to do is to find the line which minimises the distances between the observations and the points on that line. The straight line which appears to meet this criterion with respect to the observations in Figure 1 has been drawn there. The formula for this line is

I = 6,850 + 2,000(E) (1)

This formula says that if the individual has 12 years of education, his income is predicted to be $30,850.

I = 6,850 + 2,000(12) = 30,850

It can be seen from Figure 1 that, in general, the observations lie fairly close to the line. For this reason, we would conclude that the hypothesis that education affects income is supported. Furthermore, because the “sign” on the 2,000 component of the equation is positive, we would also conclude that education has a positive effect on income. (In this case, each extra year of education appears to lead to 2,000 extra dollars of annual income.)

Equation (1), which investigates the effect which only one variable has on another, is not typical of the equations that are normally of interest to economists. Typically, for example, we would assume that there is a large number of factors, in addition to education, that will affect income. In that case, econometricians extend their equations to include numerous variables.

For example, suppose the economist has additional information about the age of each individual in the data set. This variable can also be added to the equation to help “explain” income. The equation would become:

I = a +b_{1}(E) + b_{2}(A),

where A is “age.” The resulting estimated equation might be something like:

I = 5,000 + 1,900(E) + 200(A) (2)

This model now indicates that for every extra year of education an individual has, they will earn an extra $1,900, on average, and for each additional year in age, there is an increase of $200. In other words, if an individual has a high school diploma, and is 34 years old, then the equation indicates on average, they will earn $34,600 (= 5,000 + [1,900 x 12] + [200 x 34]). Similarly, if an individual holds a bachelor’s degree (16 years of education), and is 34 years old, then the equation indicates that, on average, they will earn $42,200 (= 5,000 + [1,900 x 16] + [200 x 34]).

The variables used as examples to this point – income, education, and age – all share the characteristic that they can easily be measured numerically. Other variables which might influence the wage rate are less easily converted to numerical equivalents, however. Assume, for example, that our hypothesis was that incomes were higher in rural areas than in cities, or that men were paid higher incomes than women, all else being equal.

As econometric analysis is a statistical technique, it requires that the economist enter all of his or her information as numbers. The way that econometricians deal with this problem is to construct what are called “dummy variables.”

In this procedure, one of the observations is arbitrarily chosen to be the “reference variable” and it is given the value of 0 whenever it appears. The other observation is then given the value of 1. For example, if “female” was the reference category, then the dummy variable would be given the value 0 whenever the observed individual was female and would be given the value 1 whenever the individual was male.

Assume that this has been done and equation (2) has been re-estimated with a male/female dummy variable included. The new equation might look like:

I = 3,000 + 1,900(E) + 200(A) + 4,000(M) (3)

where M is 1 if the individual is male and 0 if she is female. The interpretation that is given to the value that appears in front of M in this equation is that income is $4,000 higher when the worker is a male than when the worker is female.

Alternatively, because the dummy variable takes on the value 0 when the worker is female, the relevant regression equation for females is simply equation (3) *excluding* the dummy variable:

I(*female*) = 3,000 + 1,900(E) + 200(A)

And because the dummy variable takes on the value 1 when the worker is male, the relevant equation for males becomes:

I(*male*) = 3,000 + 1,900(E) + 200(A) + 4,000(1)

= 7,000 + 1,900(E) + 200(A)

The income model is one example of how econometrics is used, and how it is useful to determine trends and relationships between variables. Other uses may include forecasting prices, inflation rates, or interest rates. Econometrics provides the methodology to economists to make quantitative predications using statistical data.