Are Data from the 2011 Census Reliable?

by Christopher J. Bruce

When estimating future earnings in personal injury and fatal accident cases, financial experts often rely on information provided by the Canadian Census. Of particular importance are data concerning incomes by age, sex, occupation, and education. For example, if a 24 year-old male plaintiff would have been a journeyman carpenter, his potential earnings might be based on average incomes for Canadians with that certification, in the age groups 25-29, 30-34, 35-44, etc.

In the past, these data would have been drawn from a section of the Census known as the “long form.” This portion of the Census survey, which contained much more detailed information than on was on the rest of the Census, was given to only one household out of five. (The remainder of the Census survey asks only basic questions about such demographic factors as age, sex, language, and area of residence.)

For the 2011 Census, however, the government decided to replace the long-form questions with a “National Household Survey (NHS).” Although the 2011 NHS asked the same questions as had the 2006 Census long form, whereas the long form had been mandatory, the NHS was voluntary. The result, as had been expected, was that the percentage of households answering this portion of the survey fell significantly, from 93.8% in 2006 to 77.2% in 2011.This created three statistical problems concerning the reliability of the data (variability in small community data, sample error, and non-response bias). As Statistics Canada had anticipated these problems, however, it took steps to mitigate them, steps that have maintained the reliability of the data that are of value to the courts. Wayne R. Smith, Chief Statistician of Canada, recently wrote an article in which he discussed these steps. [“The 2011 National Household Survey – the complete statistical story,” http://www.statcan.gc.ca/eng/blog-blogue/cs-sc/2011NHSstory. June 4, 2015.] In this article, I summarise Dr. Smith’s discussion.

Variability in small community data

As the sample size of any survey becomes smaller, the data become less and less reliable, due to an increase in variance. In response, Statistics Canada routinely withholds data concerning the smallest communities. In 2011, they withheld the results from 1,100 such communities, up from 160 in the 2006 Census. That is, all of the data reported in 2011 meet the normal statistical requirements for reliability.

Sample error

As the overall size of a sample decreases, there is an increase in what is known as the “sampling error;” that is, from the problem that the average characteristics of the sample differ from the average of the total population. Because Statistics Canada expected a smaller percentage of households to answer the voluntary NHS than had answered the mandatory long form, they anticipated that the total size of the “sample” (the households answering the survey) would be lower in 2011 than in 2006.

To deal with this problem, Statistics Canada increased the number of households who were asked to answer the long portion of the 2011 Census. Whereas one household in five were asked to answer the 2006 long form, one household in three were asked to answer the NHS. The result was that, even though a smaller percentage of households responded to the NHS than had responded to the 2006 long form, the number of households answering the NHS was higher than in 2006, (2,657,461 versus 2,443,507, representing 6,719,688 versus 2006’s 6,136,517).

Although this approach does not correct for all errors, those errors become less and less important as the data are aggregated. Thus, for example, the data for the average income of all carpenters in Alberta are more reliable than for the average income of carpenters in Calgary.

Non-response bias

The most worrisome problem that arises when a survey is made voluntary is that the households who choose to respond to that survey may differ significantly from those who refuse to do so. For example, if those carpenters with relatively high incomes are more likely to respond to the NHS than are those with low incomes, the average incomes reported by the NHS will be biased upwards.

Statistics Canada could not control, ex ante, for the possibility that this would happen. However, they were able, ex post, to investigate whether the respondents to the NHS were representative of the overall groups from which they were drawn – that is, they were able to determine whether the respondents “looked” different from the average.

To make this determination, Statistics Canada was assisted by the fact that they had a considerable amount of information about the respondents to the NHS before those individuals answered the NHS survey. Most importantly, they also had their responses to the short questions on the Census that are mandatory for all Canadians. In addition, they were also able to link the NHS respondents to those individuals’ tax files, immigrant landing data, and the Indian Register.

Using sophisticated statistical techniques they were able to determine that the average respondent to the NHS had very similar characteristics to the average Canadian with respect to age, sex, language, area of residence, income tax, immigration status, and aboriginal status. This finding leads Statistics Canada to conclude that the NHS respondents were, in most cases, representative of the larger population from which they were drawn. And when Statistics Canada was unable to conclude that the individuals who replied to a specific sub-class of questions were representative of the population, the resulting data were not released, or they were released with an accompanying cautionary note.

Summary

To summarise: Although the long-form portion of the 2011 Census was made voluntary, there is sound reason to believe that the data that are of greatest relevance to the calculation of lost earnings can be relied upon.

  1. The information in this article is drawn from a blog written by Wayne R. Smith, Chief Statistician of Canada, entitled “The 2011 National Household Survey – the complete statistical story,” June 4, 2015. This blog can be found at: http://www.statcan.gc.ca/eng/blog-blogue/cs-sc/2011NHSstory.

 

Christopher Bruce is the President of Economica and a Professor of Economics at the University of Calgary. He is also the author of Assessment of Personal Injury Damages (Butterworths, 2004).