Description of The Data Set

The Dependent Variable

The dependent variable in our study is the concentration of sulphur dioxide at observation sites in major cities around the world as obtained through the GEMS/AIR data set supplied by the World Health Organization. Measurements are carried out using comparable methods. Each observation station reports annual summary statistics of SO concentrations such as the median, the arithmetic and geometric mean, as well as 90th and 95th percentiles. The raw data supplied by the WHO were processed by the United States Environmental Protection Agency (EPA) and are disseminated to the public through the EPA’s web site. We have obtained a more comprehensive version of what is released directly from the EPA.

We have chosen to use a logarithmic transformation of the median SO concentration as our dependent variable. Figure A.1 shows that the distribution of concentrations is highly-skewed towards zero when viewed on a linear scale. in this diagram, the horizontal axis shows ranges of median SO concentrations in parts per million per cubic metre [ppm/m3]. As was pointed out in the WHO (1984) report about the GEMS/AIR project, concentrations are more suitably described by a log-normal distribution.

This is apparent in figure A.2 where the horizontal axis is logarithmic. The large number of observations in the bin at the very left of the diagram can be explained by the measurement threshold of the measurement devices; they cannot measure arbitrarily low concentrations. There is also an ambient level of SO in the air that has natural causes.
The composition of the data set by contributor countries is shown in the pie diagram of figure A.3.