Geography Reference
In-Depth Information
13.2 The data documentation initiative and the aggregate
data extension
The Data Documentation Initiative (DDI; www.ddialliance.org) is very much a collective
enterprise of data archivists, not data analysts: the initiative has 25 members, all institutions
and including the Inter-university Consortium for Political and Social Research (ICPSR) at
the University of Michigan, the UK Data Archive at the University of Essex and the World
Bank's Development Data Group. The DDI Alliance was formally established only in 2003,
but work on the standard began in 1995, the first beta version appeared in 1999 and the
DDI DTD Version 1.0 was published in March 2000 (Blank and Rasmussen, 2004). The
work described here was based broadly on DDI Version 2.0, and especially on the aggregate
data extension it introduced, but at the time of writing Version 3.0 is being finalized for
release.
Version 1 of the DDI focused purely on survey data, meaning the results of directly
computerizing the replies to questionnaires, ignoring time series and aggregate data. The
highest level of documentation would be for a collection , meaning a complete data library,
but this would be made up of studies , such as a particular questionnaire survey. The DDI
document describing the data created by a study was to be divided into five parts: the first
described the structure of the document itself; the second was a description of the study
that created the data; the third was a description of the physical format of the data files;
the fourth contained the variables, which for a questionnaire survey meant the questions
asked; and, finally, any other materials. All this information was designed to be not just
machine-readable but machine-processable , and was therefore represented using XML.
Version 2.0 of the DDI specification was released in 2003 and included an extension
covering aggregate and tabular data, such as appears in census reports. This extension has
provided a foundation for much of the work of the Minnesota Population Center on the
US National Historical GIS project (www.nhgis.org), whose aims included converting all
existing transcriptions of data from US census reports to a standard format. Any given
census is, of course, a questionnaire survey which could be described using DDI 1.0. How-
ever, that description would cover only the individual-level microdata, which are usually
confidential for a long period. Those censuses which are old enough to be no longer confi-
dential were carried out long before the computer, and would be enormously expensive to
computerize. Research is therefore largely limited to the tabulations created by the census
offices from the confidential individual-level data, and published either in printed volumes
or, more recently, machine-readable small area statistics. DDI 1.0 was used experimentally
by the Minnesota project to describe aggregate data, but coverage of cross-tabulations was
obviously inadequate.
The key innovation in the DDI aggregate data extension is the nCube , which is essentially
a matrix. The dimensions of an nCube are defined by its component variables. The simplest
possible nCube is a population count: just one variable and that with only one category.
Two separate counts of the numbers of men and women in each area are only slightly more
complex: a one-dimensional nCube, based on a single variable containing two categories.
However, more complex structures are common. For example, the six Decennial Supple-
ments published by the Registrar General for England and Wales between 1851 and 1910
consist mainly of sets of tables, one for each of around 600 Registration Districts, giving
numbers of deaths in each combination of an age group and a cause of death, sometimes
Search WWH ::




Custom Search