Information Technology Reference
In-Depth Information
Box 4.1
Storing the census
The small area statistics of the 1971 whole population census consisted of
480 counts for each of 125 476 enumeration districts, some sixty million
numbers. Originally each count was stored in an eight character wide slot
and the file was half a gigabyte in size, far too large to be easily stored and
repeatedly accessed. A simple form of run length encoding was customised
to compress the file and still allow the records of individual enumeration
districts to be read instantly. The counts for each cell were stored sequentially
as either a run of zeros, half-bytes (for 0 to 15), bytes (for 0 to 255) or half-
words (for 0 to 65 535). The sophistication of the algorithm was in deciding
when it was profitable to drop down an order of magnitude in the form of
storage used and when it was not. This was achieved by looking through
the list both forwards and backwards. The following simplified heuristic was
employed:
define: yesterday, today and tomorrow as the magnitude of the pre-
vious, present and future cell to be encoded. Then, if the opportunity
to lower the magnitude of storage arises (today < yesterday) continue
at the present order while tomorrow yesterday.
With a few other caveats, this rule compresses the file to just 5 % of its
former volume: under 30 megabytes. The more sparse first section, the 10 %
population census file containing 368 cells by 125 462 enumeration districts
(14 missing) is compressed to a file of just 11 megabytes in size. These
figures are better than those achieved by the standard Lempel-Ziv compress
algorithm, but, more importantly, the file could be read and decoded faster
than any other configuration (including the original flat form) given disk
speed restrictions in 1990.
the last twenty years than over the previous twenty thousand. Is it not surprising
that radically new techniques are required to view the social landscape? 3 Con-
ventional choropleth maps at the level of ten thousand wards are occasionally
included here to show how they contrast with the message of the cartograms.
Gender is the least ambiguous attribute we give people. One of the very first
cartograms made for this topic (Figure 4.2) of enumeration districts was a picture
where each street block is coloured either black, for over-average proportions of
females, or white, for under-average. The picture not only showed the random
variation in this statistic, indicated by the speckled nature of the image, but also
3 'Quite simply there is far too much information to allow policy-makers, planners, geographers,
politicians, schoolchildren, and others interested in census data for a particular area to be able to
identify easily patterns of characteristics or features of interest from SAS [Small Area Statistics]
data without processing and condensing it in some way' (Openshaw, 1983, p. 243).
Search WWH ::




Custom Search