Structures for Storing Geographic Data - Introducing Geographic Information Systems with ArcGIS

Geography Reference

In-Depth Information

corner of the upper-left cell, (b) the cell size, and (c) the orientation of the raster, you can then easily

calculate the location of any cell, given its row number and column number. To know what the value of

the cell at column 14 and row 3, you need only multiply the row number minus one by 29 and then add

the column number. In the example, this gives 2 × 20 + 14 or 72. If you start counting in the upper-left

corner of the sequence raster values, wrapping around the end of a row to the beginning of the next row,

you will find a “7” when you get to the 72nd value. Though this example is highly simplified, it gives

you an idea of an addressing scheme that may be used to store raster bands.

Rasters may be very large. A raster composed of hundreds of millions of cells is not unusual. (The

Kentucky-wide land use raster that you looked at has about 300 million cells.) The number of cells is the

product of the number of rows and the number of columns. If there are 10,000 rows and 10,000 columns,

then there 100 million cells. If you make each cell half the size, then the number of rows and columns is

doubled (to 20,000 each), so the number of cells goes up to 400 million.

The sheer size of rasters produces a problem, which in years past limited either the amount of real estate

covered or limited the level of detail—that is, the cell size. Great advances in computer storage—both

electronic (RAM) and mechanical (hard drives, DVDs)—have solved part of the problem, but huge rasters

are also made possible by advances in data compression techniques. One such approach, which has been

around for a long time and is simple to implement, is called run-length encoding (RLE). The idea is based

on the fact that if you pick any cell in an integer raster, the cell to its right is likely to have the same value.

So, we might take advantage of this fact and encode the preceding data as follows:

Row 1: {8:6}, {7:7}, {1:5}, {6:4}, {7:9}

Row 2: {6:6}, {14:7} and so on.

This says that in row 1 there are eight sequential values of 6, seven values of 7, one value of 5, and so on.

There are several variations of RLE and, of course, coding economies are used—the braces, colons, and

commas don't explicitly appear in the string.

Whether this approach uses less storage depends on the data. For example, if every cell were different

from its neighbor, this scheme would be much more costly, but usually the savings in memory are

enormous.

As I have said several times, a computer file is composed of 1s and 0s. The idea of a compression scheme

is to store the same information in a new file in many fewer 1s and 0s, and then be able to reconstitute

the original file exactly, so that no information is lost. This is vital if the file is, say, a computer program

where one wrong bit can sink the whole enterprise. Zipping programs do this sort of compression, called

lossless . However, if one is not picky about being able to exactly reproduce the original file—say, it is a

photo in which you will accept a near replication in exchange for a great reduction in file size—then you

could use a lossy compression method. To know more, type “lossless” and “lossy” into a Web search

engine.

Formats accepted by ArcGIS 10 for raster data sets are as follows:

Esri GRID

ERDAS IMAGINE

TIFF (TIF)

Search WWH ::

Custom Search

Home