Geoscience Reference
In-Depth Information
Image statistics can be calculated using the option -stats . If approximated
statistics, based on a subset of the dataset, are sufficient, you can use the option
-approx_stats . The histogram is calculated using the option -hist . It is based
on a fixed number of 256 buckets (bins) that cover theminimumandmaximumvalues.
Currently, no-data values are considered as valid data and thus taken into account
for calculating the histogram. For a more flexible calculation of image statistics, you
can also create your own tool based on the GDAL API (see Chap. 14 ) .
Another useful option of gdalinfo is -checksum . Calculating the checksum
can be very useful to check the image integrity or to compare if two images are the
same without visualizing them. The 16bit (0-65535) result for two identical images
should be the same. However, some care must be taken in case of different computing
platforms and for compressed image data due to rounding errors.
Suppose youwant to check for duplicates in your image archivewhich is organized
in several directories. Checking filenames is not sufficient as identical images could
have been copied to a different name. If the list of images becomes large, this can
be a cumbersome task. Manually opening each image with an image viewer is not
only time consuming, but also error prone. Images that are visually similar could
still be different. The following code snippet automatically creates a text file with the
list of duplicate images. It is relatively fast as it loops only once through the image
list. There is no need to go through all file combinations. It uses gdalmanage to
recursively search for images in a directory. That list is then used in a for loop in
which the checksum is calculated for each image. With the Bash command sort
the result is sorted by increasing checksums and redirected to a temporary text file
( > list.txt ). This file is then sorted for unique checksums only (using the
option -u ). The Bash command diff is used to compare the complete sorted list
with the unique sorted list. As a result we obtain the duplicate files only. It is not our
intention to fully explain the details of this code snippet, but to show how a relatively
complex and time consuming task can be solved in just a few lines of code on the
command line. The current code snippet assumes all images have one band only, but
it can easily be adapted to handle multi-band images.
for IMAGE in $(gdalmanage identify -r output| awk -F : '{print
$1}'); do
gdalinfo -checksum $IMAGE|grep Checksum | awk -v IM=$IMAGE -F =
'{print IM,$2}'
done | sort -nk2 > /tmp/list.txt
sort -unk2 /tmp/list.txt > /tmp/list_unique.txt
diff /tmp/list.txt /tmp/list_unique.txt > /tmp/duplicates.txt
In some cases, the amount of information that is listed by gdalinfo can be
overwhelming. To suppress some of the information you can use the options -nogcp ,
-nomd , -nrat , and -noct . Suppose you only want to list one particular attribute
such as the number of lines in an image. In particular for automatic processing of
images where the attribute of one image is used to process another, it can be useful
to store the result of gdalinfo in some environment variable. Suppose we want to
 
 
Search WWH ::




Custom Search