Information Technology Reference
In-Depth Information
example) increases the likelihood that an implementation will exist in the future,
or if one does not exist, then the likelihood and motivation to produce one will
be increased. Basically this is due to the value and amount of data that has been
described (consider the vast number of XML schemas that exist for XML data).
Currently though, binary data is not usually accompanied with FSRI, and their struc-
ture is usually described in a human readable document. But the relatively recent
development of formal languages to describe binary data structures may change this
if they are adopted more widely. Such an adoption would be highly beneficial for
data preservation.
The current set of FSRIs are themselves formally described, for example, EAST
and DRB are both described with a form of BNF as they are structured text based
formats. This allows an instance of the FSRIs to be validated to ensure its structure
and content follow the formal grammar. Having FSRI for data also allows one to
automatically check that the data is written exactly in accordance with the FSRI, i.e.
each instance of the data has the correct structure. This ability is important for data
preservation for the following reasons:
it can be used to check the valid creation of a data structure.
it can be used to periodically check the data structure for errors or corruption
(also useful in authenticity to check for deliberate structure tampering).
It can be used to identify a data file accurately - it is accurate because knowledge
about the whole data structure is used as opposed to simple file format signatures.
Properties that the FSRI highlights guide a person in capturing the relevant structure
information that is required to read the DVs. Having a well thought out FSRI which
ensures that all the relevant structure information is captured is possibly the most
important thing for the preservation of data. The current set of FSRIs are good but
still incomplete. They either restrict the types of logical data structure that can be
described or fail to provide sufficient generality to describe the physical data struc-
ture (or both). EAST for example has most of the properties defined to provide an
adequate description of the physical structure, but is quite restrictive in the logical
structures it can describe. But if one can describe a data file format with EAST then
it will provided a good basis for a complete FSRI for that data in terms of providing
all the information required for long-term preservation of the structure.
7.4 Format Identification
Even if one cannot create a formal description, there are a number of tools to at least
identify the structure (format). Some of these are described below.
The simplest method is to look at the file name extension and make an edu-
cated guess. For example “file.txt” is probably a text file, probably ASCII encoded.
Search WWH ::




Custom Search