Databases Reference
In-Depth Information
in importance as the tools for creating content, especially those that will
store the content in XML format, become more sophisticated.
The third category of functionality is publishing. This could cover pub-
lishing to any medium, including paper, personal devices, or the Web. Simple
content is rarely published by itself. Usually, a document is published.
Therefore, one of the first steps in publishing involves assembling the
constituent components of the document. Assembly is different from the
management of the aggregated components because management involves
managing a set of pointers, whereas publishing involves assembling copies
of the content into a unified whole. The next step after assembly is applying
presentation formatting. This is done through the use of a template. The
publishing system must be able to understand the unstructured data
within each type of content component to some degree, to be able to apply
the formatting and to be able to reproduce the formatted content in a
meaningful way. As discussed previously, understanding the unstructured
data can be complicated.
Publishing to the Web could almost be a category unto itself. In addition
to the functionality for general publishing, publishing to the Web can
require special hardware and software for Internet scalability. Special soft-
ware may be required for the synchronization of multiple Web servers if
more than one Web server is involved. Depending on the objective of pro-
viding the content over the Web, there may also be various application
servers involved. For example, there may be commerce servers for buying
and selling, or personalization servers to enable individuals to have a
unique personal experience on a particular Web site. These application
servers will require some integration with their indexing schemas.
As demonstrated, both content and documents are comprised of
unstructured data. Unstructured data usually requires special tools, algo-
rithms, and methodologies to effectively manage it. Standard IT tools such
as relational databases by themselves are not effective.
All database management systems need to understand the content of
the data in order to generate the indices that are used to store and retrieve
the data. Because the computer cannot understand the content of unstruc-
tured data at the operating system level, it cannot generate the indices.
Other tools, in addition to those provided in the standard IT toolkit, are
required. One of the most common tools is the use of metadata or data
describing the data (content). Metadata, however, needs to be generated
by some intelligence that has at least a partial understanding of the mean-
ing of the content. Metadata is stored externally as structured data in a
relational database management system. The computer then uses this
structured data and pointers to the content in the form of access paths pro-
vided by the operating system file access method, to manage information
stored in the unstructured portion.
Search WWH ::




Custom Search