Databases Reference
In-Depth Information
Like the
JSON
(JavaScript Object Notation) data format (a text-based standard for
human-readable data interchange),
XML
stores its information in a hierarchical tree
structure. Unlike
JSON
,
XML
can store documents with mixed content and
namespaces. Mixed-content systems allow you to mix sequences of text and data in
any order. Elements in
XML
can contain text that has other trees of data interspersed
throughout the text. For example, you can add
HTML
links that contain bold, italic,
links, or images anywhere inside a paragraph of text.
HTML
files are a perfect example
of the type of mixed content that can't be stored by or queried in a document data-
base that only supports the
JSON
format.
Figure 5.1 shows the relationship between the features of
comma-separated value
(
CSV
)
flat files used in many
SQL
systems,
JSON
, and
XML
documents.
If you use spreadsheets or load data into
RDBMS
tables, you know that
CSV
files are
an ideal mechanism for storing data that's loaded into a single table. The
CSV
struc-
ture of commas separating fields and newline characters separating rows is frequently
used to transfer data between spreadsheets and
RDBMS
tables.
JSON
files are ideal for sending serialized objects to and from a web browser.
JSON
allows objects to contain other objects, and works well in hierarchical structures that
don't need to store mixed content or use multiple namespaces. Due to its familiarity
in the JavaScript world,
JSON
is the de facto standard for storing hierarchical docu-
ments within a document store. But it was never designed as a general-purpose con-
tainer for mixed-content markup languages such as
HTML
.
As mentioned,
XML
files are the best choice when your document includes mixed
content.
XML
also supports an often-controversial feature: namespaces. Namespaces
allow you to mix data elements from different domains within the same document, yet
retain the source meaning of each element. Documents that support multiple
namespaces allow applications to add new elements in new namespaces without dis-
rupting existing data queries.
Extensible
markup language
(XML)
XML
Mixed content
JSON
Namespaces
Mixed
JavaScript
object notation
(JSON)
Hierarchical
documents
CSV
Comma-separated
value (CSV)
Flat files
Figure 5.1
The expressiveness of three document formats. Comma-separated value
(CSV) files are designed to only store flat files that don't contain hierarchy. JavaScript
Object Notation (JSON) files can store flat files as well as hierarchical documents.
Extensible Markup Language (XML) files can store flat files, hierarchical documents,
and documents that contain mixed content and namespaces.