Databases Reference
In-Depth Information
may contain structured content, wholly unstructured content, or a combination of
both. In addition, the initial purpose and still dominant usage of XML is as a doc-
ument structure and communication medium and not a storage model, and as such,
notions of schema for XML are not nearly as intrinsic to the model as with relations.
However, a notion of schema for XML is important for application interoperability
to establish common communication protocols.
Given that the very notion of XML schemas is relatively new, the notion of
schema evolution in XML is equally new. While there have been many proposed
schema languages for XML, two have emerged as dominant - Document type
definitions (DTDs) and XML Schema, with XML Schema now being the W3C
recommendation. Each schema language has different capabilities and expressive
power and as such has different ramifications on schema evolution strategies. None
of the proposed XML schema languages, including DTDs and XML Schema, have
an analogous notion of an “ALTER” statement from SQL allowing incremental evo-
lution. Also unlike the relational model, XML does have a candidate language for
referring to schema elements called component designators ( W3C 2010 ); however,
while the language has been used in research for other purposes, it has to date
not been used in the context of schema evolution. Currently, XML schema evolu-
tion frameworks either use a proprietary textual or graphical language to express
incremental schema changes or require the developer to provide the entire new
schema.
The W3C - the official owners of the XML and XML Schema recommenda-
tions - have a document describing a base set of use cases for evolution of XML
Schemas ( W3C 2006 ). The document does not provide any language or frame-
work for mitigating such evolutions, but instead prescribes what the semantics and
behavior should be for certain kinds of incremental schema evolution and how appli-
cations should behave when faced with the potential for data from multiple schema
versions. For instance, Sect. 2.3 lists use cases where the same element in differ-
ent versions of a schema contains different elements. Applications are instructed to
“ignore what they don't expect” and be able to “add extra elements without breaking
the application.”
All of the use cases emphasize application interoperability above all other con-
cerns, and in addition that each application be allowed to have a local understanding
of schema. Each application should be able to both produce and consume data
according to the local schema. This perspective places the onus on the database or
middle tier to handle inconsistencies, in sharp contrast to the static, structured nature
of the relational model, which generally assumes a single working database schema
with homogeneous instances that must be translated with every schema change.
Thus, commercial and research systems have taken both approaches from the outset;
some systems (e.g., Oracle) assume uniform instances like a relational system, while
other systems (e.g., DB2) allow flexibility and versioning within a single collection
of documents.
A key characteristic of a schema language such as DTDs and XML Schemas
is that it determines what elements may be present in instance documents and in
what order and multiplicity. Proprietary schema alteration languages thus tend to
Search WWH ::




Custom Search