Biomedical Engineering Reference
In-Depth Information
With all of these advantages, the tradeoff is a modifi ed, but surprisingly
simple, application development architecture. Use-cases become more
modular and web interfaces become mash-ups of many different services
hitting many different database clusters. Fat use-cases begin to disappear
because the complicated joins that produce them cannot be accomplished.
(Note: this is also partially true for the 'CDA Data Items' table in the
RDBMS example.) The methodology also has a major drawback of not
allowing traditional widespread access to the data by non-programmers.
One solution to provide traditional SQL access for non-programmers is
to use Hive on top of the HBase system. It does not fully replace the
capabilities of RDBMS, but at least it gives a familiar entry point.
To further understand the value of these approaches in this domain, we
will explore two systems that provide good insight into the power of
NoSQL, namely Cassandra [20] and Riak [21]. Cassandra was
contributed to Apache from Facebook, and is an example of a NoSQL
column store database. Like the others it is still driven primarily by
key/value access, but the value is built of a structured but extremely
'de-normalized' schema to be stored under each key, called a
ColumnFamily. The most powerful aspect of this being that the number
of practically usable columns is not fi xed; in fact the maximum number
of columns supported is over two billion! Each of these columns can be
created ad hoc, on the fl y, per transaction. This far exceeds the level of
fl exibility of new data items that might be expected in clinical
documentation. Cassandra takes this one step further and allows for
SuperColumns. A SuperColumn is essentially a column that supports
additional columns within it. For example, this allows the ability to
specify the patient as a SuperColumn, with the fi rst name and the last
name being subcolumns. The key/value, ColumnFamily, SuperColumn
model provides a nice mix of highly scalable, highly fl exible storage and
indexable, multidimensional, organized data.
Like most if not all NoSQL databases, Cassandra scales very easily.
What sets Cassandra apart is its ability to give the developer control over
the tradeoffs of consistency, availability, and partitioning (CAP). The
CAP theorem [22], fi rst proposed by Eric Brewer [23] at Inktomi, submits
that in any single large-scale distributed system, one can pick any two
of the three fundamental goals of highly available, scalable, distributed
data storage. The designers of Cassandra prioritized partitioning and
availability, and allowed consistency to be selected by the application
developer at a cost of latency. The design decision to allow these tradeoffs
to be tuned by the developer was ingenious, and more and more
architectures are moving this way. Medical use-cases typically experience
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search