Biomedical Engineering Reference
In-Depth Information
Additionally, these databases are ready for the massive scalability and
redundancy required to handle an entire region's clinical documentation
en masse. Certainly relational databases can scale, but the total cost of
ownership with these next-generation databases is demonstrably less. In
fact, we would argue that NoSQL databases are the best for both simple
agile projects and applications, and for extreme scale or highly distributed
data stores. Medium-sized platforms with plenty of rich transactional
use-cases and rich reporting will probably remain best suited for a
relational database.
The fi rst major success of the NoSQL storage paradigm was by Google.
Google built a scalable distributed storage and processing framework
called Google File System, BigTable and MapReduce [13-16] for storing
and indexing web pages accessible through their search interface. These
were all donated to Apache for open source development as a set of
Hadoop [17] frameworks. BigTable is simply an extremely de-normalized,
fl at, and wide table, that allows any type of data to be stored in any
column. This schema provides the ability to retrieve all columns by a
single key, and each row retrieved could be of a different columnar form.
This is similar to pivoting the relational model discussed earlier.
MapReduce is a powerful framework that combines a master process,
a map function, and a reduce function to process huge amounts of data
in parallel across distributed server nodes. Because the map functions can
contain arbitrary code, they can be used to perform extremely expensive
and complicated functions. However, due to the framework they must
return results via an emit function to a data (or value) reduction phase.
The reduction phase can also process the intermediary data with any
logic it wishes so long as it produces a singular (but possibly large)
answer. Figure 20.8 shows a simple data fl ow where a map function
reviews CDA for large segments of population. Using UMLS one could
look for all medical codes that imply the patient has an acute heart
disease. Each map function would then put each resulting data set to an
intermediary location. The master process would then coordinate the
hand-off to reduce functions that then combine the intermediary data set
into one fi nal data set.
The success of this architectural breakthrough led to many other uses
within their product suite. Seemingly in parallel, all the largest major
internet sites that handled 'Big Data' had approached this problem by
building or utilizing (and subsequently making famous) various products
that exist in this realm. Public recognition accelerated when Google and
Facebook [18] donated their inventions to the open source community
via Apache. Finally, Amazon [19] paid the methodology a fi nal dose of
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search