Extreme scale clinical analytics with open source software - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

The tendency to utilize expensive closed source databases is waning,

and more and more people are learning that robust, highly scalable,

distributed data storage solutions are available in the open source realm.

There has been an even more dramatic shift, though. The ecosystem of

application development patterns, using SQL and relying on overly

complicated joins of normalized tables, is giving way to the convention

and ease of NoSQL databases. The industry has learnt that if the

application development paradigms of the early 2000s are re-designed,

without relying on SQL, then linearly scalable data storage will be within

our reach. In our interoperable analytics architecture we are going to

study implementation tradeoffs between four distinct fl avors of schema

design.

The traditional manner of approaching application development and

data storage is through Relational Database Management Systems

(RDBMS). These technologies offer widely understood Structured Query

Language (SQL) interfaces through application level protocols such as

Open DataBase Connectivity (ODBC) and Java DataBase Connectivity

(JDBC). We will briefl y discuss applying the most popular open source

RDBMS, MySQL, to our acute heart disease analytics scenario. MySQL

has a long history. Michael Widenius and David Axmark originally

developed it in the mid-1990s [12], and it is now owned by the Oracle

corporation. MySQL has claimed many millions of installations, it

supports over 20 platforms, and is used by some of the largest internet

sites in the world. MySQL is a safe choice for open source database

storage. PostGresSQL is another mature, stable, and safe choice. Whereas

MySQL supports a number of storage back-ends, PostGresSQL is famous

for offering pluggable procedural language utilities, which offer some

alternative advantages that we will discuss.

Using a RDBMS to handle our analytical requirements would require

traditional data schema design and data transformation. A relational

schema would be created that supports storing each and every data

fi eld from the clinical document. CDA is quite extensive and fl exible,

so to cover longitudinal scenarios, across a spectrum of providers and

EMR systems, a generalized approach is needed; indeed, a generic table

structure that allows new data items to be stored as rows could be

created. A meta-data model would then need to be designed and

administered to defi ne what each row meant, such as that in Figure 20.7.

Such a data model is extremely fl exible in the face of many requirements

and an ever-changing landscape of healthcare data. The problem with it

is that at large scales it will not perform well, and it will be cumbersome

to work with. The variation of the data that would be held in the 'CDA

Search WWH ::

Custom Search

Home