Biomedical Engineering Reference
In-Depth Information
1. UMLS selection of all code sets indicative of acute heart disease;
2. map selection of raw data set from NoSQL;
3. MPI normalization (or other if not focused on patient centricity);
4. reduce selections to a single data set;
5. import into Data Mart.
Talend would be another good choice for this, although its native support
for NoSQL databases and MapReduce processing is not very strong.
JasperSoft integrates Talend into their suite for ETL, Master Data
Management, and Data Quality. This partnership forms a strong data
management solution. JasperSoft has recently released a set of NoSQL
connectors for native reporting directly from NoSQL databases; however,
it is unclear how well this works or scales.
Once the data are in the required form, they can be analyzed. There are
a few really good open source options for this. The R Programming
Language is an open source statistics, mathematics, and visualization
toolset, on par with SAS and Stata capabilities. Its for-profi t sponsorship
comes from a company called Revolution Analytics [28]. The basics of R
support datafi le manipulation, text manipulation, probability, math,
statistics, set manipulation, indexing, and plotting functions. The R
community have created numerous frameworks that form a large suite of
capabilities. Two important frameworks used by Open BI to produce a
Pentaho plug-in are RServe, a TCP/IP server to R the environment, and
JRI, a Java to R language interface. Some plug-ins and utilities for
MapReduce and Hadoop have also been created; however, many are not
very active. With the Java R Interface it is easy enough to include these
into NoSQL MapReduce programs. Both Pentaho and Jasper provide R
plug-ins for advanced statistical analysis within their portals. Although R
is arguably the most mature technology in this category, Pentaho sponsors
Weka [29]. Weka is a set of Java libraries that perform various statistics,
machine-learning, and predictive algorithms. Weka is a strong, well
documented, but separately built utility. We expect to see powerful
integrations with the Pentaho suite in future.
With the capabilities of the Hadoop suite under its wing, one would
fully expect Apache to embrace the statistics and machine-learning
algorithms, which are indeed provided by Mahout [30]. Written to
integrate with MapReduce algorithms, Mahout is a very promising
statistics and machine-learning library. However, currently it does not yet
have broad coverage of standard algorithms. The trend is clear, as we get
further away from the standard stack, available solutions begin to thin
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search