Databases Reference
In-Depth Information
Databases can be abstracted from a physical layer for tuning the architecture.
Databases cannot handle processing of document or semi-structured types of data.
Procedural language or other programming language interfaces on the database add overhead in
processing and often end up processing data outside the database, requiring cycles of moving vast
amounts of data, and the problem will magnify with unstructured and other new data types.
To provide a robust processing approach for the additional data, the IT team recommended the
following infrastructure and processing recommendations.
Infrastructure
To process data other than structured and additional volumes to current data, a combination of hetero-
geneous technologies is recommended. The solution architecture will include the following type of
technologies:
Hadoop, NoSQL, or similar data processing platforms, driven on nonrelational and file system-
based architecture.
MapReduce programming model will be implemented for managing data processing and
transformation.
Data discovery and analysis will be implemented using Tableau or Datameer software that
abstracts the complexities of MapReduce and works directly on Hadoop for data integration and
management.
Analytics on Hadoop will be implemented using R, Predixion, and other competing technologies
capable to MapReduce integration and management.
In-memory data processing solutions like Qlikview need to be tested further for advanced
reporting requirements, depending on the success and adoption of the new stack of technologies.
Hardware infrastructure will be running on a commodity platform based on multicore processors
and up to 96 GB RAM.
Disk architecture for the new infrastructure will be not based on storage area network (SAN) but
on direct attached storage (DAS).
A redundant configuration will be set up for failover.
A landing zone will be available on the existing server with unlimited storage. The storage will be
designed for high capacity and not for high performance.
Security for the raw data will be implemented on current disk storage access policies.
Security rules for nonrelational data postprocessing will follow the existing rules in the LDAP
repository (integrated single sign on security process) for EDW data.
Data processing
Processing of different types of data will be assigned to different clusters of systems.
Documents and text data will be processed using discovery rules. The result set will be a
structured output of tags and keywords, occurrences, counts, and processing dates.
Audit, balance, and control will be implemented for tracing data processing across layers.
Business rules will be programmatically implemented with MapReduce and other programming
languages that can scale and perform like Java or Ruby.
Search WWH ::




Custom Search