Database Reference
In-Depth Information
can leverage it to meet compliance requirements such as persisting, process-
ing, alerting, and reporting on audit logs. Future areas of interest could
include capturing audit logs for HBase, the BigInsights Web Console, among
others.
Database activity monitoring is a key capability that's needed for an effec-
tive governed Big Data environment. In fact, we'd argue that such capabilities
are needed even more than in a traditional database environment, because
currently, Hadoop governance controls are typically weaker.
Big Data, as is the case with traditional data, might require data masking
for both test and production environments. Masking data is one of the biggest
concerns with new Big Data technology, as many customers realize they
might accidentally expose very sensitive information in test and production
Big Data environments. You need to create realistic versions of real data, but
at the same time protect sensitive data values from being compromised.
Masking sensitive data that's delivered to HDFS or to a data warehouse will
become (and should already be) a pressing concern for Big Data environ-
ments. IBM InfoSphere Optim Masking Solution addresses this concern by
masking data in a Hadoop system. In fact, Optim's API-based approach to
masking means that any system can take advantage of its advanced masking
capabilities, and incorporate masking within its processing. The benefit is
clear—the ability to define masking rules centrally and apply them in multi-
ple Big Data systems.
The obfuscation and blacking out of specific sensitive content within docu-
ments will also be key; after all, if you are storing email with sensitive data in
HDFS, that email could be subject to redaction requirements. It's worth noting
here that the Text Analytics Toolkit that's part of BigInsights and Streams is
used in InfoSphere Guardium Data Redaction.
Wrapping It Up: Trust Is About Turning
Big Data into Trusted Information
Information integration and governance is a critical component that should
be considered during the design phase of any Big Data platform because of
the eventual mass injection of unwieldy data from a volume, velocity, vari-
ety, and veracity perspective. It's fair to note that IBM is a leader in informa-
tion integration and governance; and as you've likely noticed, a number of
 
Search WWH ::




Custom Search