Database Reference
In-Depth Information
Some data-retention laws require that you keep data for so long once you
start collecting it. In the United States, laws such as Health Insurance
Portability and Accountability Act (HIPAA) affect the private health
information that healthcare providers and insurance companies compile.
HIPAA requires that the holder of this health information ensure the
confidentiality and availability of the information. Additional regulations
such as the Gramm-Leach-Bliley Act are designed such that financial
services companies notify their customers, through a privacy notice, how
they collect and share their data. The European Union has its own set
of much more strict and comprehensive regulations designed to protect
consumer information from unauthorized disclosure or collection of
personal data.
As a collector of data, it is your responsibility to identify the privacy laws
that apply to that data set and the consumers involved. Spend the time to
identify the laws that affect the countries of your consumers, your specific
industry, and where your data will be located. Understand what data should
be de-identified through anonymization, pseudonymization, encryption, or
key-coding. Taking the time to understand the laws and comply with them
may be the difference between a successful big data project and one that
not only gets your organization in hot water but could also be publicly
damaging.
Challenges in Skillset
Data analysts usually have a much easier time adapting to the big data
environment than database administrators (DBAs). DBAs have a long
history of confining data to a particular environment, including managing
table space, foreign key relationships, applying indexes, and relying on lots
of SQL tuning to solve performance issues. All mature relational database
systems have extensive graphical user interfaces (GUIs) that make this job
easier. Generally, this has made DBAs relatively lazy over the years. Why
spend the time to script out an add column statement when you can do it in
30 seconds in the GUI?
The framework that the Hadoop environment provides is very distributed,
with many components that need to be learned so that a team can get
them working together for the final solution. A single solution may leverage
HDInsight, Hive, Sqoop, Pig, and Oozie. The solution will likely leverage
several of these technologies that use a variety of different programming
Search WWH ::




Custom Search