Database Reference
In-Depth Information
Some literatures [ 26 - 28 ] discuss obstacles to be overcome in the development of
big data applications. Some key challenges are listed as follows:
￿
Data Representation : many datasets have certain levels of heterogeneity in
type, structure, semantics, organization, granularity, and accessibility. Data
representation aims to make data more meaningful for computer analysis and
user interpretation. Nevertheless, an improper data representation will reduce
the value of the original data and may even obstruct effective data analysis.
Efficient data representation shall reflect data structure, class, and type, as well as
integrated technologies, so as to enable efficient operations on different datasets.
￿
Redundancy Reduction and Data Compression : generally, there is a high level of
redundancy in datasets. Redundancy reduction and data compression is effective
to reduce the indirect cost of the entire system on the premise that the potential
values of the data are not affected. For example, most data generated by sensor
networks are highly redundant, which may be filtered and compressed at orders
of magnitude.
￿
Data Life Cycle Management : compared with the relatively slow advances
of storage systems, pervasive sensors and computing are generating data at
unprecedented rates and scales. We are confronted with a lot of pressing
challenges, one of which is that the current storage system could not support
such massive data. Generally speaking, values hidden in big data depend on
data freshness. Therefore, an importance principle related to the analytical value
should be developed to decide which data shall be stored and which data shall be
discarded.
￿
Analytical Mechanism : the analytical system of big data shall process masses
of heterogeneous data within a limited time. However, traditional RDBMSs are
strictly designed with a lack of scalability and expandability, which could not
meet the performance requirements. Non-relational databases have shown their
unique advantages in the processing of unstructured data and started to become
mainstream in big data analysis. Even so, there are still some problems of non-
relational databases in their performance and particular applications. We shall
find a compromising solution between RDBMSs and non-relational databases.
For example, some enterprises have utilized a mixed database architecture that
integrates the advantages of both types of database (e.g., Facebook and Taobao).
More research is needed on the in-memory database and sample data based on
approximate analysis.
￿
Data Confidentiality : most big data service providers or owners at present
could not effectively maintain and analyze such huge datasets because of their
limited capacity. They must rely on professionals or tools to analyze the data,
which increase the potential safety risks. For example, the transactional dataset
generally includes a set of complete operating data to drive key business
processes. Such data contains details of the lowest granularity and some sensitive
information such as credit card numbers. Therefore, analysis of big data may be
delivered to a third party for processing only when proper preventive measures
are taken to protect the sensitive data, to ensure its safety.
Search WWH ::




Custom Search