Introduction - Big Data: Related Technologies, Challenges and Future Prospects

Database Reference

In-Depth Information

Some literatures [ 26 - 28 ] discuss obstacles to be overcome in the development of

big data applications. Some key challenges are listed as follows:

Data Representation : many datasets have certain levels of heterogeneity in

type, structure, semantics, organization, granularity, and accessibility. Data

representation aims to make data more meaningful for computer analysis and

user interpretation. Nevertheless, an improper data representation will reduce

the value of the original data and may even obstruct effective data analysis.

Efficient data representation shall reflect data structure, class, and type, as well as

integrated technologies, so as to enable efficient operations on different datasets.

Redundancy Reduction and Data Compression : generally, there is a high level of

redundancy in datasets. Redundancy reduction and data compression is effective

to reduce the indirect cost of the entire system on the premise that the potential

values of the data are not affected. For example, most data generated by sensor

networks are highly redundant, which may be filtered and compressed at orders

of magnitude.

Data Life Cycle Management : compared with the relatively slow advances

of storage systems, pervasive sensors and computing are generating data at

unprecedented rates and scales. We are confronted with a lot of pressing

challenges, one of which is that the current storage system could not support

such massive data. Generally speaking, values hidden in big data depend on

data freshness. Therefore, an importance principle related to the analytical value

should be developed to decide which data shall be stored and which data shall be

discarded.

Analytical Mechanism : the analytical system of big data shall process masses

of heterogeneous data within a limited time. However, traditional RDBMSs are

strictly designed with a lack of scalability and expandability, which could not

meet the performance requirements. Non-relational databases have shown their

unique advantages in the processing of unstructured data and started to become

mainstream in big data analysis. Even so, there are still some problems of non-

relational databases in their performance and particular applications. We shall

find a compromising solution between RDBMSs and non-relational databases.

For example, some enterprises have utilized a mixed database architecture that

integrates the advantages of both types of database (e.g., Facebook and Taobao).

More research is needed on the in-memory database and sample data based on

approximate analysis.

Data Confidentiality : most big data service providers or owners at present

could not effectively maintain and analyze such huge datasets because of their

limited capacity. They must rely on professionals or tools to analyze the data,

which increase the potential safety risks. For example, the transactional dataset

generally includes a set of complete operating data to drive key business

processes. Such data contains details of the lowest granularity and some sensitive

information such as credit card numbers. Therefore, analysis of big data may be

delivered to a third party for processing only when proper preventive measures

are taken to protect the sensitive data, to ensure its safety.

Search WWH ::

Custom Search

Home