THE INTERNET OF THINGS: A SURVEY FROM THE DATA-CENTRIC PERSPECTIVE - Managing and Mining Sensor Data

Database Reference

In-Depth Information

a Hadoop Distributed File System (HDFS) , which is similar to Google's

file system. HDFS provides a distributed file system, in which data is

distributed across multiple machines, with some replication, in order

to provide resilience to disk failures. The Hadoop framework handles

the process of task sub-division, and mapping the Map and Reduce sub-

tasks to the different machines. This process is completely transparent

to the programmer, who can focus their attention on building the Map

and Reduce functions. There are two other related big-data technologies

which are very useful for data management in the semantic web.

HBase The HBase is a database abstraction within the Hadoop frame-

work, which is similar to the original BigTable system [27, 126]. The

HBase has column which serves as the key, and is the only index which

may be used to retrieve the rows. The data in HBase is also stored as

( key,value ) pairs, where the content in the non-key columns may be

considered the values.

Pig The Pig implementation builds upon the Hadoop framework in or-

der to provide further database-like functionality. A table in Pig is a set

of tuples, and each field is either a value or a set of tuples. Thus, this

framework allows for nested tables, which is a rather powerful abstrac-

tion. Pig also provides a scripting language [83] called PigLatin ,which

provides all the familiar constructs of SQL such as projections, joins,

sorting, grouping etc. Different from SQL, PigLatin scripts are proce-

dural , and are rather easy for programmers to pick up. The PigLatin

language provides a higher abstraction level to the MapReduce frame-

work, because a query in PigLatin can be transformed into a sequence

of MapReduce jobs.

One interesting aspect of Pig is that its data model and transfor-

mation language are similar to RDF and the SPARQL query language

respectively. Therefore, Pig was recently extended [77] to perform RDF

querying and transformations. Specifically, Load and Save functions were

defined to convert RDF into Pig's data model, and a complete mapping

was created between SPARQL and PigLatin .

All of these technologies play a very useful role in crawling storing and

analyzing the massive RDF data sets, which are possible and likely in the

massive scale involved in the internet of things. In the next subsection,

we will discuss some of the ways in which these technologies can be used

for search and indexing.

Search WWH ::

Custom Search

Home