Database Reference
In-Depth Information
a
Hadoop Distributed File System (HDFS)
, which is similar to Google's
file system. HDFS provides a distributed file system, in which data is
distributed across multiple machines, with some replication, in order
to provide resilience to disk failures. The Hadoop framework handles
the process of task sub-division, and mapping the
Map
and
Reduce
sub-
tasks to the different machines. This process is completely transparent
to the programmer, who can focus their attention on building the
Map
and
Reduce
functions. There are two other related big-data technologies
which are very useful for data management in the semantic web.
HBase
The
HBase
is a database abstraction within the
Hadoop
frame-
work, which is similar to the original
BigTable
system [27, 126]. The
HBase
has column which serves as the key, and is the only index which
may be used to retrieve the rows. The data in
HBase
is also stored as
(
key,value
) pairs, where the content in the non-key columns may be
considered the values.
Pig
The
Pig
implementation builds upon the
Hadoop
framework in or-
der to provide further database-like functionality. A table in
Pig
is a set
of tuples, and each field is either a value or a set of tuples. Thus, this
framework allows for nested tables, which is a rather powerful abstrac-
tion.
Pig
also provides a scripting language [83] called
PigLatin
,which
provides all the familiar constructs of SQL such as projections, joins,
sorting, grouping etc. Different from SQL,
PigLatin
scripts are
proce-
dural
, and are rather easy for programmers to pick up. The
PigLatin
language provides a higher abstraction level to the
MapReduce
frame-
work, because a query in
PigLatin
can be transformed into a sequence
of
MapReduce
jobs.
One interesting aspect of
Pig
is that its data model and transfor-
mation language are similar to RDF and the SPARQL query language
respectively. Therefore,
Pig
was recently extended [77] to perform RDF
querying and transformations. Specifically,
Load
and
Save
functions were
defined to convert RDF into Pig's data model, and a complete mapping
was created between SPARQL and
PigLatin
.
All of these technologies play a very useful role in crawling storing and
analyzing the massive RDF data sets, which are possible and likely in the
massive scale involved in the internet of things. In the next subsection,
we will discuss some of the ways in which these technologies can be used
for search and indexing.