Database Reference
In-Depth Information
a Hadoop Distributed File System (HDFS) , which is similar to Google's
file system. HDFS provides a distributed file system, in which data is
distributed across multiple machines, with some replication, in order
to provide resilience to disk failures. The Hadoop framework handles
the process of task sub-division, and mapping the Map and Reduce sub-
tasks to the different machines. This process is completely transparent
to the programmer, who can focus their attention on building the Map
and Reduce functions. There are two other related big-data technologies
which are very useful for data management in the semantic web.
HBase The HBase is a database abstraction within the Hadoop frame-
work, which is similar to the original BigTable system [27, 126]. The
HBase has column which serves as the key, and is the only index which
may be used to retrieve the rows. The data in HBase is also stored as
( key,value ) pairs, where the content in the non-key columns may be
considered the values.
Pig The Pig implementation builds upon the Hadoop framework in or-
der to provide further database-like functionality. A table in Pig is a set
of tuples, and each field is either a value or a set of tuples. Thus, this
framework allows for nested tables, which is a rather powerful abstrac-
tion. Pig also provides a scripting language [83] called PigLatin ,which
provides all the familiar constructs of SQL such as projections, joins,
sorting, grouping etc. Different from SQL, PigLatin scripts are proce-
dural , and are rather easy for programmers to pick up. The PigLatin
language provides a higher abstraction level to the MapReduce frame-
work, because a query in PigLatin can be transformed into a sequence
of MapReduce jobs.
One interesting aspect of Pig is that its data model and transfor-
mation language are similar to RDF and the SPARQL query language
respectively. Therefore, Pig was recently extended [77] to perform RDF
querying and transformations. Specifically, Load and Save functions were
defined to convert RDF into Pig's data model, and a complete mapping
was created between SPARQL and PigLatin .
All of these technologies play a very useful role in crawling storing and
analyzing the massive RDF data sets, which are possible and likely in the
massive scale involved in the internet of things. In the next subsection,
we will discuss some of the ways in which these technologies can be used
for search and indexing.
Search WWH ::




Custom Search