Database Reference
In-Depth Information
no changes to the stored tables. Overlog extends Datalog with the following three
main features [38]:
1. It adds notation to specify the location of data.
2. It provides some SQL-style extensions such as primary keys and aggregation.
3. It defines a model for processing and generating changes to tables.
When Overlog tuples arrive at a node, either through rule evaluation or external
events, they are handled in an atomic local Datalog timestep . Within a timestep, each
node sees only locally stored tuples. Communication between Datalog and the rest of
the system (Java code, networks, and clocks) is modeled using events corresponding
to insertions or deletions of tuples in Datalog tables. BOOM Analytics uses a Java-
based Overlog runtime called JOL , which compiles Overlog programs into pipelined
dataflow graphs of operators. In particular, JOL provides metaprogramming support
where each Overlog program is compiled into a representation that is captured in
rows of tables. In BOOM Analytics, everything is data. This includes traditional per-
sistent information like file system metadata, runtime state like TaskTracker status,
summary statistics like those used by the JobTracker's scheduling policy, communi-
cation messages, system events, and execution state of the system.
The BOOM-FS component represents the file system metadata as a collection of
relations ( ile , fqpath , fchunk , datanode , hbchunk ) where file system operations are
implemented by writing queries over these tables. The ile relation contains a row
for each file or directory stored in BOOM-FS. The set of chunks in a file is identi-
fied by the corresponding rows in the fchunk relation. The datanode and hbchunk
relations contain the set of live DataNodes and the chunks stored by each DataNode,
respectively. The NameNode updates these relations as new heartbeats arrive. If the
NameNode does not receive a heartbeat from a DataNode within a configurable
amount of time, it assumes that the DataNode has failed and removes the corre-
sponding rows from these tables. Since a file system is naturally hierarchical, the
file system queries needed to traverse it are recursive. Therefore, the parent-child
relationship of files is used to compute the transitive closure of each file and store its
fully qualified path in the fqpath relation. Because path information is accessed fre-
quently, the fqpath relation is configured to be cached after it is computed. Overlog
will automatically update fqpath when the file is changed, using standard relational
view maintenance logic [126]. BOOM-FS also defines several views to compute
derived file system metadata such as the total size of each file and the contents of
each directory. The materialization of each view can be changed via simple Overlog
table definition statements without altering the semantics of the program. In gen-
eral, HDFS uses three different communication protocols: the metadata protocol ,
which is used by clients and NameNodes to exchange file metadata, the heartbeat
protocol , which is used by the DataNodes to notify the NameNode about chunk
locations and DataNode liveness, and the data protocol , which is used by the cli-
ents and DataNodes used to exchange chunks. BOOM-FS reimplemented these three
protocols using a set of Overlog rules. BOOM-FS also achieves the high availability
failover mechanism using Overlog to implement the hot standby NameNodes feature
using Lamport's Paxos algorithm [85].
Search WWH ::




Custom Search