Database Reference
In-Depth Information
network, in-memory, and file channels. The network and in-memory channels allow
the PACT compiler to construct low-latency execution pipelines in which one task
can immediately consume the output of another. The file channels collect the entire
output of a task in a temporary file before passing its content on to the next task.
Therefore, file channels can be considered check points, which help to recover from
execution failures.
Due to the declarative character of the PACT programming model, the PACT
compiler can apply different optimization mechanisms and select from several exe-
cution plans with varying costs for a single PACT program. For example, the Match
contract can be satisfied using either a repartition strategy that partitions all inputs
by keys or a broadcast strategy that fully replicates one input to every partition of
the other input. Choosing the right strategy can dramatically reduce network traffic
and execution time. Therefore, the PACT compiler applies standard SQL optimiza-
tion techniques [119] where it exploits information provided by the Output Contracts
and apply different cost-based optimization techniques. In particular, the optimizer
generates a set of candidate execution plans in a bottom-up fashion (starting from
the data sources) where the more expensive plans are pruned using a set of interest-
ing properties for the operators. These properties are also used to spare plans from
pruning that come with an additional property that may amortize their cost overhead
later.
Heise et al. [64] have presented Sopremo , a semantically rich operator model,
and Meteor , an extensible query language that is grounded in Sopremo. Sopremo
provides a programming framework that allows users to define custom packages, the
respective operators and their instantiations. Meteor's syntax is operator-oriented
and uses a Json-like data model to support applications that analyze semistructured
and unstructured data. Meteor queries are then translated into data flow programs
of operator instantiations that represent concrete implementations of the involved
Sopremo operators. A main advantage of this approach is that the operator's seman-
tics can be accessed at compile time and potentially be used for data flow optimi-
zation, or for detecting syntactically correct, but semantically erroneous queries.
Meteor and Sopremo have been implemented within Stratosphere,* a system for par-
allel data analysis, which comprises the Pact programming model and the Nephele
execution engine as well.
2.6.5 boom a nalytiCs
The BOOM Analytics (Berkeley Orders of Magnitude) [9] is an API-compliant
reimplementation of the HDFS distributed file system ( BOOM-FS ) and the Hadoop
MapReduce engine ( BOOM-MR ). The implementation of BOOM Analytics uses the
Overlog logic language [95], which has been originally presented as an event-driven
language and evolved a semantics more carefully grounded in Datalog , the standard
deductive query language from database theory [126]. In general, the Datalog lan-
guage is defined over relational tables as a purely logical query language that makes
* https://stratosphere.eu/.
Search WWH ::




Custom Search