Database Reference
In-Depth Information
The Mappers are used to evaluate a user-defined function on each item in
the input.
The Sorters are used to sort input records using user-provided comparator
functions.
The Joiners are binary-input operators that perform equi-joins.
The Aggregators are used to perform aggregation using a user-defined
aggregation function.
The Hyracks connectors are used to distribute data produced by a set of sender
operators to a set of receiver operators. The basic set of Hyracks connectors include
The M:N Hash-Partitioner hashes every tuple produced by senders to gen-
erate the receiver number to which the tuple is sent. Tuples produced by the
same sender keep their initial order on the receiver side.
The M:N Hash-Partitioning Merger takes as input sorted streams of data
and hashes each tuple to find the receiver. On the receiver side, it merges
streams coming from different senders based on a given comparator and
thus producing ordered partitions.
The M:N Range-Partitioner partitions data using a specified field in the
input and a range-vector.
The M:N Replicator copies the data produced by every sender to every
receiver operator.
The 1:1 Connector connects exactly one sender to one receiver operator.
In principle, Hyracks has been designed with the goal of being a runtime plat-
form where users can create their jobs and also to serve as an efficient target for the
compilers of higher-level programming languages such as Pig, Hive, or Jaql. The
ASTERIX project [16,22] uses this feature for building a scalable information man-
agement system that supports the storage, querying, and analysis of large collec-
tions of semistructured nested data objects. The ASTERIX data storage and query
processing are based on its own semistructured model called the ASTERIX Data
Model (ADM). Each individual ADM data instance is typed and self-describing.
All data instances live in data sets (the ASTERIX analogy to tables) and data
sets can be indexed, partitioned, and possibly replicated to achieve the scalability
and availability goals. External data sets that reside in files that are not under
ASTERIX control are also supported. An instance of the ASTERIX data model
can either be a primitive type (e.g., integer, string, time) or a derived type, which
may include
Enum : An enumeration type, whose domain is defined by listing the
sequence of possible values.
Record : A set of fields where each field is described by its name and type. A
record can be either an open record where it contains fields that are not part
of the type definition, or a closed record, which cannot be opened.
Ordered list : A sequence of values for which the order is determined by the
creation or insertion time.
Search WWH ::




Custom Search