Database Reference
In-Depth Information
Fig. 9.14 Basic syntax of
SQL/MR query function
SELECT ...
FROM functionname(
ON table-or-query
[PARTITION BY expr, ...]
[ORDER BY expr, ...]
[clausename(arg, ...) ...]
)
together as a single relational database. The framework is implemented as part
of the Aster Data Systems [ 13 ] nCluster shared-nothing relational database. The
framework leverages ideas from the MapReduce programming paradigm to provide
users with a straightforward API through which they can implement a UDF in the
language of their choice. Moreover, it allows maximum flexibility as the output
schema of the UDF is specified by the function itself at query plan-time. This means
that a SQL/MR function is polymorphic as it can process arbitrary input because
its behavior as well as output schema are dynamically determined by information
available at query plan-time. This also increases reusability as the same SQL/MR
function can be used on inputs with many different schemas or with different user-
specified parameters. In particular, SQL/MR allows the user to write custom-defined
functions in any programming language and insert them into queries that leverage
traditional SQL functionality. A SQL/MR function is defined in a manner that is
similar to MapReduce's map and reduce functions.
The syntax for using a SQL/MR function is depicted in Fig. 9.14 where the
SQL/MR function invocation appears in the SQL FROM clause and consists of the
function name followed by a set of clauses that are enclosed in parentheses. The ON
clause specifies the input to the invocation of the SQL/MR function. It is important
to note that the input schema to the SQL/MR function is specified implicitly at query
plan-time in the form of the output schema for the query used in the ON clause.
In practice, a SQL/MR function can be either a mapper ( Row function) or a
reducer ( Partition function). The definitions of row and partition functions ensure
that they can be executed in parallel in a scalable manner. In the Row Function , each
row from the input table or query will be operated on by exactly one instance of
the SQL/MR function. Semantically, each row is processed independently, allowing
the execution engine to control parallelism. For each input row, the row function
may emit zero or more rows. In the Partition Function , each group of rows as
defined by the PARTITION BY clause will be operated on by exactly one instance
of the SQL/MR function. If the ORDER BY clause is provided, the rows within
each partition are provided to the function instance in the specified sort order.
Semantically, each partition is processed independently, allowing parallelization by
the execution engine at the level of a partition. For each input partition, the SQL/MR
partition function may output zero or more rows.
Search WWH ::




Custom Search