Database Reference
In-Depth Information
Fig. 9.16
Sample Jaql script
import myrecord;
countFields = fn (records)(
records
-> transform myrecord::names( )
-> expand
-> group by fName = as occurrences
into
{
}
name: fName, num: count (occurrences)
);
read ( hdfs ("docs.dat"))
-> countFields()
-> write ( hdfs ("fields.dat"));
sequence of operators. The read operator loads raw data, in this case from Hadoop's
Distributed File System (HDFS), and converts it into Jaql values. These values are
processed by the countFields subflow, which extracts field names and computes
their frequencies. Finally, the write operator stores the result back into HDFS. In
general, the core expressions of the Jaql scripting language include:
1. Transform : The transform expression applies a function (or projection) to every
element of an array to produce a new array. It has the form e1->transform
e2 , where e1 is an expression that describes the input array and e2 is applied to
each element of e1 .
2. Expand : The expand expression is most often used to unnest its input array. It
differs from transform in two primary ways: (1) e2 must produce a value v that
is an array type, and (2) each of the elements of v is returned to the output array,
thereby removing one level of nesting.
3. Group by : Similar to SQL's GROUP BY, Jaql's group by expression partitions
its input on a grouping expression and applies an aggregation expression to each
group.
4. Filter : The filter expression, e > filter p , retains input values from e for
which predicate p evaluates to true.
5. Join : The join expression supports equijoin of 2 or more inputs. All of the options
for inner and outer joins are also supported.
6. Union : The union expression is a Jaql function that merges multiple input arrays
into a single output array. It has the form: union( e 1 ;::: ) where each e i is an
array.
7. Control-flow : The two most commonly used control-flow expressions in Jaql are
if-then-else and block expressions. The if-then-else expression
is similar to conditional expressions found in most scripting and programming
languages. A block establishes a local scope where zero or more local variables
can be declared and the last statement provides the return value of the block.
At a high-level, the Jaql architecture depicted in Fig. 9.17 is similar to most
database systems. Scripts are passed into the system from the interpreter or an
application, compiled by the parser and rewrite engine, and either explained
or evaluated over data from the I/O layer. The storage layer is similar to a
federated database. It provides an API to access data of different systems including
Search WWH ::




Custom Search