Database Reference
In-Depth Information
2.6 RELATED LARGE-SCALE DATA-PROCESSING SYSTEMS
In this section, we give an overview of several large-scale data-processing systems
that resemble some of the ideas of the MapReduce framework for different pur-
poses and application scenarios. It must be noted, however, that the design archi-
tectures and the implementations of these systems do not follow the architecture of
the MapReduce framework, and thus, they do not utilize and nor are they related to
the infrastructure of the framework's open-source implementations such as Hadoop.
2.6.1 sCoPe
SCOPE (Structured Computations Optimized for Parallel Execution) is a scripting
language that is targeted for large-scale data analysis and is used daily for a variety of
data analysis and data mining applications inside Microsoft [29]. SCOPE is a declara-
tive language. It allows users to focus on the data transformations required to solve the
problem at hand and hides the complexity of the underlying platform and implemen-
tation details. The SCOPE compiler and optimizer are responsible for generating an
efficient execution plan and the runtime for executing the plan with minimal overhead.
Like SQL, data is modeled as sets of rows composed of typed columns. SCOPE
is highly extensible. Users can easily define their own functions and implement their
own versions of operators: extractors (parsing and constructing rows from a file),
processors (row-wise processing), reducers (group-wise processing), and combiners
(combining rows from two inputs). This flexibility greatly extends the scope of the
language and allows users to solve problems that cannot be easily expressed in tra-
ditional SQL. SCOPE provides a functionality that is similar to that of SQL views.
This feature enhances modularity and code reusability. It is also used to restrict
access to sensitive data. SCOPE supports writing a program using traditional SQL
expressions or as a series of simple data transformations. Figure 2.20 illustrates two
equivalent scripts in the two different styles (SQL-like and MapReduce-like) to find
from the search log the popular queries that have been requested at least 1000 times.
In the MapReduce-like style, the EXTR ACT command extracts all query string
from the log file. The first SELECT command counts the number of occurrences of
each query string. The second SELECT command retains only rows with a count
FIGURE 2.20 Two equivalent SCOPE scripts in SQL-like style and MapReduce-like style.
(From R. Chaiken et al., PVLDB , 1(2), 1265-1276, 2008.)
Search WWH ::




Custom Search