Database Reference
In-Depth Information
greater than 1000. The third SELECT command sorts the rows on count. Finally, the
OUTPUT command writes the result to the file “ qcount.result .”
Microsoft has developed a distributed computing platform, called Cosmos , for
storing and analyzing massive data sets. Cosmos is designed to run on large clusters
consisting of thousands of commodity servers. Figure 2.21 shows the main compo-
nents of the Cosmos platform, which is described as follows:
Cosmos storage : a distributed storage subsystem designed to reliably and
efficiently store extremely large sequential files.
Cosmos execution environment : an environment for deploying, executing,
and debugging distributed applications.
SCOPE : a high-level scripting language for writing data analysis jobs. The
SCOPE compiler and optimizer translate scripts to efficient parallel execu-
tion plans.
The Cosmos Storage System is an append-only file system that reliably stores
petabytes of data. The system is optimized for large sequential I/O. All writes are
append-only and concurrent writers are serialized by the system. Data is distributed
and replicated for fault tolerance and compressed to save storage and increase I/O
throughput. In Cosmos, an application is modeled as a dataflow graph: a directed acy-
clic graph (DAG) with vertices representing processes and edges representing data
flows. The runtime component of the execution engine is called the Job Manager,
which represents the central and coordinating process for all processing vertices
within an application.
The SCOPE scripting language resembles SQL but with C# expressions. Thus, it
reduces the learning curve for users and eases the porting of existing SQL scripts
SCOPE script
SCOPE compiler
SCOPE
optimizer
SCOPE runtime
Cosmos execution environment
Cosmos storage system
Cosmos
files
FIGURE 2.21 SCOPE/Cosmos execution platform. (From R. Chaiken et al., PVLDB , 1(2),
1265-1276, 2008.)
Search WWH ::




Custom Search