Database Reference
In-Depth Information
Chapter 1. Getting Up and Running with
Spark
Apache Spark is a framework for distributed computing; this framework aims to make it
simpler to write programs that run in parallel across many nodes in a cluster of computers.
It tries to abstract the tasks of resource scheduling, job submission, execution, tracking, and
communication between nodes, as well as the low-level operations that are inherent in par-
allel data processing. It also provides a higher level API to work with distributed data. In
this way, it is similar to other distributed processing frameworks such as Apache Hadoop;
however, the underlying architecture is somewhat different.
Spark began as a research project at the University of California, Berkeley. The university
was focused on the use case of distributed machine learning algorithms. Hence, it is de-
signed from the ground up for high performance in applications of an iterative nature,
where the same data is accessed multiple times. This performance is achieved primarily
through caching datasets in memory, combined with low latency and overhead to launch
parallel computation tasks. Together with other features such as fault tolerance, flexible
distributed-memory data structures, and a powerful functional API, Spark has proved to be
broadly useful for a wide range of large-scale data processing tasks, over and above ma-
chine learning and iterative analytics.
Note
For more background on Spark, including the research papers underlying Spark's develop-
ment, see the project's history page at http://spark.apache.org/community.html#history .
Spark runs in four modes:
• The standalone local mode, where all Spark processes are run within the same
Java Virtual Machine ( JVM ) process
• The standalone cluster mode, using Spark's own built-in job-scheduling framework
• Using Mesos, a popular open source cluster-computing framework
• Using YARN (commonly referred to as NextGen MapReduce), a Hadoop-related
cluster-computing and resource-scheduling framework
In this chapter, we will:
Search WWH ::




Custom Search