Database Reference
In-Depth Information
NOTE
Rather than using Hadoop RPC for remote calls, Spark uses Akka , an actor-based platform for building
highly scalable, event-driven distributed applications.
Executors also send status update messages to the driver when a task has finished or if a
task fails. In the latter case, the task scheduler will resubmit the task on another executor.
It will also launch speculative tasks for tasks that are running slowly, if this is enabled (it
is not by default).
Task Execution
An executor runs a task as follows (step 7). First, it makes sure that the JAR and file de-
pendencies for the task are up to date. The executor keeps a local cache of all the depend-
encies that previous tasks have used, so that it only downloads them when they have
changed. Second, it deserializes the task code (which includes the user's functions) from
the serialized bytes that were sent as a part of the launch task message. Third, the task
code is executed. Note that tasks are run in the same JVM as the executor, so there is no
process overhead for task launch. [ 134 ]
Tasks can return a result to the driver. The result is serialized and sent to the executor
backend, and then back to the driver as a status update message. A shuffle map task re-
turns information that allows the next stage to retrieve the output partitions, while a result
task returns the value of the result for the partition it ran on, which the driver assembles
into a final result to return to the user's program.
Search WWH ::




Custom Search