Spark Executor
In Apache Spark, some distributed agent is responsible for executing tasks, this agent is what we call Spark Executor.
What is Spark Executor
- Basically, we can sayExecutors in Spark are worker nodes.
- Those help to process in charge of running individual tasks in a given Spark job.Moreover, we launch them at the start of a Spark application.
- Then it typically runs for the entire lifetime of an application.
- As soon as they have run the task, sends results to the driver.
- Executors also provide in-memory storage for Spark RDDs that are cached by user programs through Block Manager.
In addition, for the complete lifespan of a spark application, it runs. - That infers the static allocation of Spark executor. However, we can also prefer for dynamic allocation.
- Moreover, with the help of Heartbeat Sender Thread, it sends metrics and heartbeats.
- One of the advantage we can have as many executors in Spark as data nodes.
- Moreover also possible to have as many cores as you can get from the cluster.
- The other way to describe Apache Spark Executor is either by their id, hostname, environment (as SparkEnv), or classpath.
- The most important point to note is Executor backends exclusively manage Executor in Spar
. Conditions to Create Spark Executor
Some conditions in which we create Executor in Spark is:
- When CoarseGrainedExecutorBackend receives RegisteredExecutor message. Only for Spark Standalone and YARN.
- While Mesos’s MesosExecutorBackend registered on Spark.
- When LocalEndpoint is created for local mode.
. Creating Spark Executor Instance
By using the following, we can create the Spark Executor:
- From Executor ID.
- By using SparkEnv we can access the local MetricsSystem as well as BlockManager. Moreover, we can also access the local serializer by it.
- From Executor’s hostname.
- To add to tasks’ classpath, a collection of user-defined JARs. By default, it is empty.
- By flag whether it runs in local or cluster mode (disabled by default, i.e. cluster is preferred)
Moreover, when creation is successful, the one INFO messages pop up in the logs. That is:
INFO Executor: Starting executor ID [executorId] on host [executorHostname]
. Heartbeater — Heartbeat Sender Thread
Basically, with a single thread, heartbeater is a daemon ScheduledThreadPoolExecutor.
We call this thread pool a driver-heartbeater.
. Launching Task — launchTask Method
By using this method, we execute the input serializedTask task concurrently.
- launchTask(
- context: ExecutorBackend,
- taskId: Long,
- attemptNumber: Int,
- taskName: String,
- serializedTask: ByteBuffer): Unit
- launchTask(
- context: ExecutorBackend,
- taskId: Long,
- attemptNumber: Int,
- taskName: String,
- serializedTask: ByteBuffer): Unit
- Moreover, by using launchTask we use to create a TaskRunner, internally. Then, with the help of taskId, we register it in the runningTasks internal registry.
- Afterwards, we execute it on “Executor task launch worker” thread pool.
. “Executor Task Launch Worker” Thread Pool — ThreadPool Property
- Basically, To launch, by task launch worker id. It uses threadPool daemon cached thread pool.
- Moreover, at the same time of creation of Spark Executor, threadPool is created. Also, shuts it down when it stops.