What is role of Driver program in Spark Application ?
- Driver program is responsible for launching various parallel operations on the cluster.
- Driver program contains application’s main() function.
- It is the process which is running the
- user code which in turn
- create the SparkContext object,
- create RDD
- performs transformation and action operation on RDD.
- Driver program access Apache Spark through a SparkContext object which represents a connection to computing cluster (From Spark 2.0 onwards we can access SparkContext object through SparkSession).
- Driver program is responsible for converting user program into the unit of physical execution called task.
- It also defines distributed datasets on the cluster and we can apply different operations on Dataset (transformation and action).
- Spark program creates a logical plan called Directed Acyclic graph which is converted to physical execution plan by the driver when driver program runs.
- Spark Driver is the program that runs on the master node of the machine and declares transformations and actions on data RDDs.
- In simple terms, driver in Spark creates SparkContext, connected to a given Spark Master.
- The driver also delivers the RDD graphs to Master, where the standalone cluster manager runs.
Where does Spark Driver run on Yarn?
- If you are submitting a job with –master client, the Spark driver runs on the client’s machine.
- If you are submitting a job with –master yarn-cluster, the Spark driver would run inside a YARN container.