SPARK DRIVER

SPARK DRIVER

What is role of Driver program in Spark Application ?

  • Driver program is responsible for launching various parallel operations on the cluster.
  • Driver program contains application’s main() function.
  • It is the process which is running the
    • user code which in turn
    • create the SparkContext object,
    • create RDD 
    • performs transformation and action operation on RDD.
  • Driver program access Apache Spark through a SparkContext object which represents a connection to computing cluster (From Spark 2.0 onwards we can access SparkContext object through SparkSession).
  • Driver program is responsible for converting user program into the unit of physical execution called task.
  • It also defines distributed datasets on the cluster and we can apply different operations on Dataset (transformation and action).
  • Spark program creates a logical plan called Directed Acyclic graph which is converted to physical execution plan by the driver when driver program runs.
  • Spark Driver is the program that runs on the master node of the machine and declares transformations and actions on data RDDs.
  • In simple terms, driver in Spark creates SparkContext, connected to a given Spark Master.
  • The driver also delivers the RDD graphs to Master, where the standalone cluster manager runs.

Where does Spark Driver run on Yarn?

  • If you are submitting a job with –master client, the Spark driver runs on the client’s machine.
  • If you are submitting a job with –master yarn-cluster, the Spark driver would run inside a YARN container.