Tuesday, April 28, 2020

Apache Spark Architecture and processing in breif




As we know, Spark runs on Master-Slave Architecture.
Let’s see the step by step process
1.First step the moment  we submit a Spark JOB via the Cluster Mode, Spark-Submit utility will interact with the Resource Manager to Start the Application Master.
2. Then there is a Spark Driver Programme which runs on the Application Master container and it  has no dependency with the client Machine, even if we turn-off the client machine, Spark Job will be up and running.
3.Spark Driver Programme further interacts with the Resource Manger to start the containers to process the data.
4. The Resource Manager will then allocate containers and Spark Driver Programme would start executors on all the allocated containers and assigns tasks to run.
5. Executors will interact directly with the Spark Driver Programme and once the tasks are finished on each of the executors, containers along with the tasks will be released and the output will be collected by the Spark Driver Programme.
6.Here the container where the Application Master runs acts as Master node and the containers where all the executor process runs the tasks are called Slave Node.

No comments:

Post a Comment

My Logo