---
title: "Three biggest traps to avoid while setting Spark executor memory"
description: "Apache Spark is wasting a lot of RAM!"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/three-biggest-traps-when-setting-spark-executor-memory
---

What happens when you set the executor memory of a Spark worker which uses YARN as the cluster resource manager?
Does it get exactly the amount of memory you requested? Unfortunately, no.

To makes things easy to comprehend, imagine that I have a Spark cluster with a small amount of memory. Let's say I have 16GB of RAM.

I want to run a Spark job that requires 4GB of memory. What is going to happen? Will YARN allocate 4GB for that Spark job?

## spark.executor.memoryOverhead

First, it is going to read the spark.executor.memoryOverhead parameter and **multiply the requested amount of memory by the overhead value** (by default, 10%, with a minimum of 384 MB).

Spark will add the overhead to the executor memory and, as a consequence, request **4506 MB** of memory.

## yarn.scheduler.minimum-allocation-mb

The yarn.scheduler.minimum-allocation-mb parameter causes the second problem.

I set the minimal allocation to 4GB, so YARN will always allocate at least 4GB for every Spark job. That is fine, but **YARN can assign only the multiple of the minimum-allocation-mb value**.

If yarn.scheduler.minimum-allocation-mb is set to 4 GB and I have 16GB of available memory, YARN can allocate 4GB, 8GB, 12GB, or 16GB.

So what is going to happen when Spark requests 4506MB of memory? YARN allocates 8GB! That is a huge amount of additional memory!

## spark.executor.memory

Why is this a problem? Because **Spark will never use that additional memory!**

The spark.executor.memory parameter gets passed to the Java process running Spark and limits its memory usage. **I have allocated 8GB of memory, but I can access only half of it!** I am wasting a lot of RAM!