---
title: "How to add dependencies as jar files or Python scripts to PySpark"
description: "How to add a jar file or a Python file as a Pyspark dependency"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/jar-python-dependencies-in-pyspark
---

When we want to use external dependencies in the PySpark code, we have two options. We can either pass them as jar files or Python scripts.

In this article, I will show how to do that when running a PySpark job using AWS EMR. The jar and Python files will be stored on S3 in a location accessible from the EMR cluster (remember to set the permissions).

First, we have to add the `--jars` and `--py-files` parameters to the `spark-submit` command while starting a new PySpark job:

```bash
spark-submit --deploy-mode cluster \
    --jars s3://some_bucket/java_code.jar \
    --py-files s3://some_bucket/python_code.py \
    s3://some_bucket/pyspark_job.py
```

In the `pyspark_job.py` file, I can import the code from the jar file just like any other dependency.

```python
import python_code.something
```

