---
title: "Copy directories in S3 using s3-dist-cp"
description: "How to copy files in S3 and preserve the directory structure"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/copy-directories-in-s3-using-s3-dist-cp
---

S3 has no catalogs concept, but that does not stop us from putting `/` as delimiters in the object keys and think of files with the same key prefix as files in the same directory.

That causes a problem when we want to copy one catalog's content into another because we cannot just copy files to a different location. We have to preserve parts of the object keys.

In a file system on a computer when we have those files:

```
/home/user/the_directory/file_A
/home/user/the_directory/file_B
/home/user/the_directory/file_C
/home/user/the_directory/file_D
```

and we want to copy them to the home directory of user `another_user`, the expected result looks like this:

```
/home/another_user/the_directory/file_A
/home/another_user/the_directory/file_B
/home/another_user/the_directory/file_C
/home/another_user/the_directory/file_D
```

How do we achieve the same outcome in S3?

We need two things:

* a running EMR cluster
* the `s3-dist-cp` script, which is available on all EMR clusters

Let's pretend that the above directory structure is also the structure of our S3 keys. For example, we have a file in this location: `s3://home/user/the_directory/file_A`.

First, we have to SSH into the master node of the cluster.

After that, we run the s3-dist-cp command using the source prefix as the source and the target prefix as the destination. The script will automatically preserve the rest of the object keys:

```bash
s3-dist-cp --src=s3://home/user --dest=s3://home/another_user
```

