---
title: "How to populate a PostgreSQL (RDS) database with data from CSV files stored in AWS S3"
description: "How to upload S3 data into RDS tables"
author: "Bartosz Mikulski"
author_bio: "Principal AI Engineer & MLOps Architect. I bridge the gap between \"it works in a notebook\" and \"it works for 200 million users.\""
author_url: https://mikulskibartosz.name
author_linkedin: https://www.linkedin.com/in/mikulskibartosz/
author_github: https://github.com/mikulskibartosz
canonical_url: https://mikulskibartosz.name/populate-postgresql-rds-with-data-from-s3
---

Today, I am going to show you how to import data from Amazon S3 into a PostgreSQL database running as an Amazon RDS service.

The first thing we have to do is installing the `aws_s3` extension in PostgreSQL. To do this, we have to login as an administrator and run the following statement:

```sql
CREATE EXTENSION aws_s3 CASCADE;
```

The RDS service needs read-only access to the S3 bucket we want to import and the ability to use the `s3import` feature. Let's use Terraform to define the policies and roles:

```json
resource "aws_iam_policy" "s3-access" {
    name        = "s3-access"
    description = "A policy for S3 access."
    policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "s3import",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ]
        }
    ]
}
EOF
}
```

Now we have to allow the RDS service to assume the role associated with the S3 access policy:

```json
resource "aws_iam_role" "rds-assume-role" {
    name = "rds"
    assume_role_policy = <<EOF
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "rds.amazonaws.com"
      },
      "Effect": "Allow"
    }
  ]
}
EOF
}

resource "aws_iam_role_policy_attachment" "rds-to-s3" {
    role       = "${aws_iam_role.rds-assume-role.name}"
    policy_arn = "${aws_iam_policy.s3-access.arn}"
}
```

There is one more thing we need. We have to allow our RDS instance to import data from S3:

```json
resource "aws_db_instance_role_association" "rds_s3import_role" {
  db_instance_identifier = "${aws_db_instance.rds-database.id}"
  feature_name = "s3Import"
  role_arn = "${aws_iam_role.rds-assume-role.arn}"
}
```

In the end, we have to add two rules to the security group used by the RDS database. The database needs permission to make HTTP and HTTPS calls to communicate with S3:

```json
# Add those to the security group used by the RDS database
  egress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    cidr_blocks     = ["0.0.0.0/0"]
  }

  egress {
    from_port       = 443
    to_port         = 443
    protocol        = "tcp"
    cidr_blocks     = ["0.0.0.0/0"]
  }
```

## Importing the data

Finally, we can use the `aws_s3` extension to import the CSV files into the database tables:

```sql
SELECT aws_s3.table_import_from_s3('schema.table_name', '', '(FORMAT CSV)', 'bucket-name,file_key,aws-region-id)');
```

Obviously, before running that command, we have to create the table with a schema that matches the CSV file.