Today, I am going to show you how to import data from Amazon S3 into a PostgreSQL database running as an Amazon RDS service.
Table of Contents
The first thing we have to do is installing the aws_s3
extension in PostgreSQL. To do this, we have to login as an administrator and run the following statement:
CREATE EXTENSION aws_s3 CASCADE;
The RDS service needs read-only access to the S3 bucket we want to import and the ability to use the s3import
feature. Let’s use Terraform to define the policies and roles:
resource "aws_iam_policy" "s3-access" {
name = "s3-access"
description = "A policy for S3 access."
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "s3import",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}
EOF
}
Now we have to allow the RDS service to assume the role associated with the S3 access policy:
resource "aws_iam_role" "rds-assume-role" {
name = "rds"
assume_role_policy = <<EOF
{
"Version": "2008-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "rds.amazonaws.com"
},
"Effect": "Allow"
}
]
}
EOF
}
resource "aws_iam_role_policy_attachment" "rds-to-s3" {
role = "${aws_iam_role.rds-assume-role.name}"
policy_arn = "${aws_iam_policy.s3-access.arn}"
}
There is one more thing we need. We have to allow our RDS instance to import data from S3:
resource "aws_db_instance_role_association" "rds_s3import_role" {
db_instance_identifier = "${aws_db_instance.rds-database.id}"
feature_name = "s3Import"
role_arn = "${aws_iam_role.rds-assume-role.arn}"
}
In the end, we have to add two rules to the security group used by the RDS database. The database needs permission to make HTTP and HTTPS calls to communicate with S3:
# Add those to the security group used by the RDS database
egress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Importing the data
Finally, we can use the aws_s3
extension to import the CSV files into the database tables:
SELECT aws_s3.table_import_from_s3('schema.table_name', '', '(FORMAT CSV)', 'bucket-name,file_key,aws-region-id)');
Obviously, before running that command, we have to create the table with a schema that matches the CSV file.