In the Terraform configuration, we can use the core_instance_group
to define either core and spot instances. When we use the bid_price
, we get spot instances. When there is no bid_price
, we get core instances.
What do we do when we want both core and spot instances in the same cluster? In Terraform, we cannot have two core_instance_group
parameters in the same aws_emr_cluster
(maybe it will be changed in a future update).
We can solve that problem by defining the core instances in the core_instance_group
:
resource "aws_emr_cluster" "emr_name" {
name = "emr_name"
release_label = "emr-5.29.0"
applications = ["Spark"]
service_role = "EMR_ROLE"
termination_protection = false
keep_job_flow_alive_when_no_steps = true
log_uri = "s3n://logs_bucket/"
master_instance_group {
instance_type = "m5.xlarge"
}
core_instance_group {
instance_type = "m5.xlarge"
instance_count = 2
ebs_config {
size = 64
type = "gp2"
volumes_per_instance = 1
}
}
ec2_attributes {
instance_profile = "EC2_ROLE"
key_name = "ssh_key_name"
subnet_id = aws_subnet.subnet.id
}
configurations_json = <<EOF
[
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
}
]
EOF
}
This configuration gives us an EMR cluster with two core instances.
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Now, we can add spot instances using an aws_emr_instance_group
parameter:
resource "aws_emr_instance_group" "emr_name_spot" {
cluster_id = aws_emr_cluster.emr_name.id
instance_type = "m5.2xlarge"
instance_count = 3
bid_price = ""
ebs_config {
size = 128
type = "gp2"
volumes_per_instance = 1
}
}
If we put a value in the bid_price
, we will use it as the price we want to pay for the spot instances. When the bid_price
is empty, we get On-Demand spot instances.