In the Terraform configuration, we can use the core_instance_group
to define either core and spot instances. When we use the bid_price
, we get spot instances. When there is no bid_price
, we get core instances.
What do we do when we want both core and spot instances in the same cluster? In Terraform, we cannot have two core_instance_group
parameters in the same aws_emr_cluster
(maybe it will be changed in a future update).
We can solve that problem by defining the core instances in the core_instance_group
:
resource "aws_emr_cluster" "emr_name" {
name = "emr_name"
release_label = "emr-5.29.0"
applications = ["Spark"]
service_role = "EMR_ROLE"
termination_protection = false
keep_job_flow_alive_when_no_steps = true
log_uri = "s3n://logs_bucket/"
master_instance_group {
instance_type = "m5.xlarge"
}
core_instance_group {
instance_type = "m5.xlarge"
instance_count = 2
ebs_config {
size = 64
type = "gp2"
volumes_per_instance = 1
}
}
ec2_attributes {
instance_profile = "EC2_ROLE"
key_name = "ssh_key_name"
subnet_id = aws_subnet.subnet.id
}
configurations_json = <<EOF
[
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
}
]
EOF
}
This configuration gives us an EMR cluster with two core instances.
Now, we can add spot instances using an aws_emr_instance_group
parameter:
resource "aws_emr_instance_group" "emr_name_spot" {
cluster_id = aws_emr_cluster.emr_name.id
instance_type = "m5.2xlarge"
instance_count = 3
bid_price = ""
ebs_config {
size = 128
type = "gp2"
volumes_per_instance = 1
}
}
If we put a value in the bid_price
, we will use it as the price we want to pay for the spot instances. When the bid_price
is empty, we get On-Demand spot instances.