In the Terraform configuration, we can use the core_instance_group to define either core and spot instances. When we use the bid_price, we get spot instances. When there is no bid_price, we get core instances.

Table of Contents

  1. Get Weekly AI Implementation Insights

What do we do when we want both core and spot instances in the same cluster? In Terraform, we cannot have two core_instance_group parameters in the same aws_emr_cluster (maybe it will be changed in a future update).

We can solve that problem by defining the core instances in the core_instance_group:

resource "aws_emr_cluster" "emr_name" {
  name = "emr_name"
  release_label = "emr-5.29.0"
  applications = ["Spark"]
  service_role = "EMR_ROLE"
  termination_protection = false
  keep_job_flow_alive_when_no_steps = true

  log_uri = "s3n://logs_bucket/"

  master_instance_group {
    instance_type = "m5.xlarge"
  }

  core_instance_group {
    instance_type = "m5.xlarge"
    instance_count = 2

    ebs_config {
      size = 64
      type = "gp2"
      volumes_per_instance = 1
    }
  }

  ec2_attributes {
    instance_profile = "EC2_ROLE"
    key_name = "ssh_key_name"
    subnet_id = aws_subnet.subnet.id
  }

  configurations_json = <<EOF
[
    {
        "Classification": "spark-hive-site",
        "Properties": {
            "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
        }
    }
]
EOF
}

This configuration gives us an EMR cluster with two core instances.

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

Now, we can add spot instances using an aws_emr_instance_group parameter:

resource "aws_emr_instance_group" "emr_name_spot" {
  cluster_id = aws_emr_cluster.emr_name.id
  instance_type = "m5.2xlarge"
  instance_count = 3

  bid_price = ""

  ebs_config {
    size = 128
    type = "gp2"
    volumes_per_instance = 1
  }
}

If we put a value in the bid_price, we will use it as the price we want to pay for the spot instances. When the bid_price is empty, we get On-Demand spot instances.

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

Newer post

Use the ROW_NUMBER() function to get top rows by partition in Hive

How to calculate row number by partition in Hive and use it to filter rows

Engineering leaders: Is your AI failing in production? Take the 10-minute assessment
>