How to deploy a Tensorflow model using Sagemaker Endpoints and AWS Code Pipeline

In this article, I’ll show you how to build a Docker image to serve a Tensorflow model using Tensorflow Serving and deploy how to deploy the Docker image as a Sagemaker Endpoint. I’ll run all of the steps as AWS Code Pipeline.

To simplify the example, I will include only the relevant part of the pipeline configuration code. If you need an example of the entire pipeline configuration file, I suggest looking at the AWS MLOps Workshop files. I created all of the code in this article using the AWS MLOps Workshop and the “Bring your own Tensorflow model to Sagemaker” tutorial as an example.

Assumptions

In this example, I assume that a data scientist has trained an ML model using Tensorflow and stored the files on S3. I will also not include any code to expose the Sagemaker Endpoint as a REST API. You can copy such code from the workshop mentioned above. I’ll also assume that you have already defined an ECR repository (we will need it to store the Docker image).

Building a Docker image

First, we have to define an AWS Code Pipeline step to download the model from S3, build a Docker image, and push it to a Docker repository.

ModelBuildProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub ${AWS::StackName}-pipeline-modelbuild
      Description: Builds a Docker container with the model
      ServiceRole: !GetAtt CodeDeploymentRole.Arn
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_SMALL
        Image: aws/codebuild/python:3.6.5
        PrivilegedMode: true
      Source:
        Type: CODEPIPELINE
        BuildSpec: !Sub |
          version: 0.2
          phases:
            pre_build:
              commands:
                - nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
                - timeout 15 sh -c "until docker info; do echo .; sleep 1; done"
            build:
              commands:
                - bash download_model.sh
                - bash build_image.sh
            post_build:
              commands:
                - echo "Deployed"
          artifacts:
            files:
              - '**/*'
      TimeoutInMinutes: 30


Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      ArtifactStore:
        Location: !Ref "ArtifactStoreBucket"
        Type: S3
      DisableInboundStageTransitions: []
      Name: !Ref "AWS::StackName"
      RoleArn: !GetAtt [PipelineRole, Arn]
      Stages:
        - Name: HERE YOU SHOULD PUT THE STEP THAT DOWNLOADS CODE FROM THE REPOSITORY
        ...
        - Name: Build_Model_Image
          Actions:
            - Name: BuildModelImage
              ActionTypeId:
                Category: Build
                Owner: AWS
                Provider: CodeBuild
                Version: "1"
              Configuration:
                ProjectName: !Ref "ModelBuildProject"
              InputArtifacts:
                - Name: src
              OutputArtifacts:
                - Name: bld
              RunOrder: "2"

The code above starts the Docker process on the machine provided by AWS and runs two scripts: download_model.sh and build_image.sh.

To build and upload the image, you will need a lot of permissions:

CodeDeploymentRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub ${AWS::StackName}-codedeploy-role
      AssumeRolePolicyDocument:
        Statement:
          - Action: ["sts:AssumeRole"]
            Effect: Allow
            Principal:
              Service: [codebuild.amazonaws.com]
        Version: "2012-10-17"
      Path: /
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
      Policies:
        - PolicyName: UploadAccess
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Action:
                  - codepipeline:*
                  - sagemaker:*
                  - s3:*
                  - logs:*
                Effect: Allow
                Resource: "*"
              - Action:
                  - ecr:InitiateLayerUpload
                  - ecr:UploadLayerPart
                  - ecr:CompleteLayerUpload
                  - ecr:PutImage
                Effect: Allow
                Resource: "PUT THE REPOSITORY ARN HERE!!!"
              - Action:
                  - iam:PassRole
                Effect: Allow
                Resource: !Sub arn:aws:iam::${AWS::AccountId}:role/${AWS::StackName}-sagemaker-role

Downloading the model from S3

Now, we can take a look at the download_model.sh script. All we do here is using aws s3 cp to copy the model files from S3:

#!/bin/bash

aws s3 cp s3://bucket/model_location/ code/model/1 --recursive

Building a Docker image

Building a Docker image requires four configuration files. Let’s take a look at the Docker file first. In this file, we install Tensorflow serving and nginx. We will use nginx to define the REST API because all Docker images used as Sagemaker Endpoints must support two HTTP endpoints: /ping and /invocations.

FROM tensorflow/tensorflow:1.8.0-py3
# If you hit Docker rate limit, push the base image to your private ECR registry and use it here instead of public Docker image

RUN apt-get update && apt-get install -y --no-install-recommends nginx curl
RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list
RUN curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
RUN apt-get update && apt-get install tensorflow-model-server

ENV PATH="/opt/ml/code:${PATH}"

COPY /code /opt/ml/code
WORKDIR /opt/ml/code

In the nginx configuration, we define the endpoints mentioned above and configure the server to await connections to TCP port 8080. The /ping endpoint must return HTTP 200, the /invocations endpoint forwards the calls to Tensorflow serving:

# put this in code/nginx.conf file
events {
    worker_connections 2048;
}

http {
  server {
    listen 8080 deferred;

    location /invocations {
      proxy_pass http://localhost:8501/v1/models/saved_model:predict;
    }

    location /ping {
      return 200 "OK";
    }
  }
}

Sagemaker Endpoint starts the server using docker run [image name] serve command, so we must implement the serve script. In the serve script, we start the nginx server and Tensorflow serving:

# put this code in code/serve file

#!/usr/bin/env python
import subprocess

def start_server():
    subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
    subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])

    nginx = subprocess.Popen(['nginx', '-c', '/opt/ml/code/nginx.conf'])

    tf_model_server = subprocess.call(['tensorflow_model_server',
                                       '--rest_api_port=8501',
                                       '--model_name=saved_model',
                                       '--model_base_path=/opt/ml/code/model'])

if __name__ == '__main__':
    start_server()

In the build.sh file, we build the Docker image, connect to ECR and push the image to the registry:

#!/bin/bash

algorithm_name=mlops-deployments

chmod +x code/serve

account=$(aws sts get-caller-identity --query Account --output text)

region=$(aws configure get region)
region=${region:-eu-central-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

$(aws ecr get-login --region ${region} --no-include-email)

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

Deploying a Sagemaker endpoint

The deployment step in AWS Code Pipeline requires the following configuration:

ModelDeploymentProject:
    Type: AWS::CodeBuild::Project
    Properties:
      Name: !Sub ${AWS::StackName}-pipeline-modeldeployment
      Description: Deploys a model as a Sagemaker Endpoint
      ServiceRole: !GetAtt CodeDeploymentRole.Arn
      Artifacts:
        Type: CODEPIPELINE
      Environment:
        Type: LINUX_CONTAINER
        ComputeType: BUILD_GENERAL1_SMALL
        Image: aws/codebuild/python:3.6.5
      Source:
        Type: CODEPIPELINE
        BuildSpec: !Sub |
          version: 0.2
          phases:
            pre_build:
              commands:
                - echo "Installing requirements"
                - pip install --upgrade pip
                - pip install -r deploy/requirements.txt
            build:
              commands:
                - echo "Running deployment.py"
                - cd deploy
                - python deployment.py
            post_build:
              commands:
                - echo "Deployed!"
          artifacts:
            files:
              - '**/*'
      TimeoutInMinutes: 30

# In the Stages part:
- Name: Deploy_Model
    Actions:
    - Name: ModelDeployment
        ActionTypeId:
        Category: Build
        Owner: AWS
        Provider: CodeBuild
        Version: "1"
        Configuration:
        ProjectName: !Ref "ModelDeploymentProject"
        InputArtifacts:
        - Name: bld
        OutputArtifacts:
        - Name: dpl
        RunOrder: "3"

To have a complete pipeline, we must create a requirements.txt file in the deploy directory:

sagemaker==2.5.3

and the deployment Python script (deployment.py):

import sagemaker

from sagemaker.model import Model

sagemaker_session = sagemaker.Session()

model = Model(
    role=role,
    image_uri='PUT THE ECR REGISTRY HERE:latest'
)

model.deploy(1, 'ml.t2.medium')

Using the Sagemaker Endpoint

Sagemaker does not create a publicly accessible API, so we need boto3 to access it. Optionally, we can deploy a Lambda function as a proxy between the public API gateway and the Sagemaker Endpoint. In this example, however, we’ll use the endpoint directly in Python code.

import json
import boto3

payload = json.dumps({"instances": [[]]})

runtime = boto3.client("runtime.sagemaker")
response = runtime.invoke_endpoint(
    EndpointName='endpoint name', ContentType="application/json", Body=payload
)

response = response["Body"].read()
result = json.loads(response.decode("utf-8"))
Older post

How to deal with days of the week in machine learning

How to encode week days as features for machine learning models

Newer post

How to deploy a REST API AWS Lambda using Chalice and AWS Code Pipeline

How to create a REST API Endpoint using AWS Lambda, Chalice, and AWS Code Pipeline

Are you looking for an experienced AI consultant? Do you need assistance with your RAG or Agentic Workflow?
Schedule a call, send me a message on LinkedIn, or use the chat button in the right-bottom corner. Schedule a call or send me a message on LinkedIn

>