In this article, I’ll show you how to build a Docker image to serve a Tensorflow model using Tensorflow Serving and deploy how to deploy the Docker image as a Sagemaker Endpoint. I’ll run all of the steps as AWS Code Pipeline.
Table of Contents
To simplify the example, I will include only the relevant part of the pipeline configuration code. If you need an example of the entire pipeline configuration file, I suggest looking at the AWS MLOps Workshop files. I created all of the code in this article using the AWS MLOps Workshop and the “Bring your own Tensorflow model to Sagemaker” tutorial as an example.
Assumptions
In this example, I assume that a data scientist has trained an ML model using Tensorflow and stored the files on S3. I will also not include any code to expose the Sagemaker Endpoint as a REST API. You can copy such code from the workshop mentioned above. I’ll also assume that you have already defined an ECR repository (we will need it to store the Docker image).
Building a Docker image
First, we have to define an AWS Code Pipeline step to download the model from S3, build a Docker image, and push it to a Docker repository.
ModelBuildProject:
Type: AWS::CodeBuild::Project
Properties:
Name: !Sub ${AWS::StackName}-pipeline-modelbuild
Description: Builds a Docker container with the model
ServiceRole: !GetAtt CodeDeploymentRole.Arn
Artifacts:
Type: CODEPIPELINE
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/python:3.6.5
PrivilegedMode: true
Source:
Type: CODEPIPELINE
BuildSpec: !Sub |
version: 0.2
phases:
pre_build:
commands:
- nohup /usr/local/bin/dockerd --host=unix:///var/run/docker.sock --host=tcp://127.0.0.1:2375 --storage-driver=overlay2 &
- timeout 15 sh -c "until docker info; do echo .; sleep 1; done"
build:
commands:
- bash download_model.sh
- bash build_image.sh
post_build:
commands:
- echo "Deployed"
artifacts:
files:
- '**/*'
TimeoutInMinutes: 30
Pipeline:
Type: AWS::CodePipeline::Pipeline
Properties:
ArtifactStore:
Location: !Ref "ArtifactStoreBucket"
Type: S3
DisableInboundStageTransitions: []
Name: !Ref "AWS::StackName"
RoleArn: !GetAtt [PipelineRole, Arn]
Stages:
- Name: HERE YOU SHOULD PUT THE STEP THAT DOWNLOADS CODE FROM THE REPOSITORY
...
- Name: Build_Model_Image
Actions:
- Name: BuildModelImage
ActionTypeId:
Category: Build
Owner: AWS
Provider: CodeBuild
Version: "1"
Configuration:
ProjectName: !Ref "ModelBuildProject"
InputArtifacts:
- Name: src
OutputArtifacts:
- Name: bld
RunOrder: "2"
The code above starts the Docker process on the machine provided by AWS and runs two scripts: download_model.sh
and build_image.sh
.
To build and upload the image, you will need a lot of permissions:
CodeDeploymentRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Sub ${AWS::StackName}-codedeploy-role
AssumeRolePolicyDocument:
Statement:
- Action: ["sts:AssumeRole"]
Effect: Allow
Principal:
Service: [codebuild.amazonaws.com]
Version: "2012-10-17"
Path: /
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
Policies:
- PolicyName: UploadAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Action:
- codepipeline:*
- sagemaker:*
- s3:*
- logs:*
Effect: Allow
Resource: "*"
- Action:
- ecr:InitiateLayerUpload
- ecr:UploadLayerPart
- ecr:CompleteLayerUpload
- ecr:PutImage
Effect: Allow
Resource: "PUT THE REPOSITORY ARN HERE!!!"
- Action:
- iam:PassRole
Effect: Allow
Resource: !Sub arn:aws:iam::${AWS::AccountId}:role/${AWS::StackName}-sagemaker-role
Downloading the model from S3
Now, we can take a look at the download_model.sh
script. All we do here is using aws s3 cp
to copy the model files from S3:
#!/bin/bash
aws s3 cp s3://bucket/model_location/ code/model/1 --recursive
Building a Docker image
Building a Docker image requires four configuration files. Let’s take a look at the Docker file first. In this file, we install Tensorflow serving and nginx. We will use nginx to define the REST API because all Docker images used as Sagemaker Endpoints must support two HTTP endpoints: /ping
and /invocations
.
FROM tensorflow/tensorflow:1.8.0-py3
# If you hit Docker rate limit, push the base image to your private ECR registry and use it here instead of public Docker image
RUN apt-get update && apt-get install -y --no-install-recommends nginx curl
RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list
RUN curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
RUN apt-get update && apt-get install tensorflow-model-server
ENV PATH="/opt/ml/code:${PATH}"
COPY /code /opt/ml/code
WORKDIR /opt/ml/code
In the nginx configuration, we define the endpoints mentioned above and configure the server to await connections to TCP port 8080. The /ping
endpoint must return HTTP 200, the /invocations
endpoint forwards the calls to Tensorflow serving:
# put this in code/nginx.conf file
events {
worker_connections 2048;
}
http {
server {
listen 8080 deferred;
location /invocations {
proxy_pass http://localhost:8501/v1/models/saved_model:predict;
}
location /ping {
return 200 "OK";
}
}
}
Sagemaker Endpoint starts the server using docker run [image name] serve
command, so we must implement the serve
script. In the serve
script, we start the nginx server and Tensorflow serving:
# put this code in code/serve file
#!/usr/bin/env python
import subprocess
def start_server():
subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])
subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])
nginx = subprocess.Popen(['nginx', '-c', '/opt/ml/code/nginx.conf'])
tf_model_server = subprocess.call(['tensorflow_model_server',
'--rest_api_port=8501',
'--model_name=saved_model',
'--model_base_path=/opt/ml/code/model'])
if __name__ == '__main__':
start_server()
In the build.sh
file, we build the Docker image, connect to ECR and push the image to the registry:
#!/bin/bash
algorithm_name=mlops-deployments
chmod +x code/serve
account=$(aws sts get-caller-identity --query Account --output text)
region=$(aws configure get region)
region=${region:-eu-central-1}
fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"
$(aws ecr get-login --region ${region} --no-include-email)
docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}
docker push ${fullname}
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.
Deploying a Sagemaker endpoint
The deployment step in AWS Code Pipeline requires the following configuration:
ModelDeploymentProject:
Type: AWS::CodeBuild::Project
Properties:
Name: !Sub ${AWS::StackName}-pipeline-modeldeployment
Description: Deploys a model as a Sagemaker Endpoint
ServiceRole: !GetAtt CodeDeploymentRole.Arn
Artifacts:
Type: CODEPIPELINE
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/python:3.6.5
Source:
Type: CODEPIPELINE
BuildSpec: !Sub |
version: 0.2
phases:
pre_build:
commands:
- echo "Installing requirements"
- pip install --upgrade pip
- pip install -r deploy/requirements.txt
build:
commands:
- echo "Running deployment.py"
- cd deploy
- python deployment.py
post_build:
commands:
- echo "Deployed!"
artifacts:
files:
- '**/*'
TimeoutInMinutes: 30
# In the Stages part:
- Name: Deploy_Model
Actions:
- Name: ModelDeployment
ActionTypeId:
Category: Build
Owner: AWS
Provider: CodeBuild
Version: "1"
Configuration:
ProjectName: !Ref "ModelDeploymentProject"
InputArtifacts:
- Name: bld
OutputArtifacts:
- Name: dpl
RunOrder: "3"
To have a complete pipeline, we must create a requirements.txt
file in the deploy
directory:
sagemaker==2.5.3
and the deployment Python script (deployment.py
):
import sagemaker
from sagemaker.model import Model
sagemaker_session = sagemaker.Session()
model = Model(
role=role,
image_uri='PUT THE ECR REGISTRY HERE:latest'
)
model.deploy(1, 'ml.t2.medium')
Using the Sagemaker Endpoint
Sagemaker does not create a publicly accessible API, so we need boto3 to access it. Optionally, we can deploy a Lambda function as a proxy between the public API gateway and the Sagemaker Endpoint. In this example, however, we’ll use the endpoint directly in Python code.
import json
import boto3
payload = json.dumps({"instances": [[]]})
runtime = boto3.client("runtime.sagemaker")
response = runtime.invoke_endpoint(
EndpointName='endpoint name', ContentType="application/json", Body=payload
)
response = response["Body"].read()
result = json.loads(response.decode("utf-8"))