A common problem I noticed in various applications was forgetting that DynamoDB supports pagination too. Somehow, when developers see more than ten results, they assume that they receive everything ;)

How do we retrieve all values from DynamoDB when performing a query?

We have to extract the LastEvaluatedKey from the response and use it as the ExclusiveStartKey in the subsequent query. In this article, I show how to do it when we use the AwsDynamoDBHook in Airflow:

from boto3.dynamodb.conditions import Attr
from airflow.contrib.hooks.aws_dynamodb_hook import AwsDynamoDBHook

query_params = {
    'FilterExpression': Attr('some_field').eq('value'), 'ConsistentRead': True
}

hook = AwsDynamoDBHook('primary_key_name', 'table_name', 'aws_region')
connection = hook.get_conn()
table = connection.Table('table_name')

response = table.scan(**query_params)

entries = list()

for item in response['Items']:
    entries.append(item)

while 'LastEvaluatedKey' in response:
    response = table.scan(**query_params, ExclusiveStartKey=response['LastEvaluatedKey'])
    for item in response['Items']:
        entries.append(item)

output = iter(entries)
Stop AI Hallucinations Before They Cost You.
Join engineering leaders getting weekly tactics to prevent failure in customer-facing AI systems. Straight from real production deployments.
Stop AI Hallucinations Before They Cost You.
Join engineering leaders getting weekly tactics to prevent failure in customer-facing AI systems. Straight from real production deployments.
Older post

How to make sure that you did not leave an EMR cluster running

How to get notifications about running EMR cluster

Newer post

How to determine the partition size in Apache Spark

How to choose the proper partition size and the number of partitions to run an Apache Spark job

Engineering leaders: Is your AI failing in production? Take the 10-minute assessment
>
×
Stop AI Hallucinations Before They Cost You.
Join engineering leaders getting weekly tactics to prevent failure in customer-facing AI systems. Straight from real production deployments.