A common problem I noticed in various applications was forgetting that DynamoDB supports pagination too. Somehow, when developers see more than ten results, they assume that they receive everything ;)

How do we retrieve all values from DynamoDB when performing a query?

We have to extract the LastEvaluatedKey from the response and use it as the ExclusiveStartKey in the subsequent query. In this article, I show how to do it when we use the AwsDynamoDBHook in Airflow:

from boto3.dynamodb.conditions import Attr
from airflow.contrib.hooks.aws_dynamodb_hook import AwsDynamoDBHook

query_params = {
    'FilterExpression': Attr('some_field').eq('value'), 'ConsistentRead': True
}

hook = AwsDynamoDBHook('primary_key_name', 'table_name', 'aws_region')
connection = hook.get_conn()
table = connection.Table('table_name')

response = table.scan(**query_params)

entries = list()

for item in response['Items']:
    entries.append(item)

while 'LastEvaluatedKey' in response:
    response = table.scan(**query_params, ExclusiveStartKey=response['LastEvaluatedKey'])
    for item in response['Items']:
        entries.append(item)

output = iter(entries)
Subscribe to the newsletter
Now Enrolling: A new cohort for my premium course on fixing AI hallucinations. Limited 'Founding Member' spots available. Learn more