A common problem I noticed in various applications was forgetting that DynamoDB supports pagination too. Somehow, when developers see more than ten results, they assume that they receive everything ;)
How do we retrieve all values from DynamoDB when performing a query?
We have to extract the LastEvaluatedKey
from the response and use it as the ExclusiveStartKey
in the subsequent query. In this article, I show how to do it when we use the AwsDynamoDBHook
in Airflow:
from boto3.dynamodb.conditions import Attr
from airflow.contrib.hooks.aws_dynamodb_hook import AwsDynamoDBHook
query_params = {
'FilterExpression': Attr('some_field').eq('value'), 'ConsistentRead': True
}
hook = AwsDynamoDBHook('primary_key_name', 'table_name', 'aws_region')
connection = hook.get_conn()
table = connection.Table('table_name')
response = table.scan(**query_params)
entries = list()
for item in response['Items']:
entries.append(item)
while 'LastEvaluatedKey' in response:
response = table.scan(**query_params, ExclusiveStartKey=response['LastEvaluatedKey'])
for item in response['Items']:
entries.append(item)
output = iter(entries)
Want to build AI systems that actually work?
Download my expert-crafted GenAI Transformation Guide for Data Teams and discover how to properly measure AI performance, set up guardrails, and continuously improve your AI solutions like the pros.