Sometimes it makes no sense to backfill old DAG runs. For example, when we retrieve data from a REST API, which always returns the current state, when we use Airflow to send a newsletter, or when the DAG run computes the entire data history every time it runs, so the execution date does not matter.

Table of Contents

  1. Get Weekly AI Implementation Insights

In Airflow, there are two ways to prevent the DAG from backfilling old runs.

We can set the catchup parameter of a DAG to False. In this case, Airflow will never create DAG runs with the execution date in the past.

dag = DAG('example_dag',
        ... # other parameters

The second method is to include the LatestOnlyOperator operator inside the DAG. This operator stops DAG execution if the current run is not the latest one. This approach is useful when we want to backfill only some of the tasks and skip others. To understand how to use the LatestOnlyOperator, take a look at this blog post.

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

Get Weekly AI Implementation Insights

Join engineering leaders who receive my analysis of common AI production failures and how to prevent them. No fluff, just actionable techniques.

Older post

What is s3:TestEvent, and why does it break my event processing?

S3 sends s3:TestEvent to SQS after setting up the bucket notifications

Newer post

Get the date of the previous successful DAG run in Airflow.

Get the start time or the execution date of the previous successful DAG run in Airflow

Engineering leaders: Is your AI failing in production? Take the 10-minute assessment