0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Displaying progress bar and ETA for Airflow backfills

Posted at

During my work I often have to run long Airflow backfills. They take very long time and my boss often asks me "How long do you still need?"
It is not very easy to answer this question, since while Airflow displays INFO messages as

[2021-07-06 14:24:11,117] {backfill_job.py:364} INFO — [backfill progress] | finished run 65 of 67 | tasks waiting: 0 | succeeded: 266 | running: 2 | failed: @ | skipped: 67 | deadlocked: @ | not ready: ®

it does not show the estimated remaining time (ETA).

In order to solve this problem, I found myself using the simple script below (see https://github.com/nailbiter/for/blob/master/forpython/fordatawise/non-reusable/backfill-progress.py for latest version):

import click
import sys
import re
from tqdm import tqdm


@click.command()
def backfill_progress():
    idx = 0
    max_cnt, tqdm_object = [None]*2
    pat = re.compile(r".*finished run (\d+) of (\d+).*")
    while True:
        try:
            line = input()
        except EOFError:
            break
        m = pat.match(line)
        if m is not None:
            i, cnt = [int(m.group(i+1)) for i in range(2)]
            if max_cnt is None:
                max_cnt = cnt
                tqdm_object = tqdm(total=max_cnt)
            if i > idx:
                tqdm_object.update(i-idx)
                idx = i

        click.echo(line)


if __name__ == "__main__":
    backfill_progress()

The usage is as follows (if you have the script above saved under the name backfill-progress.py in your current folder):

airflow backfill DAG_ID -s YYYY-MM-DD -e YYYY-MM-DD | python3 backfill-progress.py

The generated output will be similar to below:

Screen Shot 2021-07-07 at 15.48.54 2.png

As can be seen from source code, the idea is simple: upon reception of a line from backfill command's output (here we make use of Unix's mighty piping mechanism)) script searches the line for
text matching the regex finished run (\d+) of (\d+) and if this regex is present, updates the progress bar and ETA estimate. Also, the line is unconditionally forwarded to stdout (so we can view the backfills's output as well).

Before wrapping out, I have to mention that displaying progress bar and ETA estimate is done via the great tqdm package. Also, click library is used for convenience, but this dependency can be easily removed.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?