The Issue
For security purposes, sensitive connection and administrative information is encrypted with a Fernet key before being stored in Airflow's backend database. This includes any passwords for your connection objects as well as service account keys for e.g. Google Cloud.
However, if you have built Airflow webserver as a containerized service, then every time you modify and rebuild your container you run the risk of invalidating your Fernet key and losing access to your connections.
Airflow finds the Fernet key you would like to use from the config file, which by default gets generated and added to airflow/airflow.cfg
when you first run the airflow initdb
command. There is some insecurity built into this approach, since the key gets hard-coded into the file.
For this diagnosis and If you're using the puckel/docker-airflow repository's Dockerfile or docker-compose.yaml as a base for building your Airflow service, then the point at which your Fernet key gets generated is here, in the scripts/entrypoint.sh
file:
: "${AIRFLOW__CORE__FERNET_KEY:=${FERNET_KEY:=$(python -c "from cryptography.fernet \
import Fernet; FERNET_KEY = Fernet.generate_key().decode(); print(FERNET_KEY)")}}"
That's a pretty sexy, maybe brilliant, one-liner as far as bash-scripted one-liner's go. If you're not familiar with bash script, the breakdown is as follows:
- the
:
at the start of the line allows you to define a variable in a script with a default value. Here,entrypoint.sh
creates the AIRFLOW__CORE__FERNET_KEY variable for the script, if the variable does not already exist in the environment. So, you could override it by specifying this variable in your Dockerfile or docker-compose.yaml file with something like
ENV AIRFLOW__CORE__FERNET_KEY='some string you generated or made up'
. -
=${FERNET_KEY:=
This portion assigns the value of FERNET_KEY to AIRFLOW__CORE__FERNET_KEY if it already exists in the environment (maybe you decided to pass it from somewhere else); if it does not exist, then the:=
part here tells bash to make a default value with the one-line call to Python'scryptography
library. - the
python -c ...
you can probably understand if you know at least some Python--the string gets passed as a -c (command) to python, and the print(FERNET_KEY) call prints out the random Fernet Key to STDOUT. - Placing the whole Python section into
$( )
tells bash to evaluate the entire expression in a sub-process and to return the STDOUT output. So, the output of this portion is the Fernet key itself.
At first I wondered why Puckel defined two different variables for the key in entrypoint.sh
, but I realized that it is necessary to have two places where the user can manually define it, depending on their use case.
-
AIRFLOW__CORE__FERNET_KEY
is the environment variable theairflow initdb
command will look for when creating the back-end database, and so if the user wants to change it and uses docker-compose, she should set it in the docker-compose file. - If she just wants to build the webserver by itself, she can set
FERNET_KEY
in the Dockerfile, because that is accessible toentrypoint.sh
, which gets executed at the end of the Dockerfile.
In either case, the final value of $FERNET_KEY
gets assigned to the airflow.cfg
file in line 122:
# Secret key to save connection passwords in the db
fernet_key = $FERNET_KEY
The Solution
So, now that we understand exactly what is going on, we can troubleshoot the cryptography.fernet
error that might appear in your DAG task execution logs when a task fails, as it tries to access the back-end database for connection and other runtime data. This will likely happen every time you rebuild your airflow_webserver container, unless you rebuild the database and it's data dump each time (if you're like me, you prefer to keep it as a persistent volume so you have some permanency to your Airflow execution and scheduling data).
The easiest thing to do is just re-enter your connections and other entries that use the Fernet key for cryptographic encoding in the Airflow UI, though if you have many connections, that will become very tedious.
The second easiest thing is to create a task that recreates connections and other database entries you need, scheduled to run @once
so that you can just trigger it after rebuilding your webserver container.
The Python task would look something like this:
from airflow.models import Connection, Variable, Session
import airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow import models, settings
dag = DAG( .... )
def set_connection(**config):
for k,v in config.items():
conn = Connection()
setattr(conn, k, v)
session = settings.Session()
session.add(conn)
session.commit()
session.close()
task = PythonOperator(
dag = dag,
task_id = 'set-connections',
python_callable = set_connection,
...........
)
Of course, try to avoid hard-coding the config for your connections directly into your file. You can store it in a more secure place, such as a dedicated database with encryption, in a gcloud bucket, etc., and pull that connection configuration data into your script. Also, for added security, Airflow connection
objects have a rotate_fernet_key
attribute you can explore to change the encryption in the backend database regularly!
Sources