目的
- Airflow の最小構成を構築することにより airflow.cfg などの設定方法の理解を深める
- puckel/docker-airflow を参考にする
結論
- puckel/docker-airflow に以下の patch を当てる
diff --git a/script/entrypoint.sh b/script/entrypoint.sh
index fb3f9ad..62d4198 100755
--- a/script/entrypoint.sh
+++ b/script/entrypoint.sh
@@ -70,8 +70,8 @@ fi
case "$1" in
webserver)
airflow initdb
- if [ "$AIRFLOW__CORE__EXECUTOR" = "LocalExecutor" ]; then
- # With the "Local" executor it should all run in one container.
+ if [ "$AIRFLOW__CORE__EXECUTOR" != "CeleryExecutor" ]; then
+ # With the "Sequential" or "Local" executor it should all run in one container.
airflow scheduler &
fi
exec airflow webserver
※ PR 送りましたが author はもうメンテしてないようで merge される可能性は低いです。。
あと、数時間前に別の人が同じような PR してたので確認不足でした。。
https://github.com/puckel/docker-airflow/pull/316
以下のコマンドで build して起動
docker build --rm -t puckel/docker-airflow .
# -e LOAD_EX=y で example_dags を有効にする
docker run -d -p 8080:8080 -e LOAD_EX=y puckel/docker-airflow webserver
airflow.cfg について
airflow/config_templates にある default_airflow.cfg などを参考にするとよい
また puckel/docker-airflow や、Google Cloud Composer などの設定と見比べるとより理解が深まると思われる
SequentialExecutor と LocalExecutor
puckel/docker-airflow でも docker container 単体で動作させる際の Executor としては SequentialExecutor が選択されています
By default, docker-airflow runs Airflow with SequentialExecutor :
docker run -d -p 8080:8080 puckel/docker-airflow webserver
しかし前述の通り patch を当てないとこれは動作しません
今まではここで疑問を持たずに docker-compose-LocalExecutor.yml を利用してきました
おそらく SequentialEecutor にも airflow scheduler は必要なのだと思われます
airflow/executors/local_executor.py には以下のようなコメントがあります
LocalExecutor runs tasks by spawning processes in a controlled fashion in different
modes. Given that BaseExecutor has the option to receive aparallelism
parameter to
limit the number of process spawned, when this parameter is0
the number of processes
that LocalExecutor can spawn is unlimited.
The following strategies are implemented:
- Unlimited Parallelism (self.parallelism == 0): In this strategy, LocalExecutor will
spawn a process every timeexecute_async
is called, that is, every task submitted to the
LocalExecutor will be executed in its own process. Once the task is executed and the
result stored in theresult_queue
, the process terminates. There is no need for a
task_queue
in this approach, since as soon as a task is received a new process will be
allocated to the task. Processes used in this strategy are of class LocalWorker.- Limited Parallelism (self.parallelism > 0): In this strategy, the LocalExecutor spawns
the number of processes equal to the value ofself.parallelism
atstart
time,
using atask_queue
to coordinate the ingestion of tasks and the work distribution among
the workers, which will take a task as soon as they are ready. During the lifecycle of
the LocalExecutor, the worker processes are running waiting for tasks, once the
LocalExecutor receives the call to shutdown the executor a poison token is sent to the
workers to terminate them. Processes used in this strategy are of class QueuedLocalWorker.
Arguably,SequentialExecutor
could be thought as a LocalExecutor with limited
parallelism of just 1 worker, i.e.self.parallelism = 1
.
This option could lead to the unification of the executor implementations, running
locally, into just oneLocalExecutor
with multiple modes.
SequentialExecutor は parallelism を 1 に限定した LocalExecutor と考えることもできます
docker-compose-LocalExecutor.yml も postgresql と webserver しか service はありません
ちなみに SequentialExecutor 以外では sqlite が利用できないことから postgresql が利用されています
SequentialExecutor 側にもこんなコメントがあります
This executor will only run one task instance at a time, can be used
for debugging. It is also the only executor that can be used with sqlite
since sqlite doesn't support multiple connections.
Since we want airflow to work out of the box, it defaults to this
SequentialExecutor alongside sqlite as you first install it.
デバッグ用、初期インストール時のデフォルト