Job/CronJob について手元で検証した時のメモ。
環境
$kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:03:09Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-eks-6bad6d", GitCommit:"6bad6d9c768dc0864dab48a11653aa53b5a47043", GitTreeState:"clean", BuildDate:"2018-12-06T23:13:14Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
Job
Job を作る
とりあえず作ってみる。
上記ドキュメントのものを使う。
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4
# job を作成
$kubectl create -f https://k8s.io/examples/controllers/job.yaml
job.batch "pi" created
# job の実行を要求し、1つが成功
$ kubectl get jobs
NAME DESIRED SUCCESSFUL AGE
pi 1 1 5m
# Pod の情報を確認.Running ではなく、Completed になっている
$kubectl get pods --selector=job-name=pi
NAME READY STATUS RESTARTS AGE
pi-zhv2f 0/1 Completed 0 3m
# job の詳細を確認
$kubectl describe jobs/pi
Name: pi
Namespace: default
Selector: controller-uid=2252cb06-096d-11e9-a297-068bea14a4ce
Labels: controller-uid=2252cb06-096d-11e9-a297-068bea14a4ce
job-name=pi
Annotations: <none>
Parallelism: 1
Completions: 1
Start Time: Thu, 27 Dec 2018 09:19:54 +0900
Pods Statuses: 1 Running / 0 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=2252cb06-096d-11e9-a297-068bea14a4ce
job-name=pi
Containers:
pi:
Image: perl
Port: <none>
Host Port: <none>
Command:
perl
-Mbignum=bpi
-wle
print bpi(2000)
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 21s job-controller Created pod: pi-zhv2f
# Pod の実行結果は今回の場合 logs から確認できる
$ kubectl logs pi-zhv2f
3.14159265358979323846264338327950288...
RestartPolicy による違い
Handling Pod and Container Failures
.spec.template.spec.restartPolicy = "OnFailure", then the Pod stays on the node, but the Container is re-run. Therefore, your program needs to handle the case when it is restarted locally, or else specify .spec.template.spec.restartPolicy = "Never". See pods-states for more information on restartPolicy.
An entire Pod can also fail, for a number of reasons, such as when the pod is kicked off the node (node is upgraded, rebooted, deleted, etc.), or if a container of the Pod fails and the .spec.template.spec.restartPolicy = "Never". When a Pod fails, then the Job controller starts a new Pod. Therefore, your program needs to handle the case when it is restarted in a new pod. In particular, it needs to handle temporary files, locks, incomplete output and the like caused by previous runs.
OnFailure の場合、Pod はそのままで再度実行される。
Never の場合、新しい Pod が作成される。
検証してみる。
Kubernetes完全ガイド impress top gearシリーズの Github リポジトリのマニュフェストを利用させて頂く。
MasayaAoyama/kubernetes-perfect-guide
Never の場合
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job-never-restart
spec:
completions: 1
parallelism: 1
backoffLimit: 10
template:
spec:
containers:
- name: sleep-container
image: centos:6
command: ["sh", "-c"]
args: ["$(sleep 3600)"]
restartPolicy: Never
# job を作成
$kubectl apply -f sample-job-never-restart.yaml
job.batch "sample-job-never-restart" created
# pod の実行を確認
$kubectl get pods --selector=job-name=sample-job-never-restart
NAME READY STATUS RESTARTS AGE
sample-job-never-restart-plj2g 1/1 Running 0 46s
# exec で sleep プロセスを killする
$kubectl exec -it sample-job-never-restart-plj2g -- sh -c 'kill -9 `pgrep sleep`'
# 先程の Pod は Error になって新しい Pod が作成された
$kubectl get pods --selector=job-name=sample-job-never-restart
NAME READY STATUS RESTARTS AGE
sample-job-never-restart-d4rxk 1/1 Running 0 38s
sample-job-never-restart-plj2g 0/1 Error 0 6m
OnFailure の場合
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job-onfailer-restart
spec:
completions: 1
parallelism: 1
backoffLimit: 10
template:
spec:
containers:
- name: sleep-container
image: centos:6
command: ["sh", "-c"]
args: ["$(sleep 3600)"]
restartPolicy: OnFailure
# job 作成
$kubectl apply -f sample-job-onfailer-restart.yaml
job.batch "sample-job-onfailer-restart" created
# pod 確認
$kubectl get pod --selector=job-name=sample-job-onfailer-restart
NAME READY STATUS RESTARTS AGE
sample-job-onfailer-restart-qsvgr 1/1 Running 0 36s
# Events に pull/create/start がされていることを確認
$kubectl describe pod sample-job-onfailer-restart-qsvgr
・・・
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 52s default-scheduler Successfully assigned default/sample-job-onfailer-restart-qsvgr to ip-172-31-0-56.ap-northeast-1.compute.internal
Normal Pulled 51s kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Container image "centos:6" already present on machine
Normal Created 51s kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Created container
Normal Started 51s kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Started container
# 同じようにプロセスを終了
$kubectl exec -it sample-job-onfailer-restart-qsvgr -- sh -c 'kill -9 `pgrep sleep`'
# pod は作成されず、すぐ Running になっている
$ kubectl get pod --selector=job-name=sample-job-onfailer-restart
NAME READY STATUS RESTARTS AGE
sample-job-onfailer-restart-qsvgr 1/1 Running 1 2m
# Pod の Events を見ると pull/create/start が x2 となっており、再度実施されたことが分かる
$kubectl describe pod sample-job-onfailer-restart-qsvgr
・・・
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m default-scheduler Successfully assigned default/sample-job-onfailer-restart-qsvgr to ip-172-31-0-56.ap-northeast-1.compute.internal
Normal Pulled 1m (x2 over 3m) kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Container image "centos:6" already present on machine
Normal Created 1m (x2 over 3m) kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Created container
Normal Started 1m (x2 over 3m) kubelet, ip-172-31-0-56.ap-northeast-1.compute.internal Started container
# Job の Events をみても分からない
$kubectl describe job sample-job-onfailer-restart
・・・
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 6m job-controller Created pod: sample-job-onfailer-restart-qsvgr
Job パターン
completion, parallelism,backofflimit を使っていくつかの Job パターンを定義出来る。
completion は成功数、parallelism は並行数、backofflimit は失敗許容数を示す。
- 成否に関わらず1回だけ実行したい:completion=1,parallelism=1, backofflimit=1
- N並列で実行させ、M回成功するまで実行する:completion=M,parallelism=N
- 1個ずつ実行し続ける(ワーカーキュー):parallelism=1, completion=(指定なし).ただし、backofflimit は無制限に出来ないので backofflimit の回数失敗したら終了する
- N並列で実行し続ける(ワーカーキュー):parallelism=N, completion=(指定なし)でN 並列で Job を実行し続ける.ただし、backofflimit は無制限に出来ないので backofflimit の回数失敗したら終了する
Cronjob
Job を決められた時間に起動させる機能。
チュートリアルを試す。
Running Automated Tasks with a CronJob
CronJob を作成する
1分毎に実施される CronJob.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
# cronjob の作成
$ kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/job/cronjob.yaml
cronjob.batch "hello" created
# cronjob が作成されている
$kubectl get cronjob hello
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 <none> 32s
# 定期的に job が作成されている
$kubectl get jobs
NAME DESIRED SUCCESSFUL AGE
hello-1545887340 1 1 2m
hello-1545887400 1 1 1m
hello-1545887460 1 1 50s
# LAST SCHEDULE 最後にスケジュールされた時間。そのため、1秒ずつ増加し、再度スケジュールされるとまた戻る
$kubectl get cronjob hello
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 17s 7m
$kubectl get cronjob hello
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 0 50s 7m
CronJob の一時停止
CronJob を一時的に停止することが出来る。
これもやってみる。
# 一時停止する
$kubectl patch cronjob hello -p '{"spec":{"suspend":true}}'
cronjob.batch "hello" patched
# suspend true になった
$kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * True 0 28s 19m
# LAST SCHEDULE が2分
$kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * True 0 2m 21m
# 一時停止をやめる
$kubectl patch cronjob hello -p '{"spec":{"suspend":false}}'
cronjob.batch "hello" patched
# LAST SCHEDULEが更新され、再開
$kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 1 11s 22m