slurm24.11:preempt機能によるジョブのSUSPENDと割込実行①

SLURM

Posted at 2025-03-09

使い所

slurmでジョブを大量投入する場合、実行が１週間どころか、数ヶ月実行するジョブが結構あります。そうすると後から緊急に実行したいジョブがある場合、Priorityを変更して優先的に実行させるのですが、前述のようなジョブばかりの場合は、まったく効果が無いことが往々にして発生します。その場合は泣く泣くジョブをキャンセルするしか無くなります。こんなアホな事が起こらないように優先度の高いジョブが投入された時に現在実行中のジョブを一時停止(SUSPEND)し、優先度の高いジョブを実行する機能としてslurmではPreemptが用意されています。

Preemptを使う上で直面する問題

一見ジョブを一次停止して、優先ジョブを実行するのは、そこまで難しい事ではないように見えます。実際ＯＳでもマルチタスクで動いているので、常に割込が発生し複数のプロセスが実行されているので。

しかしslurmにおいてリソース管理を行う場合、事前にリソースの確保を行う関係上、使っていないとしてもリソースも使えないものとして扱われます。つまり一次停止しているジョブ用に確保されたリソースが解放されない限り優先ジョブが実行出来ない状態に陥ります
厳格なリソース管理は今回は取り上げません（というかまだ検証段階）

リソース管理を激甘にしたジョブの割込実行

まずはマニュアルに書いてあるやり方に基づいて、PriorityTierを使って割込み実行を行います。
https://slurm.schedmd.com/preempt.html

マニュアルにあるとおり、通常実行用のnormalパーティションと、緊急実行用のhighパーティションを用意します。ポイントとしてはそれぞれのパーティションには同じノードを定義する事と、OverSubscribeするジョブは１つだけにすることで、優先ジョブを割り込ませます。

slurm.conf

SelectType=select/linear
# JOB PRIORITY
PriorityType=priority/multifactor
PriorityWeightPartition=10000

# COMPUTE NODES
NodeName=slurm2411c1 CPUs=8 Boards=1 SocketsPerBoard=8 CoresPerSocket=1 ThreadsPerCore=1 RealMemory=15992

PartitionName=normal PriorityTier=1 Nodes=ALL OverSubscribe=FORCE:1 PreemptMode=SUSPEND Default=YES MaxTime=INFINITE State=UP

PartitionName=high   PriorityTier=5 Nodes=ALL OverSubscribe=FORCE:1 PreemptMode=NONE MaxTime=INFINITE State=UP

PreemptType=preempt/partition_prio
PreemptMode=SUSPEND,GANG
SchedulerTimeSlice=30
PreemptExemptTime=0

normal.sh

#!/bin/bash
#SBATCH --job-name=test.sh
#SBATCH -o job_%J_%j.out
#SBATCH -e job_%J_%j.err
#SBATCH --partition=normal

# メモリを１ＧＢ消費して８０秒waitするだけのスクリプト
python mem.py

high.sh

#!/bin/bash
#SBATCH --job-name=high.sh
#SBATCH -o job_%J_%j.out
#SBATCH -e job_%J_%j.err
#SBATCH -p high

python mem.py

bash

$sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*      up   infinite      1   idle slurm2411c1
high         up   infinite      1   idle slurm2411c1

(base) testuser1@slurm2411c1:/work/temp$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

まずは、normalパーティションからジョブを投入します。

bash

(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 386
(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 387
(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 388
(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 389
(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 390
(base) testuser1@slurm2411c1:/work/temp$ sbatch normal.sh 
Submitted batch job 391
(base) testuser1@slurm2411c1:/work/temp$ squeue

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
391    normal  test.sh  testuser1 PD       0:00      1 (Priority)
390    normal  test.sh  testuser1 PD       0:00      1 (Priority)
389    normal  test.sh  testuser1 PD       0:00      1 (Priority)
388    normal  test.sh  testuser1 PD       0:00      1 (Priority)
387    normal  test.sh  testuser1 PD       0:00      1 (Resources)
386    normal  test.sh  testuser1  R       0:05      1 slurm2411c1

上記ではJOBID:386が実行状態でそれ以外がペンディングになっています。

次にhighパーティションにジョブを投入します。

bash

(base) testuser1@slurm2411c1:/work/temp$ sbatch high.sh 
Submitted batch job 392

(base) testuser1@slurm2411c1:/work/temp$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
392      high  high.sh  testuser1  R       0:01      1 slurm2411c1
391    normal  test.sh  testuser1 PD       0:00      1 (Priority)
390    normal  test.sh  testuser1 PD       0:00      1 (Priority)
389    normal  test.sh  testuser1 PD       0:00      1 (Priority)
388    normal  test.sh  testuser1 PD       0:00      1 (Priority)
387    normal  test.sh  testuser1 PD       0:00      1 (Resources)
386    normal  test.sh  testuser1  S       0:11      1 slurm2411c1

highパーティションにジョブが投入されたことで、JOBID:386のジョブはSUSPENDになり、highパーティションに投入されたJOBID:392が実行状態になりました。

bash

(base) testuser1@slurm2411c1:/work/temp$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
391    normal  test.sh  testuser1 PD       0:00      1 (Priority)
390    normal  test.sh  testuser1 PD       0:00      1 (Priority)
389    normal  test.sh  testuser1 PD       0:00      1 (Priority)
388    normal  test.sh  testuser1 PD       0:00      1 (Priority)
387    normal  test.sh  testuser1 PD       0:00      1 (Resources)
386    normal  test.sh  testuser1  R       0:03      1 slurm2411c1

highパーティションから投入されたJOBID:392が完了するとSUSPENDされていたJOBID386が再開されます

終わりに

select/linearはジョブの投入時にしかリソースを管理していないので、簡単に割込ができました。しかし実運用になれば、パーティションを複数分ける運用が問題になります（みんな結局highパーティションを使い始める）し、そもそもリソース管理しないと、プログラムの暴走や予期せぬリソースの大量消費で他のジョブに悪影響を与えるケースが出てきます。基本実運用ではリソース管理は必須として、後日改めて検証結果を記載します。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up