More than 5 years have passed since last update.

Real-Time group scheduling

Posted at 2020-05-11

もともと、Linux Kernelのソースコードの一部なので、GPLv2扱いになる（はずの認識）。

https://www.kernel.org/doc/html/latest/index.html

Licensing documentation

The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.

https://www.kernel.org/doc/html/latest/process/license-rules.html#kernel-licensing

Docs » Linux Scheduler » Real-Time group scheduling

0. WARNING

Fiddling with these settings can result in an unstable system, the knobs are root only and assumes root knows what he is doing.

これらの設定をいじると、システムが不安定になる可能性があります。rootにだけ取っ手はあり、rootは自分が何をしているのかを知る必要があります。

Most notable:

very small values in sched_rt_period_us can result in an unstable system when the period is smaller than either the available hrtimer resolution, or the time it takes to handle the budget refresh itself.

。区間が、有効なhrtimer resolutionよりも短い場合、もしくは、それが更新されるのに用いる時間よりも短い場合、sched_rt_period_us が非常に小さい値が、システムを不安定にする結果をもたらします

very small values in sched_rt_runtime_us can result in an unstable system when the runtime is so small the system has difficulty making forward progress (NOTE: the migration thread and kstopmachine both are real-time processes).

実行時間がシステムが次の進捗を作るのが難しい場合、sched_rt_runtime_us が非常に小さい値が、システムを不安定にする結果をもたらします（注：移行スレッドとkstopmachineはどちらもリアルタイムプロセスです）。

1. Overview

1.1 The problem

Realtime scheduling is all about determinism, a group has to be able to rely on the amount of bandwidth (eg. CPU time) being constant.

Realtime schedulingはすべて決定論に関する。グループは一定の帯域幅（CPU時間等）に依存する必要がある。

In order to schedule multiple groups of realtime tasks, each group must be assigned a fixed portion of the CPU time available.

realtime taskの複数グループをスケジュールするためには、それぞれのグループには有効なCPU時間の固定部分が割り当てられられなければならない。

Without a minimum guarantee a realtime group can obviously fall short.

realtime groupに最低限の保証がないと、明らかに不足する場合があります。

A fuzzy upper limit is of no use since it cannot be relied upon. Which leaves us with just the single fixed portion.

ファジーな上限は信用できず、役に立ちません。これにより、固定部分が１つだけ残ります。

1.2 The solution

CPU time is divided by means of specifying how much time can be spent running in a given period.

CPU時間は、どれだけの時間を実行に割り当てるのかを指定することで分割されます。

We allocate this “run time” for each realtime group which the other realtime groups will not be permitted to use.

この"run time" は、他のreal time groupに割り当てられていないときに、それぞれのreal time groupに対して、割り当てます。

Any time not allocated to a realtime group will be used to run normal priority tasks (SCHED_OTHER).

realtime gourpに割り当てられていない時間では、通常の優先度タスク(SCHED_OTHER)の実行に利用されます。

Any allocated run time not used will also be picked up by SCHED_OTHER.

割り当てられたrun timeのうち使われていない時間も、SCHED_OTHERによってかき集められます。

Let’s consider an example:
例を考えてみましょう。

a frame fixed realtime renderer must deliver 25 frames a second, which yields a period of 0.04s per frame.

frame fixed realtime renderは、毎秒25frameを提供しなければなりません。これは1フレームあたり0.04秒の周期が生じます。

Now say it will also have to play some music and respond to input, leaving it with around 80% CPU time dedicated for the graphics.

ここで、音楽を再生したり入力に反応したりしなければならない場合、graphicsを処理するための80%を残さなければなりません。

We can then give this group a run time of 0.8 * 0.04s = 0.032s.

このグループは、`0.8 * 0.04s = 0.032s"のrun time を与えることができます。

This way the graphics group will have a 0.04s period with a 0.032s run time limit.

このように、graphics groupは0.04s 周期と、0.032s run time limitを有する事になります。

Now if the audio thread needs to refill the DMA buffer every 0.005s, but needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s = 0.00015s.

今、audio threadが0.005秒毎にDMA bufferに挿入しなければならないならば、それはCPU時間の3%しか必要としないので、0.03 * 0.005s = 0.00015sで実行する。

So this group can be scheduled with a period of 0.005s and a run time of 0.00015s.

したがって、このグループは、0.005sの周期と、0.00015sのrun timeでスケジュールすることができます。

The remaining CPU time will be used for user input and other tasks.

残りのCPU timeは、ユーザーからの入力やほかのタスクに用いられます。

Because realtime tasks have explicitly allocated the CPU time they need to perform their tasks, buffer underruns in the graphics or audio can be eliminated.

realtime taskがそれぞれのtaskを実行するの必要となるCPU時間を明示的に割り当てているので、graphicsあるいはaudioのbuffer underrunを防ぐことができます。

NOTE: the above example is not fully implemented yet. We still lack an EDF scheduler to make non-uniform periods usable.
注意：上記の例はまだ完全に実装されているわけではありません。日均一周期を使用可能尾にするためのEDF schedulerはまだ実装されていません。

2. The Interface

2.1 System wide settings

The system wide settings are configured under the /proc virtual file system:

システム全体の設定は、/proc仮想ファイルシステムで構成されます。

/proc/sys/kernel/sched_rt_period_us:

The scheduling period that is equivalent to 100% CPU bandwidth

100％のCPU帯域幅に相当するスケジューリング期間は以下です。

/proc/sys/kernel/sched_rt_runtime_us:

A global limit on how much time realtime scheduling may use.

どれだけの時間をrealtime schedulingに割り当てるのかの全体の制限は利用されるだろう。

Even without CONFIG_RT_GROUP_SCHED enabled, this will limit time reserved to realtime processes.

CONFIG_RT_GROUP_SCHEDがenableになっていなくとも、これはrealtime processに予約する時間を制限される。

With CONFIG_RT_GROUP_SCHED it signifies the total bandwidth available to all realtime groups.

CONFIG_RT_GROUP_SCHEDを利用すると、全てのrealtime groupsで有効にばなる帯域幅の合計を示します。

Time is specified in us because the interface is s32. This gives an operating range from 1us to about 35 minutes.

sched_rt_period_us takes values from 1 to INT_MAX.
sched_rt_runtime_us takes values from -1 to (INT_MAX - 1).
A run time of -1 specifies runtime == period, ie. no limit.
interfaceはs32であるため、時間はus単位で指定されます。これにより、処理は1usから35分まで処理できます。
sched_rt_period_usは、1からINT_MAXまでの値を取りまsう。
sched_rt_runtime_usは、-1からINT_MAX-1までの値を取ります。
tun timeが-1というのは、runtime == periodを意味し、つまり、制限なしです。

2.2 Default behaviour

The default values for sched_rt_period_us (1000000 or 1s) and sched_rt_runtime_us (950000 or 0.95s).

デフォルトとして、sched_rt_period_usは(1000000 or 1s)、sched_rt_runtime_usは(950000 or 0.95s)の値を取ります。

This gives 0.05s to be used by SCHED_OTHER (non-RT tasks).
これは、SCHED_OTHER(非RTtask)に、0.05sを与える事になります。

These defaults were chosen so that a run-away realtime tasks will not lock up the machine but leave a little time to recover it.

このデフォルトは、暴走したrealtime taskがmachineをロックせず、リカバリするためのわずかな時間だけ離れるように選択されていまsう。

By setting runtime to -1 you’d get the old behaviour back.
runtimeを-1にした設定は、過去の振る舞いを再現する事になります。

By default all bandwidth is assigned to the root group and new groups get the period from /proc/sys/kernel/sched_rt_period_us and a run time of 0.

デフォルトでは、root groupに全部の帯域が割り当てられ、新しいgroupは/proc/sys/kernel/sched_rt_period_usからの周期と、runtime 0を取得します。

If you want to assign bandwidth to another group, reduce the root group’s bandwidth and assign some or all of the difference to another group.

帯域をほかのグループに割り当てたい場合、root groupの帯域を減らし、ことなる他のgroupにいくつかあるいは全部を割り当てます。
.

Realtime group scheduling means you have to assign a portion of total CPU bandwidth to the group before it will accept realtime tasks.

Realtime group schedulingとは、realtime taskを受け入れる前に、groupに合計CPU帯域幅の一部を割り当てる必要があることを意味します。

Therefore you will not be able to run realtime tasks as any user other than root until you have done that, even if the user has the rights to run processes with realtime priority!

したがって、ユーザーがrealtime 優先度processを動かす権限を有しているのであれば、root以外のユーザーとしてrealtime taskを動かす事はできません。

2.3 Basis for grouping tasks

Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real CPU bandwidth to task groups.

CONFIG_RT_GROUP_SCHEDを有効にすると、task groupに実CPU帯域を明示的に割り当てることができます。

This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us” to control the CPU time reserved for each control group.

これはcgroup 仮想ファイルシステムを使い、それぞれのcontrol groupに対してCPU時間の予約を制御するために、"<cgroup>/cpu.rt_runtime_us” を用います。
.

For more information on working with control groups, you should read Documentation/admin-guide/cgroup-v1/cgroups.rst as well.

コントロールグループの操作の詳細については、Documentation/admin-guide/cgroup-v1/cgroups.rstもお読みください。

Group settings are checked against the following limits in order to keep the configuration schedulable:

グループ設定は、構成スケジュール可能な状態に保つために、次の制限に対してチェックされます。

Sum_{i} runtime_{i} / global_period <= global_runtime / global_period

For now, this can be simplified to just the following (but see Future plans):

現時点では、これは次のように単純化できます（ただし、将来の計画を参照してください）。

Sum_{i} runtime_{i} <= global_runtime

3. Future plans

There is work in progress to make the scheduling period for each group (“/cpu.rt_period_us”) configurable as well.

.各グループのスケジューリング期間（“ /cpu.rt_period_us”）を構成可能にするための作業も進行中です。

The constraint on the period is that a subgroup must have a smaller or equal period to its parent.

期間の制約は、サブグループはその親よりも短いか等しい期間を持つ必要があります。

But realistically its not very useful yet as its prone to starvation without deadline scheduling.

しかし、現実的にはdeadline schedulingなしでは、枯渇しがちであるため、有用ではありません。
.

Consider two sibling groups A and B; both have 50% bandwidth, but A’s period is twice the length of B’s.

AとB二つの関連するグループについて考えます。両方とも50%の帯域を持ちますが、Aの期間はBの二倍あります。

group A: period=100000us, runtime=50000us

this runs for 0.05s once every 0.1s
group B: period= 50000us, runtime=25000us
this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
group A: period=100000us, runtime=50000us
0.1秒ごとに、0.05sだけ実行されます。
group B: period= 50000us, runtime=25000us
0.1秒ごとに0.025秒の２倍（0.05秒に一度）実行します

This means that currently a while (1) loop in A will run for the full period of B and can starve B’s tasks (assuming they are of lower priority) for a whole period.

これは、A内でのwhile(1) loopが、Bの全機関にわたって実行され、(低い優先度であると仮定)Bのタスクが全期間にわたって枯渇することを意味ます。

The next project will be SCHED_EDF (Earliest Deadline First scheduling) to bring full deadline scheduling to the linux kernel.

次のプロジェクトは、Linuxカーネルに完全なdeadline schedulingをもたらすSCHED_EDF（Earliest Deadline First scheduling) です。

Deadline scheduling the above groups and treating end of the period as a deadline will ensure that they both get their allocated time.

上記グループと、deadlineの期間の終端と扱う、deadline schedulingは、両方が割り当てられた時間を確保することを保障します。

Implementing SCHED_EDF might take a while to complete.

SCHED_EDFの実装は、完了するまでに時間がかかると考えられます。

Priority Inheritance is the biggest challenge as the current linux PI infrastructure is geared towards the limited static priority levels 0-99.

現在のLinux PI infrastructure では、静的優先度レベルが0〜99に制限されているため、優先度の継承が最大の課題です。

With deadline scheduling you need to do deadline inheritance

deadline schedulingでは、deadline の継承をしなければなりません。

(since priority is inversely proportional to the deadline delta (deadline - now)).

（優先度は、deadline delta ( deadline - now)に反比例するため）

This means the whole PI machinery will have to be reworked - and that is one of the most complex pieces of code we have.

これは、PI machinery全体を作り直す必要があることを意味します。これは、我々がもっているコードの中でもっとも複雑な部分の一部です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up