1. はじめに
下記記事に触発されて、手元に転がっているデバイスで色々試してみました。
今回は勢いで調べただけの、箸休め的な記事になります。
「へーそうなんだ」ぐらいで読み進めていただければと思います。
2. latency_measure について
これは Zephyr が提供している test ツールのうちの一つで、そのマイコン上で Zephyr を動作させたときのリアルタイム性能をベンチマークするためのものです。
そのため、どちらかというと Zephyr 開発者が改善点探しや変更の影響を測るためのものだとは思いますが、ユーザーの立場でも、マイコンの性能や熟成度を測ったり、導入済みの状況でバージョンを上げるかどうかの判断材料の一つにするなど、応用は利きそうです。
ビルドは下記にて行います。
環境は下記の通り
Zephyr : v4.3.0-3523-g588d22464d13
Zephyr-SDK : zephyr 0.17.0 (少々古い)
west build -p -b ${BOARD_NAME} zephyr/tests/benchmarks/latency_measure/
3. ベンチマーク結果
3.1. 実測したデバイス群
目玉は Teensy4.1。暫定的にクロック順で1軍と2軍に分けて表記しています。
(Teensy4.1 は CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC=600000000 だったため 600MHz 動作と解釈)
| 製品名 | コア実装 | CPU Clock | RAM | ROM | FPU |
|---|---|---|---|---|---|
| Teensy 4.1 (iMXRT1062) | Cortex M7 | 600 MHz | 1 MB | 8 MB | o |
| STM32H747i disco | Cortex M7 | 480 MHz | 1 MB | 2 MB | o |
| Nucleo G431KB | Cortex M4F | 170 MHz | 32 KB | 128 KB | o |
| RPi pico2 (RP2350) | Cortex M33 | 150 MHz | 520 KB | 4MB | o |
| 製品名 | コア実装 | CPU Clock | RAM | ROM | FPU |
|---|---|---|---|---|---|
| RPi pico (RP2040) | Cortex M0+ | 133 MHz | 264 KB | 2MB | x |
| Nucleo F401RE | Cortex M4F | 84 MHz | 96 KB | 512 KB | o |
| microbit v2.2 (nRF52833) | Cortex M4F | 64 MHz | 128 KB | 512 KB | o |
| TY51822r3 (nRF51822) | Cortex M0 | 16 MHz | 16 KB | 256 KB | x |
3.2. 主要な項目の結果
1軍の結果。何やら気になる数字が…
| 項目 | Teensy4.1 | STM32H747i | Nucleo G431KB | rpi_pico2 |
|---|---|---|---|---|
| Context Switch | 325 | 484 | 1183 | 1146 |
| ISR Resume | 2322 | 554 | 1547 | 2942 |
| Suspend thread | 8507 | 542 | 2358 | 1080 |
| Mutex Lock | 166 | 185 | 482 | 472 |
| Semaphore Take | 528 | 846 | 2888 | 1808 |
| Heap Malloc | 508 | 1242 | 2458 | 4205 |
| Thread Create | 352 | 542 | 2358 | 1511 |
2軍。nrf51822 は Context Switch が測定されませんでした。
| 項目 | rpi_pico | Nucleo F401RE | microbit v2.2 | TY51822r3 |
|---|---|---|---|---|
| Context Switch | 2163 | 2445 | 3578 | --- |
| ISR Resume | 5101 | 2451 | 3631 | 23187 |
| Suspend thread | 2819 | 3659 | 3390 | 25125 |
| Mutex Lock | 737 | 994 | 1547 | 2348 |
| Semaphore Take | 3206 | 4139 | 5860 | 33751 |
| Heap Malloc | 8823 | 4052 | 5712 | 37625 |
| Thread Create | 2871 | 2871 | 3659 | 5014 |
4. 所感
Teensy4.1 の ISR Resume や Suspend thread が妙に高いのが気になりますね。
以下の内容が関係しているのでしょうか。
藪蛇感があるのと今回箸休め記事なので深追いは見送ります🙏
(そもそも JTAG / SWD などの利用が実質不可なので、現代っ子(?)にはシンドい)
Teensy 4.1 を除いて、現時点 での比較をざっくりすると以下のような性能差でしょうか。
RPi4B >= STM32H747i >> pico2 >= Nucleo G431KB > pico >= Nucleo F401RE
Teensy 4.1 は先述の ISR Resume / Suspend thead 以外については(シングルコアですが)RPi4B 相当と言っても良い性能が出ているだけに、今後の改善に期待ですね。
5. 結果の詳細
それぞれのデバイスのすべての項目は以下に貼り付けておきます。
Teensy4.1 の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 195 cycles , 325 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 191 cycles , 319 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 1393 cycles , 2322 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 291 cycles , 485 ns :
thread.create.kernel.from.kernel - Create thread : 211 cycles , 352 ns :
thread.start.kernel.from.kernel - Start thread : 311 cycles , 519 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 5104 cycles , 8507 ns :
thread.resume.kernel.from.kernel - Resume thread : 241 cycles , 401 ns :
thread.abort.kernel.from.kernel - Abort thread : 264 cycles , 441 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 106 cycles , 178 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 108 cycles , 181 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 496 cycles , 826 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 425 cycles , 709 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 356 cycles , 594 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 439 cycles , 733 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 344 cycles , 573 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 436 cycles , 727 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 108 cycles , 181 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 106 cycles , 177 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 490 cycles , 818 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 415 cycles , 691 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 345 cycles , 576 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 441 cycles , 736 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 343 cycles , 572 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 431 cycles , 719 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 159 cycles , 265 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 155 cycles , 258 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 107 cycles , 178 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 108 cycles , 181 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 387 cycles , 646 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 618 cycles , 1031 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 379 cycles , 631 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 604 cycles , 1007 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 61 cycles , 102 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 66 cycles , 110 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 317 cycles , 528 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 380 cycles , 633 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 428 cycles , 713 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 1151 cycles , 1919 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 83 cycles , 139 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 86 cycles , 143 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 335 cycles , 558 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 399 cycles , 665 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 99 cycles , 166 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 29 cycles , 48 ns :
heap.malloc.immediate - Average time for heap malloc : 304 cycles , 508 ns :
heap.free.immediate - Average time for heap free : 290 cycles , 484 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
STM32H747i の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 193 cycles , 484 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 193 cycles , 484 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 221 cycles , 554 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 301 cycles , 753 ns :
thread.create.kernel.from.kernel - Create thread : 217 cycles , 542 ns :
thread.start.kernel.from.kernel - Start thread : 314 cycles , 786 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 214 cycles , 536 ns :
thread.resume.kernel.from.kernel - Resume thread : 249 cycles , 622 ns :
thread.abort.kernel.from.kernel - Abort thread : 278 cycles , 696 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 119 cycles , 297 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 127 cycles , 319 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 679 cycles , 1698 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 644 cycles , 1611 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 346 cycles , 866 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 469 cycles , 1174 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 352 cycles , 882 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 473 cycles , 1184 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 110 cycles , 275 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 117 cycles , 294 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 669 cycles , 1673 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 644 cycles , 1610 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 358 cycles , 896 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 494 cycles , 1237 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 362 cycles , 905 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 477 cycles , 1192 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 160 cycles , 400 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 158 cycles , 395 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 105 cycles , 263 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 105 cycles , 264 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 409 cycles , 1022 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 651 cycles , 1627 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 432 cycles , 1081 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 646 cycles , 1615 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 61 cycles , 152 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 68 cycles , 170 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 338 cycles , 846 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 418 cycles , 1046 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 449 cycles , 1124 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 535 cycles , 1339 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 98 cycles , 246 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 99 cycles , 249 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 363 cycles , 907 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 443 cycles , 1108 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 74 cycles , 185 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 30 cycles , 75 ns :
heap.malloc.immediate - Average time for heap malloc : 497 cycles , 1242 ns :
heap.free.immediate - Average time for heap free : 532 cycles , 1330 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
Nucleo G431KB の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 201 cycles , 1183 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 201 cycles , 1183 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 263 cycles , 1547 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 375 cycles , 2211 ns :
thread.create.kernel.from.kernel - Create thread : 400 cycles , 2358 ns :
thread.start.kernel.from.kernel - Start thread : 414 cycles , 2435 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 245 cycles , 1446 ns :
thread.resume.kernel.from.kernel - Resume thread : 296 cycles , 1746 ns :
thread.abort.kernel.from.kernel - Abort thread : 369 cycles , 2174 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 111 cycles , 653 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 94 cycles , 553 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 722 cycles , 4251 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 582 cycles , 3423 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 509 cycles , 2994 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 613 cycles , 3605 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 512 cycles , 3012 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 610 cycles , 3592 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 110 cycles , 647 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 94 cycles , 553 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 709 cycles , 4174 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 573 cycles , 3370 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 504 cycles , 2968 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 605 cycles , 3559 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 518 cycles , 3047 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 588 cycles , 3462 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 165 cycles , 970 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 164 cycles , 965 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 104 cycles , 612 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 104 cycles , 612 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 583 cycles , 3435 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 824 cycles , 4852 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 613 cycles , 3605 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 838 cycles , 4933 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 59 cycles , 347 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 69 cycles , 406 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 491 cycles , 2888 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 501 cycles , 2947 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 645 cycles , 3794 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 645 cycles , 3794 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 68 cycles , 400 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 71 cycles , 417 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 497 cycles , 2923 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 535 cycles , 3147 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 82 cycles , 482 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 44 cycles , 259 ns :
heap.malloc.immediate - Average time for heap malloc : 417 cycles , 2458 ns :
heap.free.immediate - Average time for heap free : 394 cycles , 2318 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
Nucleo F401RE の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 205 cycles , 2445 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 204 cycles , 2439 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 205 cycles , 2451 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 278 cycles , 3319 ns :
thread.create.kernel.from.kernel - Create thread : 307 cycles , 3659 ns :
thread.start.kernel.from.kernel - Start thread : 356 cycles , 4242 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 201 cycles , 2397 ns :
thread.resume.kernel.from.kernel - Resume thread : 259 cycles , 3088 ns :
thread.abort.kernel.from.kernel - Abort thread : 238 cycles , 2843 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 112 cycles , 1338 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 94 cycles , 1124 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 524 cycles , 6239 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 444 cycles , 5296 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 393 cycles , 4682 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 458 cycles , 5457 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 400 cycles , 4771 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 459 cycles , 5474 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 111 cycles , 1332 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 94 cycles , 1124 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 524 cycles , 6240 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 447 cycles , 5329 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 392 cycles , 4667 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 457 cycles , 5448 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 395 cycles , 4707 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 460 cycles , 5481 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 165 cycles , 1975 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 164 cycles , 1958 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 104 cycles , 1238 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 104 cycles , 1244 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 412 cycles , 4915 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 599 cycles , 7140 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 434 cycles , 5173 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 624 cycles , 7430 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 59 cycles , 702 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 69 cycles , 827 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 347 cycles , 4139 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 395 cycles , 4713 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 467 cycles , 5560 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 509 cycles , 6066 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 69 cycles , 826 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 72 cycles , 862 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 385 cycles , 4593 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 432 cycles , 5154 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 83 cycles , 994 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 44 cycles , 524 ns :
heap.malloc.immediate - Average time for heap malloc : 340 cycles , 4052 ns :
heap.free.immediate - Average time for heap free : 325 cycles , 3873 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
microbit v2.2 の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 229 cycles , 3578 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 229 cycles , 3578 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 232 cycles , 3631 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 291 cycles , 4562 ns :
thread.create.kernel.from.kernel - Create thread : 320 cycles , 5014 ns :
thread.start.kernel.from.kernel - Start thread : 384 cycles , 6015 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 216 cycles , 3390 ns :
thread.resume.kernel.from.kernel - Resume thread : 279 cycles , 4374 ns :
thread.abort.kernel.from.kernel - Abort thread : 296 cycles , 4640 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 129 cycles , 2015 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 114 cycles , 1781 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 562 cycles , 8782 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 446 cycles , 6969 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 391 cycles , 6109 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 471 cycles , 7359 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 400 cycles , 6250 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 466 cycles , 7281 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 122 cycles , 1906 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 108 cycles , 1687 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 552 cycles , 8626 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 435 cycles , 6797 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 373 cycles , 5829 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 455 cycles , 7110 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 381 cycles , 5953 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 459 cycles , 7171 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 188 cycles , 2938 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 186 cycles , 2907 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 110 cycles , 1719 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 110 cycles , 1719 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 422 cycles , 6593 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 606 cycles , 9469 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 430 cycles , 6718 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 636 cycles , 9953 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 83 cycles , 1297 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 78 cycles , 1219 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 375 cycles , 5860 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 403 cycles , 6297 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 489 cycles , 7642 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 521 cycles , 8141 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 79 cycles , 1249 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 80 cycles , 1250 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 391 cycles , 6109 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 445 cycles , 6953 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 99 cycles , 1547 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 46 cycles , 720 ns :
heap.malloc.immediate - Average time for heap malloc : 365 cycles , 5712 ns :
heap.free.immediate - Average time for heap free : 338 cycles , 5287 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
TY51822r3 の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 0 cycles , 0 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 0 cycles , 0 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 371 cycles , 23187 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 432 cycles , 27000 ns :
thread.create.kernel.from.kernel - Create thread : 488 cycles , 30500 ns :
thread.start.kernel.from.kernel - Start thread : 540 cycles , 33750 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 402 cycles , 25125 ns :
thread.resume.kernel.from.kernel - Resume thread : 448 cycles , 28000 ns :
thread.abort.kernel.from.kernel - Abort thread : 375 cycles , 23437 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 273 cycles , 17062 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 229 cycles , 14312 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 817 cycles , 51062 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 697 cycles , 43562 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 549 cycles , 34374 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 644 cycles , 40250 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 554 cycles , 34625 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 642 cycles , 40125 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 272 cycles , 17000 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 229 cycles , 14312 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 817 cycles , 51062 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 697 cycles , 43562 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 549 cycles , 34374 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 643 cycles , 40187 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 554 cycles , 34625 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 642 cycles , 40125 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 20 cycles , 1282 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 21 cycles , 1345 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 24 cycles , 1503 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 29 cycles , 1816 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 622 cycles , 38936 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 828 cycles , 51750 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 632 cycles , 39500 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 829 cycles , 51812 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 16 cycles , 1036 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 27 cycles , 1723 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 540 cycles , 33751 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 576 cycles , 36000 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 632 cycles , 39501 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 670 cycles , 41875 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 202 cycles , 12625 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 194 cycles , 12125 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 539 cycles , 33749 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 604 cycles , 37750 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 37 cycles , 2348 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 54 cycles , 3386 ns :
heap.malloc.immediate - Average time for heap malloc : 602 cycles , 37625 ns :
heap.free.immediate - Average time for heap free : 572 cycles , 35750 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
ついでに同一環境で取得した rpi_pico2 / rpi_pico も載せておきます。
RPi Pico2 の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 171 cycles , 1146 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 170 cycles , 1137 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 441 cycles , 2942 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 478 cycles , 3190 ns :
thread.create.kernel.from.kernel - Create thread : 226 cycles , 1511 ns :
thread.start.kernel.from.kernel - Start thread : 263 cycles , 1754 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 162 cycles , 1080 ns :
thread.resume.kernel.from.kernel - Resume thread : 196 cycles , 1307 ns :
thread.abort.kernel.from.kernel - Abort thread : 179 cycles , 1195 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 99 cycles , 665 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 84 cycles , 560 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 711 cycles , 4740 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 726 cycles , 4843 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 365 cycles , 2435 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 408 cycles , 2726 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 365 cycles , 2434 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 406 cycles , 2711 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 98 cycles , 655 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 83 cycles , 556 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 706 cycles , 4712 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 722 cycles , 4818 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 364 cycles , 2431 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 406 cycles , 2709 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 364 cycles , 2428 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 406 cycles , 2708 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 146 cycles , 978 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 147 cycles , 983 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 92 cycles , 617 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 89 cycles , 597 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 310 cycles , 2067 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 458 cycles , 3058 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 392 cycles , 2615 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 532 cycles , 3547 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 53 cycles , 358 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 62 cycles , 416 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 271 cycles , 1808 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 293 cycles , 1954 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 425 cycles , 2839 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 438 cycles , 2921 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 62 cycles , 413 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 64 cycles , 432 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 268797 cycles , 1791986 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 388 cycles , 2590 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 70 cycles , 472 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 37 cycles , 248 ns :
heap.malloc.immediate - Average time for heap malloc : 630 cycles , 4205 ns :
heap.free.immediate - Average time for heap free : 665 cycles , 4434 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
RPi Pico の結果
*** Booting Zephyr OS build v4.3.0-3523-g588d22464d13 ***
thread.yield.preemptive.ctx.k_to_k - Context switch via k_yield : 270 cycles , 2163 ns :
thread.yield.cooperative.ctx.k_to_k - Context switch via k_yield : 267 cycles , 2137 ns :
isr.resume.interrupted.thread.kernel - Return from ISR to interrupted thread : 637 cycles , 5101 ns :
isr.resume.different.thread.kernel - Return from ISR to another thread : 1714 cycles , 13714 ns :
thread.create.kernel.from.kernel - Create thread : 358 cycles , 2871 ns :
thread.start.kernel.from.kernel - Start thread : 403 cycles , 3224 ns :
thread.suspend.kernel.from.kernel - Suspend thread : 352 cycles , 2819 ns :
thread.resume.kernel.from.kernel - Resume thread : 344 cycles , 2752 ns :
thread.abort.kernel.from.kernel - Abort thread : 237 cycles , 1898 ns :
fifo.put.immediate.kernel - Add data to FIFO (no ctx switch) : 156 cycles , 1249 ns :
fifo.get.immediate.kernel - Get data from FIFO (no ctx switch) : 114 cycles , 915 ns :
fifo.put.alloc.immediate.kernel - Allocate to add data to FIFO (no ctx switch) : 1162 cycles , 9302 ns :
fifo.get.free.immediate.kernel - Free when getting data from FIFO (no ctx switch) : 1130 cycles , 9041 ns :
fifo.get.blocking.k_to_k - Get data from FIFO (w/ ctx switch) : 522 cycles , 4176 ns :
fifo.put.wake+ctx.k_to_k - Add data to FIFO (w/ ctx switch) : 611 cycles , 4890 ns :
fifo.get.free.blocking.k_to_k - Free when getting data from FIFO (w/ ctx switch) : 523 cycles , 4191 ns :
fifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to FIFO (w/ ctx switch) : 606 cycles , 4853 ns :
lifo.put.immediate.kernel - Add data to LIFO (no ctx switch) : 117592 cycles , 940739 ns :
lifo.get.immediate.kernel - Get data from LIFO (no ctx switch) : 117554 cycles , 940436 ns :
lifo.put.alloc.immediate.kernel - Allocate to add data to LIFO (no ctx switch) : 1149 cycles , 9193 ns :
lifo.get.free.immediate.kernel - Free when getting data from LIFO (no ctx switch) : 1122 cycles , 8978 ns :
lifo.get.blocking.k_to_k - Get data from LIFO (w/ ctx switch) : 521 cycles , 4170 ns :
lifo.put.wake+ctx.k_to_k - Add data to LIFO (w/ ctx switch) : 235490 cycles , 1883924 ns :
lifo.get.free.blocking.k_to_k - Free when getting data from LIFO (w/ ctx switch) : 523 cycles , 4189 ns :
lifo.put.alloc.wake+ctx.k_to_k - Allocate to add data to LIFO (w/ ctx switch) : 605 cycles , 4842 ns :
events.post.immediate.kernel - Post events (nothing wakes) : 196 cycles , 1571 ns :
events.set.immediate.kernel - Set events (nothing wakes) : 196 cycles , 1575 ns :
events.wait.immediate.kernel - Wait for any events (no ctx switch) : 142 cycles , 1143 ns :
events.wait_all.immediate.kernel - Wait for all events (no ctx switch) : 146 cycles , 1174 ns :
events.wait.blocking.k_to_k - Wait for any events (w/ ctx switch) : 479 cycles , 3833 ns :
events.set.wake+ctx.k_to_k - Set events (w/ ctx switch) : 668 cycles , 5346 ns :
events.wait_all.blocking.k_to_k - Wait for all events (w/ ctx switch) : 598 cycles , 4786 ns :
events.post.wake+ctx.k_to_k - Post events (w/ ctx switch) : 235660 cycles , 1885287 ns :
semaphore.give.immediate.kernel - Give a semaphore (no waiters) : 73 cycles , 585 ns :
semaphore.take.immediate.kernel - Take a semaphore (no blocking) : 84 cycles , 675 ns :
semaphore.take.blocking.k_to_k - Take a semaphore (context switch) : 400 cycles , 3206 ns :
semaphore.give.wake+ctx.k_to_k - Give a semaphore (context switch) : 433 cycles , 3466 ns :
condvar.wait.blocking.k_to_k - Wait for a condvar (context switch) : 600 cycles , 4804 ns :
condvar.signal.wake+ctx.k_to_k - Signal a condvar (context switch) : 633 cycles , 5065 ns :
stack.push.immediate.kernel - Add data to k_stack (no ctx switch) : 90 cycles , 723 ns :
stack.pop.immediate.kernel - Get data from k_stack (no ctx switch) : 82 cycles , 662 ns :
stack.pop.blocking.k_to_k - Get data from k_stack (w/ ctx switch) : 512 cycles , 4099 ns :
stack.push.wake+ctx.k_to_k - Add data to k_stack (w/ ctx switch) : 576 cycles , 4611 ns :
mutex.lock.immediate.recursive.kernel - Lock a mutex : 92 cycles , 737 ns :
mutex.unlock.immediate.recursive.kernel - Unlock a mutex : 51 cycles , 408 ns :
heap.malloc.immediate - Average time for heap malloc : 1102 cycles , 8823 ns :
heap.free.immediate - Average time for heap free : 1067 cycles , 8540 ns :
===================================================================
PROJECT EXECUTION SUCCESSFUL
5. まとめ
今回使用した latency_measure はマイコンの性能評価というより、そのマイコンに対して Zephyr RTOS の作り込み具合を評価するものであり、
今回のようなマイコン間の性能比較に用いるのは本来正しい使い方ではありません。
ですが、「アーキテクチャやコア実装が異なるマイコン間で、クロック周波数による単純比較はできない」という背景もあるため、
それらの違いを踏まえて比較できるという点では結構面白いツールで、良い評価軸を得られたように思えます。
(せっかくなので RISC-V や x86 とかも比較したいですね)
以下の記事とあわせてマイコン選定の判断材料としてご参考にしていただければ。