More than 3 years have passed since last update.

MMC Asynchronous Request

Posted at 2021-07-23

MMC Asynchronous Request

Rationale

How significant is the cache maintenance overhead?

キャッシュメンテナンスのオーバーヘッドは、どれくらい重要ですか？

It depends. Fast eMMC and multiple cache levels with speculative cache pre-fetch makes the cache overhead relatively significant.

場合によります。高速なeMMCと、投機的なキャッシュプリフェッチを備えた複数のキャッシュレベルによって、キャッシュのオーバーヘッドを比較的大きくします。

If the DMA preparations for the next request are done in parallel with the current transfer, the DMA preparation overhead would not affect the MMC performance.

もし現在の転送と並行して次の要求のためにDMAが準備できるならば、DMAの準備がMMC性能には影響を与えません。

The intention of non-blocking (asynchronous) MMC requests is to minimize the time between when an MMC request ends and another MMC request begins.

non-blocking (asynchronous) MMC requestsの目的は、MMC要求が終了してから別のMMC要求が開始するまでの時間を最小限に抑える事にあります。

Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and dma_unmap_sg are processing. Using non-blocking MMC requests makes it possible to prepare the caches for next job in parallel with an active MMC request.

mmc_wait_for_req()を使うと、dma_map_sgとdma_unmap_sgを処理している間、MMC controllerはidle状態になります。non-blocking MMC requestsを用いることで、アクティブなMMC要求と並行して次のジョブに対するキャッシュの準備をすることができます。

MMC block driver

The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.

mmc block driverのmmc_blk_issue_rw_rq()は、non-blockingになります。

The increase in throughput is proportional to the time it takes to prepare (major part of preparations are dma_map_sg() and dma_unmap_sg()) a request and how fast the memory is.

スループットの増加は、要求の準備に要する時間（準備の大部分はdma_map_sg()とdma_unmap_sgです）と、メモリ速度に比例します。

The faster the MMC/SD is the more significant the prepare request time becomes.

MMC/SDが高速になれば、要求に対する準備時間はさらに重要になります。

Roughly the expected performance gain is 5% for large writes and 10% on large reads on a L2 cache platform.

大まかに、L2 cache platformにおいて、大容量な書き込みでは5%、大容量な読み込みでは10%の性能向上を示します。

In power save mode, when clocks run on a lower frequency, the DMA preparation may cost even more.

省電力モードでは、クロックがより低い周波数で動作します。DMA準備は更にコストがかかる場合があります。

As long as these slower preparations are run in parallel with the transfer performance won't be affected.

これらの準備が遅くなることは、転送処理と並列して実行することで影響を免れます。

Details on measurements from IOZone and mmc_test

MMC core API extension

There is one new public function mmc_start_req().

新しいpublic function mmc_start_req()が追加されます。

It starts a new MMC command request for a host. The function isn't truly non-blocking. If there is an ongoing async request it waits for completion of that request and starts the new one and returns.

これは、新しいMMC command 要求をホストに開始するものです。この関数は、完全なnon-blockingではありません。もし、非同期要求を実行中である場合には、その要求が開始し、returnし、完遂するまで待ちます。

It doesn't wait for the new request to complete. If there is no ongoing request it starts the new request and returns immediately.

新しいリクエストについては待ちません。新しいリクエストはすぐに開始し、すぐにreturnします。

MMC host extensions

There are two optional members in the mmc_host_ops -- pre_req() and post_req() -- that the host driver may implement in order to move work to before and after the actual mmc_host_ops.request() function is called.

mmc_host_ops に2つのoptionalなメンバー pre_req()とpost_req()が追加されます。 host driverは、mmc_host_ops.request()関数が呼ばれる前後に実行する処理を実装できます。

In the DMA case pre_req() may do dma_map_sg() and prepare the DMA descriptor, and post_req() runs the dma_unmap_sg().

DMAの場合、pre_req()はdma_map_sg()を実行してDMA descriptorを準備し、post_req()はdma_unmap_sg()を実行します。

Optimize for the first request

The first request in a series of requests can't be prepared in parallel with the previous transfer, since there is no previous request.

一連のリクエストにおいて、最初のリクエストには前のリクエストがありません。そのため、前の転送と並行して準備できません。

The argument is_first_req in pre_req() indicates that there is no previous request. The host driver may optimize for this scenario to minimize the performance loss.

pre_req()のis_first_req引数は、前のrequestが存在するかどうかを示します。host driverはパフォーマンス損失を最小化するために、このシナリオで最適化することができます。

A way to optimize for this is to split the current request in two chunks, prepare the first chunk and start the request, and finally prepare the second chunk and start the transfer.

これを最適化する方法は、現在の要求を２つのchunksに分割する事です。最初のチャンクの準備をして、要求を実行し、最後に２番目のチャンクの準備をして、転送を開始します。

Pseudocode to handle is_first_req scenario with minimal prepare overhead::

is_first_req scenarioを用いた準備のオバーヘッド最小化の疑似コードは以下です。

  if (is_first_req && req->size > threshold)
     /* start MMC transfer for the complete transfer size */
     /* 完全な転送サイズをMMC転送を開始 */
     mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);

     /*
      * Begin to prepare DMA while cmd is being processed by MMC.
      * The first chunk of the request should take the same time
      * to prepare as the "MMC process command time".
      * If prepare time exceeds MMC cmd time
      * the transfer is delayed, guesstimate max 4k as first chunk size.
      */
      /*
       * cmdがMMCによって処理されている間に、DMAの準備を開始する
       * 要求の最初のチャンクは、"MMC process command time"と同じ時間で準備する必要がある。
       * 準備時間がMMC cmd時間を超過すると、転送が遅延します。最初のチャンクの大きさは最大4Kと推定します。
       */
      prepare_1st_chunk_for_dma(req);
      /* flush pending desc to the DMAC (dmaengine.h) */
      /* DMACに対してpendingしていたdescをflushします(dmaengine.h) */
      dma_issue_pending(req->dma_desc);

      prepare_2nd_chunk_for_dma(req);
      /*
       * The second issue_pending should be called before MMC runs out
       * of the first chunk. If the MMC runs out of the first data chunk
       * before this call, the transfer is delayed.
       */
      /*
       * 2番目のissue_pendingは、MMCが最初のチャンクを使い果たす前に呼び出す必要があります。
       * この呼び出しの前にMMCが最初のデータチャンクを使い果たした場合、転送は遅延します。
       */
      dma_issue_pending(req->dma_desc);

(訳中）つまり、どういうこと？

時間がかかるDMAの準備と、MMCへの送信処理とをパイプライン化することで、効率よく処理しましょう、というアイデア。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up