More than 3 years have passed since last update.

Unaligned Memory Accesses(2/2)

Posted at 2020-06-08

Unaligned Memory Accesses

Code that causes unaligned access

With the above in mind, let's move onto a real life example of a function that can cause an unaligned memory access. The following function taken from include/linux/etherdevice.h is an optimized routine to compare two ethernet MAC addresses for equality::

上記記述を踏まえて、unaligned memory accessを引き起こす関数の実際の例に移りましょう。次の関数は、include/linux/etherdevice.hに記述されている、2つのMAC addressの比較を行う最適化された処理です。

  bool ether_addr_equal(const u8 *addr1, const u8 *addr2)
  {
  #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
	u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) |
		   ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4)));

	return fold == 0;
  #else
	const u16 *a = (const u16 *)addr1;
	const u16 *b = (const u16 *)addr2;
	return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0;
  #endif
  }

In the above function, when the hardware has efficient unaligned access capability, there is no issue with this code. But when the hardware isn't able to access memory on arbitrary boundaries, the reference to a[0] causes 2 bytes (16 bits) to be read from memory starting at address addr1.

上記関数では、hardwareがunaligned accessを許容している場合には、このコードでは何も問題をしません。ただし、hardwareが任意の境界でメモリにアクセスできない場合、a[0]への参照により、address addr1から始まるメモリから2byte (16 bits)が読み取られます。

Think about what would happen if addr1 was an odd address such as 0x10003
(Hint: it'd be an unaligned access.)

addr1が0x10003のように奇数アドレスだった場合、どんな問題が起きるのかを考えてください（ヒント：unaligned accessになります）。

Despite the potential unaligned access problems with the above function, it is included in the kernel anyway but is understood to only work normally on 16-bit-aligned addresses. It is up to the caller to ensure this alignment or not use this function at all. This alignment-unsafe function is still useful as it is a decent optimization for the cases when you can ensure alignment, which is true almost all of the time in ethernet networking context.

上記関数では潜在的なunaligned access problemにも関わらず、これはkernelに含まれています。この関数は、通常16 bit aligned addressでだけ正常動作すると理解されています。このalignmentを確認するか、この関数を使用しないのかは、呼び出し元次第です。このalignment-unsafe関数は、alignmentが保証できる場合に適切な最適化であるため、依然として利用可能です。これは、ethernet networking contextでほとんどの場合に当てはまります。

Here is another example of some code that could cause unaligned accesses::

次は、unaligned accessを引き起こすコードの別の例です。

	void myfunc(u8 *data, u32 value)
	{
		[...]
		*((u32 *) data) = cpu_to_le32(value);
		[...]
	}

This code will cause unaligned accesses every time the data parameter points to an address that is not evenly divisible by 4.

このコードは、data parameterが4で割り切れないアドレスを示すたびに、unaligned accessを引き起こします。

In summary, the 2 main scenarios where you may run into unaligned access problems involve:

まとめると、unaligned access問題が発生する可能性がある２つの主なシナリオは以下の通りです。

Casting variables to types of different lengths

Pointer arithmetic followed by access to at least 2 bytes of data
変数を様々な長さの型にキャストする。
ポインター演算とそれに伴う少なくとも2byte以上のデータへのアクセス

Avoiding unaligned accesses

The easiest way to avoid unaligned access is to use the get_unaligned() and put_unaligned() macros provided by the header file.

unaligned accessを回避する簡単な手段は、＜asm/unaligned.h＞ headerで定義されている、get_unaligned()とput_unaligned() macroを用いる事です。

Going back to an earlier example of code that potentially causes unaligned access::

unaligned accessを潜在的に引き起こすコードの例に戻ります。

void myfunc(u8 *data, u32 value)
{
	[...]
	*((u32 *) data) = cpu_to_le32(value);
	[...]
}

To avoid the unaligned memory access, you would rewrite it as follows::

unaligned memory accessを回避するには、次のように書き換えます。

void myfunc(u8 *data, u32 value)
{
	[...]
	value = cpu_to_le32(value);
	put_unaligned(value, (u32 *) data);
	[...]
}

The get_unaligned() macro works similarly. Assuming 'data' is a pointer to memory and you wish to avoid unaligned access, its usage is as follows::

get_unaligned() macroもまた同様に振る舞います。 data がメモリのポインタを差し、unaligned accessの回避を期待しているとすれば、この使い方は以下のようになります。

 	u32 value = get_unaligned((u32 *) data);

These macros work for memory accesses of any length (not just 32 bits as in the examples above). Be aware that when compared to standard access of aligned memory, using these macros to access unaligned memory can be costly in terms of performance.

これらのマクロは、任意の長さのmemory accessに働きます(例のように、32bit以外であっても）。alignment memoryの通常のアクセスと比較すると、これらのmacroを利用して、unaligned memory accessすると、パフォーマンスの点でコストがかかる可能性があります。

If use of such macros is not convenient, another option is to use memcpy(), where the source or destination (or both) are of type u8* or unsigned char*. Due to the byte-wise nature of this operation, unaligned accesses are avoided.

これらのマクロの利用が不都合な場合には、別手段としてmemcpy()を使う方法があります。ここで、sourceとdestination(あるいはその両方）のタイプは、u8* もしくは、unsigned char*です。この操作のバイト単位の性質により、unaligned accessは回避されます。

Alignment vs. Networking

On architectures that require aligned loads, networking requires that the IP header is aligned on a four-byte boundary to optimise the IP stack. For regular ethernet hardware, the constant NET_IP_ALIGN is used. On most architectures this constant has the value 2 because the normal ethernet header is 14 bytes long, so in order to get proper alignment one needs to DMA to an address which can be expressed as 4*n + 2. One notable exception here is powerpc which defines NET_IP_ALIGN to 0 because DMA to unaligned addresses can be very expensive and dwarf the cost of unaligned loads.

aligned loadを要求するアーキテクチャでは、networkingでは、IP headerが 4 byte バウンダリにalignementが揃っていることが、IP stackの最適化のために要求されます。一般的なethernet hardwareでは、NET_IP_ALIGNが適用されます。ほとんどのアーキテクチャでは、通常のethernet headerの長さが14byteであるため、この定数の値は2です。したがって、適切なalignmentを得るためには、4 * n + 2として表現できるアドレスにDMAする必要があります。ここで、注目するべき例外の１つは、NET_IP_ALIGNを0にしているpowerpcです。なぜならば、unaligned addressへのDMAは非常にcostが高く、unaligned loadのコストを小さくできる可能性があるためです。

For some ethernet hardware that cannot DMA to unaligned addresses like 4*n+2 or non-ethernet hardware, this can be a problem, and it is then required to copy the incoming frame into an aligned buffer. Because this is unnecessary on architectures that can do unaligned accesses, the code can be made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so::

4*n+2または非ethernet hardware等の、unalign addressにDMAできない一部のethernet hardwareの場合、これは問題になる可能性があります。そして、incoming frameをaligned bufferにコピーする必要があります。これは、unaligned accessを実行できるアーキテクチャでは不要であるため、次のようにコードを、CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESSに依存させることができます。

	#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
		skb = original skb
	#else
		skb = copy skb
	#endif

もともと、Linux Kernelのソースコードの一部なので、GPLv2扱いになる（はずの認識）。

https://www.kernel.org/doc/html/latest/index.html

Licensing documentation

The following describes the license of the Linux kernel source code (GPLv2), how to properly mark the license of individual files in the source tree, as well as links to the full license text.

https://www.kernel.org/doc/html/latest/process/license-rules.html#kernel-licensing

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up