6
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

AVXで倍精度浮動小数点の4x4の転置

Posted at

ぐぐっても、AVXでfloatの8x8の転置だとか、SSEでfloatの4x4の転置だとかしか出てこなかったので、表題の通りdouble型の4x4の行列をAVXで転置する方法です。

AVXでdouble型の4x4の行列を転置

_mm256_transpose4x4.cpp
inline void _mm256_transpose4x4_pd(
    __m256d src1, __m256d src2, __m256d src3, __m256d src4,
    __m256d &dst1, __m256d &dst2, __m256d &dst3, __m256d &dst4)
{
 __m256d src5 = _mm256_unpacklo_pd(src1, src2);  // a0 b0 a2 b2
 __m256d src6 = _mm256_unpackhi_pd(src1, src2);  // a1 b1 a3 b3
 __m256d src7 = _mm256_unpacklo_pd(src3, src4);  // c0 d0 c2 d2
 __m256d src8 = _mm256_unpackhi_pd(src3, src4);  // c1 d1 c3 d3
 dst1 = _mm256_permute2f128_pd(src5, src7, 0|(2<<4));   // a0 b0 c0 d0
 dst2 = _mm256_permute2f128_pd(src6, src8, 0|(2<<4));   // a1 b1 c1 d1
 dst3 = _mm256_permute2f128_pd(src5, src7, 1|(3<<4));   // a2 b2 c2 d2
 dst4 = _mm256_permute2f128_pd(src6, src8, 1|(3<<4));   // a3 b3 c3 d3
}

参考

こちらをかなり参考にさせて頂いています。--> http://www.officedaytime.com/tips/simd.html

余談

  • SSEでfloatの4x4の転置は_MM_TRANSPOSE()なるマクロが用意されていたりします。
  • doubleの2x2の転置は_mm_unpacklo_pdと_mm_unpackhi_pdで簡単に実装することが出来ます。
  • _mm_unpacklo_psと_mm256_unpacklo_pdの動作が思ってたのと違ったのでちょっと焦りました。
6
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
6
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?