LoginSignup
0
0

More than 3 years have passed since last update.

x86 SSE, AARCH64 NEON での sqrt のメモ

Posted at

前提

単精度とする

C math

sqrt(0.0) = 0.0, sqrt(-0.0) = -0.0, sqrt(-1.0) = -nan

_mm_sqrt_ps

SSE2 命令の sqrt も, C math と同様になる. つまり +/-0.0 の入力の場合は +/-0.0, 負数の場合は -nan となる.

AARCH64 NEON

AARCH64 NEON(AARCH32 も?)には sqrt 命令がある(近似を求める sqrte も)

#include <cstdint>
#include <cstdio>
#include <cmath>

#include <arm_neon.h>

struct FP32
{
  union {
    float f;
    uint32_t ui;
  };
};

int main(int argc, char **argv)
{
  float32x4_t a = vdupq_n_f32(-1.0f);

  float32x4_t c = vsqrtq_f32(a);

  __attribute__((aligned(16))) float buf[4];

  vst1q_f32(buf, c);

  FP32 fp;
  fp.f = buf[0];

  printf("ret = %g(0x%08x)\n", buf[0], fp.ui);

  return 0;
}
sqrt(+0.0) = +0.0
sqrt(-0.0) = -0.0
sqrt(-1.0) = nan(0x7fc00000)
sqrt(+inf) = +inf
sqrt(-inf) = nan(0x7fc00000)

SSE とは異なり, nan の場合に符号がつかない.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0