前提

単精度とする

C math

sqrt(0.0) = 0.0, sqrt(-0.0) = -0.0, sqrt(-1.0) = -nan

`_mm_sqrt_ps`

SSE2 命令の sqrt も, C math と同様になる. つまり +/-0.0 の入力の場合は +/-0.0, 負数の場合は -nan となる.

AARCH64 NEON

AARCH64 NEON(AARCH32 も?)には sqrt 命令がある(近似を求める sqrte も)

#include <cstdint>
#include <cstdio>
#include <cmath>

#include <arm_neon.h>

struct FP32
{
  union {
    float f;
    uint32_t ui;
  };
};

int main(int argc, char **argv)
{
  float32x4_t a = vdupq_n_f32(-1.0f);

  float32x4_t c = vsqrtq_f32(a);

  __attribute__((aligned(16))) float buf[4];

  vst1q_f32(buf, c);

  FP32 fp;
  fp.f = buf[0];

  printf("ret = %g(0x%08x)\n", buf[0], fp.ui);

  return 0;
}

sqrt(+0.0) = +0.0
sqrt(-0.0) = -0.0
sqrt(-1.0) = nan(0x7fc00000)
sqrt(+inf) = +inf
sqrt(-inf) = nan(0x7fc00000)

SSE とは異なり, nan の場合に符号がつかない.

x86 SSE, AARCH64 NEON での sqrt のメモ

前提

C math

_mm_sqrt_ps

AARCH64 NEON

`_mm_sqrt_ps`