Edited at

# 背景 / background

ARMのNEON命令には除算命令がなく、除算したいときはVRECPE命令で逆数を推定しNewton-Raphson法で必要な精度まで補正するらしい。

There are demand for speedup of image filter which include division. The accuracy of filter does not need to meet full accuracy of single precision float. In ARM NEON ISA, there are no division instruction. When it is needed, we can estimate reciprocal value with VRECPE instruction and correct it by Newton-Raphson methods and then multiply. I set speedup of division by reducing its accuracy as the goal, and investigate below things.

• VRECPEの精度 / accuracy of VRECPE

• 初期逆数推定値の補正とその精度 / reciprocal value correction methods and its accuracy

• それぞれの補正方式の速度 / speed of these method

この記事は最初のVRECPEの精度の評価に関して書く。

This article focussed on accuracy of VRECPE.

# 調査 / investigation

raspberry pi 3 model b+を使って検証した。

1/2 ~ 1/1048576まで分母を増やしていって真値と推定値との相対誤差の最大値を計算してみた。

https://github.com/sanmanyannyan/neon_div_accuracy_analysis

に置いた。

I used raspberry pi 3 model b+.

I evaluated relative error of 1/2 ~ 1/1048576, and searched maximum relative error of these.

```max relative difference:0.002853
true value:0.007299 = 1 / 137.000000
est value:0.007278
true value:0 01110111 1101'1110'0101'1101'0110'111
est value:0 01110111 1101'1101'0000'0000'0000'000
```

# 結果 / result

VRECPEの命令仕様には精度に関する記述がなかったので実装ごとに精度が違ってるのかも。

With consideration of leading hidden bit of float, the accuracy is 7bit.

There are no mention about accuracy of VRECPE in ARM manual, so this result may differ from other CPUs.