Edited at

ARM NEONでの初期逆数推定値の補正 / Correction of reciprocal value w/ ARM NEON


背景 / background

前回の記事の続き

Continuation from last time

$1/a$をNewton-Raphson法で補正する際は以下の式で推定値を更新していく。

In Newton-raphson correction of $1/a$, we use formula in below.

x_{k+1} = x_{n}(2 - ax_{n})

ARM NEONではVRECPSという命令で$2 - ax_{n}$が計算できる。(なんで全部計算してくれないのか不思議)

ARM NEON provide VRECPS instruction which compute $2 - ax_{n}$(I don't know why it does not compute whole formula).

これを使った2次, 4次, 8次の補正の精度を評価してみた。

I evaluated accuracy of 2nd, 4th 8th order correction.

またスピードと精度のトレードオフを考慮して@kaity0256さんの記事を参考に3次の補正も評価した。

And I also evaluatedd 3rd order correction which is referred in High order reciprocal approximation correction.


評価 / evaluation

前回と同じ方法で補正した値の相対誤差を評価した。

I evaluated the accuracy in same way as I evaluated in previous article.

codes

div0(VRECPE itself)

max relative difference:2.853386E-03
true value:7.299270E-03 = 1 / 137.000000
est value:7.278442E-03
true value:0 01110111 1101'1110'0101'1101'0110'111
est value:0 01110111 1101'1101'0000'0000'0000'000
div2(Newton-Raphson 1step(2nd order))
max relative difference:8.218507E-06
true value:2.047288E-06 = 1 / 488451.000000
est value:2.047271E-06
true value:0 01101100 0001'0010'1100'1000'0100'101
est value:0 01101100 0001'0010'1100'0111'1011'011
div3(3rd order correction)
max relative difference:2.960487E-07
true value:9.216335E-06 = 1 / 108503.000000
est value:9.216333E-06
true value:0 01101110 0011'0101'0011'1111'1011'100
est value:0 01101110 0011'0101'0011'1111'1011'001
div4(Newton-Raphson 2step(4th order))
max relative difference:1.751382E-07
true value:2.596506E-06 = 1 / 385133.000000
est value:2.596506E-06
true value:0 01101100 0101'1100'0111'1111'0100'000
est value:0 01101100 0101'1100'0111'1111'0100'010
div8(Newton-Raphson 3step(8th order))
max relative difference:1.191802E-07
true value:2.442003E-04 = 1 / 4095.000000
est value:2.442002E-04
true value:0 01110011 0000'0000'0001'0000'0000'001
est value:0 01110011 0000'0000'0001'0000'0000'000


結果 / result

Correction method
accuracy(bits)

補正なし(No correction)
6

2次補正(2nd order correction)
12

3次補正(3rd order correction)
20

4次補正(4th order correction)
21

8次補正(8th order correction)
22

やっぱりニュートン法3ステップはオーバースペックっぽいな。

8th order correction seems to be overkill.