#背景 / background
前回の記事の続き
Continuation from last time
$1/a$をNewton-Raphson法で補正する際は以下の式で推定値を更新していく。
In Newton-raphson correction of $1/a$, we use formula in below.
x_{k+1} = x_{n}(2 - ax_{n})
ARM NEONではVRECPSという命令で$2 - ax_{n}$が計算できる。(なんで全部計算してくれないのか不思議)
ARM NEON provide VRECPS instruction which compute $2 - ax_{n}$(I don't know why it does not compute whole formula).
これを使った2次, 4次, 8次の補正の精度を評価してみた。
I evaluated accuracy of 2nd, 4th 8th order correction.
またスピードと精度のトレードオフを考慮して@kaity0256さんの記事を参考に3次の補正も評価した。
And I also evaluatedd 3rd order correction which is referred in High order reciprocal approximation correction.
評価 / evaluation
前回と同じ方法で補正した値の相対誤差を評価した。
I evaluated the accuracy in same way as I evaluated in previous article.
codes
div0(VRECPE itself)
max relative difference:2.853386E-03
true value:7.299270E-03 = 1 / 137.000000
est value:7.278442E-03
true value:0 01110111 1101'1110'0101'1101'0110'111
est value:0 01110111 1101'1101'0000'0000'0000'000
div2(Newton-Raphson 1step(2nd order))
max relative difference:8.218507E-06
true value:2.047288E-06 = 1 / 488451.000000
est value:2.047271E-06
true value:0 01101100 0001'0010'1100'1000'0100'101
est value:0 01101100 0001'0010'1100'0111'1011'011
div3(3rd order correction)
max relative difference:2.960487E-07
true value:9.216335E-06 = 1 / 108503.000000
est value:9.216333E-06
true value:0 01101110 0011'0101'0011'1111'1011'100
est value:0 01101110 0011'0101'0011'1111'1011'001
div4(Newton-Raphson 2step(4th order))
max relative difference:1.751382E-07
true value:2.596506E-06 = 1 / 385133.000000
est value:2.596506E-06
true value:0 01101100 0101'1100'0111'1111'0100'000
est value:0 01101100 0101'1100'0111'1111'0100'010
div8(Newton-Raphson 3step(8th order))
max relative difference:1.191802E-07
true value:2.442003E-04 = 1 / 4095.000000
est value:2.442002E-04
true value:0 01110011 0000'0000'0001'0000'0000'001
est value:0 01110011 0000'0000'0001'0000'0000'000
#結果 / result
Correction method | accuracy(bits) |
---|---|
補正なし(No correction) | 6 |
2次補正(2nd order correction) | 12 |
3次補正(3rd order correction) | 20 |
4次補正(4th order correction) | 21 |
8次補正(8th order correction) | 22 |
やっぱりニュートン法3ステップはオーバースペックっぽいな。 | |
8th order correction seems to be overkill. |