LoginSignup
2
1

More than 5 years have passed since last update.

ARM NEONでの初期逆数推定値の補正 / Correction of reciprocal value w/ ARM NEON

Last updated at Posted at 2018-11-10

背景 / background

前回の記事の続き
Continuation from last time
$1/a$をNewton-Raphson法で補正する際は以下の式で推定値を更新していく。
In Newton-raphson correction of $1/a$, we use formula in below.

x_{k+1} = x_{n}(2 - ax_{n})

ARM NEONではVRECPSという命令で$2 - ax_{n}$が計算できる。(なんで全部計算してくれないのか不思議)
ARM NEON provide VRECPS instruction which compute $2 - ax_{n}$(I don't know why it does not compute whole formula).
これを使った2次, 4次, 8次の補正の精度を評価してみた。
I evaluated accuracy of 2nd, 4th 8th order correction.
またスピードと精度のトレードオフを考慮して@kaity0256さんの記事を参考に3次の補正も評価した。
And I also evaluatedd 3rd order correction which is referred in High order reciprocal approximation correction.

評価 / evaluation

前回と同じ方法で補正した値の相対誤差を評価した。
I evaluated the accuracy in same way as I evaluated in previous article.
codes

div0(VRECPE itself)
    max relative difference:2.853386E-03
    true value:7.299270E-03 = 1 / 137.000000
     est value:7.278442E-03
    true value:0 01110111 1101'1110'0101'1101'0110'111
     est value:0 01110111 1101'1101'0000'0000'0000'000
div2(Newton-Raphson 1step(2nd order))
    max relative difference:8.218507E-06
    true value:2.047288E-06 = 1 / 488451.000000
     est value:2.047271E-06
    true value:0 01101100 0001'0010'1100'1000'0100'101
     est value:0 01101100 0001'0010'1100'0111'1011'011
div3(3rd order correction)
    max relative difference:2.960487E-07
    true value:9.216335E-06 = 1 / 108503.000000
     est value:9.216333E-06
    true value:0 01101110 0011'0101'0011'1111'1011'100
     est value:0 01101110 0011'0101'0011'1111'1011'001
div4(Newton-Raphson 2step(4th order))
    max relative difference:1.751382E-07
    true value:2.596506E-06 = 1 / 385133.000000
     est value:2.596506E-06
    true value:0 01101100 0101'1100'0111'1111'0100'000
     est value:0 01101100 0101'1100'0111'1111'0100'010
div8(Newton-Raphson 3step(8th order))
    max relative difference:1.191802E-07
    true value:2.442003E-04 = 1 / 4095.000000
     est value:2.442002E-04
    true value:0 01110011 0000'0000'0001'0000'0000'001
     est value:0 01110011 0000'0000'0001'0000'0000'000

結果 / result

Correction method accuracy(bits)
補正なし(No correction) 6
2次補正(2nd order correction) 12
3次補正(3rd order correction) 20
4次補正(4th order correction) 21
8次補正(8th order correction) 22

やっぱりニュートン法3ステップはオーバースペックっぽいな。
8th order correction seems to be overkill.

2
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
1