FortranAdvent Calendar 2024

FMM(Fast Multipole Method)ライブラリの紹介パート2

Last updated at 2024-12-24Posted at 2024-12-23

本記事は 12/21の記事の続編です．

FMM (Laplace Double Layer Potential の計算の高速化)

前回の記事で紹介した内容の中で，本記事で利用する最も重要な事実は以下の通りです．
2次元ベクトルからなる2つの列$(x_{i})$ $(1 \le i \le N_{x})$, $(y_{j})$ $(1 \le j \le N_{y})$と，$(y_{j})$に関連付けられたスカラー値の列$(c_j)$ $(1 \le j \le N_{y})$が与えられたとき，

\begin{equation}
  \sum_{j=1}^{N_{y}} c_{j} \frac{\partial{G(x_{i},y_{j})}}{\partial{n_{y_{j}}}}
\end{equation}

を全ての$i$ $(1 \le i \le N_{x})$について愚直に計算すると時間計算量が$O(N_{x}N_{y})$かかりますが，FMM (Fast Multipole Method)によって，時間計算量$O(N_{x} + N_{y})$で計算することが出来ます．ここで$\frac{\partial{G(x,y)}}{\partial{n_{y}}} = \frac{1}{2\pi r^2} (x-y)\cdot\hat{n}_{y}$です．

BIE から得られる線形方程式の解の反復解法による計算

同一の2次元空間上の点列を$(x_{i})$, $(y_{i})$ $(1 \le i, j \le N_b)$,と2つ与え(全ての$i$で$x_{i} = y_{i}$),

\begin{equation}
    \rho^{(n)}(x_{i})
  -2 \sum_{j=1}^{N_{y}} \frac{\partial{G(x_{i},y_{j})}}{\partial{n_{y_{j}}}}
    w_{j}\rho^{(n)}(y_{j})
  = b(x_{i})
\end{equation}

を各$i \in {1, 2, \dots, N_{b}}$について求める必要があります．
ここで$\rho_{i}^{(n)} := \rho_{x_{i}}^{(n)} = \rho_{y_{i}}^{(n)}$は$\rho$を数値的に求めるために適用される反復法(ここではGMRESが典型的な手法)によって得られる$n$ステップ目の$x_i = y_i$における$\rho$の推定値($n$回目の反復計算目終了時点で既知の値となる)です．
この離散化した BIE を愚直に計算すると，左辺第二項の計算で $O(N_{b})$ の時間計算量がかかり，全ての$i$に対してこれらを計算する反復法の$1$ステップで$O(N_{b}^2)$のコストかかかります．
ここで$N_{x} = N_{y} = N_{b}$, $c_{j} = w_{j}\rho_{j}^{(n)}$ として，全ての左辺第二項の計算をまとめて FMM で行うことが出来，その時間計算量は $O(N_{b})$となります．左辺の第二項達の計算結果に第一項を足す処理も全体として$O(N_{b})$です．
fmm2d では，lfmm2d_s_d_p というサブルーチンを呼び出すことでこの FMM による処理を実行出来ます．
以下に lfmm2d_s_d_p のインターフェイスを抜粋しました．

fmm2d/src/laplace/lfmm2dwrap.f

      subroutine lfmm2d_s_d_p(eps,ns,sources,
     1            dipstr,dipvec,pot,ier)
cf2py  intent(in) eps
cf2py  intent(in) ns,sources,dipstr,dipvec
cf2py  intent(out) pot,ier
c----------------------------------------------
c   INPUT PARAMETERS:
c   eps           : FMM precision requested
c   ns            : number of sources
c   sources(2,ns) : source locations
c   dipstr(ns)    : dipole strengths
c   dipvec(2,ns)  : real dipole orientations
c
c   OUTPUT PARAMETERS
c   pot(ns)       : potential at the source locations
c

入出力変数と数値実験用のサンプル問題への対応は以下の通りです

Input:
- eps: FMM で近似計算のために設定する許容誤差 (例えばdouble precision で 1e-14)
- ns: $(x_i) = (y_i)$の要素数($=N_b$)
- sources(2,ns): sources$(j, i) = x_{ij} = y_{ij}$ $(1 \le i \le N_{b}, 1 \le j \le 2)$
- dpstr(ns): dipstr$(i)$ = $w_{i}\rho^{(n)}(y_{i})$ $(1 \le i \le N_{b}$
- dipvec(2,ns): dipvec$(j,i)$ = $\hat{n_{y_{j}}} $ $(1 \le i \le N_{b}, 1 \le j \le 2)$
Output:
- pot(ns): BIE の左辺第二項$\sum_{j=1}^{N_{y}} \frac{\partial{G(x_{i},y_{j})}}{\partial{n_{y_{j}}}}w_{j}\rho^{(n)}(y_{j})$から成るサイズ$N_{b}$のベクトル

このサブルーチンを Fortran や C のコードで呼び出すために，以下のラッパー関数を用意しました．ラッパー関数作成の動機は，$(x_{i})$や$(y_{i})$等の配列を，fmm2d の外では一次元で持たせたかったからです．
行っていることは単純な配列の次元やサイズの調整と lfmm2d_s_d_p の呼び出しです．例えば点 $x_{i} = y_{i} = (y_{i1}, y_{i2})$ の$2$要素は $y_{i1} = $sources$(1, i)$，$y_{i2} = $sources$(2, i)$という風にそれぞれ格納されます．

fmm2d_laplace_wrap.f90

module lfmm2d_wrap
  use iso_c_binding
  use, intrinsic :: iso_fortran_env
  implicit none
  contains

  subroutine lfmm2d_s_d_p_wrap(eps_fmm, n_src, src_in, dip_str_in, dip_vec_in, &
     pot_out, flg_err) bind(c)
    implicit none
    real(c_double), value :: eps_fmm
    integer(c_int), value :: n_src
    integer(c_int), value :: flg_err
    ! sizes of *_in/out below: n_src*2 all
    real(c_double), intent(in) :: src_in(*), dip_str_in(*), dip_vec_in(*)
    real(c_double), intent(inout) :: pot_out(*)
    ! ((i-1)*2 + j) -> (j, i) (size (n*2,1) -> (2, n))
    double precision, allocatable :: src(:,:)
    double complex, allocatable :: dip_str(:)
    double precision, allocatable :: dip_vec(:,:)
    double complex, allocatable :: pot_src(:)
    integer :: i, j
    double complex :: iu

    iu = (0.0d0, 1.0d0)

    allocate(src(2, n_src), dip_str(n_src), dip_vec(2, n_src), pot_src(n_src))

    do i = 1, n_src
      do j = 1, 2
        src(j, i) = src_in((i-1)*2 + j)
        dip_vec(j, i) = dip_vec_in((i-1)*2 + j)
      end do
      dip_str(i) = dip_str_in((i-1)*2 + 1) + dip_str_in((i-1)*2 + 2)*iu
    end do

    flg_err = 0
    call lfmm2d_s_d_p(eps_fmm, n_src, src, dip_str, dip_vec, pot_src, flg_err)

    do i = 1, n_src
      pot_out((i-1)*2 + 1) = real(pot_src(i))
      pot_out((i-1)*2 + 2) = imag(pot_src(i))
    end do

    deallocate(src, dip_str, dip_vec, pot_src)
  end subroutine lfmm2d_s_d_p_wrap

end module lfmm2d_wrap

数値解の計算

BIE の線型方程式を解いた後，目的の数値解$u(x)$を$\Omega$内の$x$で計算します．
$(x_{i})$ ($1 \le i \le N_{e}$)を$u$の値を計算したい$\Omega$内の点から成る列,
$(y_{j})$ ($1 \le j \le N_{b}$)を線型方程式の構成で用いた$\Gamma$上の点から成る列とします．
各$i$ ($1 \le i \le N_{e}$) に対し，$u(x_{i})$の数値解は以下の計算で求められます．

\begin{equation}
    u(x_{i})
  = \sum_{j=1}^{N_{y}} \frac{\partial{G(x_{i},y_{j})}}{\partial{n_{y_{j}}}}
    w(y_{j})\rho_(y_{j})
\end{equation}

$N_{x} = N_{e}$, $N_{y} = N_{b}$として，FMM に$(x_{i})$, $(y_{j})$,　$c_{j} = w(y_{j})\rho(y_{j})$ ($1 \le i \le N_{e}$, $1 \le j \le N_{b}$) を入力データとして与えることで，時間計算量$O(N_{e} + N_{b})$で$(u(x_{i}))$ ($1 \le i \le N_{e}$) を得られます．各$i$に対して上の式を愚直に計算すると，$O(N_{e}N_{b})$かかります．

fmm2d では，lfmm2d_t_d_p というサブルーチンを呼び出すことでこの FMM による処理を実行出来ます．
以下に lfmm2d_t_d_p のインターフェイスを抜粋しました．

fmm2d/src/laplace/lfmm2dwrap.f

      subroutine lfmm2d_t_d_p(eps,ns,sources,
     1            dipstr,dipvec,nt,targ,pottarg,ier)
cf2py  intent(in) eps
cf2py  intent(in) ns,sources,dipstr,dipvec,nt,targ
cf2py  intent(out) pottarg,ier
c----------------------------------------------
c   INPUT PARAMETERS:
c   eps           : FMM precision requested
c   ns            : number of sources
c   sources(2,ns) : source locations
c   dipstr(ns)    : dipole strengths
c   dipvec(2,ns)  : real dipole orientations
c   nt            : number of targets
c   targ(2,nt)    : target locations
c
c   OUTPUT PARAMETERS
c   pottarg(nt)   : potential at the target locations
c

先述した lfmm2d_s_d_p とかなり似ていますが，入出力変数と数値実験用のサンプル問題への対応は以下の通りです．

Input:
- eps: FMM で近似計算のために設定する許容誤差 (例えばdouble precision で 1e-14)
- ns: $((y_i)$の要素数($=N_b$)
- sources(2,ns): sources$(j, i) = y_{ij}$ $(1 \le i \le N_{b}, 1 \le j \le 2)$
- dpstr(ns): dipstr$(i)$ = $w_{i}\rho^{(n)}(y_{i})$ $(1 \le i \le N_{b}$
- dipvec(2,ns): dipvec$(j,i)$ = $\hat{n_{y_{j}}} $ $(1 \le i \le N_{b}, 1 \le j \le 2)$
- nt: 数値解を求めたい点$((x_i)$の要素数($=N_e$)
- targ(2,ns): targ$(j, i) = x_{ij}$ $(1 \le i \le N_{e}, 1 \le j \le 2)$
Output:
- pottarg(nt): 数値解$(u(x_{i}))$ $(1 \le i \le N_{e})$から成るサイズ$N_{e}$のベクトル

同様に以下のラッパー関数を用意しました．

fmm2d_laplace_wrap.f90

module lfmm2d_wrap
  use iso_c_binding
  use, intrinsic :: iso_fortran_env
  implicit none
  contains

  subroutine lfmm2d_t_d_p_wrap(eps_fmm, n_src, src_in, dip_str_in, dip_vec_in, &
    n_trg, trg_in, pot_trg_out, flg_err) bind(c)
    implicit none
    real(c_double), value :: eps_fmm
    integer(c_int), value :: n_src, n_trg
    integer(c_int), value :: flg_err
    ! sizes of *_in below: n_src*2 all
    real(c_double), intent(in) :: src_in(*), dip_str_in(*), dip_vec_in(*)
    ! ((i-1)*2 + j) -> (j, i) (size (n*2,1) -> (2, n))
    double precision, allocatable :: src(:,:)
    double complex, allocatable :: dip_str(:)
    double precision, allocatable :: dip_vec(:,:)
    ! sizes of *_in below: n_trg*2 all
    real(c_double), intent(in) :: trg_in(*)
    real(c_double), intent(inout) :: pot_trg_out(*)
    ! (i-1)*2 + j) -> (j, i) (size (n*2,1) -> (2, n))
    double precision, allocatable :: trg(:,:)
    double complex, allocatable :: pot_trg(:)
    integer :: i, j
    double complex :: iu

    iu = (0.0d0, 1.0d0)

    allocate(src(2, n_src), dip_str(n_src), dip_vec(2, n_src))
    allocate(trg(2, n_trg), pot_trg(n_trg))

    do i = 1, n_src
      do j = 1, 2
        src(j, i) = src_in((i-1)*2 + j)
        dip_vec(j, i) = dip_vec_in((i-1)*2 + j)
      end do
      dip_str(i) = dip_str_in((i-1)*2 + 1) + dip_str_in((i-1)*2 + 2)*iu
    end do
    do i = 1, n_trg
      do j = 1, 2
        trg(j, i) = trg_in((i-1)*2 + j)
      end do
    end do

    flg_err = 0
    call lfmm2d_t_d_p(eps_fmm, n_src, src, &
      dip_str, dip_vec, &
      n_trg, trg, pot_trg, flg_err)

    do i = 1, n_trg
      pot_trg_out((i-1)*2 + 1) = real(pot_trg(i))
      pot_trg_out((i-1)*2 + 2) = imag(pot_trg(i))
    end do

    deallocate(src, dip_str, dip_vec)
    deallocate(trg, pot_trg)
  end subroutine lfmm2d_t_d_p_wrap
  
end module lfmm2d_wrap

また，C や C++ から lfmm2d_t_d_p_wrap と lfmm2d_w_d_p_wrap を呼び出せるようにするために，以下の関数も用意しました．

fmm2d_laplace_wrap.h

#ifdef __cplusplus
extern "C" {

void lfmm2d_t_d_p_wrap(double eps_fmm, int n_src, double *src_in,
  double *dip_str_in, double *dip_vec_in,
  int n_trg, double *trg_in, double *pot_trg_out,
  int flg_err);

void lfmm2d_s_d_p_wrap(double eps_fmm, int n_src, double *src_in,
  double *dip_str_in, double *dip_vec_in,
  double *pot_out,
  int flg_err);

}
#endif

C や C++ と Fortran の相互運用のためのコードの書き方については，以下の資料や記事が参考になります(させて頂きました)．

nAGによる資料:

sakamoti(Y Saka) さんによる当アドベントカレンダー初日の記事

osada-yum(yum osada)さんによる12/5の記事

数値計算結果

線型方程式の反復法パート

次のグラフは，サンプル問題から導かれる線型方程式$(I + K)\rho = b$を GMRES で解くために要した時間(単位：秒)を，$N_{b}$のサイズを$300$から$4 \times 10^4$程度までの異なる値で計算した結果を$\log$-$\log$スケールで描画したものです．時間計算量については，愚直な実装では$O(N_{b}^2)$，FMM では$O(N_{b})$です．実線のうち，黒のグラフが愚直な実装，青のグラフが FMM を適用した実装による結果です．$N_{b}$を増加させると，実線のグラフの傾きが同じ色の点線の傾きに近づくことが期待されますが，概ね期待される結果が得られました．

補足ですが，FMM を使わずに愚直に線型方程式の行列をデータとして陽に持たせる場合，$N_{b}$が$10^5$を超えるくらい大きくなってくると，$N_{b}^2$のサイズの配列を確保するための RAM の容量が通常のラップトップでは足りなくなるという空間計算量の問題も実はあります．この点で，FMM は陽に行列を構築せずに処理出来るため，大規模な問題にも対応可能という利点もあります．

数値解の計算パート

もう一つのグラフは，数値解$u(x)$ を計算するために要した時間(単位：秒)を，$N_{e}$と$N_{b}$のサイズを揃えて，それぞれの値を共に$3 \times 10^3$から$2 \times 10^4$程度までの値で計算した結果を$\log$-$\log$スケールで描画したものです．実線のうち，黒のグラフが愚直な実装，青のグラフが FMM を適用した実装による結果です．時間計算量については，愚直な実装では$O(N_{e}N_{b})$，FMM では$O(N_{e} + N_{b})$です．$N_{e}$と$N_{b}$を揃えて増加させると，実線のグラフの傾きが同じ色の点線の傾きに近づくことが期待されますが，この実験でも概ね期待される結果が得られました．

補足

計算誤差
この FMM ライブラリ紹介記事では，計算結果の精度には触れていません．
BIE method や FMM によってどの程度の誤差が数値計算で生じるか，そして誤差を小さくするための取り組みは，高速なアルゴリズムを構築することと共にとても重要な問題です．
実は工夫なしの実装では，境界$\Gamma$(以下の図では赤線)付近の点(青)でかなり精度が落ちてしまうことが知られています(仮に境界$\Gamma$が十分滑らかであっても)．
Green's function 自身が，$x_{i}$と$y_{j}$が近くなるにつれ発散することが主な原因です．
数学的にはその畳み込み積分が極限として有限の値をとるとしても，数値計算において精度を悪化させる要因となってしまいます．

計算対象領域$\Omega$が2次元で，その境界$\Gamma$が区分的にある程度滑らかであれば，精度の悪化を克服出来る手法が様々な問題で提案されています(例えば参考文献5, 6, 7)．
2次元の場合ほど強力な手法が整備されている訳ではないのですが，3次元の問題も活発に研究されています(例えば参考文献8, 9, 10)．

まとめ

2次元 Laplace 方程式の境界値問題に対する数値解法として BIE method とその計算アルゴリズムを高速化するためのライブラリである fmm2d，またその利用のための単純なラッパー関数を紹介しました．
個人的に興味を持っている様々な数値計算アルゴリズム(例えば3次元 FMM や，non-uniform FFT等)を将来的に紹介出来ればと考えています．

参考文献

R. Kress, Linear Integral Equations, Third Edition, Springer, 2014.
Y. Saad and M.H. Schultz, GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, J. Sci. Statist. Comput., 7 (1986), pp. 856-869.
L. Greengard and V. Rokhlin, A fast algorithm for particle simulations, J. Comput. Phys., 73(2), 1987
L. Chen, Introduction To Fast Multipole Methods, https://www.math.uci.edu/~chenlong/MathPKU/FMMsimple.pdf
J. Helsing, Integral equation methods for elliptic problems with boundary conditions of mixed type, J. Comput. Phys., 228(23), pp.8892-8907, 2009.
J. Helsing, Solving integral equations on piecewise smooth boundaries using the RCIP method: a tutorial, personal communication, 2022.
B. Wu, H. Zhu, A. Barnett, and S. Veerapaneni, Solution of Stokes flow in complex nonsmooth 2D geometries via a linear-scaling high-order adaptive integral equation scheme, J. Comput. Phys., 410, 2020.
H. Zhu and S. Veerapaneni, High-order close evaluation of Laplace layer potentials: A differential geometric approach, SIAM J. Sci. Comput, 44(3), pp.A1381-A1404, 2022.
S. Jiang and H. Zhu, Recursive reduction quadrature for the evaluation of Laplace layer potentials in three dimensions, arXiv, 2024.
L. Greengard, M. O'Neil, M. Rachh, and F. Vico, Fast multipole methods for the evaluation of layer potentials with locally-corrected quadratures, J. Comput. Phys., 10, 2021.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

FMM(Fast Multipole Method)ライブラリの紹介 パート2