前回、gprofを使おうとしてうまくいかなかったので、今回はperfを使うことにする。
perfはカーネルに直結しているので使いやすい反面、特殊なカーネル(例えばスパコンとか)だと使えないこともあるが、まぁ良しとしよう。
perf自体の使い方は先人のまとめがあるのでそっちを見てもらうとして。
とりあえず前回と同じように、-g
でデバッグシンボルを付けた状態でビルドして実行する1。
$ source ~/OpenFOAM/OpenFOAM-dev/etc/bashrc
$ cd ~/OpenFOAM-BenchmarkTest/channelReTau110/NoBatch-mesh_3M/cases/mpi_0001-method_scotch
$ perf record pimpleFoam
$ perf report
結果は以下のようになった。
24.15% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam11DICSmoother6smoothERNS_5FieldIdEERKS2_hi
14.79% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam9lduMatrix8residualERNS_5FieldIdEERKS2_S5_RKNS_10FieldFieldIS1_dEERKN
11.36% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam9lduMatrix4AmulERNS_5FieldIdEERKNS_3tmpIS2_EERKNS_10FieldFieldIS1_dEE
3.10% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam10GAMGSolver5scaleERNS_5FieldIdEES3_RKNS_9lduMatrixERKNS_10FieldField
2.89% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam18DILUPreconditioner13preconditionTERNS_5FieldIdEERKS2_h
2.87% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam18DILUPreconditioner12preconditionERNS_5FieldIdEERKS2_h
2.18% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam9lduMatrix4TmulERNS_5FieldIdEERKNS_3tmpIS2_EERKNS_10FieldFieldIS1_dEE
1.58% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam5PBiCG5solveERNS_5FieldIdEERKS2_h
1.51% pimpleFoam [kernel] [k] 0xffffffffb6a3b457
1.26% pimpleFoam libfiniteVolume.so [.] _ZN4Foam26surfaceInterpolationSchemeINS_6VectorIdEEE14dotInterpolateINS_14Geom
1.24% pimpleFoam libfiniteVolume.so [.] _ZN4Foam2fv9gaussGradIdE5gradfERKNS_14GeometricFieldIdNS_13fvsPatchFieldENS_11
1.10% pimpleFoam libfiniteVolume.so [.] _ZN4Foam2fv9gaussGradINS_6VectorIdEEE5gradfERKNS_14GeometricFieldIS3_NS_13fvsP
1.04% pimpleFoam libOpenFOAM.so [.] _ZN4Foam8multiplyERNS_5FieldIdEERKNS_5UListIdEES6_
1.01% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam17GAMGAgglomeration12prolongFieldIdEEvRNS_5FieldIT_EERKS4_ib.constpro
0.97% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam10GAMGSolver5solveERNS_5FieldIdEERKS2_h
0.96% pimpleFoam pimpleFoam [.] _ZN4Foam5FieldIdEaSERKS1_
0.96% pimpleFoam libOpenFOAM.so [.] _ZN4Foam17DICPreconditioner15calcReciprocalDERNS_5FieldIdEERKNS_9lduMatrixE
0.94% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam17GAMGAgglomeration13restrictFieldIdEEvRNS_5FieldIT_EERKS4_ib.constpr
0.78% pimpleFoam pimpleFoam [.] _ZNK4Foam8fvMatrixINS_6VectorIdEEE1HEv
0.72% pimpleFoam libfiniteVolume.so [.] _ZN4Foam26surfaceInterpolationSchemeIdE14dotInterpolateINS_17geometricOneField
0.71% pimpleFoam libOpenFOAM.so [.] _ZN4Foam18DILUPreconditioner15calcReciprocalDERNS_5FieldIdEERKNS_9lduMatrixE
0.58% pimpleFoam [kernel] [.] 0xffffffffb6e9b1d7
0.57% pimpleFoam libOpenFOAM.so [.] _ZN4Foam6divideERNS_5FieldIdEERKNS_5UListIdEES6_
0.53% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam9lduMatrix4sumAERNS_5FieldIdEERKNS_10FieldFieldIS1_dEERKNS_8UPtrListI
0.52% pimpleFoam pimpleFoam [.] _ZN4Foam8subtractIddNS_13fvsPatchFieldENS_11surfaceMeshEEEvRNS_14GeometricFiel
0.52% pimpleFoam libOpenFOAM.so [.] _ZNK4Foam10GAMGSolver6VcycleERKNS_7PtrListINS_9lduMatrix8smootherEEERNS_5Field
0.50% pimpleFoam pimpleFoam [.] _ZNK4Foam9lduMatrix1HINS_6VectorIdEEEENS_3tmpINS_5FieldIT_EEEERKS7_
0.47% pimpleFoam pimpleFoam [.] _ZNK4Foam5FieldINS_6VectorIdEEE9componentEh
0.46% pimpleFoam pimpleFoam [.] _ZN4Foam3fvc16surfaceIntegrateIdEEvRNS_5FieldIT_EERKNS_14GeometricFieldIS3_NS_
0.44% pimpleFoam libfiniteVolume.so [.] _ZN4Foam26surfaceInterpolationSchemeINS_6TensorIdEEE14dotInterpolateINS_14Geom
0.44% pimpleFoam libOpenFOAM.so [.] _ZN4Foam7sumProdIdEEdRKNS_5UListIT_EES5_
0.41% pimpleFoam libfiniteVolume.so [.] _ZN4Foam8multiplyINS_6VectorIdEENS_12fvPatchFieldENS_7volMeshEEEvRNS_14Geometr
0.40% pimpleFoam libturbulenceModels.so [.] _ZN4Foam3fvc16surfaceIntegrateINS_6VectorIdEEEEvRNS_5FieldIT_EERKNS_14Geometri
0.37% pimpleFoam pimpleFoam [.] _ZN4Foam5FieldIdEaSERKNS_3tmpIS1_EE
なぜかマングリングされている・・・。まぁextern "C"されているのでしかたなさそう。
ただなんとなく名前を見れば分かるのでよしとする。
ということでコールグラフを見るために、
$ perf record -g pimpleFoam
$ perf report -g -G
を実行する。
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 47277
#
# Samples: 3M of event 'cycles:ppp'
# Event count (approx.): 3373946376318
#
# Children Self Command Shared Object Symbol
# ........ ........ .......... .................................... ............................................................................................................................................................................................................................................................................................
#
25.64% 0.00% pimpleFoam libfiniteVolume.so [.] _ZN4Foam8fvMatrixIdE15solveSegregatedERKNS_10dictionaryE
|
---_ZN4Foam8fvMatrixIdE15solveSegregatedERKNS_10dictionaryE
|
--25.56%--_ZNK4Foam10GAMGSolver5solveERNS_5FieldIdEERKS2_h
|
--24.60%--_ZNK4Foam10GAMGSolver6VcycleERKNS_7PtrListINS_9lduMatrix8smootherEEERNS_5FieldIdEERKS8_S9_S9_S9_S9_S9_RNS1_IS8_EESD_h
|
--24.07%--_ZNK4Foam11DICSmoother6smoothERNS_5FieldIdEERKS2_hi
25.64% 0.00% pimpleFoam pimpleFoam [.] _ZN4Foam8fvMatrixIdE5solveERKNS_10dictionaryE
|
---_ZN4Foam8fvMatrixIdE5solveERKNS_10dictionaryE
|
--25.64%--_ZN4Foam8fvMatrixIdE15solveSegregatedERKNS_10dictionaryE
|
--25.56%--_ZNK4Foam10GAMGSolver5solveERNS_5FieldIdEERKS2_h
|
--24.60%--_ZNK4Foam10GAMGSolver6VcycleERKNS_7PtrListINS_9lduMatrix8smootherEEERNS_5FieldIdEERKS8_S9_S9_S9_S9_S9_RNS1_IS8_EESD_h
|
--24.07%--_ZNK4Foam11DICSmoother6smoothERNS_5FieldIdEERKS2_hi
25.64% 0.00% pimpleFoam [unknown] [.] 0x00007ffd70006465
|
---0x7ffd70006465
_ZN4Foam8fvMatrixIdE5solveERKNS_10dictionaryE
|
--25.64%--_ZN4Foam8fvMatrixIdE15solveSegregatedERKNS_10dictionaryE
|
--25.56%--_ZNK4Foam10GAMGSolver5solveERNS_5FieldIdEERKS2_h
|
--24.60%--_ZNK4Foam10GAMGSolver6VcycleERKNS_7PtrListINS_9lduMatrix8smootherEEERNS_5FieldIdEERKS8_S9_S9_S9_S9_S9_RNS1_IS8_EESD_h
|
--24.07%--_ZNK4Foam11DICSmoother6smoothERNS_5FieldIdEERKS2_hi
よくわからないので可視化しよう。
$ sudo apt install graphviz
$ git clone https://github.com/jrfonseca/gprof2dot.git
$ perf script | c++filt > perf.script
$ gprof2dot/gprof2dot.py perf.script -f perf -n1 -e1 -w | dot -Tsvg -o output.svg
たくさん出てきますが、大きいところは
- 26%:
Foam::fvMatrix::solve
- 15%:
Foam::lduMatrix::residual()
のようですね。
というところで、とりあえずFoam::fvMatrix::solve
から見ていけば良さそうということが、分かりました。
OpenFOAMは大きなアプリケーションなので、これ以上細かい情報をプロファイラーできれいに取ることはめんどそうむずかしそうです。
ということで、ターゲットも分かったので、これ以降はソースコードを読んだりタイマーを仕込んだりして更に詳しく調べることにします。
-
あとで試したら実はデバッグシンボルあってもなくても変わらなかった ↩