https://rocm-documentation.readthedocs.io/en/latest/ROCm_Tools/ROCm-Tools.html#download
https://github.com/ROCm-Developer-Tools/rocprofiler
path設定あたりが明記されておらず一瞬戸惑ったのでメモ
環境はubuntu16.04+ROCm2.1
(3/27)に実行編を追加
3/28に一部記述を削除
・追記
2019年11月現在、一々ビルド&インストールしなくてもrocm-devパッケージに含まれるようになりパスさえ通せばすぐ使えるようになってます
特殊な事情がなければビルドをわざわざする必要はなさそうです
環境変数
export CMAKE_PREFIX_PATH=/opt/rocm/hsa/include/hsa:/opt/rocm/hsa/lib
export CMAKE_BUILD_TYPE=release
export CMAKE_DEBUG_TRACE=1 # to enable debug tracing
.bashrcなどに入れてsource .bashrcします
ダウンロード
$ git clone https://github.com/ROCm-Developer-Tools/rocprofiler
ビルド&install
$ cd ./rocprofiler/
$ mkdir build
$ cd ./build
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/rocm ..
$ make
$ sudo make install
これでインストールは多分完了のはずです
まともにビルドされていれば run.shがあるので実行テストができます
$ ./run.sh
(出力省略)
Time taken for Setup by SimpleConvolution : 0.00185181
Time taken for Dispatch by SimpleConvolution : 0.00445312
Time taken in Total by SimpleConvolution : 0.0182852
ROCPRofiler: 4 contexts collected, output directory ./RESULT
gfx906(Vega20 RadeonⅦ)で実行すると何故かコアダンプでこのテストは完走できませんでしたので実行はgfx900 (VegaFE)で実行しました。
.bashrc
に
export PATH=/opt/rocm/rocprofiler/bin:${PATH}
を追記しておきました
source ~/.bashrc
してから
$ rpl_run.sh
RPL: on '190327_180120' from '/opt/rocm/rocprofiler' in '/home/misaka/rocprofiler/build'
ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package.
Full path: /opt/rocm/rocprofiler/bin/rpl_run.sh
Metrics definition: /opt/rocm/rocprofiler/lib/metrics.xml
Usage:
rpl_run.sh [-h] [--list-basic] [--list-derived] [-i <input .txt/.xml file>] [-o <output CSV file>] <app command line>
Options:
-h - this help
--verbose - verbose mode, dumping all base counters used in the input metrics
--list-basic - to print the list of basic HW counters
--list-derived - to print the list of derived metrics with formulas
-i <.txt|.xml file> - input file
Input file .txt format, automatically rerun application for every pmc line:
# Perf counters group 1
pmc : Wavefronts VALUInsts SALUInsts SFetchInsts FlatVMemInsts LDSInsts FlatLDSInsts GDSInsts VALUUtilization FetchSize
# Perf counters group 2
pmc : WriteSize L2CacheHit
# Filter by dispatches range, GPU index and kernel names
# supported range formats: "3:9", "3:", "3"
range: 1 : 4
gpu: 0 1 2 3
kernel: simple Pass1 simpleConvolutionPass2
Input file .xml format, for single profiling run:
# Metrics list definition, also the form "<block-name>:<event-id>" can be used
# All defined metrics can be found in the 'metrics.xml'
# There are basic metrics for raw HW counters and high-level metrics for derived counters
<metric name=SQ:4,SQ_WAVES,VFetchInsts
></metric>
# Filter by dispatches range, GPU index and kernel names
<metric
# range formats: "3:9", "3:", "3"
range=""
# list of gpu indexes "0,1,2,3"
gpu_index=""
# list of matched sub-strings "Simple1,Conv1,SimpleConvolution"
kernel=""
></metric>
-o <output file> - output CSV file [<input file base>.csv]
-d <data directory> - directory where profiler store profiling data including thread treaces [/tmp]
The data directory is renoving autonatically if the directory is matching the temporary one, which is the default.
-t <temporary directory> - to change the temporary directory [/tmp]
By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory.
--basenames <on|off> - to turn on/off truncating of the kernel full function names till the base ones [off]
--timestamp <on|off> - to turn on/off the kernel disoatches timestamps, dispatch/begin/end/complete [off]
--ctx-wait <on|off> - to wait for outstanding contexts on profiler exit [on]
--ctx-limit <max number> - maximum number of outstanding contexts [0 - unlimited]
--heartbeat <rate sec> - to print progress heartbeats [0 - disabled]
--stats - generating kernel execution stats, file <output name>.stats.csv
--hsa-trace - to trace HSA, generates API execution stats and JSON file chrome-tracing compatible
Generated files: <output name>.hsa_stats.txt <output name>.json
Traced API list can be set by input .txt or .xml files.
Input .txt:
hsa: hsa_queue_create hsa_amd_memory_pool_allocate
Input .xml:
<trace name="HSA">
<parameters list="hsa_queue_create, hsa_amd_memory_pool_allocate">
</parameters>
</trace>
Configuration file:
You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/misaka:<package path>
First the configuration file is looking in the current directory, then in your home, and then in the package directory.
Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat'.
An example of 'rpl_rc.xml':
<defaults
basenames=off
timestamp=off
ctx-limit=0
heartbeat=0
></defaults>
で一応実行できるようになっています
もっとスマートにROC-profilerのパスを通す方法があればいいのですがひとまずこうしました。