LoginSignup
0
1

More than 3 years have passed since last update.

ROC Profilerインストール手順+実行まで

Last updated at Posted at 2019-03-23

https://rocm-documentation.readthedocs.io/en/latest/ROCm_Tools/ROCm-Tools.html#download
https://github.com/ROCm-Developer-Tools/rocprofiler
path設定あたりが明記されておらず一瞬戸惑ったのでメモ

環境はubuntu16.04+ROCm2.1
(3/27)に実行編を追加
3/28に一部記述を削除

・追記
2019年11月現在、一々ビルド&インストールしなくてもrocm-devパッケージに含まれるようになりパスさえ通せばすぐ使えるようになってます
特殊な事情がなければビルドをわざわざする必要はなさそうです

環境変数

export CMAKE_PREFIX_PATH=/opt/rocm/hsa/include/hsa:/opt/rocm/hsa/lib
export CMAKE_BUILD_TYPE=release
export CMAKE_DEBUG_TRACE=1 # to enable debug tracing

.bashrcなどに入れてsource .bashrcします

ダウンロード

$ git clone https://github.com/ROCm-Developer-Tools/rocprofiler

ビルド&install

$ cd ./rocprofiler/
$ mkdir build
$ cd ./build
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/rocm ..
$ make
$ sudo make install

これでインストールは多分完了のはずです

まともにビルドされていれば run.shがあるので実行テストができます

$ ./run.sh
(出力省略)
Time taken for Setup by SimpleConvolution : 0.00185181
Time taken for Dispatch by SimpleConvolution : 0.00445312
Time taken in Total by SimpleConvolution : 0.0182852

ROCPRofiler: 4 contexts collected, output directory ./RESULT

gfx906(Vega20 RadeonⅦ)で実行すると何故かコアダンプでこのテストは完走できませんでしたので実行はgfx900 (VegaFE)で実行しました。
.bashrc

export PATH=/opt/rocm/rocprofiler/bin:${PATH}

を追記しておきました

source ~/.bashrcしてから

$ rpl_run.sh 
RPL: on '190327_180120' from '/opt/rocm/rocprofiler' in '/home/misaka/rocprofiler/build'
ROCm Profiling Library (RPL) run script, a part of ROCprofiler library package.
Full path: /opt/rocm/rocprofiler/bin/rpl_run.sh
Metrics definition: /opt/rocm/rocprofiler/lib/metrics.xml

Usage:
  rpl_run.sh [-h] [--list-basic] [--list-derived] [-i <input .txt/.xml file>] [-o <output CSV file>] <app command line>

Options:
  -h - this help
  --verbose - verbose mode, dumping all base counters used in the input metrics
  --list-basic - to print the list of basic HW counters
  --list-derived - to print the list of derived metrics with formulas

  -i <.txt|.xml file> - input file
      Input file .txt format, automatically rerun application for every pmc line:

        # Perf counters group 1
        pmc : Wavefronts VALUInsts SALUInsts SFetchInsts FlatVMemInsts LDSInsts FlatLDSInsts GDSInsts VALUUtilization FetchSize
        # Perf counters group 2
        pmc : WriteSize L2CacheHit
        # Filter by dispatches range, GPU index and kernel names
        # supported range formats: "3:9", "3:", "3"
        range: 1 : 4
        gpu: 0 1 2 3
        kernel: simple Pass1 simpleConvolutionPass2

      Input file .xml format, for single profiling run:

        # Metrics list definition, also the form "<block-name>:<event-id>" can be used
        # All defined metrics can be found in the 'metrics.xml'
        # There are basic metrics for raw HW counters and high-level metrics for derived counters
        <metric name=SQ:4,SQ_WAVES,VFetchInsts
        ></metric>

        # Filter by dispatches range, GPU index and kernel names
        <metric
          # range formats: "3:9", "3:", "3"
          range=""
          # list of gpu indexes "0,1,2,3"
          gpu_index=""
          # list of matched sub-strings "Simple1,Conv1,SimpleConvolution"
          kernel=""
        ></metric>

  -o <output file> - output CSV file [<input file base>.csv]
  -d <data directory> - directory where profiler store profiling data including thread treaces [/tmp]
      The data directory is renoving autonatically if the directory is matching the temporary one, which is the default.
  -t <temporary directory> - to change the temporary directory [/tmp]
      By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory.

  --basenames <on|off> - to turn on/off truncating of the kernel full function names till the base ones [off]
  --timestamp <on|off> - to turn on/off the kernel disoatches timestamps, dispatch/begin/end/complete [off]
  --ctx-wait <on|off> - to wait for outstanding contexts on profiler exit [on]
  --ctx-limit <max number> - maximum number of outstanding contexts [0 - unlimited]
  --heartbeat <rate sec> - to print progress heartbeats [0 - disabled]

  --stats - generating kernel execution stats, file <output name>.stats.csv
  --hsa-trace - to trace HSA, generates API execution stats and JSON file chrome-tracing compatible
    Generated files: <output name>.hsa_stats.txt <output name>.json
    Traced API list can be set by input .txt or .xml files.
    Input .txt:
      hsa: hsa_queue_create hsa_amd_memory_pool_allocate
    Input .xml:
      <trace name="HSA">
        <parameters list="hsa_queue_create, hsa_amd_memory_pool_allocate">
        </parameters>
      </trace>

Configuration file:
  You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/misaka:<package path>
  First the configuration file is looking in the current directory, then in your home, and then in the package directory.
  Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat'.
  An example of 'rpl_rc.xml':
    <defaults
      basenames=off
      timestamp=off
      ctx-limit=0
      heartbeat=0
    ></defaults>

で一応実行できるようになっています

もっとスマートにROC-profilerのパスを通す方法があればいいのですがひとまずこうしました。

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1