Gromacsのインストールとパフォーマンスの検証 on Docker and CentOS 7

  • 1
    いいね
  • 0
    コメント

Gromacsビルドログ

Gromacs公式ホームページ
http://www.gromacs.org/

作成日

2016年12月9日

最終更新日

2016年12月18日

環境

$ uname -a
Linux HOSTNAME 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/*release|uniq
CentOS Linux release 7.2.1511 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.2.1511 (Core)

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    1
Core(s) per socket:    6
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 45
Model name:            Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Stepping:              7
CPU MHz:               1199.953
BogoMIPS:              4605.16
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-5
NUMA node1 CPU(s):     6-11

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          31983        1295       17945          29       12742       30133
Swap:         16127           0       16127

1. 目的

1.1 目標

  1. centOS 7.2にGromacs 2016.1を入れる。
  2. コンパイラや最適化オプションにおけるPerformanceの違いを検証する。
  3. Dockerを利用した場合のPerformanceの違いを検証する。

1.2 作業手順

  1. ビルドに必要なパッケージのインストール
  2. Gromacs ビルド
  3. Gromacs 動作確認
  4. テストラン
  5. Performanceの比較検証

2. 作業ログ

公式Gromacs 2016.1 インストールガイド

2.1 ビルドに必要なパッケージのインストール

  • yum レポジトリのアップデート
$ su
% yum update -y
...

2.1.1 cmake

GROMACS builds with the CMake build system, requiring at least version 2.8.8.

  • yumで利用できるcmakeのバージョン確認
% yum list |grep cmake
cmake.x86_64                               2.8.11-5.el7                base
cmake-gui.x86_64                           2.8.11-5.el7                base
  • 2.8.8以上のバージョンが利用できるのでインストール
% yum install cmake -y
...
% cmake --version
cmake version 2.8.11

2.1.2 gcc

We recommend gcc, because it is free, widely available and frequently provides the best performance.

Minimum supported compiler versions are * GNU (gcc) 4.6 * Intel (icc) 14 * LLVM (clang) 3.4 * Microsoft (MSVC) 2015 Other compilers may work (Cray, Pathscale, older clang) but do not offer competitive performance.

We recommend against PGI because the performance with C++ is very bad.

The Intel and GNU compilers produce much faster GROMACS executables than the PGI and Cray compilers.

  • 既にgccが入っていたので、バージョンが4.6以上であることを確認
% gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2.2 Gromacs 2016.1ビルド

2.2.1 ビルド

% exit
$ mkdir ~/src
$ cd ~/src
$ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2016.1.tar.gz
$ tar -zxvf gromacs-2016.1.tar.gz
$ mkdir build_serial
$ cd build_serial
$ cmake .. \
 -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-serial \
 -DGMX_SIMD=None  \
 -DGMX_BUILD_OWN_FFTW=ON
$ make
...
$ su
% make install
% exit

Generally, there is no advantage in using MKL with GROMACS, and FFTW is often faster.
- デフォルトで、cmakeするときにgccの最適化オプションの設定が、AVX512向けになるようだ。ワークステーションのCPU(Sandy Bridge)がAVX_512に対応していないため、cmakeするときに-DGMX_SIMD=Noneのフラッグでオフにした。(AVX_256は対応しているようだが、-DGMX_SIMD=AVX_256でcmakeしてもmakeの時に失敗する。gccのバージョンが古いためか?)

2.2.2 テスト

2.2.2.1 動作確認
 $ /opt/sw/gromacs-2016.1-serial/bin/gmx --version
                         :-) GROMACS - gmx, 2016.1 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson
  Justin A. Lemkul   Magnus Lundborg   Pieter Meulenhoff    Erik Marklund
   Teemu Murtola       Szilard Pall       Sander Pronk      Roland Schulz
  Alexey Shvetsov     Michael Shirts     Alfons Sijbers     Peter Tieleman
  Teemu Virolainen  Christian Wennberg    Maarten Wolf
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx, version 2016.1
Executable:   /opt/sw/gromacs-2016.1-serial/bin/gmx
Data prefix:  /opt/sw/gromacs-2016.1-serial
Working dir:  /home/hnishi/test_gromacs/serial
Command line:
  gmx --version

GROMACS version:    2016.1
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:        disabled
SIMD instructions:  NONE
FFT library:        fftw-3.3.5
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
Built on:           Fri Dec  9 18:35:06 JST 2016
Built by:           hnishi@HOSTNAME [CMAKE]
Build OS/arch:      Linux 3.10.0-327.el7.x86_64 x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Build CPU family:   6   Model: 45   Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 4.8.5
C compiler flags:        -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 4.8.5
C++ compiler flags:     -std=c++0x   -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast

* パスを通すには、bashを使う場合は $ source /opt/sw/gromacs-2016.1-serial-SSE4.1/bin/GMXRC.bash すればよい。他にGMXRC.csh、GMXRC.zshも用意されている。

2.2.2.2 テストラン
$ cd your/work/directory
$ git clone https://github.com/hnishi/testdata_gromacs.git
$ cd testdata_gromacs
$ /opt/sw/gromacs-2016.1-serial/bin/gmx grompp -f MD1.mdp -c ./replace_po_mg.gro -p ./topol.top -o md1.tpr > grompp.log 2>&1
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 8 > md1.log2 2>&1
$ tail -n 6 md1.log
               Core t (s)   Wall t (s)        (%)
       Time:    65247.241     8155.905      800.0
                         2h15:55
                 (ns/day)    (hour/ns)
Performance:        0.212      113.265
Finished mdrun on rank 0 Sat Dec 10 13:44:37 2016
  • gromppでtprファイル(計算のためのパラメータを格納したバイナリファイル)の準備
  • mdrunで分子動力学シミュレーションの実行(-ntomp オプションはopenMPで使うコア数の指定)
  • 計算終了後、md1.logの最後にプログラムの実行時間が出力される。

*テストデータ: 384886 Atoms, 10000 steps, 20 ps

2.3 Performanceの比較検証

2.3.1 one core

  • openMPとthread-MPIで使うコア数を指定する
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 1 -ntmpi 1 > md1.log2 2>&1

2.3.2 thread-mpi

  • openMPは1コア、thread-MPIは8コア
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 1 -ntmpi 8 > md1.log2 2>&1
$ tail -n 6 md1.log
               Core t (s)   Wall t (s)        (%)
       Time:    65889.685     8236.211      800.0
                         2h17:16
                 (ns/day)    (hour/ns)
Performance:        0.210      114.380
Finished mdrun on rank 0 Sat Dec 10 21:59:08 2016

openMPの場合と、ほぼ同じPerformanceとなった。

2.3.3 SSE4.1最適化

http://www.gromacs.org/Documentation/Installation_Instructions_5.0#simd-support

SSE4.1 Present in all Intel core processors since 2007, but notably not in AMD magny-cours. Still, almost all recent processors support this, so this can also be considered a good baseline if you are content with portability between reasonably modern processors.

  • cmakeの時のフラッグを、以下のように変更してビルド
$ cd /home/hnishi/src/gromacs-2016.1/
$ mkdir build_serial_sse4.1
$ cd build_serial_sse4.1
$ cmake .. \
 -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-serial-SSE4.1 \
 -DGMX_SIMD=SSE4.1  \
 -DGMX_BUILD_OWN_FFTW=ON
$ make
$ su
% make isntall
  • 同様にテストランを行う
 $ cd your/work/directory2
 $ cp your/work/directory/md1.tpr .
 $ /opt/sw/gromacs-2016.1-serial-SSE4.1/bin/gmx mdrun -deffnm md1 -ntomp 8 > md1.log2 2>&1
 $ tail -n 6 md1.log

               Core t (s)   Wall t (s)        (%)
       Time:    12646.491     1580.811      800.0
                 (ns/day)    (hour/ns)
Performance:        1.093       21.954
Finished mdrun on rank 0 Sat Dec 10 16:27:15 2016

約5倍の効率化がなされていることがわかった。

2.3.4 AVX_256最適化

http://www.gromacs.org/Documentation/Installation_Instructions_5.0#simd-support

  1. None For use only on an architecture either lacking SIMD, or to which GROMACS has not yet been ported and none of the options below are applicable.
  2. SSE2 This SIMD instruction set was introduced in Intel processors in 2001, and AMD in 2003. Essentially all x86 machines in existence have this, so it might be a good choice if you need to support dinosaur x86 computers too.
  3. SSE4.1 Present in all Intel core processors since 2007, but notably not in AMD magny-cours. Still, almost all recent processors support this, so this can also be considered a good baseline if you are content with portability between reasonably modern processors.
  4. AVX_128_FMA AMD bulldozer processors (2011) have this. Unfortunately Intel and AMD have diverged the last few years; If you want good performance on modern AMD processors you have to use this since it also allows the reset of the code to use AMD 4-way fused multiply-add instructions. The drawback is that your code will not run on Intel processors at all.
  5. AVX_256 This instruction set is present on Intel processors since Sandy Bridge (2011), where it is the best choice unless you have an even more recent CPU that supports AVX2. While this code will work on recent AMD processors, it is significantly less efficient than the AVX_128_FMA choice above - do not be fooled to assume that 256 is better than 128 in this case.
  6. AVX2_256 Present on Intel Haswell processors released in 2013, and it will also enable Intel 3-way fused multiply-add instructions. This code will not work on AMD CPUs.
  7. IBM_QPX BlueGene/Q A2 cores have this.
  8. Sparc64_HPC_ACE Fujitsu machines like the K computer have this.

Sandy Bridge世代である Intel(R) Xeon(R) CPU E5-2630 に適用できる最新のオプションはAVX_256のようである。

  • -DGMX_SIMD=AVX2_256のオプションでcmakeしてビルドすると、FFTWのビルドのところで失敗する
checking whether C compiler accepts -mavx512f... no
configure: error: Need a version of gcc with -mavx512f
make[2]: *** [src/contrib/fftw/fftwBuild-prefix/src/fftwBuild-stamp/fftwBuild-configure] Error 1
make[1]: *** [src/contrib/fftw/CMakeFiles/fftwBuild.dir/all] Error 2
make: *** [all] Error 2
  • FFTWを自動でダウンロード、ビルドしているが、そのときのオプションでAVX_512が有効になっているようである(AVX_512はSandy Bridgeで対応していない)
  • cmake後にできる下記 build.make ファイルの 109 行目から --enable-avx512 の項目を削除してmakeする
$ sed -n -e 109p /home/hnishi/src/gromacs-2016.1/build_avx256/src/contrib/fftw/CMakeFiles/fftwBuild.dir/build.make
    cd /home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/src/fftwBuild-build && /home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/src/fftwBuild/configure --prefix=/home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix --libdir=/home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/lib --disable-fortran --disable-shared --enable-static --with-pic --enable-sse2 --enable-avx --enable-avx2 --enable-avx512 --enable-float

もしくはcmakeの前に、下記のように CMakeLists.txt の76行目から ; --enable-avx512 を削除してからcmakeしてビルドを行う

$ diff /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_org /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt
76c76,77
<     set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2;--enable-avx512)
---
>     #set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2;--enable-avx512)
>     set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2)
$ cd /home/hnishi/src/gromacs-2016.1/build_avx256
$ cmake .. \
 -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-AVX256 \
 -DGMX_SIMD=AVX_256  \
 -DGMX_BUILD_OWN_FFTW=ON
$ make
% su
% make install
$ /opt/sw/gromacs-2016.1-AVX256/bin/gmx -version |grep SIMD
SIMD instructions:  AVX_256
FFTW-3.3.4
  • FFTW-3.3.4を利用したい場合には、以下のように CMakeLists.txt を書き換えてcmakeする
$ diff /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_org /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_fftw-3.3.4
88c88,89
< set(EXTERNAL_FFTW_VERSION 3.3.5)
---
> #set(EXTERNAL_FFTW_VERSION 3.3.5)
> set(EXTERNAL_FFTW_VERSION 3.3.4)
93c94,95
< set(GMX_BUILD_OWN_FFTW_MD5 6cc08a3b9c7ee06fdd5b9eb02e06f569 CACHE STRING
---
> #set(GMX_BUILD_OWN_FFTW_MD5 6cc08a3b9c7ee06fdd5b9eb02e06f569 CACHE STRING
> set(GMX_BUILD_OWN_FFTW_MD5 2edab8c06b24feeb3b82bbb3ebf3e7b3 CACHE STRING

2.3.5 openMPI

$ su
% yum -y install openmpi openmpi-devel
$ exit
$ cd /home/hnishi/src/gromacs-2016.1/build_openmpi
$ cmake .. \
 -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-openmpi \
 -DGMX_SIMD=SSE4.1  \
 -DGMX_OPENMP=ON  \
 -DGMX_MPI=ON  \
 -DGMX_GPU=OFF \
 -DCMAKE_C_COMPILER=/usr/lib64/openmpi/bin/mpicc \
 -DCMAKE_CXX_COMPILER=/usr/lib64/openmpi/bin/mpicxx \
 -DGMX_BUILD_OWN_FFTW=ON
$ make
$ su
% make install
% exit
$ /opt/sw/gromacs-2016.1-openmpi/bin/gmx_mpi --version |grep MPI
MPI library:        MPI
$ cd /your/work/directory3
$ cp /your/work/directory2/md1.tpr .
$ /usr/lib64/openmpi/bin/mpirun --version
mpirun (Open MPI) 1.10.0

Report bugs to http://www.open-mpi.org/community/help/
$ /usr/lib64/openmpi/bin/mpirun -np 8 /opt/sw/gromacs-2016.1-openmpi/bin/gmx_mpi mdrun -deffnm md1  > md1.log2 2>&1
$ tail -n 6 md1.log

               Core t (s)   Wall t (s)        (%)
       Time:    12121.662     1515.208      800.0
                 (ns/day)    (hour/ns)
Performance:        1.141       21.042
Finished mdrun on rank 0 Sun Dec 11 20:16:50 2016

2.3.6 Docker

2.3.6.1 Dockerインストール
参考資料

公式

Prerequisites
  • a 64-bit OS and version 3.10 or higher of the Linux kernel
$ uname -r
3.10.0-327.el7.x86_64
  • yumにDocker-engineのレポジトリを追加する
$ su
% tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
  • Docker-engineのインストールと、docker-serviceの有効化、Docker daemonの起動、テスト(hello world)
% yum install docker-engine -y
% systemctl enable docker.service
% systemctl start docker
% docker run --rm hello-world
% docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   6b644ec
 Built:
 OS/Arch:      linux/amd64
2.3.6.2 Gromacs4.6.6 MPI版イメージの取得と実行
  • GromacsイメージをDocker Hubから取得
% docker run -it hnishi/ubuntu14.04_gromacs-4.6.6:mpi_v1.0

root% cd /usr/local/src/gromacs-4.6.6/build_mpi
root% ./gromacs-4.6.6-mpi/bin/mdrun_mpi -version
                         :-)  G  R  O  M  A  C  S  (-:

                  Gromacs Runs On Most of All Computer Systems

                            :-)  VERSION 4.6.6  (-:

        Contributions from Mark Abraham, Emile Apol, Rossen Apostolov,
           Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar,
     Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans,
        Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff,
           Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
                Michael Shirts, Alfons Sijbers, Peter Tieleman,

               Berk Hess, David van der Spoel, and Erik Lindahl.

       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
         Copyright (c) 2001-2012,2013, The GROMACS development team at
        Uppsala University & The Royal Institute of Technology, Sweden.
            check out http://www.gromacs.org for more information.

         This program is free software; you can redistribute it and/or
       modify it under the terms of the GNU Lesser General Public License
        as published by the Free Software Foundation; either version 2.1
             of the License, or (at your option) any later version.

                  :-)  ../gromacs-4.6.6-mpi/bin/mdrun_mpi  (-:

Program: ../gromacs-4.6.6-mpi/bin/mdrun_mpi
Gromacs version:    VERSION 4.6.6
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled
GPU support:        disabled
invsqrt routine:    gmx_software_invsqrt(x)
CPU acceleration:   AVX_256
FFT library:        fftw-3.3.2-sse2
Large file support: enabled
RDTSCP usage:       enabled
Built on:           Fri Dec  2 08:16:34 UTC 2016
Built by:           root@eed842e89771 [CMAKE]
Build OS/arch:      Linux 4.4.27-boot2docker x86_64
Build CPU vendor:   GenuineIntel
Build CPU brand:    Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
Build CPU family:   6   Model: 69   Stepping: 1
Build CPU features: aes apic avx clfsh cmov cx8 cx16 lahf_lm mmx msr nonstop_tsc pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
C compiler:         /usr/bin/cc GNU cc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
C compiler flags:   -mavx   -D_FORTIFY_SOURCE=2  -fstack-protector  -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wno-unused-parameter -Wno-array-bounds -Wno-maybe-uninitialized -Wno-strict-overflow   -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast  -O3 -DNDEBUG
  • テストラン
root% apt-get install git
root% git clone https://github.com/hnishi/testdata_gromacs.git
root% cd testdata_gromacs
root% ../../build_serial/gromacs-4.6.6-serial/bin/grompp  -f MD1.mdp -c ./replace_po_mg.gro -p ./topol.top -o md1.tpr > grompp.log 2>&1
root% mpirun -np 8 ../gromacs-4.6.6-mpi/bin/mdrun_mpi -deffnm md1 > md1.log2 2>&1 &
root% tail -n 6 md1.log

               Core t (s)   Wall t (s)        (%)
       Time:     9873.043     1234.789      799.6
                 (ns/day)    (hour/ns)
Performance:        1.400       17.148
Finished mdrun on node 0 Mon Dec 12 08:12:41 2016
dockerグループの作成

rootに入らずともdockerを使いたい場合

Warning: The docker group is equivalent to the root user; For details on how this impacts security in your system, see Docker Daemon Attack Surface for details.

Docker daemonをホストの起動時に自動で起動させたい場合

2.3.7 PGIコンパイラ

https://www.softek.co.jp/SPG/Pgi/pgi_community.html

PGI Community Editionが、リリース日から1年間無償で利用できる。

2.3.7.1 ビルド
参考資料

日本語

英語documents

英語pdf

  • システムには、gcc、gcc-c++、gcc-fortran (もしくは gcc-g77) の GNU GCC 開発環境用のパッケージのライブラリがインストールされている必要がある
  • 確認する
$ rpm -q --qf '%{NAME}-%{VERSION}-%{RELEASE} (%{ARCH})\n' gcc gcc-c++ gcc-gfortran glibc
gcc-4.8.5-4.el7 (x86_64)
gcc-c++-4.8.5-4.el7 (x86_64)
gcc-gfortran-4.8.5-4.el7 (x86_64)
glibc-2.17-106.el7_2.8 (x86_64)
  • もし入っていなければ、以下のようにインストールする
% yum groupinstall 'Development tools'
  • Linux Standard Base(LSB) packageの実装が必須である
  • 確認する
$ rpm -qa|grep -i lsb
  • インストールされていなかったので、以下のようにインストールする
% yum install redhat-lsb.x86_64
$ lsb_release
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch

The PGI
software can be installed on a single node, and the node can be treated as if it is a cluster.

$ mkdir pgilinux-2016-1610-x86_64
$ cd pgilinux-2016-1610-x86_64
$ tar -zxpf ../pgilinux-2016-1610-x86_64.tar.gz
$ su
% ./install
  • ./installを実行すると、対話的にセッティングが行われる
  • 以下の内容でセッティングした

    1. Consent to the PGI End-User License Agreement (EULA): accept
    2. A network installation: 1
    3. Installation directory: /opt/sw/pgi
    4. CUDA Toolkit License Agreement: accept
    5. AMD OpenCL License Agreement: decline
    6. JAVA JRE License Agreement: accept
    7. PGI OpenACC Unified Memory License Agreement: accept
    8. Do you wish to update/create links in the 2016 directory: y
    9. Open MPI library installation: y
    10. NVIDIA GPU support in Open MPI: y
    11. to generate license keys or configure license service: n
    12. the files in the install directory to be read-only: n
  • /opt/sw/pgi にインストールされたので、環境変数をセッティング

$ export PGI=/opt/sw/pgi #(コンパイラの installation directoryを定義)
$ export PATH=$PGI/linux86-64/16.10/bin:$PGI/linux86-64/2016/mpi/openmpi-1.10.2/bin:$PATH
$ export MANPATH=$MANPATH:$PGI/linux86-64/16.10/man
$ export LM_LICENSE_FILE=$PGI/license.dat
$ pgcc -V

pgcc 16.10-0 64-bit target on x86-64 Linux -tp sandybridge
The Portland Group - PGI Compilers and Tools
Copyright (c) 2016, NVIDIA CORPORATION.  All rights reserved.
  • pgiのCコンパイラ(pgcc)の動作確認
$ tee hello.c << _EOF
main(){
  puts("hello");
}
_EOF
$ pgcc hello.c
$ ./a.out
hello
2.3.7.2 PGI版Gromacsのコンパイル
$ cd /home/hnishi/src/gromacs-2016.1/build_pgi
$ export CPP=cpp
$ export CXX=pgc++
$ export CC=pgcc
$ export CCDIR=/opt/sw/pgi/linux86-64/16.10/bin
$ cmake .. \
 -DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-pgi \
 -DCMAKE_C_COMPILER=/opt/sw/pgi/linux86-64/16.10/bin/pgcc \
 -DCMAKE_CXX_COMPILER=/opt/sw/pgi/linux86-64/16.10/bin/pgc++ \
 -DGMX_SIMD=None \
 -DGMX_MPI=OFF  \
 -DGMX_BUILD_OWN_FFTW=ON
$ make
...
[  1%] Building CXX object src/gromacs/CMakeFiles/libgromacs.dir/fileio/xtcio.cpp.o
"/usr/include/c++/4.8.5/cstdint", line 51: error: expected an identifier
    using ::int64_t;
            ^

"/home/hnishi/src/gromacs-2016.1/src/gromacs/math/functions.h", line 152: error:
          expected a ";"
  std::int64_t
       ^

9 errors detected in the compilation of "/home/hnishi/src/gromacs-2016.1/src/gromacs/fileio/xtcio.cpp".
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/fileio/xtcio.cpp.o] Error 2
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2

ビルドに失敗。理由不明。

以下、気になる点。インストールしたpgiのbinの中のpgcが動かないようである。コンパイルの失敗と関係があるのか?

$ pgc -V
PGC-S-0011-Unrecognized command line switch: -V
PGC/x86-64 Linux 16.10-0: compilation completed with severe errors

まとめ

Gromacs2016.1 on CentOS 7.2

compiler compile optimization FFTW version core openMP thread-MPI openMPI wall t (s) simulation (ns/day) Performance
gcc 4.8.5 None 3.3.5 1 1 1 - 61893.469 0.028 100 %
gcc 4.8.5 None 3.3.5 8 8 1 - 8155.905 0.212 758.9 %
gcc 4.8.5 None 3.3.5 8 1 8 - 8236.211 0.210 751.5 %
gcc 4.8.5 None 3.3.5 8 1 - 8 8098.182 0.213 764.3 %
gcc 4.8.5 SSE4.1 3.3.5 1 1 1 - 11663.013 0.148 530.7 %
gcc 4.8.5 SSE4.1 3.3.5 8 8 1 - 1580.811 1.093 3915.3 %
gcc 4.8.5 SSE4.1 3.3.5 8 1 8 - 1571.665 1.100 3938.1 %
gcc 4.8.5 SSE4.1 3.3.5 8 1 - 8 1515.208 1.141 4084.8 %
gcc 4.8.5 AVX_256 3.3.4 1 1 - 1 9333.249 0.185 663.2 %
gcc 4.8.5 AVX_256 3.3.4 8 1 - 8 1223.706 1.412 5057.9 %
gcc 4.8.5 AVX_256 3.3.5 1 1 - 1 9220.935 0.187 671.2 %
gcc 4.8.5 AVX_256 3.3.5 8 8 - 1 1243.272 1.390 4978.3 %
gcc 4.8.5 AVX_256 3.3.5 8 1 - 8 1216.902 1.420 5086.2 %
Docker on CentOS 7.2

Gromacs-4.6.6 on Ubuntu14.04 on Docker

compiler compile optimization FFTW version core openMP thread-MPI openMPI wall t (s) simulation (ns/day) Performance
gcc 4.8.4 AVX_256 3.3.2 8 1 - 8 1234.789 1.400 5012.5 %

結論1:以下のようにビルドすれば最大限の効率化ができる。

  • 最適化 -> AVX_256(6.712倍)
  • FFTW version -> fftw-3.3.5(3.3.4に比べて1.012倍)
  • 並列化 -> openMPI(8並列の場合、serialに比べて7.577倍(94.7%)、opneMPに比べて1.007倍)

結論2:Dockerを利用しても、ほとんど性能に差が出ない

さらに比較できそうなこと

intelコンパイラとMKLライブラリの利用

(ただし以下のように、GNUコンパイラと効率はあまり変わらないらしい)

Generally, there is no advantage in using MKL with GROMACS, and FFTW is often faster.
The Intel compiler has historically been better at instruction scheduling, but recent gcc versions have proved to be as fast or sometimes faster than Intel.

http://www.gromacs.org/Documentation/Installation_Instructions_5.0