Gromacsビルドログ
Gromacs公式ホームページ
http://www.gromacs.org/
作成日
2016年12月9日
最終更新日
2016年12月18日
環境
$ uname -a
Linux HOSTNAME 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/*release|uniq
CentOS Linux release 7.2.1511 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.2.1511 (Core)
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 1
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Model name: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Stepping: 7
CPU MHz: 1199.953
BogoMIPS: 4605.16
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0-5
NUMA node1 CPU(s): 6-11
$ free -m
total used free shared buff/cache available
Mem: 31983 1295 17945 29 12742 30133
Swap: 16127 0 16127
1. 目的
1.1 目標
- centOS 7.2にGromacs 2016.1を入れる。
- コンパイラや最適化オプションにおけるPerformanceの違いを検証する。
- Dockerを利用した場合のPerformanceの違いを検証する。
1.2 作業手順
- ビルドに必要なパッケージのインストール
- Gromacs ビルド
- Gromacs 動作確認
- テストラン
- Performanceの比較検証
2. 作業ログ
2.1 ビルドに必要なパッケージのインストール
- yum レポジトリのアップデート
$ su
% yum update -y
...
2.1.1 cmake
GROMACS builds with the CMake build system, requiring at least version 2.8.8.
- yumで利用できるcmakeのバージョン確認
% yum list |grep cmake
cmake.x86_64 2.8.11-5.el7 base
cmake-gui.x86_64 2.8.11-5.el7 base
- 2.8.8以上のバージョンが利用できるのでインストール
% yum install cmake -y
...
% cmake --version
cmake version 2.8.11
2.1.2 gcc
We recommend gcc, because it is free, widely available and frequently provides the best performance.
Minimum supported compiler versions are * GNU (gcc) 4.6 * Intel (icc) 14 * LLVM (clang) 3.4 * Microsoft (MSVC) 2015 Other compilers may work (Cray, Pathscale, older clang) but do not offer competitive performance.
We recommend against PGI because the performance with C++ is very bad.
The Intel and GNU compilers produce much faster GROMACS executables than the PGI and Cray compilers.
- 既にgccが入っていたので、バージョンが4.6以上であることを確認
% gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
2.2 Gromacs 2016.1ビルド
2.2.1 ビルド
% exit
$ mkdir ~/src
$ cd ~/src
$ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2016.1.tar.gz
$ tar -zxvf gromacs-2016.1.tar.gz
$ mkdir build_serial
$ cd build_serial
$ cmake .. \
-DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-serial \
-DGMX_SIMD=None \
-DGMX_BUILD_OWN_FFTW=ON
$ make
...
$ su
% make install
% exit
- makeするとき-jオプションで並列処理できる。
例)$ make -j 4
- 高速フーリエ変換ライブラリにFFTWを使った。
http://www.gromacs.org/Documentation/Installation_Instructions_5.0#fast-fourier-transform-library
Many simulations in GROMACS make extensive use of fast Fourier transforms, and a software library to perform these is always required. We recommend FFTW (version 3 or higher only) or Intel MKL.
Generally, there is no advantage in using MKL with GROMACS, and FFTW is often faster.
- デフォルトで、cmakeするときにgccの最適化オプションの設定が、AVX512向けになるようだ。ワークステーションのCPU(Sandy Bridge)がAVX_512に対応していないため、cmakeするときに-DGMX_SIMD=Noneのフラッグでオフにした。(AVX_256は対応しているようだが、-DGMX_SIMD=AVX_256でcmakeしてもmakeの時に失敗する。gccのバージョンが古いためか?)
2.2.2 テスト
2.2.2.1 動作確認
$ /opt/sw/gromacs-2016.1-serial/bin/gmx --version
:-) GROMACS - gmx, 2016.1 (-:
GROMACS is written by:
Emile Apol Rossen Apostolov Herman J.C. Berendsen Par Bjelkmar
Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof
Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis
Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson
Justin A. Lemkul Magnus Lundborg Pieter Meulenhoff Erik Marklund
Teemu Murtola Szilard Pall Sander Pronk Roland Schulz
Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman
Teemu Virolainen Christian Wennberg Maarten Wolf
and the project leaders:
Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2015, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
GROMACS: gmx, version 2016.1
Executable: /opt/sw/gromacs-2016.1-serial/bin/gmx
Data prefix: /opt/sw/gromacs-2016.1-serial
Working dir: /home/hnishi/test_gromacs/serial
Command line:
gmx --version
GROMACS version: 2016.1
Precision: single
Memory model: 64 bit
MPI library: thread_mpi
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: NONE
FFT library: fftw-3.3.5
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: disabled
Tracing support: disabled
Built on: Fri Dec 9 18:35:06 JST 2016
Built by: hnishi@HOSTNAME [CMAKE]
Build OS/arch: Linux 3.10.0-327.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Build CPU family: 6 Model: 45 Stepping: 7
Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /usr/bin/cc GNU 4.8.5
C compiler flags: -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler: /usr/bin/c++ GNU 4.8.5
C++ compiler flags: -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast
* パスを通すには、bashを使う場合は $ source /opt/sw/gromacs-2016.1-serial-SSE4.1/bin/GMXRC.bash
すればよい。他にGMXRC.csh、GMXRC.zshも用意されている。
2.2.2.2 テストラン
$ cd your/work/directory
$ git clone https://github.com/hnishi/testdata_gromacs.git
$ cd testdata_gromacs
$ /opt/sw/gromacs-2016.1-serial/bin/gmx grompp -f MD1.mdp -c ./replace_po_mg.gro -p ./topol.top -o md1.tpr > grompp.log 2>&1
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 8 > md1.log2 2>&1
$ tail -n 6 md1.log
Core t (s) Wall t (s) (%)
Time: 65247.241 8155.905 800.0
2h15:55
(ns/day) (hour/ns)
Performance: 0.212 113.265
Finished mdrun on rank 0 Sat Dec 10 13:44:37 2016
- gromppでtprファイル(計算のためのパラメータを格納したバイナリファイル)の準備
- mdrunで分子動力学シミュレーションの実行(-ntomp オプションはopenMPで使うコア数の指定)
- 計算終了後、md1.logの最後にプログラムの実行時間が出力される。
*テストデータ: 384886 Atoms, 10000 steps, 20 ps
2.3 Performanceの比較検証
2.3.1 one core
- openMPとthread-MPIで使うコア数を指定する
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 1 -ntmpi 1 > md1.log2 2>&1
2.3.2 thread-mpi
- openMPは1コア、thread-MPIは8コア
$ /opt/sw/gromacs-2016.1-serial/bin/gmx mdrun -deffnm md1 -ntomp 1 -ntmpi 8 > md1.log2 2>&1
$ tail -n 6 md1.log
Core t (s) Wall t (s) (%)
Time: 65889.685 8236.211 800.0
2h17:16
(ns/day) (hour/ns)
Performance: 0.210 114.380
Finished mdrun on rank 0 Sat Dec 10 21:59:08 2016
openMPの場合と、ほぼ同じPerformanceとなった。
2.3.3 SSE4.1最適化
http://www.gromacs.org/Documentation/Installation_Instructions_5.0#simd-support
SSE4.1 Present in all Intel core processors since 2007, but notably not in AMD magny-cours. Still, almost all recent processors support this, so this can also be considered a good baseline if you are content with portability between reasonably modern processors.
- cmakeの時のフラッグを、以下のように変更してビルド
$ cd /home/hnishi/src/gromacs-2016.1/
$ mkdir build_serial_sse4.1
$ cd build_serial_sse4.1
$ cmake .. \
-DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-serial-SSE4.1 \
-DGMX_SIMD=SSE4.1 \
-DGMX_BUILD_OWN_FFTW=ON
$ make
$ su
% make isntall
- 同様にテストランを行う
$ cd your/work/directory2
$ cp your/work/directory/md1.tpr .
$ /opt/sw/gromacs-2016.1-serial-SSE4.1/bin/gmx mdrun -deffnm md1 -ntomp 8 > md1.log2 2>&1
$ tail -n 6 md1.log
Core t (s) Wall t (s) (%)
Time: 12646.491 1580.811 800.0
(ns/day) (hour/ns)
Performance: 1.093 21.954
Finished mdrun on rank 0 Sat Dec 10 16:27:15 2016
約5倍の効率化がなされていることがわかった。
2.3.4 AVX_256最適化
http://www.gromacs.org/Documentation/Installation_Instructions_5.0#simd-support
- None For use only on an architecture either lacking SIMD, or to which GROMACS has not yet been ported and none of the options below are applicable.
- SSE2 This SIMD instruction set was introduced in Intel processors in 2001, and AMD in 2003. Essentially all x86 machines in existence have this, so it might be a good choice if you need to support dinosaur x86 computers too.
- SSE4.1 Present in all Intel core processors since 2007, but notably not in AMD magny-cours. Still, almost all recent processors support this, so this can also be considered a good baseline if you are content with portability between reasonably modern processors.
- AVX_128_FMA AMD bulldozer processors (2011) have this. Unfortunately Intel and AMD have diverged the last few years; If you want good performance on modern AMD processors you have to use this since it also allows the reset of the code to use AMD 4-way fused multiply-add instructions. The drawback is that your code will not run on Intel processors at all.
- AVX_256 This instruction set is present on Intel processors since Sandy Bridge (2011), where it is the best choice unless you have an even more recent CPU that supports AVX2. While this code will work on recent AMD processors, it is significantly less efficient than the AVX_128_FMA choice above - do not be fooled to assume that 256 is better than 128 in this case.
- AVX2_256 Present on Intel Haswell processors released in 2013, and it will also enable Intel 3-way fused multiply-add instructions. This code will not work on AMD CPUs.
- IBM_QPX BlueGene/Q A2 cores have this.
- Sparc64_HPC_ACE Fujitsu machines like the K computer have this.
Sandy Bridge世代である Intel(R) Xeon(R) CPU E5-2630 に適用できる最新のオプションはAVX_256のようである。
- -DGMX_SIMD=AVX2_256のオプションでcmakeしてビルドすると、FFTWのビルドのところで失敗する
checking whether C compiler accepts -mavx512f... no
configure: error: Need a version of gcc with -mavx512f
make[2]: *** [src/contrib/fftw/fftwBuild-prefix/src/fftwBuild-stamp/fftwBuild-configure] Error 1
make[1]: *** [src/contrib/fftw/CMakeFiles/fftwBuild.dir/all] Error 2
make: *** [all] Error 2
- FFTWを自動でダウンロード、ビルドしているが、そのときのオプションでAVX_512が有効になっているようである(AVX_512はSandy Bridgeで対応していない)
- cmake後にできる下記 build.make ファイルの 109 行目から --enable-avx512 の項目を削除してmakeする
$ sed -n -e 109p /home/hnishi/src/gromacs-2016.1/build_avx256/src/contrib/fftw/CMakeFiles/fftwBuild.dir/build.make
cd /home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/src/fftwBuild-build && /home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/src/fftwBuild/configure --prefix=/home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix --libdir=/home/hnishi/src/gromacs-2016.1-naoi/zzz/src/contrib/fftw/fftwBuild-prefix/lib --disable-fortran --disable-shared --enable-static --with-pic --enable-sse2 --enable-avx --enable-avx2 --enable-avx512 --enable-float
もしくはcmakeの前に、下記のように CMakeLists.txt の76行目から ; --enable-avx512 を削除してからcmakeしてビルドを行う
$ diff /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_org /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt
76c76,77
< set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2;--enable-avx512)
---
> #set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2;--enable-avx512)
> set(_fftw_simd_support_level --enable-sse2;--enable-avx;--enable-avx2)
$ cd /home/hnishi/src/gromacs-2016.1/build_avx256
$ cmake .. \
-DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-AVX256 \
-DGMX_SIMD=AVX_256 \
-DGMX_BUILD_OWN_FFTW=ON
$ make
% su
% make install
$ /opt/sw/gromacs-2016.1-AVX256/bin/gmx -version |grep SIMD
SIMD instructions: AVX_256
FFTW-3.3.4
- FFTW-3.3.4を利用したい場合には、以下のように CMakeLists.txt を書き換えてcmakeする
$ diff /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_org /home/hnishi/src/gromacs-2016.1/src/contrib/fftw/CMakeLists.txt_fftw-3.3.4
88c88,89
< set(EXTERNAL_FFTW_VERSION 3.3.5)
---
> #set(EXTERNAL_FFTW_VERSION 3.3.5)
> set(EXTERNAL_FFTW_VERSION 3.3.4)
93c94,95
< set(GMX_BUILD_OWN_FFTW_MD5 6cc08a3b9c7ee06fdd5b9eb02e06f569 CACHE STRING
---
> #set(GMX_BUILD_OWN_FFTW_MD5 6cc08a3b9c7ee06fdd5b9eb02e06f569 CACHE STRING
> set(GMX_BUILD_OWN_FFTW_MD5 2edab8c06b24feeb3b82bbb3ebf3e7b3 CACHE STRING
2.3.5 openMPI
$ su
% yum -y install openmpi openmpi-devel
$ exit
$ cd /home/hnishi/src/gromacs-2016.1/build_openmpi
$ cmake .. \
-DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-openmpi \
-DGMX_SIMD=SSE4.1 \
-DGMX_OPENMP=ON \
-DGMX_MPI=ON \
-DGMX_GPU=OFF \
-DCMAKE_C_COMPILER=/usr/lib64/openmpi/bin/mpicc \
-DCMAKE_CXX_COMPILER=/usr/lib64/openmpi/bin/mpicxx \
-DGMX_BUILD_OWN_FFTW=ON
$ make
$ su
% make install
% exit
$ /opt/sw/gromacs-2016.1-openmpi/bin/gmx_mpi --version |grep MPI
MPI library: MPI
$ cd /your/work/directory3
$ cp /your/work/directory2/md1.tpr .
$ /usr/lib64/openmpi/bin/mpirun --version
mpirun (Open MPI) 1.10.0
Report bugs to http://www.open-mpi.org/community/help/
$ /usr/lib64/openmpi/bin/mpirun -np 8 /opt/sw/gromacs-2016.1-openmpi/bin/gmx_mpi mdrun -deffnm md1 > md1.log2 2>&1
$ tail -n 6 md1.log
Core t (s) Wall t (s) (%)
Time: 12121.662 1515.208 800.0
(ns/day) (hour/ns)
Performance: 1.141 21.042
Finished mdrun on rank 0 Sun Dec 11 20:16:50 2016
2.3.6 Docker
2.3.6.1 Dockerインストール
参考資料
Prerequisites
- a 64-bit OS and version 3.10 or higher of the Linux kernel
$ uname -r
3.10.0-327.el7.x86_64
- yumにDocker-engineのレポジトリを追加する
$ su
% tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF
- Docker-engineのインストールと、docker-serviceの有効化、Docker daemonの起動、テスト(hello world)
% yum install docker-engine -y
% systemctl enable docker.service
% systemctl start docker
% docker run --rm hello-world
% docker version
Client:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
Server:
Version: 1.12.3
API version: 1.24
Go version: go1.6.3
Git commit: 6b644ec
Built:
OS/Arch: linux/amd64
2.3.6.2 Gromacs4.6.6 MPI版イメージの取得と実行
- GromacsイメージをDocker Hubから取得
% docker run -it hnishi/ubuntu14.04_gromacs-4.6.6:mpi_v1.0
root% cd /usr/local/src/gromacs-4.6.6/build_mpi
root% ./gromacs-4.6.6-mpi/bin/mdrun_mpi -version
:-) G R O M A C S (-:
Gromacs Runs On Most of All Computer Systems
:-) VERSION 4.6.6 (-:
Contributions from Mark Abraham, Emile Apol, Rossen Apostolov,
Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar,
Rudi van Drunen, Anton Feenstra, Gerrit Groenhof, Christoph Junghans,
Peter Kasson, Carsten Kutzner, Per Larsson, Pieter Meulenhoff,
Teemu Murtola, Szilard Pall, Sander Pronk, Roland Schulz,
Michael Shirts, Alfons Sijbers, Peter Tieleman,
Berk Hess, David van der Spoel, and Erik Lindahl.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2012,2013, The GROMACS development team at
Uppsala University & The Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.
:-) ../gromacs-4.6.6-mpi/bin/mdrun_mpi (-:
Program: ../gromacs-4.6.6-mpi/bin/mdrun_mpi
Gromacs version: VERSION 4.6.6
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled
GPU support: disabled
invsqrt routine: gmx_software_invsqrt(x)
CPU acceleration: AVX_256
FFT library: fftw-3.3.2-sse2
Large file support: enabled
RDTSCP usage: enabled
Built on: Fri Dec 2 08:16:34 UTC 2016
Built by: root@eed842e89771 [CMAKE]
Build OS/arch: Linux 4.4.27-boot2docker x86_64
Build CPU vendor: GenuineIntel
Build CPU brand: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
Build CPU family: 6 Model: 69 Stepping: 1
Build CPU features: aes apic avx clfsh cmov cx8 cx16 lahf_lm mmx msr nonstop_tsc pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
C compiler: /usr/bin/cc GNU cc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
C compiler flags: -mavx -D_FORTIFY_SOURCE=2 -fstack-protector -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wall -Wno-unused -Wunused-value -Wno-unused-parameter -Wno-array-bounds -Wno-maybe-uninitialized -Wno-strict-overflow -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3 -DNDEBUG
- テストラン
root% apt-get install git
root% git clone https://github.com/hnishi/testdata_gromacs.git
root% cd testdata_gromacs
root% ../../build_serial/gromacs-4.6.6-serial/bin/grompp -f MD1.mdp -c ./replace_po_mg.gro -p ./topol.top -o md1.tpr > grompp.log 2>&1
root% mpirun -np 8 ../gromacs-4.6.6-mpi/bin/mdrun_mpi -deffnm md1 > md1.log2 2>&1 &
root% tail -n 6 md1.log
Core t (s) Wall t (s) (%)
Time: 9873.043 1234.789 799.6
(ns/day) (hour/ns)
Performance: 1.400 17.148
Finished mdrun on node 0 Mon Dec 12 08:12:41 2016
dockerグループの作成
Warning: The docker group is equivalent to the root user; For details on how this impacts security in your system, see Docker Daemon Attack Surface for details.
Docker daemonをホストの起動時に自動で起動させたい場合
2.3.7 PGIコンパイラ
https://www.softek.co.jp/SPG/Pgi/pgi_community.html
PGI Community Editionが、リリース日から1年間無償で利用できる。
2.3.7.1 ビルド
参考資料
- システムには、gcc、gcc-c++、gcc-fortran (もしくは gcc-g77) の GNU GCC 開発環境用のパッケージのライブラリがインストールされている必要がある
- 確認する
$ rpm -q --qf '%{NAME}-%{VERSION}-%{RELEASE} (%{ARCH})\n' gcc gcc-c++ gcc-gfortran glibc
gcc-4.8.5-4.el7 (x86_64)
gcc-c++-4.8.5-4.el7 (x86_64)
gcc-gfortran-4.8.5-4.el7 (x86_64)
glibc-2.17-106.el7_2.8 (x86_64)
- もし入っていなければ、以下のようにインストールする
% yum groupinstall 'Development tools'
- Linux Standard Base(LSB) packageの実装が必須である
- 確認する
$ rpm -qa|grep -i lsb
- インストールされていなかったので、以下のようにインストールする
% yum install redhat-lsb.x86_64
$ lsb_release
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
- PGIコンパイラのソースを以下のURLからダウンロード
http://www.pgroup.com/products/community.htm
The PGI
software can be installed on a single node, and the node can be treated as if it is a cluster.
$ mkdir pgilinux-2016-1610-x86_64
$ cd pgilinux-2016-1610-x86_64
$ tar -zxpf ../pgilinux-2016-1610-x86_64.tar.gz
$ su
% ./install
- ./installを実行すると、対話的にセッティングが行われる
- 以下の内容でセッティングした
- Consent to the PGI End-User License Agreement (EULA): accept
- A network installation: 1
- Installation directory: /opt/sw/pgi
- CUDA Toolkit License Agreement: accept
- AMD OpenCL License Agreement: decline
- JAVA JRE License Agreement: accept
- PGI OpenACC Unified Memory License Agreement: accept
- Do you wish to update/create links in the 2016 directory: y
- Open MPI library installation: y
- NVIDIA GPU support in Open MPI: y
- to generate license keys or configure license service: n
- the files in the install directory to be read-only: n
- /opt/sw/pgi にインストールされたので、環境変数をセッティング
$ export PGI=/opt/sw/pgi #(コンパイラの installation directoryを定義)
$ export PATH=$PGI/linux86-64/16.10/bin:$PGI/linux86-64/2016/mpi/openmpi-1.10.2/bin:$PATH
$ export MANPATH=$MANPATH:$PGI/linux86-64/16.10/man
$ export LM_LICENSE_FILE=$PGI/license.dat
$ pgcc -V
pgcc 16.10-0 64-bit target on x86-64 Linux -tp sandybridge
The Portland Group - PGI Compilers and Tools
Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved.
- pgiのCコンパイラ(pgcc)の動作確認
$ tee hello.c << _EOF
main(){
puts("hello");
}
_EOF
$ pgcc hello.c
$ ./a.out
hello
2.3.7.2 PGI版Gromacsのコンパイル
$ cd /home/hnishi/src/gromacs-2016.1/build_pgi
$ export CPP=cpp
$ export CXX=pgc++
$ export CC=pgcc
$ export CCDIR=/opt/sw/pgi/linux86-64/16.10/bin
$ cmake .. \
-DCMAKE_INSTALL_PREFIX=/opt/sw/gromacs-2016.1-pgi \
-DCMAKE_C_COMPILER=/opt/sw/pgi/linux86-64/16.10/bin/pgcc \
-DCMAKE_CXX_COMPILER=/opt/sw/pgi/linux86-64/16.10/bin/pgc++ \
-DGMX_SIMD=None \
-DGMX_MPI=OFF \
-DGMX_BUILD_OWN_FFTW=ON
$ make
...
[ 1%] Building CXX object src/gromacs/CMakeFiles/libgromacs.dir/fileio/xtcio.cpp.o
"/usr/include/c++/4.8.5/cstdint", line 51: error: expected an identifier
using ::int64_t;
^
"/home/hnishi/src/gromacs-2016.1/src/gromacs/math/functions.h", line 152: error:
expected a ";"
std::int64_t
^
9 errors detected in the compilation of "/home/hnishi/src/gromacs-2016.1/src/gromacs/fileio/xtcio.cpp".
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/fileio/xtcio.cpp.o] Error 2
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2
ビルドに失敗。理由不明。
以下、気になる点。インストールしたpgiのbinの中のpgcが動かないようである。コンパイルの失敗と関係があるのか?
$ pgc -V
PGC-S-0011-Unrecognized command line switch: -V
PGC/x86-64 Linux 16.10-0: compilation completed with severe errors
まとめ
Gromacs2016.1 on CentOS 7.2
compiler | compile optimization | FFTW version | core | openMP | thread-MPI | openMPI | wall t (s) | simulation (ns/day) | Performance |
---|---|---|---|---|---|---|---|---|---|
gcc 4.8.5 | None | 3.3.5 | 1 | 1 | 1 | - | 61893.469 | 0.028 | 100 % |
gcc 4.8.5 | None | 3.3.5 | 8 | 8 | 1 | - | 8155.905 | 0.212 | 758.9 % |
gcc 4.8.5 | None | 3.3.5 | 8 | 1 | 8 | - | 8236.211 | 0.210 | 751.5 % |
gcc 4.8.5 | None | 3.3.5 | 8 | 1 | - | 8 | 8098.182 | 0.213 | 764.3 % |
gcc 4.8.5 | SSE4.1 | 3.3.5 | 1 | 1 | 1 | - | 11663.013 | 0.148 | 530.7 % |
gcc 4.8.5 | SSE4.1 | 3.3.5 | 8 | 8 | 1 | - | 1580.811 | 1.093 | 3915.3 % |
gcc 4.8.5 | SSE4.1 | 3.3.5 | 8 | 1 | 8 | - | 1571.665 | 1.100 | 3938.1 % |
gcc 4.8.5 | SSE4.1 | 3.3.5 | 8 | 1 | - | 8 | 1515.208 | 1.141 | 4084.8 % |
gcc 4.8.5 | AVX_256 | 3.3.4 | 1 | 1 | - | 1 | 9333.249 | 0.185 | 663.2 % |
gcc 4.8.5 | AVX_256 | 3.3.4 | 8 | 1 | - | 8 | 1223.706 | 1.412 | 5057.9 % |
gcc 4.8.5 | AVX_256 | 3.3.5 | 1 | 1 | - | 1 | 9220.935 | 0.187 | 671.2 % |
gcc 4.8.5 | AVX_256 | 3.3.5 | 8 | 8 | - | 1 | 1243.272 | 1.390 | 4978.3 % |
gcc 4.8.5 | AVX_256 | 3.3.5 | 8 | 1 | - | 8 | 1216.902 | 1.420 | 5086.2 % |
Docker on CentOS 7.2
Gromacs-4.6.6 on Ubuntu14.04 on Docker
compiler | compile optimization | FFTW version | core | openMP | thread-MPI | openMPI | wall t (s) | simulation (ns/day) | Performance |
---|---|---|---|---|---|---|---|---|---|
gcc 4.8.4 | AVX_256 | 3.3.2 | 8 | 1 | - | 8 | 1234.789 | 1.400 | 5012.5 % |
結論1:以下のようにビルドすれば最大限の効率化ができる。
- 最適化 -> AVX_256(6.712倍)
- FFTW version -> fftw-3.3.5(3.3.4に比べて1.012倍)
- 並列化 -> openMPI(8並列の場合、serialに比べて7.577倍(94.7%)、opneMPに比べて1.007倍)
結論2:Dockerを利用しても、ほとんど性能に差が出ない
さらに比較できそうなこと
intelコンパイラとMKLライブラリの利用
(ただし以下のように、GNUコンパイラと効率はあまり変わらないらしい)
Generally, there is no advantage in using MKL with GROMACS, and FFTW is often faster.
The Intel compiler has historically been better at instruction scheduling, but recent gcc versions have proved to be as fast or sometimes faster than Intel.