More than 1 year has passed since last update.

Ubuntu20.04にlammps(GPU対応)を導入する

Last updated at 2023-12-26Posted at 2023-04-30

分子動力学計算を行うためにlammpsを導入したいと思っていたのですが、GPUに対応させるのが結構面倒だったため、備忘録として記録してみます。

前提環境

OS: Ubuntu20.04
GPU: GTX1660Ti

事前準備

１．環境の更新

$sudo apt update
$sudo apt upgrade

必要なパッケージのインストール
以下の解説がとても助かりました。
https://qiita.com/hirta/items/efadb69c6166dbcf662f

２．GPUの認識確認
参考サイト：https://qiita.com/sabaku20XX/items/97db2c0bf7298e3a645c

$lspci | grep -i nvidia
$cat /proc/driver/nvidia/version
$nvidia-smi

GPUの型番とドライバーバージョンが正しいものを認識している事を確認する。
ドライバーのバージョンは後のCUDAをインストールする際に必要になるのでメモしておく。

※ドライバーがインストールされていなかった場合、NVIDIAのHPからインストールします。
https://www.nvidia.co.jp/Download/index.aspx?lang=jp
私の場合、ドライバーのバージョンは　525.116.03　でした。

3．CUDAのインストール

ドライバーのバージョンに対応したCUDAのバージョンを調べる。今回の環境では、525.116.03なのでCUDA12.1をダウンロードできるようです。
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions

NVIDIAサイトから所望のバージョンのCUDAをインストールする
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=20.04&target_type=deb_local

私の場合以下のコマンドを入力しました。

$get https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2004-12-1-local_12.1.1-530.30.02-1_amd64.deb
$sudo dpkg -i cuda-repo-ubuntu2004-12-1-local_12.1.1-530.30.02-1_amd64.deb
$sudo cp /var/cuda-repo-ubuntu2004-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
$sudo apt-get update
$sudo apt-get -y install cuda

※下のサイトを見るとインストールする際の要件が書いているため、必要に応じて確認する（gccのバ-ジョンが古いなどするとエラーを吐くかもしれないため）。
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

lammpsインストール

１．lammpsのHPからstable版をローカルにダウンロード
https://www.lammps.org/download.html　
（私のときは23 Jun 2022のバージョンでした)

２．ダウンロードしたディレクトリ内で解凍

$tar -xvf lammps-stable.tar.gz

３．cmakeを使ったコンパイル
lammpsマニュアルに詳しい手順があるため、それに準拠する。
https://docs.lammps.org/Build_cmake.html
また、コンパイルする際にどのパッケージをインストールするかをpresetsも使いながら../cmakeの手前に記述しておく
https://docs.lammps.org/Build_package.html#cmake-presets

$cd lammps-23Jun2022       
$mkdir build; cd build    
$cmake -C ../cmake/presets/basic.cmake -D PKG_GPU=on -D GPU_API=cuda -D GPU_ARCH=sm_90 -DBIN2C=/usr/local/cuda-12.1/bin/bin2c ../cmake         
$cmake --build .         
$make install

※GPU_ARCHはインストールしているCUDAバージョンによって決まるので、適宜変更する。

※make install 後に別のパッケージを入れたいときは、一度コンパイル内容をリセットした上で再度cmake ../cmakeを行う必要がある。

PATHを通す

CUDAの場所とコンパイルしたlammpsの場所にPATHを通したいので、以下をbashrcに追記する。

$export PATH=/home/user/.local/bin:$PATH
$export PATH=/usr/local/cuda/bin:$PATH

GPU計算を実行する

lammps-23jun2022/exsamples/melt/内のin.meltをGPUで実行する。

$mpirun -np 2 lmp -sf gpu -pk gpu 1 -in in.melt

うまくいけば以下のような表示のあとに計算が実行される。

LAMMPS (23 Jun 2022 - Update 3)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
package gpu 0
package gpu 1
# 3d Lennard-Jones melt

units		lj
atom_style	atomic

lattice		fcc 0.8442
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
region		box block 0 30 0 30 0 30
create_box	1 box
Created orthogonal box = (0 0 0) to (50.387886 50.387886 50.387886)
  1 by 1 by 2 MPI processor grid
create_atoms	1 box
Created 108000 atoms
  using lattice units in orthogonal box = (0 0 0) to (50.387886 50.387886 50.387886)
  create_atoms CPU = 0.006 seconds
mass		1 1.0

velocity	all create 3.0 87287 loop geom

pair_style	lj/cut 2.5
pair_coeff	1 1 1.0 1.0 2.5

neighbor	0.3 bin
neigh_modify	every 20 delay 0 check no

fix		1 all nve

dump		id all atom 50 dump.melt

#dump		2 all image 25 image.*.jpg type type #		axes yes 0.8 0.02 view 60 -30
#dump_modify	2 pad 3

#dump		3 all movie 25 movie.mpg type type #		axes yes 0.8 0.02 view 60 -30
#dump_modify	3 pad 3

thermo		50
run		100000

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:

- GPU package (short-range, long-range and three-body potentials):

@Article{Brown11,
 author = {W. M. Brown, P. Wang, S. J. Plimpton, A. N. Tharrington},
 title = {Implementing Molecular Dynamics on Hybrid High Performance Computers - Short Range Forces},
 journal = {Comp.~Phys.~Comm.},
 year =    2011,
 volume =  182,
 pages =   {898--911}
}

@Article{Brown12,
 author = {W. M. Brown, A. Kohlmeyer, S. J. Plimpton, A. N. Tharrington},
 title = {Implementing Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle Particle-Mesh},
 journal = {Comp.~Phys.~Comm.},
 year =    2012,
 volume =  183,
 pages =   {449--459}
}

@Article{Brown13,
 author = {W. M. Brown, Y. Masako},
 title = {Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body Potentials},
 journal = {Comp.~Phys.~Comm.},
 year =    2013,
 volume =  184,
 pages =   {2785--2793}
}

@Article{Trung15,
 author = {T. D. Nguyen, S. J. Plimpton},
 title = {Accelerating dissipative particle dynamics simulations for soft matter systems},
 journal = {Comput.~Mater.~Sci.},
 year =    2015,
 volume =  100,
 pages =   {173--180}
}

@Article{Trung17,
 author = {T. D. Nguyen},
 title = {GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations},
 journal = {Comp.~Phys.~Comm.},
 year =    2017,
 volume =  212,
 pages =   {113--122}
}

@Article{Nikolskiy19,
 author = {V. Nikolskiy, V. Stegailov},
 title = {GPU acceleration of four-site water models in LAMMPS},
 journal = {Proceeding of the International Conference on Parallel Computing (ParCo 2019), Prague, Czech Republic},
 year =    2019
}

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Generated 0 of 0 mixed pair_coeff terms from geometric mixing rule
Per MPI rank memory allocation (min/avg/max) = 19.08 | 19.08 | 19.08 Mbytes
   Step          Temp          E_pair         E_mol          TotEng         Press     
         0   3             -6.7733675      0             -2.2734092     -3.7027414    
        50   1.6702985     -4.7866625      0             -2.2812379      5.6590718    
       100   1.6543097     -4.7617353      0      
~~~
以下計算結果が続く

CPU計算とGPU計算の比較

in.meltの系をｘ，ｙ，ｚそれぞれ３倍にし、解析ステップ数を100,000とした場合

CPUのみ

$mpirun -np 8 lmp -in in.melt

計算時間：17分

GPUを使用した場合

$mpirun -np 2 lmp -sf gpu -pk gpu 1 -in in.melt

計算時間：5分

5万程度のGPUでも大幅に計算速度が向上していることが確認出来ました。

計算結果のアニメーション

※追記
RTX A4000 の場合、同計算が30秒で終了した。　粒子数の多い計算にGPUは効果的のようだ。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up