More than 1 year has passed since last update.

FluidX3Dのチュートリアル

Last updated at 2023-08-06Posted at 2023-08-06

この記事について

少し話題のLBM計算用ソフトウェアFluidX3Dを自宅のPCで触ってみました。
ベンチマークとチュートリアルをやってみたので、そのときのメモです。

FluidX3d

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs via OpenCL. Free for non-commercial use.

FluidX3DはLBM（格子ボルツマン法）を用いたCFDソフトウェアで、GPUを用いた高速演算ができます。
非商用、非軍事の用途には自由に使えるよう、以下のリポジトリでソースコードが公開されています。

環境

今回は以下の環境で試しました


CPU	Intel core i5 13400F
GPU	NVIDIA Geforce RTX 3060 Ti 8GB
RAM	32 GB
OS	Windows 11 Home

インストール

公式のドキュメントに従えば問題なくできました。
ソースコードをリポジトリからクローンしてきて、Visual Studioでコンパイル、実行するようです。

Download
Download and unzip the source code, or clone with git clone https://github.com/ProjectPhysX/FluidX3D.git.

Compiling the Source Code
There is no "installation" of the FluidX3D software. Instead, you have to compile the source code yourself.
I have made this as easy as possible and this documentation will guide you through it. Nontheless, some basic programming experience with C++ would be good, as all the setup scripts are written in C++.
First, compile the code as-is; this is the standard FP32 benchmark test case. By default, the fastest installed GPU will be selected automatically. Compile time is about 10 seconds.

Windows
Download and install Visual Studio Community. In Visual Studio Installer, add:

Desktop development with C++

MSVC v142

Windows 10 SDK

Open FluidX3D.sln in Visual Studio Community.

Compile and run by clicking the ► Local Windows Debugger button.

To select a specific GPU, open Windows CMD in the FluidX3D folder (type cmd in File Explorer in the directory field and press Enter), then run bin\FluidX3D.exe 0 to select device 0. You can also select multiple GPUs with bin\FluidX3D.exe 0 1 3 6 if the setup is configured as multi-GPU.

Windows 10 SDKの代わりに、Windows 11 SDKでも問題なく動作しました。

ベンチマーク

上記の手順でコンパイルすると、ベンチマークが走り出します。
LBMにおけるベンチマークの指標としてはMLUPs（Million Lattice Updates per Second）が使われることが多いようです。

The performance of Lattice Boltzmann simulations is typically measured in terms of MLUPS, or "Million Lattice Updates per Second." This means if your lattice has 1 million lattice sites and your simulation runs at a speed of 1 MLUPS, your lattice is updated once per second.

計算が終了すると、以下のようにPeakのMLUPsが表示されます。

🟢 GeForce RTX 3060 Ti 16.49 8 448 2644 (90%) 5129 (88%) 4718 (81%)

のページにあるベンチマーク結果（2644 MLUPs）と比較すると、良好のようです。
CPUだとせいぜい数百くらいなので、そこそこでのGPUでもかなり高速化できていることがわかります。
数千万の格子点の計算が手元のPCであっという間に計算できるのは夢があります。
また、

The fastest and most memory efficient lattice Boltzmann CFD software

というだけあって、GPUメモリの使用量も非常に少ないのがいいです。

なお、計算実行中はGPUの使用率が99 %となりました。CPUはそこまで使われてませんでした。

チュートリアル（遠心ファン）

以下の動画で解説されているチュートリアルをやってみます。
チュートリアルに必要なコードはすでに用意されていて、それをuncommentしたり、いらないものをコメントアウトしてやればOKです。

準備

setup.cppの編集

計算に必要な設定はsetup.cppに書かれています。
まずはベンチマークの設定をコメントアウトします。

以下のような部分があると思うので、voidの前に/*を入れてコメントアウトします。
また#ifdef BENCHMARKと#endif //BENCHMARKの行を削除します。

#ifdef BENCHMARK
void main_setup() { // benchmark; required extensions in defines.hpp: BENCHMARK, optionally FP16S or FP16C
	// ################################################################## define simulation box size, viscosity and volume force ###################################################################
	uint mlups = 0u; {

		//LBM lbm( 32u,  32u,  32u, 1.0f);
		//LBM lbm( 64u,  64u,  64u, 1.0f);
		//LBM lbm(128u, 128u, 128u, 1.0f);
		LBM lbm(256u, 256u, 256u, 1.0f); // default
		//LBM lbm(384u, 384u, 384u, 1.0f);
		//LBM lbm(512u, 512u, 512u, 1.0f);

		//const uint memory = 1488u; // memory occupation in MB (for multi-GPU benchmarks: make this close to as large as the GPU's VRAM capacity)
		//const uint3 lbm_N = resolution(float3(1.0f, 1.0f, 1.0f), memory); // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
		//LBM lbm(1u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 1u, 1u, 1u, 1.0f); // 1 GPU
		//LBM lbm(2u*lbm_N.x, 1u*lbm_N.y, 1u*lbm_N.z, 2u, 1u, 1u, 1.0f); // 2 GPUs
		//LBM lbm(2u*lbm_N.x, 2u*lbm_N.y, 1u*lbm_N.z, 2u, 2u, 1u, 1.0f); // 4 GPUs
		//LBM lbm(2u*lbm_N.x, 2u*lbm_N.y, 2u*lbm_N.z, 2u, 2u, 2u, 1.0f); // 8 GPUs

		// #########################################################################################################################################################################################
		for(uint i=0u; i<1000u; i++) {
			lbm.run(10u);
			mlups = max(mlups, to_uint((double)lbm.get_N()*1E-6/info.runtime_lbm_timestep_smooth));
		}
	} // make lbm object go out of scope to free its memory
	print_info("Peak MLUPs/s = "+to_string(mlups));
#if defined(_WIN32)
	wait();
#endif // Windows
} /**/
#endif // BENCHMARK

次に、radial fanのチュートリアル用のmain_setup()を検索して探します。
みつかったらuncommentして、有効化します。

void main_setup() { // radial fan; required extensions in defines.hpp: FP16S, MOVING_BOUNDARIES, SUBGRID, INTERACTIVE_GRAPHICS or GRAPHICS
	// ################################################################## define simulation box size, viscosity and volume force ###################################################################
	const uint3 lbm_N = resolution(float3(3.0f, 3.0f, 1.0f), 2000u); // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution
	const float lbm_Re = 100000.0f;
	const float lbm_u = 0.12f;
	const uint lbm_T = 5000u;
	const uint lbm_dt = 10u;
	LBM lbm(lbm_N, units.nu_from_Re(lbm_Re, (float)lbm_N.x, lbm_u));
	// ###################################################################################### define geometry ######################################################################################
	// const float radius = 0.25f*(float)lbm_N.x;
	const float radius = 0.05f * (float)lbm_N.x;
	const float3 center = float3(lbm.center().x, lbm.center().y, 0.36f*radius);
	const float lbm_omega=lbm_u/radius, lbm_domega=lbm_omega*lbm_dt;
	Mesh* mesh = read_stl(get_exe_path()+"../stl/FAN_Solid_Bottom.stl", lbm.size(), center, 2.0f*radius); // https://www.thingiverse.com/thing:6113/files
	const uint Nx=lbm.get_Nx(), Ny=lbm.get_Ny(), Nz=lbm.get_Nz(); parallel_for(lbm.get_N(), [&](ulong n) { uint x=0u, y=0u, z=0u; lbm.coordinates(n, x, y, z);
		if(x==0u||x==Nx-1u||y==0u||y==Ny-1u||z==0u) lbm.flags[n] = TYPE_S; // all non periodic
	}); // ####################################################################### run simulation, export images and data ##########################################################################
	lbm.graphics.visualization_modes = VIS_FLAG_LATTICE|VIS_FLAG_SURFACE|VIS_Q_CRITERION;
	lbm.run(0u); // initialize simulation
	while(lbm.get_t()<lbm_T) { // main simulation loop
		lbm.voxelize_mesh_on_device(mesh, TYPE_S, center, float3(0.0f), float3(0.0f, 0.0f, lbm_omega));
		lbm.run(lbm_dt);
		mesh->rotate(float3x3(float3(0.0f, 0.0f, 1.0f), lbm_domega)); // rotate mesh
#if defined(GRAPHICS) && !defined(INTERACTIVE_GRAPHICS)
		if(lbm.graphics.next_frame(lbm_T, 30.0f)) {
			lbm.graphics.set_camera_free(float3(0.353512f*(float)Nx, -0.150326f*(float)Ny, 1.643939f*(float)Nz), -25.0f, 61.0f, 100.0f);
			lbm.graphics.write_frame();
		}
#endif // GRAPHICS && !INTERACTIVE_GRAPHICS
	}
	lbm.u.write_device_to_vtk();
} /**/

defines.hppの編集

次にdefines.hppを編集します。このファイルでは必要なモジュール群を定義することができます。
必要なモジュールは、setup.cppのコメントに書かれています。

required extensions in defines.hpp: FP16S, MOVING_BOUNDARIES, SUBGRID, INTERACTIVE_GRAPHICS or GRAPHICS

FP16S
MOVING_BOUNDARYES
SUBGRID
INTERACTIVE_GRAPHICS or GRAPHICS

ジオメトリ

必要なジオメトリ（遠心ファンのSTLファイル）をコメントアウトで記されているこちらのサイトからダウンロードします。

ダウンロードして解凍したファイルを、stlフォルダを作成してその中にコピーします。

実行

ここまで準備できたら、あとは実行するだけです。
ベンチマークと同様、Visual StudioのローカルWindowsデバッガーのボタンを押すと、コンパイルと計算が開始します。

INTERACTIVE_GRAPHICS をONにしている場合は、全画面でアプリケーションが表示されます。
キーボードのpを押すと、計算が開始され、リアルタイムに計算結果が表示されます。
マウスでカメラの位置などを変更することができます。
デフォルトではＱ-Criterionが表示されているかと思います。
またキーボードのhを押すとヘルプが表示され、表示する物理量を変更したりすることができます。

GRAPHICS をONにしている場合は計算途中の画像が、bin/exportフォルダに出力されます。
GRAPHICSモードのほうがGPUの負荷は小さそうな気がします。

おまけ

グリッドサイズの変更

このチュートリアルでは、グリッドサイズは使用するメモリ量から決まるようです。

const uint3 lbm_N = resolution(float3(3.0f, 3.0f, 1.0f), 1000u); // input: simulation box aspect ratio and VRAM occupation in MB, output: grid resolution

の行で、メモリサイズ（1000u）の部分を大きくすると、その分グリッドが細かくなりました。

物体サイズの変更

物体サイズは、以下の行の0.25fの部分を変えてやると変わりました。

const float radius = 0.25f * (float)lbm_N.x;

VTKファイルの出力

while文の後、つまり計算のイテレーションが終了した後、に以下の一文を入れると、計算結果がvtkファイルをして出力されました。

lbm.u.write_device_to_vtk();

さいごに

もう少しいろいろ触ってみて、また追記しようと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up