More than 1 year has passed since last update.

CUDA で GPU の戦闘力の差を実感する...

Last updated at 2024-05-16Posted at 2016-07-07

Overview

GTX-1080 など GeForce を Ubuntu 16.04 LTS で CUDA-8.0RC と共に使うで、Ubuntu 16.04LTSの上に、CUDA-8.0RCを使う環境を構築しました.

というわけで、performance (戦闘力)の差を確認しましょう

基本に戻ってハードウェアスペックの差 (Turing世代を追加)

GeForce 型番	コアクロック	CUDAコア数	最大消費電力	論理performance	750Ti比性能
RTX4060	1830	3072	115	5621760	8.6
RTX3070	1500	5888	220	8832000	13.5
TITAX RTX	1350	4608	280	6220800	9.5
RTX2080 Ti	1350	4352	250	5875200	9.0
RTX2080 Super	1605	3072	250	4930560	7.6
RTX2080	1515	2944	215	4460160	6.8
RTX2070 Super	1605	2560	215	4108800	6.3
RTX2070	1410	2304	175	3248640	5.0
RTX2060 Super	1470	2176	175	3198720	4.9
RTX2060	1365	1920	160	2620800	4.0
GTX1660Ti	1500	1536	120	2304000	3.5
GTX1660	1530	1408	120	2154240	3.3
GTX1650Super	1530	1280	100	1958400	3.0
GTX1650	1485	896	75	1330560	2.0
TITAN X	1417	3584	250	5078528	7.8
GTX1080 Ti	1480	3584	250	5304320	8.1
GTX 1080	1607	2560	180	4113920	6.3
GTX 1070	1506	1920	150	2891520	4.4
GTX 1060※	1506	1280	120	1927680	3.0
GTX1050	1000	1024		1024000	1.6
GTX 750 Ti	1020	640	60	652800	1.0

つまるところ、6倍の絶対演算性能の差がHW上はある、ということです。
そして、180W/6=30Wなので、消費電力あたりの性能は750Tiの2倍ということですね。

メモリーの量が8GB(GTX-1080) vs 2GB (GTX-750Ti)とかもあるので、そのあたりがどうなのか?という点です。

benchmark

では実際に走らせてみましょう。

GTX-750Ti

hidenorly@ubuntu-gtx:~/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody --benchmark --numbodies=256000 --device=0
..snip..

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 750 Ti
> Compute 5.0 CUDA device: [GeForce GTX 750 Ti]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 14209.303 ms
= 46.122 billion interactions per second
= 922.438 single-precision GFLOP/s at 20 flops per interaction

GTX-1080

~/work/cuda/NVIDIA_CUDA-8.0_Samples/5_Simulations/nbody$ ./nbody -benchmark -numbodies=256000 -device=0

..snip..

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "GeForce GTX 1080
> Compute 6.1 CUDA device: [GeForce GTX 1080]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 2401.327 ms
= 272.916 billion interactions per second
= 5458.315 single-precision GFLOP/s at 20 flops per interaction

RTX4060 (Ubuntu 22.04LTS direct (550 open driver))

$ ./nbody --benchmark --numbodies=256000 --device=0
..snip..
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Ada
> Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4060]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 1566.519 ms
= 418.354 billion interactions per second
= 8367.086 single-precision GFLOP/s at 20 flops per interaction

RTX4060 (WSL2 Ubuntu 22.04) (31.0.15.5186)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Ada
> Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4060]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 1565.827 ms
= 418.539 billion interactions per second
= 8370.784 single-precision GFLOP/s at 20 flops per interaction

うーん、誤差のレベルになってしまった・・・
これならメモリー増やしてWSL2でもよいかも・・・・

RTX3070 (cuda-12.4.1/550 driver on ubuntu 22.04)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
gpuDeviceInit() CUDA Device [0]: "Ampere
> Compute 8.6 CUDA device: [NVIDIA GeForce RTX 3070]
number of bodies = 256000
256000 bodies, total time for 10 iterations: 1182.832 ms
= 554.060 billion interactions per second
= 11081.199 single-precision GFLOP/s at 20 flops per interaction

結果

綺麗に、5438.315 / 922.438 = 5.9465 という感じで、物理的な演算能力の差が、benchmarkでも綺麗に出ました。

約6倍のパフォーマンスですが、価格は今のところGTX-750Tiの10倍でGTX-1080ということで、このあたりを天秤にかければ良いと思います。
(750Tiでもi7-870のCPU単体よりも10倍はChainerでword2vecでも速いという結果があります。わずか1万2000円ほどの投資で・・・)

なお、750Tiは3DMark2016でも、20fpsぐらいしか出ないので、本来はその3倍はパフォーマンスがほしいです。ですので、今後、GTX-1060とか出て、時間が経過してこなれたら購入すれば良いかもしれませんね。

別途word2vecとかでも後で走らせて性能比較しておきます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

CUDA で GPU の 戦闘力の差 を 実感する...