mixbench-opencl-alt で計測しました.
amdgpu-pro 21.10 を利用.
110W に powerlimit しています.
------------------------ Device specifications ------------------------
Platform: AMD Accelerated Parallel Processing
Device: gfx1030/Advanced Micro Devices, Inc.
Driver version: 3246.0 (HSA1.1,LC)
Address bits: 64
GPU clock rate: 2475 MHz
Total global mem: 16368 MB
Max allowed buffer: 13912 MB
OpenCL version: OpenCL 2.0
Total CUs: 30
-----------------------------------------------------------------------
Buffer size: 64MB
Workgroup size: 256
Workitem stride: NDRange
Buffer allocation: Device allocated
Timer: CL event based
Loading kernel source file...
Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
0, 0.000, 13.96, 0.00, 615.13, 0.000, 11.43, 0.00,1502.86, 0.000, 5.44, 0.00,1579.14, 0.000, 5.47, 0.00,1571.27
1, 0.129, 5.33, 201.29,1560.02, 0.065, 10.94, 98.18,1521.78, 0.258, 5.31, 404.30,1566.68, 0.129, 5.40, 198.98,1542.09
2, 0.267, 5.16, 416.37,1561.39, 0.133, 10.81, 198.73,1490.49, 0.533, 5.19, 827.63,1551.81, 0.267, 5.18, 414.68,1555.06
3, 0.414, 5.01, 642.34,1552.32, 0.207, 11.05, 291.44,1408.64, 0.828, 5.76, 1119.03,1352.16, 0.414, 6.47, 498.01,1203.52
4, 0.571, 6.70, 640.77,1121.34, 0.286, 15.73, 273.05, 955.69, 1.143, 7.75, 1108.18, 969.66, 0.571, 8.28, 519.02, 908.29
5, 0.741, 8.35, 642.99, 868.04, 0.370, 17.91, 299.70, 809.20, 1.481, 7.77, 1382.25, 933.02, 0.741, 7.93, 676.93, 913.86
6, 0.923, 7.75, 831.37, 900.66, 0.462, 17.86, 360.63, 781.36, 1.846, 6.52, 1976.21,1070.45, 0.923, 7.13, 903.81, 979.13
7, 1.120, 6.43, 1168.23,1043.06, 0.560, 17.57, 427.73, 763.81, 2.240, 5.55, 2710.57,1210.08, 1.120, 6.88, 1091.77, 974.80
8, 1.333, 5.60, 1533.72,1150.29, 0.667, 17.81, 482.33, 723.50, 2.667, 4.81, 3568.66,1338.25, 1.333, 6.83, 1257.79, 943.34
9, 1.565, 4.89, 1974.78,1261.67, 0.783, 18.48, 522.81, 668.04, 3.130, 4.34, 4454.28,1422.90, 1.565, 6.90, 1400.32, 894.65
10, 1.818, 4.50, 2386.68,1312.67, 0.909, 18.67, 574.99, 632.49, 3.636, 3.78, 5680.22,1562.06, 1.818, 6.68, 1606.32, 883.47
11, 2.095, 3.87, 3048.26,1454.85, 1.048, 18.79, 628.54, 599.97, 4.190, 3.49, 6766.03,1614.62, 2.095, 6.73, 1753.91, 837.09
12, 2.400, 3.44, 3745.10,1560.46, 1.200, 19.11, 674.14, 561.78, 4.800, 3.07, 8394.99,1748.96, 2.400, 6.66, 1934.51, 806.05
13, 2.737, 3.08, 4528.11,1654.50, 1.368, 19.57, 713.24, 521.21, 5.474, 2.90, 9630.17,1759.36, 2.737, 6.98, 2000.44, 730.93
14, 3.111, 2.87, 5235.32,1682.78, 1.556, 20.63, 728.70, 468.45, 6.222, 2.66,11301.12,1816.25, 3.111, 7.13, 2107.39, 677.37
15, 3.529, 2.65, 6071.15,1720.16, 1.765, 20.66, 779.73, 441.85, 7.059, 2.37,13616.44,1929.00, 3.529, 7.09, 2272.49, 643.87
16, 4.000, 2.90, 5920.46,1480.11, 2.000, 21.14, 812.73, 406.36, 8.000, 2.88,11922.27,1490.28, 4.000, 7.31, 2350.36, 587.59
17, 4.533, 2.62, 6975.72,1538.76, 2.267, 22.83, 799.57, 352.75, 9.067, 2.60,14052.84,1549.95, 4.533, 7.94, 2299.32, 507.20
18, 5.143, 2.64, 7327.81,1424.85, 2.571, 24.21, 798.26, 310.43, 10.286, 2.46,15709.39,1527.30, 5.143, 8.08, 2393.24, 465.35
19, 5.846, 2.51, 8114.83,1388.06, 2.923, 24.91, 818.89, 280.15, 11.692, 2.29,17850.86,1526.72, 5.846, 8.41, 2426.88, 415.12
20, 6.667, 2.53, 8489.78,1273.47, 3.333, 25.71, 835.37, 250.61, 13.333, 2.24,19138.71,1435.40, 6.667, 8.55, 2510.64, 376.60
21, 7.636, 2.49, 9065.66,1187.17, 3.818, 26.47, 851.83, 223.10, 15.273, 2.19,20588.88,1348.08, 7.636, 8.89, 2536.79, 332.20
22, 8.800, 2.52, 9367.35,1064.47, 4.400, 27.58, 856.36, 194.63, 17.600, 2.18,21673.79,1231.47, 8.800, 9.12, 2589.57, 294.27
23, 10.222, 2.50, 9869.85, 965.53, 5.111, 28.02, 881.31, 172.43, 20.444, 2.12,23246.81,1137.07, 10.222, 9.27, 2663.90, 260.60
24, 12.000, 2.47,10445.93, 870.49, 6.000, 28.80, 894.67, 149.11, 24.000, 2.11,24469.46,1019.56, 12.000, 9.54, 2701.34, 225.11
25, 14.286, 2.49,10795.75, 755.70, 7.143, 29.87, 898.83, 125.84, 28.571, 2.10,25515.66, 893.05, 14.286, 9.74, 2755.61, 192.89
26, 17.333, 2.47,11284.22, 651.01, 8.667, 30.23, 923.43, 106.55, 34.667, 2.10,26534.79, 765.43, 17.333, 9.94, 2809.87, 162.11
27, 21.600, 2.46,11761.40, 544.51, 10.800, 31.10, 932.17, 86.31, 43.200, 2.11,27544.35, 637.60, 21.600, 10.16, 2854.21, 132.14
28, 28.000, 2.44,12325.21, 440.19, 14.000, 31.58, 952.09, 68.01, 56.000, 2.08,28951.26, 516.99, 28.000, 10.25, 2933.12, 104.75
29, 38.667, 2.41,12929.93, 334.39, 19.333, 32.16, 968.14, 50.08, 77.333, 2.10,29604.93, 382.82, 38.667, 10.56, 2949.59, 76.28
30, 60.000, 2.42,13288.83, 221.48, 30.000, 33.06, 974.46, 32.48, 120.000, 2.09,30880.64, 257.34, 60.000, 10.77, 2991.72, 49.86
31, 124.000, 2.39,13922.48, 112.28, 62.000, 33.13, 1004.82, 16.21, 248.000, 2.02,33004.11, 133.08, 124.000, 10.41, 3197.87, 25.79
32, inf, 2.19,15667.59, 0.00, inf, 32.10, 1070.35, 0.00, inf, 2.01,34248.07, 0.00, inf, 10.24, 3355.28, 0.00
-----------------------------------
Voila! おおむね仕様通り fp64 1 TFlops でました.
参考までに Radeon VII(145W に powerlimit)
------------------------ Device specifications ------------------------
Platform: AMD Accelerated Parallel Processing
Device: gfx906:sramecc-:xnack-/Advanced Micro Devices, Inc.
Driver version: 3246.0 (HSA1.1,LC)
Address bits: 64
GPU clock rate: 1801 MHz
Total global mem: 16368 MB
Max allowed buffer: 13912 MB
OpenCL version: OpenCL 2.0
Total CUs: 60
-----------------------------------------------------------------------
Buffer size: 64MB
Workgroup size: 256
Workitem stride: NDRange
Buffer allocation: Device allocated
Timer: CL event based
Loading kernel source file...
Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
0, 0.000, 8.83, 0.00, 973.01, 0.000, 18.44, 0.00, 931.88, 0.000, 8.61, 0.00, 997.36, 0.000, 8.56, 0.00,1003.78
1, 0.129, 10.73, 100.07, 775.51, 0.065, 20.06, 53.53, 829.79, 0.258, 10.58, 202.90, 786.25, 0.129, 10.31, 104.17, 807.31
2, 0.267, 10.33, 207.98, 779.92, 0.133, 19.69, 109.07, 818.02, 0.533, 10.19, 421.36, 790.04, 0.267, 10.30, 208.59, 782.21
3, 0.414, 9.67, 333.28, 805.43, 0.207, 19.11, 168.61, 814.93, 0.828, 9.25, 696.16, 841.19, 0.414, 9.56, 337.09, 814.63
4, 0.571, 9.12, 470.81, 823.92, 0.286, 18.47, 232.58, 814.04, 1.143, 9.18, 935.43, 818.50, 0.571, 9.35, 459.40, 803.94
5, 0.741, 9.07, 592.11, 799.35, 0.370, 17.77, 302.16, 815.82, 1.481, 8.97, 1196.79, 807.83, 0.741, 9.18, 584.81, 789.50
6, 0.923, 8.75, 736.02, 797.35, 0.462, 17.10, 376.81, 816.42, 1.846, 8.74, 1474.19, 798.52, 0.923, 8.67, 742.65, 804.54
7, 1.120, 8.16, 920.79, 822.14, 0.560, 16.42, 457.75, 817.41, 2.240, 8.01, 1877.36, 838.11, 1.120, 8.33, 902.81, 806.08
8, 1.333, 7.83, 1096.50, 822.38, 0.667, 16.00, 536.98, 805.47, 2.667, 7.79, 2204.76, 826.79, 1.333, 7.90, 1087.95, 815.96
9, 1.565, 7.39, 1306.89, 834.96, 0.783, 15.16, 637.27, 814.29, 3.130, 7.45, 2595.75, 829.20, 1.565, 7.63, 1266.68, 809.27
10, 1.818, 6.96, 1542.67, 848.47, 0.909, 14.91, 720.13, 792.14, 3.636, 6.89, 3115.92, 856.88, 1.818, 7.21, 1488.90, 818.89
11, 2.095, 6.64, 1778.07, 848.62, 1.048, 13.74, 859.89, 820.81, 4.190, 6.55, 3604.49, 860.16, 2.095, 7.04, 1678.91, 801.30
12, 2.400, 6.16, 2091.44, 871.44, 1.200, 13.64, 944.55, 787.13, 4.800, 6.16, 4182.78, 871.41, 2.400, 6.66, 1935.79, 806.58
13, 2.737, 5.68, 2455.80, 897.31, 1.368, 13.19, 1058.53, 773.54, 5.474, 5.60, 4987.12, 911.11, 2.737, 6.32, 2208.66, 807.01
14, 3.111, 5.41, 2779.68, 893.47, 1.556, 12.64, 1189.55, 764.71, 6.222, 5.53, 5432.70, 873.11, 3.111, 5.81, 2586.10, 831.25
15, 3.529, 5.04, 3192.76, 904.61, 1.765, 11.92, 1351.14, 765.65, 7.059, 4.94, 6520.14, 923.69, 3.529, 5.61, 2868.90, 812.86
16, 4.000, 5.70, 3014.71, 753.68, 2.000, 12.97, 1324.26, 662.13, 8.000, 5.75, 5973.24, 746.65, 4.000, 6.24, 2752.28, 688.07
17, 4.533, 5.36, 3404.23, 750.93, 2.267, 12.20, 1496.38, 660.17, 9.067, 5.40, 6766.67, 746.32, 4.533, 6.20, 2945.29, 649.70
18, 5.143, 5.21, 3709.08, 721.21, 2.571, 10.64, 1816.39, 706.37, 10.286, 5.05, 7651.69, 743.91, 5.143, 6.30, 3067.77, 596.51
19, 5.846, 4.60, 4432.46, 758.18, 2.923, 10.35, 1971.64, 674.51, 11.692, 4.63, 8804.32, 753.00, 5.846, 6.61, 3085.03, 527.70
20, 6.667, 4.39, 4890.68, 733.60, 3.333, 9.63, 2229.93, 668.98, 13.333, 4.35, 9862.94, 739.72, 6.667, 6.82, 3148.89, 472.33
21, 7.636, 3.98, 5672.42, 742.82, 3.818, 9.39, 2401.76, 629.03, 15.273, 3.95,11417.00, 747.54, 7.636, 7.22, 3123.56, 409.04
22, 8.800, 3.85, 6141.54, 697.90, 4.400, 9.69, 2437.57, 553.99, 17.600, 3.64,12985.88, 737.83, 8.800, 7.51, 3145.15, 357.40
23, 10.222, 3.27, 7542.01, 737.81, 5.111, 9.56, 2582.83, 505.34, 20.444, 3.45,14305.98, 699.75, 10.222, 7.77, 3178.35, 310.93
24, 12.000, 3.26, 7897.71, 658.14, 6.000, 9.85, 2616.23, 436.04, 24.000, 3.29,15662.54, 652.61, 12.000, 8.09, 3184.35, 265.36
25, 14.286, 3.03, 8855.10, 619.86, 7.143, 9.90, 2711.28, 379.58, 28.571, 2.98,18038.67, 631.35, 14.286, 8.02, 3345.00, 234.15
26, 17.333, 3.01, 9262.10, 534.35, 8.667, 9.74, 2865.25, 330.61, 34.667, 2.97,18803.69, 542.41, 17.333, 8.21, 3400.69, 196.19
27, 21.600, 2.97, 9771.35, 452.38, 10.800, 9.87, 2938.25, 272.06, 43.200, 2.81,20668.48, 478.44, 21.600, 8.37, 3461.92, 160.27
28, 28.000, 2.87,10457.53, 373.48, 14.000, 10.11, 2972.97, 212.36, 56.000, 2.94,20478.55, 365.69, 28.000, 8.54, 3518.81, 125.67
29, 38.667, 2.82,11030.53, 285.27, 19.333, 10.10, 3083.54, 159.49, 77.333, 2.88,21637.94, 279.80, 38.667, 8.75, 3559.17, 92.05
30, 60.000, 2.83,11379.93, 189.67, 30.000, 10.30, 3128.70, 104.29, 120.000, 2.84,22648.49, 188.74, 60.000, 8.94, 3603.47, 60.06
31, 124.000, 2.76,12069.69, 97.34, 62.000, 10.35, 3216.99, 51.89, 248.000, 2.82,23644.16, 95.34, 124.000, 9.09, 3660.30, 29.52
32, inf, 2.76,12441.71, 0.00, inf, 10.54, 3260.61, 0.00, inf, 2.70,25497.97, 0.00, inf, 9.09, 3780.10, 0.00
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
fp64 3.2 TF とさすがです.
RX6800, 単体では Radeon VII, Titan V などには及びませんが, メモリ 16 GB ありますので Vulkan レイトレしつつ倍精度も欲しいという場合には(必要に応じて)複数枚並べて使うのもよい選択肢になりそうです(ETH 60 MH/s @ 120W なのでマイニングで元もとれる)
ただ, 今はもう入手できないっぽいようなので, RX6900XT(1.5 TF) になるでしょうか.
amdgpu-pro 21.10 でやっと Linux でも Navi2 安定運用できてきた感がありますので(+ Vulkan ray tracing のベータ対応), Navi2 で倍精度計算も実務的になりそうやも
(GeForce EULA と異なり, データセンターでも動かし放題なのがよい)
その他
- RX6700XT: 仕様は 0.8 TF(800 GFlops)
- A6000, RTX 3090 の倍精度性能のメモ https://qiita.com/syoyo/items/46b9c7a890153d64acf6