ITとは関係の無い普段の仕事用にM1 MaxのMacbookを使っているんですが、機械学習で、NVIDIAの最新民生用ボードと比較してどれぐらいのパフォーマンスが出るかを知りたく、検証してみました。
コードはこちらの4に記載のスクリプトです。
https://developer.apple.com/metal/tensorflow-plugin/
*M1 Max 10-core CPU/ 32-core GPU
-Without tensorflow plugin
Epoch 1/5
2023-08-12 14:07:20.945626: W tensorflow/tsl/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
782/782 [==============================] - 348s 443ms/step - loss: 4.7625 - accuracy: 0.0782
Epoch 2/5
782/782 [==============================] - 345s 441ms/step - loss: 4.2499 - accuracy: 0.1255
Epoch 3/5
782/782 [==============================] - 348s 445ms/step - loss: 3.9645 - accuracy: 0.1518
Epoch 4/5
782/782 [==============================] - 359s 459ms/step - loss: 3.5721 - accuracy: 0.1895
Epoch 5/5
782/782 [==============================] - 367s 469ms/step - loss: 3.3477 - accuracy: 0.2222
CPU times: user 1h 23min 53s, sys: 15min 9s, total: 1h 39min 3s
Wall time: 29min 41s
-With Tensorflow plugin
Epoch 1/5
2023-08-12 14:40:22.793065: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.
782/782 [==============================] - 61s 65ms/step - loss: 4.8102 - accuracy: 0.0699
Epoch 2/5
782/782 [==============================] - 49s 63ms/step - loss: 4.4811 - accuracy: 0.0876
Epoch 3/5
782/782 [==============================] - 49s 63ms/step - loss: 4.2097 - accuracy: 0.1066
Epoch 4/5
782/782 [==============================] - 49s 63ms/step - loss: 4.2189 - accuracy: 0.0958
Epoch 5/5
782/782 [==============================] - 49s 62ms/step - loss: 3.7773 - accuracy: 0.1462
CPU times: user 4min 10s, sys: 54.9 s, total: 5min 5s
Wall time: 4min 25s
*Kaggle T4x2(参考)
Epoch 1/5
782/782 [==============================] - 84s 43ms/step - loss: 5.0559 - accuracy: 0.0460
Epoch 2/5
782/782 [==============================] - 33s 42ms/step - loss: 4.3158 - accuracy: 0.0787
Epoch 3/5
782/782 [==============================] - 33s 42ms/step - loss: 4.1034 - accuracy: 0.1102
Epoch 4/5
782/782 [==============================] - 34s 43ms/step - loss: 4.0032 - accuracy: 0.1279
Epoch 5/5
782/782 [==============================] - 33s 42ms/step - loss: 3.7757 - accuracy: 0.1461
*Kaggle TPU(参考)
Epoch 1/5
2023-08-12 07:10:31.223489: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignAddVariableOp.
2023-08-12 07:10:31.999218: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:954] model_pruner failed: INVALID_ARGUMENT: Graph does not contain terminal node AssignAddVariableOp.
782/782 [==============================] - 56s 36ms/step - loss: 4.9752 - accuracy: 0.0451
Epoch 2/5
782/782 [==============================] - 28s 36ms/step - loss: 4.6078 - accuracy: 0.0649
Epoch 3/5
782/782 [==============================] - 28s 36ms/step - loss: 4.8404 - accuracy: 0.0354
Epoch 4/5
782/782 [==============================] - 28s 36ms/step - loss: nan - accuracy: 0.0224
Epoch 5/5
782/782 [==============================] - 28s 36ms/step - loss: nan - accuracy: 0.0100
比較と言いつつ自分は4090を持ってませんので、Kaggle上で協力してくれる方を探して、以下の回答を貰いました。
結果、4090はざっくりM1 Maxの3倍は速いということがわかりました。もちろん計算対象によって大きな違いは出るかとは思いますが、とりあえずAppleがGPUプラグインのサンプルコードとして自分であげているものでの結果がこれです。M2/3にあたりになってくると差が縮まるのかもしれませんが、M1に比べて倍になっているという話ではないはずですから、競争としては厳しそうですね。