Metal基礎解説 - Apple GPUの心臓部

Last updated at 2025-11-27Posted at 2025-11-27

シリーズ: Apple Silicon AI技術スタック完全解説
難易度: ★★★☆☆（中級）
想定読者: GPUプログラミングの基礎を知りたい人、極限の最適化が必要な人

TL;DR

MetalはAppleのGPUプログラミングAPI（OpenGL/CUDAのApple版）
グラフィックスと汎用計算（GPGPU）の両方に対応
機械学習スタック全体の基盤となっている
通常は直接使わないが、仕組みを知ると最適化の理解が深まる

Metalとは？

ここまでMPS、MLX、MPSGraphと見てきた。でも、それらすべての下で動いているのがMetalだ。

Appleの公式サイトでは：

"Metal is a modern, tightly integrated graphics and compute API coupled with a powerful shading language designed so you can take full advantage of Apple silicon. The low-overhead model gives you direct control over each task the GPU performs, enabling you to maximize the efficiency of your graphics and compute software."

（Metalは、Apple Siliconの能力を最大限に活用できるように設計された、モダンで密に統合されたグラフィックス・計算APIと強力なシェーディング言語です。低オーバーヘッドモデルにより、GPUが実行する各タスクを直接制御でき、グラフィックスと計算ソフトウェアの効率を最大化できます）

出典：Apple Developer - Metal

簡単に言えば、MetalはApple版のCUDA/DirectX/Vulkanだ。

Metalの歴史：OpenGLからの卒業

Metalは2014年、iOS 8とともに登場した。Wikipediaによると：

"Metal is a low-level, low-overhead hardware-accelerated 3D graphic and compute shader API created by Apple, debuting in iOS 8. Metal combines functions similar to OpenGL and OpenCL in one API."

（MetalはAppleが作成した低レベル・低オーバーヘッドのハードウェアアクセラレーションされた3DグラフィックスおよびコンピュートシェーダーAPIで、iOS 8でデビューしました。MetalはOpenGLとOpenCLに類似した機能を1つのAPIに統合しています）

出典：Wikipedia - Metal (API)

なぜMetalが必要だったのか

OpenGLはクロスプラットフォームだが、問題があった：

抽象化のオーバーヘッド - 汎用APIの代償
古い設計 - 1992年生まれで、モダンGPUに最適化されていない
ドライバの複雑さ - 各ベンダーの実装に依存

MetalはApple専用だからこそ、ハードウェアに密着した最適化ができる。

Metalの進化

バージョン	年	主な機能
Metal 1	2014	iOS 8でデビュー
Metal 2	2017	GPU駆動パイプライン
Metal 3	2022	MetalFX、メッシュシェーダー
Metal 4	2024	ML統合強化、TensorOps

Metal Shading Language：GPUプログラミングの言語

GPUで実際にコードを動かすには、シェーダー言語が必要。Metalには独自のMSL（Metal Shading Language）がある：

"Metal is an object-oriented API that can be invoked using the Swift, Objective-C or C++17 programming languages. Full-blown GPU execution is controlled via the Metal Shading Language."

（MetalはSwift、Objective-C、C++17プログラミング言語で呼び出すことができるオブジェクト指向APIです。本格的なGPU実行はMetal Shading Languageで制御されます）

出典：Wikipedia - Metal (API)

MSLはC++14ベースなので、C++に慣れた人なら比較的とっつきやすい。

簡単なMSLの例（行列の要素ごとの加算）

#include <metal_stdlib>
using namespace metal;

kernel void add_arrays(
    device const float* a [[buffer(0)]],
    device const float* b [[buffer(1)]],
    device float* result [[buffer(2)]],
    uint index [[thread_position_in_grid]]
) {
    result[index] = a[index] + b[index];
}

kernel - これがGPUで並列実行される関数
device - GPUメモリ上のデータ
[[buffer(N)]] - バッファのバインディングポイント
[[thread_position_in_grid]] - 現在のスレッドID

Metalの基本概念

Metal Programming Guideから主要な概念を整理しよう：

"Metal provides a single, unified programming interface and language for both graphics and data-parallel computation workloads. Metal enables you to integrate graphics and computation tasks much more efficiently without needing to use separate APIs and shader languages."

（Metalはグラフィックスとデータ並列計算ワークロードの両方に対して、単一の統一されたプログラミングインターフェースと言語を提供します。Metalにより、別々のAPIとシェーダー言語を使用せずに、グラフィックスと計算タスクをより効率的に統合できます）

出典：Apple Developer - Metal Programming Guide

主要なコンポーネント

MTLDevice
    │
    ├── MTLCommandQueue（コマンドキュー）
    │       │
    │       └── MTLCommandBuffer（コマンドバッファ）
    │               │
    │               └── MTLComputeCommandEncoder（エンコーダー）
    │
    ├── MTLLibrary（シェーダーライブラリ）
    │       │
    │       └── MTLFunction（シェーダー関数）
    │
    ├── MTLComputePipelineState（パイプライン状態）
    │
    └── MTLBuffer / MTLTexture（GPUメモリ）

MTLDevice - GPUを表すオブジェクト
MTLCommandQueue - コマンドを送るキュー
MTLCommandBuffer - コマンドのバッチ
MTLComputePipelineState - コンパイルされたシェーダー
MTLBuffer / MTLTexture - GPUメモリ上のデータ

Swiftでの基本的なワークフロー

実際にMetalでGPU計算を行う流れを見てみよう：

import Metal

// 1. デバイスを取得
guard let device = MTLCreateSystemDefaultDevice() else {
    fatalError("Metal is not supported on this device")
}

// 2. コマンドキューを作成
guard let commandQueue = device.makeCommandQueue() else {
    fatalError("Failed to create command queue")
}

// 3. シェーダーをロード
let library = device.makeDefaultLibrary()!
let function = library.makeFunction(name: "add_arrays")!
let pipelineState = try! device.makeComputePipelineState(function: function)

// 4. データを準備
let count = 1024
let dataA = [Float](repeating: 1.0, count: count)
let dataB = [Float](repeating: 2.0, count: count)

// 5. バッファを作成
let bufferA = device.makeBuffer(
    bytes: dataA, 
    length: count * MemoryLayout<Float>.size, 
    options: []
)!
let bufferB = device.makeBuffer(
    bytes: dataB, 
    length: count * MemoryLayout<Float>.size, 
    options: []
)!
let bufferResult = device.makeBuffer(
    length: count * MemoryLayout<Float>.size, 
    options: []
)!

// 6. コマンドを作成
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!

encoder.setComputePipelineState(pipelineState)
encoder.setBuffer(bufferA, offset: 0, index: 0)
encoder.setBuffer(bufferB, offset: 0, index: 1)
encoder.setBuffer(bufferResult, offset: 0, index: 2)

// 7. スレッドグリッドを設定
let gridSize = MTLSize(width: count, height: 1, depth: 1)
let threadGroupSize = MTLSize(
    width: min(pipelineState.maxTotalThreadsPerThreadgroup, count), 
    height: 1, 
    depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()

// 8. 実行して待機
commandBuffer.commit()
commandBuffer.waitUntilCompleted()

// 9. 結果を取得
let resultPointer = bufferResult.contents().bindMemory(
    to: Float.self, 
    capacity: count
)
let result = Array(UnsafeBufferPointer(start: resultPointer, count: count))
print(result[0])  // 3.0 (1.0 + 2.0)

Metal 4：2024年の大進化

Metal 4は機械学習との統合をさらに深めた：

"The latest version of Metal is built to scale to the needs of modern apps. Metal 4 enables entirely new ways to integrate machine learning while also enabling you to encode commands and compile shaders more efficiently than ever."

（最新バージョンのMetalは、モダンなアプリのニーズに対応できるように構築されています。Metal 4は、機械学習を統合する全く新しい方法を可能にし、コマンドのエンコードとシェーダーのコンパイルをこれまで以上に効率的に行えます）

出典：Apple Developer - Metal

ML統合の強化

"Now you can tap into machine learning capabilities like MetalFX, run inference networks directly in your shaders, and implement the latest neural rendering techniques with Metal 4."

（MetalFXのような機械学習機能を活用し、シェーダー内で推論ネットワークを直接実行し、Metal 4で最新のニューラルレンダリング技術を実装できます）

出典：Apple Developer - Machine Learning & AI

これは何を意味するか？ゲームのレンダリングパイプライン内で、リアルタイムにML推論を実行できるということ。例えば：

MetalFX Upscaling - 低解像度をMLでアップスケール
ニューラルレンダリング - NeRFスタイルの技術
リアルタイム画像処理 - ノイズ除去、スタイル変換

MetalとMPSの関係

前に説明したMPSは、Metal上に構築されている：

"Metal Performance Shaders is a highly optimized library of graphics functions that can help application developers achieve great performance at the same time decrease work on maintaining GPU family specific functions."

（Metal Performance Shadersは、高度に最適化されたグラフィックス関数ライブラリで、アプリケーション開発者が高いパフォーマンスを達成しながら、GPUファミリー固有の関数のメンテナンス作業を削減できます）

出典：Wikipedia - Metal (API)

技術スタックの階層

[アプリケーション]
      ↓
[Core ML / MLX / PyTorch]
      ↓
[MPSGraph]
      ↓
[MPS]
      ↓
[Metal] ← 今ここ！
      ↓
[Apple GPU Hardware]

Metalは「全ての道の土台」だ。

機械学習開発者がMetalを直接触る場面

正直なところ、機械学習目的でMetalを直接書く機会は少ない。高レベルのフレームワーク（Core ML、MLX、PyTorch）がやってくれるから。

でも、こんな場面では有用：

1. カスタムカーネルの実装

フレームワークにない演算を自分で書く場合：

// カスタム活性化関数の例
kernel void custom_activation(
    device const float* input [[buffer(0)]],
    device float* output [[buffer(1)]],
    uint index [[thread_position_in_grid]]
) {
    float x = input[index];
    // 独自の活性化関数
    output[index] = x * (1.0 / (1.0 + exp(-x)));  // Swish
}

2. グラフィックス × ML

レンダリングパイプラインにMLを組み込む場合、Metalレベルでの制御が必要になる。

3. 極限の最適化

1ミリ秒でも速くしたい場面では、フレームワークの抽象化を排除してMetalで直接書く選択肢がある。

実践的なアドバイス

TowardsDataScienceの記事が参考になる：

"In general, and if possible, you'd probably be better off using the MPS framework (discussed later) for equivalent functionality when possible – it tends to be highly-optimized for common classes of GPU-aligned use cases (like matrix multiplication or neural networks)."

（一般的に、可能であれば同等の機能にはMPSフレームワークを使用した方が良いでしょう。行列乗算やニューラルネットワークなど、一般的なGPU向けユースケースに対して高度に最適化されています）

出典：Towards Data Science - Programming Apple GPUs through Go and Metal Shading Language

つまり：まずMPSを試す。それで足りなければMetalへ。

Metal Debuggerでパフォーマンス分析

Xcodeには強力なMetal Debuggerが内蔵されている：

"Inspect, debug, and optimize your entire rendering pipeline with Metal debugger, from mesh shading to ray tracing to machine learning. Monitor performance in real time with the Metal performance HUD."

（Metal Debuggerで、メッシュシェーディングからレイトレーシング、機械学習まで、レンダリングパイプライン全体を検査、デバッグ、最適化できます。Metal performance HUDでリアルタイムにパフォーマンスを監視できます）

出典：Apple Developer - Metal

主な機能

GPU Timeline - GPUの動作を時系列で可視化
Shader Debugger - シェーダーをステップ実行
Memory Viewer - GPUメモリの使用状況
Performance HUD - リアルタイムの統計情報

Apple GPUの特徴：タイルベースレンダリング

Apple GPUには独自の特徴がある。それがタイルベースの遅延レンダリング（TBDR）：

"Macs using Apple silicon will feature Apple GPUs with a feature set combining what was previously available on macOS and iOS, and will be able to take advantage of features tailored to the tile based deferred rendering (TBDR) architecture of Apple GPUs."

（Apple Siliconを搭載したMacは、macOSとiOSで以前利用可能だった機能を組み合わせた機能セットを持つApple GPUを搭載し、Apple GPUのタイルベース遅延レンダリング（TBDR）アーキテクチャに合わせた機能を活用できます）

出典：Wikipedia - Metal (API)

これにより：

メモリ帯域の節約 - タイルローカルメモリの活用
電力効率 - 不要なピクセル処理をスキップ
機械学習との相性 - タイル単位での処理が可能

まとめ：Metalは「全ての道の土台」

機械学習エンジニアとしてMetalのコードを書く機会は少ないかもしれない。でも、MPS、MLX、Core ML、何を使っていてもその下ではMetalが動いている。

GPUの仕組みを理解し、なぜApple Siliconが機械学習に強いのかを知るために、Metalの基本概念は押さえておいて損はない。

そして、本当に極限の最適化が必要になったとき、Metalという選択肢があることを覚えておこう。

次に読む

MPS基礎解説 - Metal上の最適化されたカーネル
MPSGraph基礎解説 - 計算グラフの最適化
vDSP基礎解説 - GPU以外の選択肢（CPU最適化）
シリーズ目次に戻る

参考文献

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up