科学と神々株式会社 Advent Calendar 2025

LLM量子化 Day 23: テスト戦略

Last updated at 2025-12-22Posted at 2025-12-22

科学と神々株式会社アドベントカレンダー 2025

LLM量子化 Day 23: テスト戦略

なぜテストが重要か

「動いているから大丈夫」——これは危険な考えです。

今日動いても、明日のリファクタリングで壊れるかもしれない
自分の環境で動いても、他の環境で動かないかもしれない
正常系は動いても、異常系で壊れるかもしれない

テストは、「今も将来も、あらゆる状況で正しく動く」ことを保証します。

テストピラミッド

llm-quantizeでは、以下のテスト構造を採用しています：

          /\
         /  \  E2Eテスト（少数）
        /    \  - 全体を通した動作確認
       /──────\
      /        \  統合テスト（中程度）
     /          \  - 複数モジュールの連携
    /────────────\
   /              \  ユニットテスト（多数）
  /                \  - 個々の関数・クラス
 /──────────────────\

下に行くほどテスト数が多く、上に行くほど実行時間が長くなります。

テストの構造

tests/
├── unit/                    # ユニットテスト
│   ├── test_quantizers/     # 量子化器のテスト
│   │   ├── test_gguf.py
│   │   ├── test_awq.py
│   │   ├── test_gptq.py
│   │   └── test_base.py
│   ├── test_checkpoint.py   # チェックポイントのテスト
│   ├── test_progress.py     # 進捗表示のテスト
│   └── test_converter.py    # 変換のテスト
├── integration/             # 統合テスト
│   └── test_quantization_flow.py
└── conftest.py              # 共通フィクスチャ

ユニットテストの原則

1. 一つのテストは一つのことだけをテスト

# 悪い例: 複数のことをテスト
def test_quantizer():
    quantizer = GGUFQuantizer(...)
    assert quantizer.get_supported_levels() == [...]
    result = quantizer.quantize()
    assert result.is_valid
    assert result.file_size > 0

# 良い例: 個別にテスト
def test_get_supported_levels():
    assert GGUFQuantizer.get_supported_levels() == [...]

def test_quantize_returns_valid_result():
    quantizer = GGUFQuantizer(...)
    result = quantizer.quantize()
    assert result.is_valid

def test_quantize_produces_nonzero_file():
    quantizer = GGUFQuantizer(...)
    result = quantizer.quantize()
    assert result.file_size > 0

2. テストは独立している

# 悪い例: テスト間に依存
class TestQuantizer:
    result = None  # クラス変数で共有

    def test_quantize(self):
        TestQuantizer.result = quantizer.quantize()

    def test_validate(self):
        assert TestQuantizer.result.is_valid  # 前のテストに依存

# 良い例: 各テストが独立
class TestQuantizer:
    def test_quantize_is_valid(self):
        result = quantizer.quantize()
        assert result.is_valid

3. テスト名は何をテストしているか明示する

# 悪い例
def test_1():
def test_quantize():

# 良い例
def test_quantize_returns_valid_result_for_7b_model():
def test_quantize_raises_error_for_invalid_level():
def test_log_respects_verbosity_quiet():

フィクスチャの活用

pytest のフィクスチャを使って、テストの準備を共通化します：

# conftest.py
import pytest
from llm_quantize.lib.data_models import SourceModel, QuantizationConfig

@pytest.fixture
def sample_source_model():
    """テスト用のモデル情報"""
    return SourceModel(
        model_path="test/model",
        model_type=ModelType.LOCAL_DIR,
        architecture="LlamaForCausalLM",
        parameter_count=7_000_000_000,
        dtype="float16",
        num_layers=32,
    )

@pytest.fixture
def sample_config():
    """テスト用の設定"""
    return QuantizationConfig(
        target_format=OutputFormat.GGUF,
        quantization_level="Q4_K_M",
        output_dir="/tmp/test_output",
    )

使用側：

def test_quantizer_initialization(sample_source_model, sample_config):
    quantizer = GGUFQuantizer(sample_source_model, sample_config)
    assert quantizer.source_model == sample_source_model

モックの活用

外部依存をモックに置き換えることで、テストを高速かつ安定にします：

from unittest.mock import MagicMock, patch

def test_quantize_calls_llama_cpp():
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = MagicMock(returncode=0)

        quantizer = GGUFQuantizer(...)
        quantizer.quantize()

        # llama-quantizeが呼ばれたことを確認
        mock_run.assert_called()
        call_args = mock_run.call_args[0][0]
        assert "llama-quantize" in call_args[0]

パラメタライズドテスト

同じテストロジックを複数のパラメータで実行します：

import pytest

@pytest.mark.parametrize("verbosity,should_show", [
    (Verbosity.QUIET, False),
    (Verbosity.NORMAL, True),
    (Verbosity.VERBOSE, True),
    (Verbosity.DEBUG, True),
])
def test_log_info_respects_verbosity(verbosity, should_show):
    reporter = ProgressReporter(verbosity=verbosity)
    reporter.log_info("Test message")

    if should_show:
        assert "Test message" in reporter.get_output()
    else:
        assert reporter.get_output() == ""

4つの個別テストを書く代わりに、1つのテストで4パターンをカバーできます。

エッジケースのテスト

正常系だけでなく、異常系もテストします：

class TestCheckpointEdgeCases:
    def test_resume_with_corrupted_metadata(self, tmp_path):
        """破損したメタデータからの再開"""
        metadata_file = tmp_path / "checkpoint_metadata.json"
        metadata_file.write_text("not valid json{")

        assert not Checkpoint.can_resume(tmp_path)

    def test_resume_with_empty_directory(self, tmp_path):
        """空ディレクトリからの再開"""
        assert not Checkpoint.can_resume(tmp_path)

    def test_resume_with_config_mismatch(self, tmp_path, sample_config):
        """設定不一致での再開"""
        checkpoint = Checkpoint(tmp_path)
        checkpoint.initialize(32, sample_config)

        different_config = sample_config.copy()
        different_config.quantization_level = "Q5_K_M"

        with pytest.raises(ValueError, match="Configuration mismatch"):
            Checkpoint.from_resume(tmp_path, different_config)

カバレッジの計測

# カバレッジ付きでテスト実行
pytest --cov=llm_quantize --cov-report=html tests/

# 結果の確認
open htmlcov/index.html

カバレッジレポートで、テストされていない行を特定できます。

97%カバレッジへの道

llm-quantizeは97%のカバレッジを達成しています。コツは：

1. 分岐を意識する

def process(value):
    if value > 0:      # 分岐1
        return "positive"
    elif value < 0:    # 分岐2
        return "negative"
    else:              # 分岐3
        return "zero"

3つのテストで100%カバー：

test_process_positive()
test_process_negative()
test_process_zero()

2. 例外パスをテスト

def test_raises_error_for_invalid_input():
    with pytest.raises(ValueError):
        process(None)

3. 到達困難なコードを見直す

カバレッジが低いコードは、実際には使われていない可能性があります。削除するか、リファクタリングを検討しましょう。

Tips: テストのコツ

1. テストも「コード」として扱う

# リファクタリング、命名規則、DRY原則はテストにも適用

2. CIで自動実行

# .github/workflows/test.yml
- name: Run tests
  run: pytest --cov=llm_quantize

3. 失敗するテストから始める（TDD）

def test_new_feature():
    assert new_feature() == expected  # まず失敗
    # → 実装 → 成功

4. 壊れやすいテストを避ける

# 悪い例: 実行環境に依存
assert result.path == "/home/user/output"

# 良い例: 相対的な検証
assert result.path.endswith("output")

次回予告

Day 24では「パフォーマンス最適化」として、メモリ効率とスループットの改善について解説します。

テストは「保険」です。事故が起きてから入るのでは遅い。最初から組み込むことで、安心して開発を進められます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up