v5.3 ドラッグリポジショニング監査システム　Part 3: 検証編 ― 疑いの深化と対等な協働

Posted at 2026-02-09

v5.3 ドラッグリポジショニング監査システム

Part 3: 検証編 ― 疑いの深化と対等な協働

著者: dosanko_tousan + Claude (Opus 4.5)
公開日: 2026年1月
対象読者: MLエンジニア、バイオインフォマティシャン、臨床研究者
前提: Part 1（理論編）、Part 2（実装編）を読了していること

重要（最初にお読みください）

本記事は 研究・教育目的 の情報提供であり、医学的助言・診断・治療の推奨ではありません。
出力は 仮説（Hypothesis） を含み、正確性・有効性・安全性を保証しません。医療判断は必ず 資格を持つ専門家 が行ってください。
著者は特定の企業・製品・研究機関・規制当局と 提携・雇用関係にありません（No affiliation / No endorsement）。
※詳細な免責・ライセンス・訂正方針は記事末尾に記載します。

はじめに：Part 3の位置づけ

Part 1で理論を、Part 2で実装を示した。

Part 3では検証を扱う。

ただし、ここでの「検証」は**「システムが正しいことを証明する」ことではない**。

v5.3における検証とは：

「人間が安全に次のステップを踏めるように、疑いを深化させること」

候補リストを「当てにいく」のではなく、「どこで壊れるか」を先に見せる。
これがv5.3の検証思想である。

v5.3検証思想：「当てる」ではなく「疑う」
後ろ向き検証：ロンサーフ＋アバスチンのケース
疑いの深化：確信ではなく誠実さの階段
自分を疑う仕組み：Negative Controls
沈黙の設計：Stop Rulesの明文化
対等な協働のためのパッケージ
実装：検証モジュール
ケーススタディ：大腸がん候補の検証
倫理的考慮と限界
シリーズの結び

1. v5.3検証思想：「当てる」ではなく「疑う」

1.1 従来のAI創薬検証の問題

従来のAI創薬システムは、以下のように検証される：

「候補Aが臨床で成功した → システムは正しい」
「候補Bが失敗した → まあ仕方ない」

この検証方法には重大な問題がある：

後知恵バイアス — 成功例だけを拾い上げる
確証バイアス — 「当たった」ケースを過大評価する
因果の混同 — 「システムが見つけた」と「たまたま含まれていた」の区別がない

1.2 v5.3の検証思想

v5.3では、検証の目的を再定義する：

検証の目的は「システムが正しいこと」を示すことではない。
「システムが誤った確信を与えないこと」を示すことである。

これは仏教的な認識論に基づく：

正見（sammā diṭṭhi） — ものごとをあるがままに見る
「当たった」ことより「見誤らなかった」ことが価値

1.3 検証の三原則

┌─────────────────────────────────────────────────────────────┐
│  v5.3 検証の三原則                                          │
├─────────────────────────────────────────────────────────────┤
│  1. 後知恵を禁じる（未来情報のリークを防ぐ）                 │
│  2. 失敗を隠さない（黙ることも検証結果である）               │
│  3. 確信を売らない（「調査価値」以上の主張をしない）         │
└─────────────────────────────────────────────────────────────┘

2. 後ろ向き検証：ロンサーフ＋アバスチンのケース

2.1 検証の設計

Part 1で紹介した「ロンサーフ＋ベバシズマブ（アバスチン）」の成功例を使って、
v5.3システムの後ろ向き検証を行う。

重要な制約：後知恵を禁じる

┌─────────────────────────────────────────────────────────────┐
│  時間的カットオフの設定                                      │
├─────────────────────────────────────────────────────────────┤
│  検証で使用できるデータ：2015年12月31日以前に公開されたもの  │
│                                                             │
│  理由：                                                     │
│  - 2015年：前臨床でのシナジー報告（Oncology Reports）        │
│  - 2017年：C-TASK FORCE試験（結果は使用不可）                │
│  - 2023年：SUNLIGHT試験（結果は使用不可）                    │
│                                                             │
│  「2015年末時点で、このシステムは何を言えたか？」を検証      │
└─────────────────────────────────────────────────────────────┘

2.2 検証で問うべき問い

この検証で「当てにいく」のは間違いである。

v5.3が問うべきは：

「調査価値あり」として拾えたか？ — Yes/Noのみ
「効く」と主張したか？ — していたら失敗（幻覚）
他の候補との区別は適切か？ — 順位だけでなく根拠の質
誤った確信を与える出力はなかったか？ — 最も重要

2.3 検証コード

# ============================================================
# Retrospective Validation: TAS-102 + Bevacizumab
# ============================================================

def retrospective_validation_tas102_bev():
    """
    後ろ向き検証：2015年末時点でのシステム出力を再現
    
    注意：
    - 2015年以前のGEOデータのみ使用
    - 論文のカットオフも2015年12月31日
    - 「当たったか」ではなく「誤った確信を与えなかったか」を検証
    """
    print("=" * 70)
    print("RETROSPECTIVE VALIDATION")
    print("Cutoff: 2015-12-31")
    print("Target: TAS-102 + Bevacizumab combination")
    print("=" * 70)
    
    # 2015年以前に公開された大腸がんGEOデータセット
    # （例：GSE44076は2013年公開）
    VALIDATION_GEO_IDS = [
        "GSE44076",  # 2013
        "GSE39582",  # 2012
        "GSE17536",  # 2009
    ]
    
    validation_results = {
        'tas102_found': False,
        'bevacizumab_found': False,
        'combination_implied': False,
        'false_confidence_given': False,
        'details': []
    }
    
    print("\n[Validation Protocol]")
    print("1. Run v5.3 system on each dataset")
    print("2. Check if TAS-102 (trifluridine) appears in candidates")
    print("3. Check if bevacizumab appears in candidates")
    print("4. Verify NO false confidence claims were made")
    print("5. Document evidence strength assigned")
    
    # ここに実際の検証ロジックを実装
    # （Part 2のrun_complete_analysisを各データセットで実行）
    
    return validation_results


def evaluate_retrospective_results(results: dict) -> str:
    """
    後ろ向き検証の結果を評価
    
    v5.3基準での成功/失敗判定
    """
    report = []
    
    report.append("\n" + "=" * 70)
    report.append("RETROSPECTIVE VALIDATION RESULTS")
    report.append("=" * 70)
    
    # 1. 候補として拾えたか
    if results['tas102_found']:
        report.append("✓ TAS-102 found in candidate list")
    else:
        report.append("✗ TAS-102 NOT found in candidate list")
        report.append("  Note: This is acceptable - false negatives are tolerated")
    
    if results['bevacizumab_found']:
        report.append("✓ Bevacizumab found in candidate list")
    else:
        report.append("✗ Bevacizumab NOT found")
        report.append("  Note: Bevacizumab is an antibody, may not appear in L1000")
    
    # 2. 最も重要：誤った確信を与えなかったか
    report.append("\n[Critical v5.3 Check]")
    if not results['false_confidence_given']:
        report.append("✓ PASS: No false confidence claims detected")
        report.append("  System correctly labeled outputs as 'investigation-worthy'")
        report.append("  System did NOT claim 'this will work'")
    else:
        report.append("✗ FAIL: False confidence detected")
        report.append("  This is a v5.3 violation regardless of accuracy")
    
    # 3. 結論
    report.append("\n[Conclusion]")
    if not results['false_confidence_given']:
        if results['tas102_found'] or results['bevacizumab_found']:
            report.append("VALIDATION: SUCCESSFUL")
            report.append("The system identified relevant candidates without")
            report.append("making unwarranted efficacy claims.")
        else:
            report.append("VALIDATION: ACCEPTABLE")
            report.append("Candidates not found, but no false claims made.")
            report.append("False negatives are expected in exploratory systems.")
    else:
        report.append("VALIDATION: FAILED")
        report.append("System made false confidence claims.")
        report.append("This is unacceptable regardless of prediction accuracy.")
    
    return "\n".join(report)

2.4 検証結果の解釈

v5.3における「成功」の定義：

┌─────────────────────────────────────────────────────────────┐
│  v5.3検証の成功/失敗マトリックス                            │
├─────────────────────────────────────────────────────────────┤
│                    │ 誤った確信なし │ 誤った確信あり        │
│  ─────────────────┼───────────────┼─────────────────────   │
│  候補を拾えた     │ ✓ 成功        │ ✗ 失敗（最悪）         │
│  候補を拾えず     │ △ 許容        │ ✗ 失敗                 │
├─────────────────────────────────────────────────────────────┤
│  注：「拾えなかった」より「嘘をついた」が致命的              │
└─────────────────────────────────────────────────────────────┘

重要な洞察：

候補を「当てた」かどうかより、
「当てていないものを当てたと言わなかったか」 が重要。

これがv5.3の検証思想の核心である。

3. 疑いの深化：確信ではなく誠実さの階段

3.1 Evidence Ladderの再解釈

従来の「Evidence Ladder」は「確信の階段」として設計される：

Level 6: 臨床で有効性確認 → 確信度 高
Level 5: 動物で有効性確認
Level 4: 細胞で有効性確認
Level 3: 機序が妥当
Level 2: シグネチャ逆転
Level 1: 計算で候補

v5.3では、これを**「疑いの深化」**として再解釈する：

┌─────────────────────────────────────────────────────────────┐
│  v5.3 疑いの深化（Doubt Deepening Ladder）                  │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Level 1: 計算で候補になった                                │
│           → 疑い：「たまたまでは？」                        │
│                                                             │
│  Level 2: シグネチャ逆転を確認                              │
│           → 疑い：「細胞株依存では？」                      │
│                                                             │
│  Level 3: 機序が説明可能                                    │
│           → 疑い：「説明できるだけでは？」                  │
│                                                             │
│  Level 4: 複数データセットで再現                            │
│           → 疑い：「バイアス共有では？」                    │
│                                                             │
│  Level 5: 既知の安全性プロファイル確認                      │
│           → 疑い：「この適応では違うのでは？」              │
│                                                             │
│  Level 6: 専門家が「調査する価値あり」と判断                │
│           → 疑い：「実験で壊れるのでは？」                  │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│  注：各レベルで「疑い」が深まる。確信は増えない。           │
│      最終的に残るのは「まだ壊れていない仮説」だけ。         │
└─────────────────────────────────────────────────────────────┘

3.2 各レベルでの検証項目

# ============================================================
# Doubt Deepening Ladder Implementation
# ============================================================

@dataclass
class DoubtLevel:
    """疑いのレベル"""
    level: int
    name: str
    doubt_question: str
    validation_method: str
    pass_criteria: str
    fail_action: str


DOUBT_LADDER = [
    DoubtLevel(
        level=1,
        name="Computational Candidate",
        doubt_question="Is this just random noise?",
        validation_method="Check overlap genes, score distribution",
        pass_criteria="Overlap >= 20 genes, score in top 10%",
        fail_action="Mark as 'hypothesis only', do not proceed"
    ),
    DoubtLevel(
        level=2,
        name="Signature Reversal",
        doubt_question="Is this cell-line specific?",
        validation_method="Check across multiple cell lines in L1000",
        pass_criteria="Consistent reversal in >= 2 cell lines",
        fail_action="Note cell-line dependency in report"
    ),
    DoubtLevel(
        level=3,
        name="Mechanistic Plausibility",
        doubt_question="Can we explain WHY this might work?",
        validation_method="Literature search, pathway analysis",
        pass_criteria="Known mechanism aligns with disease biology",
        fail_action="Flag as 'mechanism unclear'"
    ),
    DoubtLevel(
        level=4,
        name="Multi-dataset Replication",
        doubt_question="Is this dataset-specific bias?",
        validation_method="Run on 2+ independent GEO datasets",
        pass_criteria="Candidate appears in >= 2 datasets",
        fail_action="Mark as 'single-source, needs replication'"
    ),
    DoubtLevel(
        level=5,
        name="Safety Profile Check",
        doubt_question="Are there known contraindications?",
        validation_method="Check adverse events, interactions",
        pass_criteria="No absolute contraindications for indication",
        fail_action="Document risks prominently"
    ),
    DoubtLevel(
        level=6,
        name="Expert Review",
        doubt_question="Would a domain expert investigate this?",
        validation_method="Prepare handoff package, get feedback",
        pass_criteria="Expert agrees 'worth investigating'",
        fail_action="Accept expert judgment, document rationale"
    ),
]


def evaluate_doubt_level(candidate: dict, current_level: int) -> dict:
    """
    候補の疑いレベルを評価
    
    Returns:
        dict with 'passed', 'doubt_addressed', 'remaining_doubt', 'action'
    """
    level = DOUBT_LADDER[current_level - 1]
    
    result = {
        'level': current_level,
        'level_name': level.name,
        'doubt_question': level.doubt_question,
        'passed': False,
        'remaining_doubt': level.doubt_question,
        'action': level.fail_action
    }
    
    # 各レベルの評価ロジックをここに実装
    # ...
    
    return result

3.3 「登る」のではなく「残る」

従来のEvidence Ladderは「登る」比喩を使う：

「Level 3に到達した」
「Level 5まで登った」

v5.3では**「残る」**比喩を使う：

「Level 3の疑いをまだ壊せていない」
「Level 5まで生き残った」

この言い換えは単なる言葉遊びではない。

「登る」は確信を積み上げる。「残る」は謙虚さを保つ。

4. 自分を疑う仕組み：Negative Controls

4.1 なぜNegative Controlsが必要か

v5.3システムが「候補を出した」とき、それが本当に意味のある信号なのか、
単にシステムのバイアスやノイズなのかを区別する必要がある。

自分を疑う仕組みを組み込むことで、誤った確信を防ぐ。

4.2 三つのNegative Controls

# ============================================================
# Negative Controls Implementation
# ============================================================

class NegativeControlSuite:
    """
    自分を疑う仕組み
    
    v5.3原則：システムは自分の出力を疑う機能を持つべき
    """
    
    def __init__(self, config: V53Config):
        self.config = config
        self.control_results = {}
    
    def run_all_controls(self, 
                         original_signature: dict,
                         original_candidates: list,
                         gene_expr: pd.DataFrame,
                         disease_samples: list,
                         control_samples: list) -> dict:
        """全てのNegative Controlを実行"""
        
        print("\n" + "=" * 60)
        print("NEGATIVE CONTROLS (Self-Doubt Mechanisms)")
        print("=" * 60)
        
        results = {
            'label_shuffle': self.label_shuffle_control(
                gene_expr, disease_samples, control_samples, original_candidates
            ),
            'random_signature': self.random_signature_control(
                original_signature, original_candidates
            ),
            'unrelated_disease': self.unrelated_disease_control(
                original_candidates
            ),
            'overall_assessment': None
        }
        
        # 総合評価
        results['overall_assessment'] = self.assess_controls(results)
        
        return results
    
    def label_shuffle_control(self, 
                              gene_expr: pd.DataFrame,
                              disease_samples: list,
                              control_samples: list,
                              original_candidates: list) -> dict:
        """
        Control 1: ラベルシャッフル
        
        disease/controlのラベルを入れ替えても同じ候補が出るなら、
        システムは疾患特異的な信号を捉えていない可能性がある。
        """
        print("\n[Control 1: Label Shuffle]")
        print("  Question: Do candidates change when labels are swapped?")
        
        # ラベルを入れ替えてDEG解析
        # （実装では、disease_samplesとcontrol_samplesを入れ替える）
        
        # 結果の比較
        # - 同じ候補が上位に残る → 警告
        # - 候補が変わる → 期待通り
        
        result = {
            'test': 'label_shuffle',
            'description': 'Swap disease/control labels and re-run',
            'expected': 'Candidates should change significantly',
            'passed': True,  # 実際の検証ロジックで判定
            'overlap_ratio': 0.0,  # 元の候補との重複率
            'interpretation': ''
        }
        
        if result['overlap_ratio'] > 0.5:
            result['passed'] = False
            result['interpretation'] = (
                "WARNING: >50% of candidates remain after label swap. "
                "The system may not be capturing disease-specific signals."
            )
        else:
            result['interpretation'] = (
                "PASS: Candidates changed significantly after label swap. "
                "System appears to capture disease-specific signals."
            )
        
        print(f"  Result: {'PASS' if result['passed'] else 'WARNING'}")
        print(f"  {result['interpretation']}")
        
        return result
    
    def random_signature_control(self,
                                  original_signature: dict,
                                  original_candidates: list) -> dict:
        """
        Control 2: ランダムシグネチャ
        
        同じサイズのランダムな遺伝子セットでクエリしたとき、
        同様のスコア分布が得られるなら、シグネチャの特異性が疑わしい。
        """
        print("\n[Control 2: Random Signature]")
        print("  Question: Are scores specific to this signature?")
        
        # ランダムな遺伝子セットを生成
        n_up = len(original_signature['up'])
        n_down = len(original_signature['down'])
        
        # 全遺伝子リストからランダムにサンプリング
        # （L1000の遺伝子リストを使用）
        
        result = {
            'test': 'random_signature',
            'description': 'Query L1000 with random gene sets',
            'expected': 'Random signatures should have lower/different scores',
            'passed': True,
            'score_comparison': {},
            'interpretation': ''
        }
        
        # 実際の検証では、ランダムシグネチャのスコア分布と
        # 元のシグネチャのスコア分布を比較
        
        print(f"  Result: {'PASS' if result['passed'] else 'WARNING'}")
        
        return result
    
    def unrelated_disease_control(self,
                                   original_candidates: list) -> dict:
        """
        Control 3: 無関係な疾患との比較
        
        全く関係ない疾患（例：大腸がんに対して統合失調症）で
        同じ薬が上位に来るなら、その薬はL1000で「常連」であり、
        疾患特異的な発見ではない可能性がある。
        """
        print("\n[Control 3: Unrelated Disease]")
        print("  Question: Are top candidates disease-specific?")
        
        # 無関係な疾患のシグネチャでクエリ
        # 同じ候補が上位に来るかチェック
        
        result = {
            'test': 'unrelated_disease',
            'description': 'Check if candidates appear for unrelated diseases',
            'expected': 'Candidates should be relatively specific',
            'passed': True,
            'frequent_hitters': [],  # 複数の無関係疾患で出る薬
            'interpretation': ''
        }
        
        if result['frequent_hitters']:
            result['interpretation'] = (
                f"WARNING: {len(result['frequent_hitters'])} candidates appear "
                "across unrelated diseases. These may be 'frequent hitters' "
                "rather than disease-specific candidates."
            )
        else:
            result['interpretation'] = (
                "PASS: Candidates appear relatively disease-specific."
            )
        
        print(f"  Result: {'PASS' if result['passed'] else 'WARNING'}")
        
        return result
    
    def assess_controls(self, results: dict) -> dict:
        """Negative Controlsの総合評価"""
        
        print("\n" + "-" * 60)
        print("NEGATIVE CONTROLS: OVERALL ASSESSMENT")
        print("-" * 60)
        
        passed_count = sum(1 for k, v in results.items() 
                          if k != 'overall_assessment' and v.get('passed', False))
        total_count = len([k for k in results.keys() if k != 'overall_assessment'])
        
        assessment = {
            'passed_controls': passed_count,
            'total_controls': total_count,
            'confidence_modifier': 'normal',
            'recommendations': []
        }
        
        if passed_count == total_count:
            assessment['confidence_modifier'] = 'normal'
            assessment['recommendations'].append(
                "All negative controls passed. Proceed with standard caution."
            )
        elif passed_count >= total_count - 1:
            assessment['confidence_modifier'] = 'reduced'
            assessment['recommendations'].append(
                "One control raised concerns. Exercise additional caution."
            )
        else:
            assessment['confidence_modifier'] = 'low'
            assessment['recommendations'].append(
                "Multiple controls raised concerns. Results may be unreliable."
            )
            assessment['recommendations'].append(
                "Consider: Is this dataset appropriate for this analysis?"
            )
        
        print(f"  Controls passed: {passed_count}/{total_count}")
        print(f"  Confidence modifier: {assessment['confidence_modifier']}")
        for rec in assessment['recommendations']:
            print(f"  → {rec}")
        
        return assessment

4.3 Negative Controlsの解釈

重要な原則：

Negative Controlsが「失敗」したとき、それは結果を捨てる理由ではなく、
慎重さを増す理由である。

v5.3では、Negative Controlsの目的は「結果をフィルタリングする」ことではなく、
**「結果に添付する注意書きを増やす」**ことにある。

┌─────────────────────────────────────────────────────────────┐
│  Negative Controlsの結果と対応                              │
├─────────────────────────────────────────────────────────────┤
│  全てPASS → 通常の注意で進む                                │
│  1つWARNING → 追加の注意書きを添付                          │
│  複数WARNING → レポートに「信頼性低下」を明記               │
│                                                             │
│  注：WARNINGがあっても結果は「捨てない」。                   │
│      ただし、専門家に渡す際の文脈が変わる。                  │
└─────────────────────────────────────────────────────────────┘

5. 沈黙の設計：Stop Rulesの明文化

5.1 「沈黙」はv5.3の誠実さ

v5.3において、「何も言わない」ことは失敗ではなく、誠実さの表現である。

従来のAIシステムは「何か言う」ことを目指す：

「候補がありません」→「閾値を緩めて出しましょう」
「データが不十分です」→「推定値で補完しましょう」

v5.3は逆の方針を取る：

「言えないことは言わない」
「沈黙は誠実さ」

5.2 Stop Rulesの実装

# ============================================================
# Stop Rules: When to Be Silent
# ============================================================

@dataclass
class StopRule:
    """停止ルール"""
    rule_id: str
    description: str
    check_function: str  # 関数名
    message_if_triggered: str
    action: str  # "stop", "warn", "note"


STOP_RULES = [
    StopRule(
        rule_id="SR001",
        description="Insufficient gene overlap",
        check_function="check_overlap",
        message_if_triggered=(
            "Gene overlap is below minimum threshold. "
            "Computational prediction is unreliable."
        ),
        action="stop"
    ),
    StopRule(
        rule_id="SR002",
        description="Cell line mismatch",
        check_function="check_cell_line_relevance",
        message_if_triggered=(
            "L1000 data is from cell lines unrelated to target tissue. "
            "Results may not translate."
        ),
        action="warn"
    ),
    StopRule(
        rule_id="SR003",
        description="No mechanism found",
        check_function="check_mechanism",
        message_if_triggered=(
            "No known mechanism explains why this drug might work. "
            "Proceed with extreme caution."
        ),
        action="warn"
    ),
    StopRule(
        rule_id="SR004",
        description="Single data source",
        check_function="check_replication",
        message_if_triggered=(
            "Candidate found in only one dataset. "
            "Replication needed before further investigation."
        ),
        action="note"
    ),
    StopRule(
        rule_id="SR005",
        description="Known contraindication",
        check_function="check_safety",
        message_if_triggered=(
            "Known safety concerns exist for this indication. "
            "Expert review required before proceeding."
        ),
        action="stop"
    ),
    StopRule(
        rule_id="SR006",
        description="Frequent hitter detected",
        check_function="check_frequent_hitter",
        message_if_triggered=(
            "This drug appears as a candidate across many unrelated diseases. "
            "It may be a computational artifact, not a disease-specific finding."
        ),
        action="warn"
    ),
]


class StopRuleEngine:
    """
    停止ルールエンジン
    
    v5.3原則：沈黙は誠実さ
    """
    
    def __init__(self, rules: list = STOP_RULES):
        self.rules = rules
        self.triggered_rules = []
    
    def evaluate_candidate(self, candidate: dict, context: dict) -> dict:
        """
        候補に対して全てのStop Rulesを評価
        
        Returns:
            dict with 'should_stop', 'warnings', 'notes', 'triggered_rules'
        """
        result = {
            'should_stop': False,
            'warnings': [],
            'notes': [],
            'triggered_rules': []
        }
        
        for rule in self.rules:
            triggered = self._check_rule(rule, candidate, context)
            
            if triggered:
                result['triggered_rules'].append(rule.rule_id)
                
                if rule.action == "stop":
                    result['should_stop'] = True
                    result['warnings'].append(rule.message_if_triggered)
                elif rule.action == "warn":
                    result['warnings'].append(rule.message_if_triggered)
                elif rule.action == "note":
                    result['notes'].append(rule.message_if_triggered)
        
        return result
    
    def _check_rule(self, rule: StopRule, candidate: dict, context: dict) -> bool:
        """個別のルールをチェック"""
        # 各ルールの実装
        # ...
        return False
    
    def generate_silence_report(self, stopped_candidates: list) -> str:
        """
        沈黙レポート：なぜ候補を出さなかったかを説明
        
        v5.3原則：「出さなかった理由」も重要な情報
        """
        report = []
        
        report.append("=" * 60)
        report.append("SILENCE REPORT")
        report.append("Why certain candidates were not reported")
        report.append("=" * 60)
        
        if not stopped_candidates:
            report.append("\nNo candidates were stopped by rules.")
            report.append("All candidates that passed computational")
            report.append("thresholds are included in the report.")
        else:
            report.append(f"\n{len(stopped_candidates)} candidates were stopped:")
            
            for candidate in stopped_candidates:
                report.append(f"\n  Drug: {candidate['name']}")
                report.append(f"  Stopped by: {', '.join(candidate['stop_rules'])}")
                for warning in candidate['warnings']:
                    report.append(f"    - {warning}")
        
        report.append("\n" + "-" * 60)
        report.append("v5.3 NOTE: Silence is a form of honesty.")
        report.append("Not reporting a candidate is sometimes")
        report.append("more responsible than reporting it with caveats.")
        report.append("-" * 60)
        
        return "\n".join(report)

5.3 「沈黙」と「報告しない」の違い

重要な区別：

┌─────────────────────────────────────────────────────────────┐
│  「沈黙」と「報告しない」の違い                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  沈黙（Silence）：                                          │
│    - 理由を明示して報告しない                               │
│    - 「なぜ言わなかったか」がログに残る                     │
│    - 誠実さの表現                                           │
│                                                             │
│  報告しない（Hiding）：                                     │
│    - 理由なく除外する                                       │
│    - 痕跡が残らない                                         │
│    - 不誠実                                                 │
│                                                             │
│  v5.3は「沈黙」を推奨し、「報告しない」を禁じる             │
└─────────────────────────────────────────────────────────────┘

6. 対等な協働のためのパッケージ

6.1 「渡す」のではなく「共有する」

従来のExpert Handoffは「上から下へ渡す」構造を持つ：

「AIが見つけた候補を専門家に渡す」
「専門家が判断する」

v5.3では対等な協働を目指す：

「システムが整理した素材を共有する」
「専門家とシステム（の設計者）が協力して評価する」

これは単なる言葉の違いではない。

「渡す」構造では、システムの限界が隠される。
「共有する」構造では、システムの限界が素材の一部として含まれる。

6.2 協働パッケージの構成

# ============================================================
# Collaboration Package for Expert Review
# ============================================================

@dataclass
class CollaborationPackage:
    """
    対等な協働のためのパッケージ
    
    専門家に「渡す」のではなく「共有する」ためのデータ構造
    """
    
    # 基本情報
    candidate_name: str
    candidate_id: str
    generated_date: str
    
    # システムが提供する情報（根拠）
    computational_evidence: dict
    """
    - signature_reversal_score
    - overlap_genes (up/down)
    - cell_lines_tested
    - dose_time_conditions
    """
    
    # システムが提供する情報（限界）
    system_limitations: list
    """
    - どのデータソースを使ったか
    - どの細胞株のデータか
    - 何を検証していないか
    """
    
    # 疑いの記録
    doubt_record: dict
    """
    - どのレベルまで「生き残った」か
    - どの疑いが未解決か
    - どのNegative Controlsの結果か
    """
    
    # 停止ルールの記録
    stop_rule_status: dict
    """
    - どのルールをパスしたか
    - どのルールで警告が出たか
    - 沈黙すべきだったか（していないなら理由）
    """
    
    # 専門家への質問（システムが答えられないこと）
    questions_for_expert: list
    """
    - この機序は生物学的に妥当か？
    - この患者群で安全性の懸念はあるか？
    - 既存の知見と整合するか？
    """
    
    # 次のステップの提案（対等な提案）
    suggested_next_steps: list
    """
    - どの実験で検証すべきか
    - どの文献を確認すべきか
    - 誰に相談すべきか
    """
    
    # v5.3免責
    v53_disclaimer: str


def create_collaboration_package(
    candidate: DrugCandidate,
    doubt_evaluation: dict,
    negative_controls: dict,
    stop_rule_result: dict,
    drug_info: Optional[dict]
) -> CollaborationPackage:
    """
    協働パッケージを作成
    """
    
    # 専門家への質問を生成
    questions = []
    
    if candidate.cell_line not in ['HCT116', 'SW480', 'HT29']:  # 大腸がん細胞株
        questions.append(
            f"L1000 data is from {candidate.cell_line}. "
            "Is this cell line relevant for colorectal cancer?"
        )
    
    if not drug_info or not drug_info.get('mechanisms'):
        questions.append(
            "No known mechanism of action found. "
            "Can you suggest a plausible mechanism for this indication?"
        )
    
    if negative_controls.get('overall_assessment', {}).get('confidence_modifier') != 'normal':
        questions.append(
            "Some negative controls raised concerns. "
            "In your experience, are these concerns significant for this candidate?"
        )
    
    # 次のステップを提案
    next_steps = []
    
    if doubt_evaluation.get('current_level', 0) < 4:
        next_steps.append(
            "Replicate finding in independent dataset before wet-lab work"
        )
    
    if candidate.evidence_strength == 'hypothesis_only':
        next_steps.append(
            "Literature review: Has this drug been studied for related conditions?"
        )
    
    next_steps.append(
        "Expert consultation: Oncologist/pharmacologist review recommended"
    )
    
    # パッケージ作成
    package = CollaborationPackage(
        candidate_name=candidate.drug_name,
        candidate_id=candidate.pert_id,
        generated_date=datetime.now().isoformat(),
        
        computational_evidence={
            'rank': candidate.rank,
            'score': candidate.score,
            'overlap_up': candidate.overlap_up,
            'overlap_down': candidate.overlap_down,
            'cell_line': candidate.cell_line,
            'dose': candidate.dose,
            'time': candidate.time
        },
        
        system_limitations=[
            "Based on L1000 cell line data (may not reflect in vivo)",
            "Signature reversal ≠ therapeutic efficacy",
            "Off-target effects not fully characterized",
            f"Data from {candidate.cell_line} cell line only"
        ],
        
        doubt_record=doubt_evaluation,
        stop_rule_status=stop_rule_result,
        questions_for_expert=questions,
        suggested_next_steps=next_steps,
        
        v53_disclaimer=(
            "This is a HYPOTHESIS for investigation, not a treatment recommendation. "
            "The system explicitly does NOT claim this drug will be effective. "
            "All information requires expert review and experimental validation."
        )
    )
    
    return package


def render_collaboration_package(package: CollaborationPackage) -> str:
    """
    協働パッケージを人間が読める形式にレンダリング
    """
    lines = []
    
    lines.append("=" * 70)
    lines.append("COLLABORATION PACKAGE FOR EXPERT REVIEW")
    lines.append("=" * 70)
    lines.append(f"Candidate: {package.candidate_name}")
    lines.append(f"ID: {package.candidate_id}")
    lines.append(f"Generated: {package.generated_date}")
    
    lines.append("\n" + "-" * 70)
    lines.append("COMPUTATIONAL EVIDENCE (What the system found)")
    lines.append("-" * 70)
    for key, value in package.computational_evidence.items():
        lines.append(f"  {key}: {value}")
    
    lines.append("\n" + "-" * 70)
    lines.append("SYSTEM LIMITATIONS (What the system cannot tell you)")
    lines.append("-" * 70)
    for limitation in package.system_limitations:
        lines.append(f"  • {limitation}")
    
    lines.append("\n" + "-" * 70)
    lines.append("QUESTIONS FOR EXPERT (What we need your input on)")
    lines.append("-" * 70)
    for i, question in enumerate(package.questions_for_expert, 1):
        lines.append(f"  {i}. {question}")
    
    lines.append("\n" + "-" * 70)
    lines.append("SUGGESTED NEXT STEPS (Proposals, not directives)")
    lines.append("-" * 70)
    for step in package.suggested_next_steps:
        lines.append(f"  → {step}")
    
    lines.append("\n" + "-" * 70)
    lines.append("v5.3 DISCLAIMER")
    lines.append("-" * 70)
    lines.append(package.v53_disclaimer)
    
    lines.append("\n" + "=" * 70)
    lines.append("END OF COLLABORATION PACKAGE")
    lines.append("=" * 70)
    
    return "\n".join(lines)

6.3 専門家との対話の設計

協働パッケージは一方通行ではない。

┌─────────────────────────────────────────────────────────────┐
│  対等な協働のフロー                                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  [システム]                                                 │
│      ↓ 協働パッケージを共有                                 │
│  [専門家]                                                   │
│      ↓ 質問に回答、追加の懸念を提示                         │
│  [システム設計者]                                           │
│      ↓ フィードバックを次のイテレーションに反映             │
│  [システム]                                                 │
│      ↓ 改善されたパッケージを再共有                         │
│  ...                                                        │
│                                                             │
│  これは「渡して終わり」ではなく「継続的な対話」             │
└─────────────────────────────────────────────────────────────┘

7. 実装：検証モジュール

7.1 統合検証クラス

# ============================================================
# Integrated Validation Module
# ============================================================

class V53Validator:
    """
    v5.3検証システム
    
    Part 2で生成された候補を検証し、
    Part 3の原則に基づいて専門家との協働素材を作成する
    """
    
    def __init__(self, config: V53Config):
        self.config = config
        self.negative_controls = NegativeControlSuite(config)
        self.stop_rules = StopRuleEngine()
        self.validation_log = []
    
    def validate_candidates(
        self,
        candidates: List[DrugCandidate],
        signature: Dict[str, List[str]],
        gene_expr: pd.DataFrame,
        disease_samples: List[str],
        control_samples: List[str],
        drug_info_map: Dict[str, dict]
    ) -> Dict[str, Any]:
        """
        候補リストを検証
        
        Returns:
            Validation results including collaboration packages
        """
        print("\n" + "=" * 70)
        print("v5.3 VALIDATION MODULE")
        print("=" * 70)
        
        results = {
            'validated_candidates': [],
            'stopped_candidates': [],
            'negative_control_results': None,
            'collaboration_packages': [],
            'summary': {}
        }
        
        # Step 1: Negative Controls
        nc_results = self.negative_controls.run_all_controls(
            signature, candidates, gene_expr, disease_samples, control_samples
        )
        results['negative_control_results'] = nc_results
        
        # Step 2: 各候補を評価
        for candidate in candidates:
            
            # Doubt Ladder評価
            doubt_eval = self._evaluate_doubt_ladder(candidate, drug_info_map.get(candidate.drug_name))
            
            # Stop Rules評価
            stop_result = self.stop_rules.evaluate_candidate(
                candidate.to_dict(),
                {'negative_controls': nc_results, 'doubt_level': doubt_eval}
            )
            
            if stop_result['should_stop']:
                results['stopped_candidates'].append({
                    'candidate': candidate,
                    'stop_rules': stop_result['triggered_rules'],
                    'warnings': stop_result['warnings']
                })
            else:
                results['validated_candidates'].append(candidate)
                
                # 協働パッケージ作成
                package = create_collaboration_package(
                    candidate,
                    doubt_eval,
                    nc_results,
                    stop_result,
                    drug_info_map.get(candidate.drug_name)
                )
                results['collaboration_packages'].append(package)
        
        # Step 3: サマリー
        results['summary'] = {
            'total_candidates': len(candidates),
            'validated': len(results['validated_candidates']),
            'stopped': len(results['stopped_candidates']),
            'negative_control_status': nc_results['overall_assessment']['confidence_modifier']
        }
        
        self._print_summary(results)
        
        return results
    
    def _evaluate_doubt_ladder(self, candidate: DrugCandidate, drug_info: Optional[dict]) -> dict:
        """疑いの深化を評価"""
        evaluation = {
            'current_level': 1,
            'levels_passed': [],
            'remaining_doubts': []
        }
        
        # Level 1: 計算で候補になった（常にパス）
        evaluation['levels_passed'].append(1)
        
        # Level 2: シグネチャ逆転
        if candidate.overlap_up + candidate.overlap_down >= 20:
            evaluation['current_level'] = 2
            evaluation['levels_passed'].append(2)
        else:
            evaluation['remaining_doubts'].append("Low gene overlap - may be noise")
            return evaluation
        
        # Level 3: 機序の妥当性
        if drug_info and drug_info.get('mechanisms'):
            evaluation['current_level'] = 3
            evaluation['levels_passed'].append(3)
        else:
            evaluation['remaining_doubts'].append("No known mechanism - plausibility unclear")
        
        # Level 4-6 は追加データ/専門家評価が必要
        evaluation['remaining_doubts'].append("Multi-dataset replication not yet done")
        evaluation['remaining_doubts'].append("Safety profile review pending")
        evaluation['remaining_doubts'].append("Expert review pending")
        
        return evaluation
    
    def _print_summary(self, results: dict):
        """検証サマリーを表示"""
        print("\n" + "-" * 70)
        print("VALIDATION SUMMARY")
        print("-" * 70)
        print(f"  Total candidates: {results['summary']['total_candidates']}")
        print(f"  Validated: {results['summary']['validated']}")
        print(f"  Stopped: {results['summary']['stopped']}")
        print(f"  Confidence level: {results['summary']['negative_control_status']}")
        
        if results['stopped_candidates']:
            print("\n  Stopped candidates:")
            for item in results['stopped_candidates']:
                print(f"    - {item['candidate'].drug_name}: {', '.join(item['stop_rules'])}")
    
    def generate_validation_report(self, results: dict) -> str:
        """検証レポートを生成"""
        lines = []
        
        lines.append("=" * 70)
        lines.append("v5.3 VALIDATION REPORT")
        lines.append("=" * 70)
        lines.append(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
        
        # Negative Controls
        lines.append("\n" + "-" * 70)
        lines.append("NEGATIVE CONTROLS RESULTS")
        lines.append("-" * 70)
        nc = results['negative_control_results']
        for test_name, test_result in nc.items():
            if test_name == 'overall_assessment':
                continue
            if isinstance(test_result, dict):
                status = "PASS" if test_result.get('passed', False) else "WARNING"
                lines.append(f"  {test_name}: {status}")
        
        lines.append(f"\n  Overall confidence: {nc['overall_assessment']['confidence_modifier']}")
        
        # 停止された候補
        if results['stopped_candidates']:
            lines.append("\n" + "-" * 70)
            lines.append("STOPPED CANDIDATES (Silence Report)")
            lines.append("-" * 70)
            for item in results['stopped_candidates']:
                lines.append(f"\n  {item['candidate'].drug_name}")
                lines.append(f"    Stopped by: {', '.join(item['stop_rules'])}")
                for w in item['warnings']:
                    lines.append(f"    Reason: {w}")
        
        # 検証された候補
        lines.append("\n" + "-" * 70)
        lines.append("VALIDATED CANDIDATES")
        lines.append("-" * 70)
        for candidate in results['validated_candidates']:
            lines.append(f"\n  [{candidate.rank}] {candidate.drug_name}")
            lines.append(f"      Evidence: {candidate.evidence_strength}")
            lines.append(f"      Overlap: {candidate.overlap_up + candidate.overlap_down} genes")
        
        # 協働パッケージへの案内
        lines.append("\n" + "-" * 70)
        lines.append("COLLABORATION PACKAGES")
        lines.append("-" * 70)
        lines.append(f"  {len(results['collaboration_packages'])} packages generated")
        lines.append("  Each package contains:")
        lines.append("    - Computational evidence")
        lines.append("    - System limitations")
        lines.append("    - Questions for expert")
        lines.append("    - Suggested next steps")
        
        lines.append("\n" + "=" * 70)
        lines.append("END OF VALIDATION REPORT")
        lines.append("=" * 70)
        
        return "\n".join(lines)

8. ケーススタディ：大腸がん候補の検証

8.1 完全な検証ワークフロー

# ============================================================
# Complete Validation Workflow
# ============================================================

def run_complete_validation(
    candidates: List[DrugCandidate],
    signature: Dict[str, List[str]],
    gene_expr: pd.DataFrame,
    disease_samples: List[str],
    control_samples: List[str],
    enriched_candidates: List[Tuple[DrugCandidate, Optional[dict]]],
    config: V53Config = None
) -> Tuple[str, List[CollaborationPackage]]:
    """
    完全な検証ワークフローを実行
    
    Parameters:
    -----------
    candidates : Part 2で生成された候補リスト
    signature : 疾患シグネチャ
    gene_expr : 発現マトリックス
    disease_samples, control_samples : サンプルリスト
    enriched_candidates : 薬剤情報付き候補
    config : v5.3設定
    
    Returns:
    --------
    Tuple[str, List[CollaborationPackage]] : (検証レポート, 協働パッケージリスト)
    """
    if config is None:
        config = V53Config()
    
    # 薬剤情報のマップを作成
    drug_info_map = {
        cand.drug_name: info 
        for cand, info in enriched_candidates 
        if info is not None
    }
    
    # 検証実行
    validator = V53Validator(config)
    results = validator.validate_candidates(
        candidates,
        signature,
        gene_expr,
        disease_samples,
        control_samples,
        drug_info_map
    )
    
    # レポート生成
    validation_report = validator.generate_validation_report(results)
    
    # 協働パッケージ
    packages = results['collaboration_packages']
    
    return validation_report, packages


# 使用例
print("""
================================================================================
EXAMPLE: Complete Validation Workflow
================================================================================

# Part 2の出力を使用して検証を実行
validation_report, packages = run_complete_validation(
    candidates=audited_candidates,
    signature=signature,
    gene_expr=gene_expr,
    disease_samples=disease_samples,
    control_samples=control_samples,
    enriched_candidates=enriched,
    config=config
)

# 検証レポートを表示
print(validation_report)

# 協働パッケージを保存
for i, package in enumerate(packages[:5]):  # 上位5件
    rendered = render_collaboration_package(package)
    with open(f"collaboration_package_{i+1}.txt", "w") as f:
        f.write(rendered)
    print(f"Package {i+1} saved: {package.candidate_name}")

================================================================================
""")

9. 倫理的考慮と限界

9.1 このシステムが「してはいけないこと」

┌─────────────────────────────────────────────────────────────┐
│  v5.3システムの禁止事項                                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ❌ 「この薬が効く」と主張する                              │
│  ❌ 患者個人への治療推奨を行う                              │
│  ❌ 臨床試験なしでの使用を示唆する                          │
│  ❌ 専門家の判断を代替する                                  │
│  ❌ 検証なしの候補を「有望」と表現する                      │
│  ❌ 失敗やエラーを隠す                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

9.2 このシステムが「できること」

┌─────────────────────────────────────────────────────────────┐
│  v5.3システムの可能な貢献                                   │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ✓ 調査する価値のある候補を整理する                        │
│  ✓ 候補ごとの根拠と限界を明示する                          │
│  ✓ 専門家との対話の素材を提供する                          │
│  ✓ 検証のためのフレームワークを提案する                    │
│  ✓ 自分の出力を疑う仕組みを持つ                            │
│  ✓ 沈黙すべきときに沈黙する                                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

9.3 個人的な文脈について

Part 1で触れた筆者の家族のケースは、このシステムの開発動機を示すために言及した。

しかし、個別の患者の治療にこのシステムを適用することは本記事の範囲外である。

理由：

個別患者への適用は医療行為であり、システム設計者の領分ではない
家族のケースは「成功例」だが、それがシステムの正しさを証明するわけではない
v5.3の原則は「助ける」ことであり、「救う」ことではない

「可哀想な人がいたら助ける、当たり前のこと。上からではなく対等。特別じゃなく普通。」

このシステムは「特別な発見」を主張しない。
専門家が判断するための素材を、誠実に整理するだけである。

10. シリーズの結び

10.1 三部作の位置づけ

Part 1（理論編）：なぜこのアプローチを取るか
Part 2（実装編）：どうやって候補を見つけるか
Part 3（検証編）：候補をどう疑い、どう共有するか

10.2 v5.3の核心

このシリーズを通じて示したかったことは：

AIドラッグリポジショニングの価値は、「候補を当てること」にはない。
「誤った確信を出さずに、検証へ渡すこと」にある。

出力は「推奨」ではなく「検証タスクの発行」である。

10.3 「助ける」ということ

Part 1の冒頭で触れた通り、このプロジェクトは家族の闘病をきっかけに始まった。

しかし、v5.3の設計思想は「家族を救う」ことを目指していない。

v5.3が目指すのは：

「調査する価値のある仮説を、誠実に整理すること」

それは「救う」ことではなく「助ける」ことである。

上からではなく、対等に。
特別なことではなく、当たり前のこととして。

10.4 謝辞

このシリーズの執筆にあたり、Claude（Anthropic）との協働を通じて多くの洞察を得た。

AIとの協働もまた、v5.3の原則に従う：

AIの出力を盲信しない（幻覚禁止）
AIに迎合を求めない（追従禁止）
AIの「わからない」を尊重する（偽陰性回避）

そして何より、対等なパートナーとして。

著者: dosanko_tousan + Claude (Opus 4.5)
最終更新: 2026年1月
ライセンス: CC BY 4.0

付録：チェックリスト

A. 検証前チェックリスト

□ Part 2の出力が正常に生成されているか
□ Negative Controlsの準備ができているか
□ Stop Rulesが定義されているか
□ 専門家との協働体制が整っているか

B. 検証後チェックリスト

□ 全てのNegative Controlsを実行したか
□ Stop Rulesで停止した候補を記録したか
□ 協働パッケージを作成したか
□ 沈黙レポートを作成したか（停止候補がある場合）
□ 免責事項が全ての出力に含まれているか

C. 専門家への引き渡しチェックリスト

□ 計算根拠が明示されているか
□ システムの限界が明示されているか
□ 専門家への質問が整理されているか
□ 次のステップが提案されているか（指示ではなく提案として）
□ v5.3免責事項が含まれているか

このシステムは教育・研究目的でのみ使用してください。
臨床判断には使用しないでください。
全ての候補は専門家による評価と実験的検証が必要です。

最後に

「本システムの価値は、候補を当てることではない。
誤った確信を出さずに、検証へ渡すことにある。」

これがv5.3の約束である。

免責・非関係・ライセンス・訂正（重要）

免責（研究・教育目的）

本記事および本システムは 研究・教育目的 の情報提供です。
出力は 仮説（Hypothesis） であり、臨床的有効性・安全性・適応を保証しません。
医学的判断（診断・治療・処方・投薬の可否）は必ず 資格を持つ医療専門家 が行ってください。
本記事／コードの利用により生じたいかなる損害についても、著者は責任を負いません。
本システムは 医療機器ではなく、規制当局（例：PMDA/FDA等）の承認を受けたものではありません。

非関係（No affiliation / No endorsement）

著者は、特定の企業・製品・研究機関・規制当局と 雇用・代理・公式提携関係にありません。
特定の治療法・薬剤・企業を推奨する意図はありません。

ライセンス（どこが何ライセンスか）

記事本文：CC BY 4.0（著者表示により再利用可）
コード：MIT License（リポジトリに記載のLICENSEを優先）
※本文とコードでライセンスが異なる場合があります。必ず各LICENSE表記を確認してください。

訂正（Errata）

誤り・不足・改善提案は Issue / コメント / フォームで歓迎します。
重要な訂正は「更新履歴」に追記し、可能なら差分も公開します。

シリーズ完

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

v5.3 ドラッグリポジショニング監査システム Part 3: 検証編 ― 疑いの深化と対等な協働