bravesoft Advent Calendar 2025

bravesoft株式会社

数学で「バズり」を科学する

Last updated at 2025-12-09Posted at 2025-12-09

はじめに

「バズる」という現象を数学的に説明してみます。TwitterやTikTokで投稿が爆発的に拡散される様子は、実は感染症の広がり方と同じ数学モデルで説明できます。

本記事では、100年近い研究の歴史を持つSIRモデル（感染症数理モデル）をSNSの情報拡散に応用し、Pythonで実装しながら「バズり」のメカニズムを解明してみたいと思います。

1. 指数関数的成長 - バズりの初期段階

1.1 数学的基礎

最も単純なバズりモデルは指数関数で表されます

N(t) = N₀ × r^t

N(t): 時刻tでの拡散数（RT数、いいね数など）
N₀: 初期値（最初の投稿を見た人数）
r: 拡散率（1人が平均何人に広めるか）
t: 経過時間

例：1人が平均2人にシェアする場合（r=2）

1時間後: 2人
2時間後: 4人
3時間後: 8人
10時間後: 1,024人
20時間後: 1,048,576人

1.2 実装：指数関数成長の可視化

import numpy as np
import matplotlib.pyplot as plt
import matplotlib

# 日本語フォントの設定
plt.rcParams['font.sans-serif'] = ['Hiragino Sans', 'Yu Gothic', 'Meirio', 'MS Gothic', 'DejaVu Sans']
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['axes.unicode_minus'] = False  # マイナス記号の文字化け対策

def exponential_growth(N0, r, t):
    """
    指数関数的成長モデル
    
    Parameters:
    -----------
    N0 : int
        初期拡散数
    r : float
        拡散率（基本再生産数に相当）
    t : array
        時間配列
    
    Returns:
    --------
    array : 各時刻での拡散数
    """
    return N0 * (r ** t)

# パラメータ設定
N0 = 1      # 初期投稿
r = 2       # 拡散率（1人が2人にシェア）
t = np.arange(0, 15, 0.1)

# 計算
N = exponential_growth(N0, r, t)

# 可視化
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# 通常スケール
ax1.plot(t, N, 'b-', linewidth=2, label=f'r={r}')
ax1.set_xlabel('Time (hours)', fontsize=12)
ax1.set_ylabel('Number of shares', fontsize=12)
ax1.set_title('Exponential Growth (Linear Scale)', fontsize=14, fontweight='bold')
ax1.grid(True, alpha=0.3)
ax1.legend()

# 対数スケール
ax2.semilogy(t, N, 'r-', linewidth=2, label=f'r={r}')
ax2.set_xlabel('Time (hours)', fontsize=12)
ax2.set_ylabel('Number of shares (log scale)', fontsize=12)
ax2.set_title('Exponential Growth (Log Scale)', fontsize=14, fontweight='bold')
ax2.grid(True, alpha=0.3)
ax2.legend()

plt.tight_layout()
plt.savefig('exponential_growth.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"10時間後の拡散数: {exponential_growth(N0, r, 10):.0f}人")
print(f"20時間後の拡散数: {exponential_growth(N0, r, 20):.0f}人")

対数スケールでプロットすると直線になります。これが「指数関数的成長」を判定する決定的な特徴です。

1.3 なぜ対数スケールか？

対数を取ると

log(N(t)) = log(N₀) + t × log(r)

これは傾きlog(r)の直線方程式です。実際のデータを対数プロットして直線になれば、指数関数的成長していると判断できます。

2. SIRモデル - より現実的なバズりの数理

2.1 モデルの理論的背景

SIRモデルは1927年にKermackとMcKendrickによって発表され、感染症数理疫学の基礎として確立されています。このモデルは感染症だけでなく、ソーシャルネットワーク上の情報拡散、コンピュータウイルスの伝播、金融ネットワークの危機伝播など、幅広い分野で応用されています。

実際のSNSでは無限に成長しません。理由は以下の通りです。

すでに見た人には拡散されない（有限人口）
時間とともに興味が薄れる（飽和効果）
フォロワー数には限界がある（ネットワーク制約）

2.2 数学的定式化

SIRモデルは人口を3つのグループに分類します。

S (Susceptible): まだ見ていない人
I (Infected): バズってる投稿を見て拡散する人
R (Recovered): すでに見終わった人

微分方程式系：

dS/dt = -βSI/N
dI/dt = βSI/N - γI
dR/dt = γI

パラメータの意味：

β (beta): 接触率・感染率。SNSでは「シェアされる確率」に相当
γ (gamma): 回復率・除去率。SNSでは「興味が薄れる速さ」に相当（1/γ = 平均拡散期間）
N: 総人口（潜在的なリーチ数）
R₀ = β/γ: 基本再生産数（1人が平均何人に拡散させるか）

重要な閾値：

R₀ > 1: 拡散が広がる（バズる）
R₀ < 1: 拡散が収束する（バズらない）
R₀ = 1: 臨界点

2.3 実装：SIRモデルのシミュレーション

from scipy.integrate import odeint
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# 日本語フォントの設定
plt.rcParams['font.sans-serif'] = ['Hiragino Sans', 'Yu Gothic', 'Meirio', 'MS Gothic', 'DejaVu Sans']
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['axes.unicode_minus'] = False  # マイナス記号の文字化け対策

def sir_model(y, t, beta, gamma, N):
    """
    SIRモデルの微分方程式
    
    パラメータ:
    -----------
    y : list
        [S, I, R] の状態ベクトル
    t : float
        時刻
    beta : float
        接触率（拡散のしやすさ）
    gamma : float
        回復率（興味が薄れる速さ）
    N : int
        総人口
    
    返り値:
    --------
    list : [dS/dt, dI/dt, dR/dt]
    """
    S, I, R = y
    dS_dt = -beta * S * I / N
    dI_dt = beta * S * I / N - gamma * I
    dR_dt = gamma * I
    return [dS_dt, dI_dt, dR_dt]

# パラメータ設定（実データに基づく推定値）
N = 1000000      # 総ユーザー数（100万人）
I0 = 10          # 初期拡散者（最初に見た10人）
R0 = 0           # 初期の「見終わった人」
S0 = N - I0 - R0 # まだ見ていない人

beta = 0.5       # 接触率
gamma = 0.1      # 回復率（1/gamma = 10時間が平均拡散期間）
R_naught = beta / gamma  # 基本再生産数

print(f"基本再生産数 R₀ = {R_naught:.2f}")
print(f"R₀ > 1 なので、これはバズります！🚀")

# 初期条件
y0 = [S0, I0, R0]

# 時間軸（0〜100時間）
t = np.linspace(0, 100, 1000)

# 数値解を求める（4次ルンゲ=クッタ法）
solution = odeint(sir_model, y0, t, args=(beta, gamma, N))
S, I, R = solution.T

# 結果の分析
peak_idx = np.argmax(I)
peak_time = t[peak_idx]
peak_value = I[peak_idx]
final_reached = R[-1]

print(f"\n=== シミュレーション結果 ===")
print(f"ピーク時の拡散人数: {peak_value:.0f}人")
print(f"ピーク到達時刻: {peak_time:.1f}時間後")
print(f"最終的に到達した人数: {final_reached:.0f}人 ({final_reached/N*100:.1f}%)")

# 可視化
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# SIRの推移
ax1.plot(t, S, 'b-', label='まだ見ていない人 (S)', linewidth=2)
ax1.plot(t, I, 'r-', label='拡散中の人 (I)', linewidth=2)
ax1.plot(t, R, 'g-', label='見終わった人 (R)', linewidth=2)
ax1.set_xlabel('経過時間（時間）', fontsize=12)
ax1.set_ylabel('人数', fontsize=12)
ax1.set_title(f'SIRモデル：バズりの時間変化 (R₀={R_naught:.1f})', 
              fontsize=14, fontweight='bold')
ax1.legend(fontsize=10)
ax1.grid(True, alpha=0.3)

# 拡散中の人数の詳細
ax2.plot(t, I, 'r-', linewidth=2.5)
ax2.axhline(y=peak_value, color='k', linestyle='--', alpha=0.5, 
            label=f'ピーク: {peak_value:.0f}人 @ {peak_time:.1f}時間')
ax2.axvline(x=peak_time, color='k', linestyle='--', alpha=0.5)
ax2.fill_between(t, 0, I, alpha=0.3, color='red')
ax2.set_xlabel('経過時間（時間）', fontsize=12)
ax2.set_ylabel('アクティブに拡散している人数', fontsize=12)
ax2.set_title('バズりのピーク', fontsize=14, fontweight='bold')
ax2.legend(fontsize=10)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('sir_model_simulation.png', dpi=150, bbox_inches='tight')
plt.show()

2.4 パラメータの感度分析

異なるR₀値での比較：

# 複数のシナリオを比較
scenarios = [
    {'beta': 0.3, 'gamma': 0.1, 'label': 'R₀=3.0 (強いバズり)'},
    {'beta': 0.2, 'gamma': 0.1, 'label': 'R₀=2.0 (中規模バズり)'},
    {'beta': 0.15, 'gamma': 0.1, 'label': 'R₀=1.5 (弱いバズり)'},
    {'beta': 0.08, 'gamma': 0.1, 'label': 'R₀=0.8 (拡散失敗)'},
]

plt.figure(figsize=(12, 6))

for scenario in scenarios:
    beta = scenario['beta']
    gamma = scenario['gamma']
    R0 = beta / gamma
    
    solution = odeint(sir_model, y0, t, args=(beta, gamma, N))
    S, I, R = solution.T
    
    plt.plot(t, I, linewidth=2.5, label=scenario['label'])

plt.xlabel('Time (hours)', fontsize=12)
plt.ylabel('Number of active sharers', fontsize=12)
plt.title('Effect of R₀ on Viral Spread', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.savefig('r0_sensitivity_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

3. 実データ風シミュレーション - バズりパターンの分類

実際のTwitterデータに近い特性を持つデータを生成して分析します。

# より現実的なバズりパターンの生成
np.random.seed(42)
hours = np.arange(0, 72)  # 3日間

def generate_realistic_viral_pattern(peak_time, max_rt, decay_rate, noise_level=0.05):
    """
    現実的なバズりパターンを生成
    
    Parameters:
    -----------
    peak_time : float
        ピーク到達時刻
    max_rt : int
        最大RT数
    decay_rate : float
        衰退速度
    noise_level : float
        ノイズの大きさ（変動係数）
    """
    rt_counts = []
    cumulative_rt = 0
    
    for h in hours:
        if h < peak_time:
            # 成長期（ロジスティック関数的）
            growth = max_rt * (h / peak_time) ** 2
        else:
            # 衰退期（指数減衰）
            growth = max_rt * np.exp(-decay_rate * (h - peak_time))
        
        # ノイズを追加（現実のばらつきを再現）
        noise = np.random.normal(0, growth * noise_level)
        cumulative_rt = max(0, growth + noise)
        rt_counts.append(int(cumulative_rt))
    
    return np.array(rt_counts)

# 3つの異なる規模のバズりを生成
tweets = {
    'Mega-viral (100K RT)': generate_realistic_viral_pattern(12, 100000, 0.08),
    'Mid-viral (10K RT)': generate_realistic_viral_pattern(8, 10000, 0.15),
    'Mini-viral (1K RT)': generate_realistic_viral_pattern(6, 1000, 0.25),
}

# 包括的な可視化
fig = plt.figure(figsize=(16, 12))
gs = fig.add_gridspec(3, 2, hspace=0.3, wspace=0.3)

# 1. 通常スケールでの比較
ax1 = fig.add_subplot(gs[0, 0])
for label, data in tweets.items():
    ax1.plot(hours, data, marker='o', label=label, linewidth=2, markersize=4, alpha=0.8)
ax1.set_xlabel('Elapsed time (hours)', fontsize=11)
ax1.set_ylabel('Cumulative RT count', fontsize=11)
ax1.set_title('Viral Pattern Comparison (Linear Scale)', fontsize=12, fontweight='bold')
ax1.legend(fontsize=9)
ax1.grid(True, alpha=0.3)

# 2. 対数スケールでの比較
ax2 = fig.add_subplot(gs[0, 1])
for label, data in tweets.items():
    ax2.semilogy(hours, np.maximum(data, 1), marker='o', label=label, 
                 linewidth=2, markersize=4, alpha=0.8)
ax2.set_xlabel('Elapsed time (hours)', fontsize=11)
ax2.set_ylabel('Cumulative RT count (log scale)', fontsize=11)
ax2.set_title('Viral Pattern Comparison (Log Scale)', fontsize=12, fontweight='bold')
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3)

# 3. 拡散速度（1時間あたりの増加数）
ax3 = fig.add_subplot(gs[1, 0])
for label, data in tweets.items():
    velocity = np.diff(data)
    ax3.plot(hours[1:], velocity, marker='o', label=label, 
             linewidth=2, markersize=4, alpha=0.8)
ax3.set_xlabel('Elapsed time (hours)', fontsize=11)
ax3.set_ylabel('RT increase per hour', fontsize=11)
ax3.set_title('Spread Velocity Over Time', fontsize=12, fontweight='bold')
ax3.legend(fontsize=9)
ax3.grid(True, alpha=0.3)

# 4. 成長率（対前時刻比）
ax4 = fig.add_subplot(gs[1, 1])
for label, data in tweets.items():
    growth_rate = np.diff(data) / (data[:-1] + 1) * 100
    ax4.plot(hours[1:], growth_rate, marker='o', label=label, 
             linewidth=2, markersize=4, alpha=0.8)
ax4.set_xlabel('Elapsed time (hours)', fontsize=11)
ax4.set_ylabel('Growth rate (%)', fontsize=11)
ax4.set_title('Hourly Growth Rate', fontsize=12, fontweight='bold')
ax4.legend(fontsize=9)
ax4.grid(True, alpha=0.3)
ax4.axhline(y=0, color='k', linestyle='--', alpha=0.3)

# 5. 累積分布（最終到達率）
ax5 = fig.add_subplot(gs[2, 0])
for label, data in tweets.items():
    normalized = data / data[-1] * 100  # パーセンテージに正規化
    ax5.plot(hours, normalized, marker='o', label=label, 
             linewidth=2, markersize=4, alpha=0.8)
ax5.set_xlabel('Elapsed time (hours)', fontsize=11)
ax5.set_ylabel('Cumulative reach (%)', fontsize=11)
ax5.set_title('Normalized Cumulative Distribution', fontsize=12, fontweight='bold')
ax5.legend(fontsize=9)
ax5.grid(True, alpha=0.3)
ax5.axhline(y=50, color='r', linestyle='--', alpha=0.3, label='50% reached')

# 6. 統計サマリー
ax6 = fig.add_subplot(gs[2, 1])
ax6.axis('off')

summary_text = "=== Statistical Summary ===\n\n"
for label, data in tweets.items():
    peak_idx = np.argmax(np.diff(data))
    peak_time = hours[peak_idx + 1]
    final_rt = data[-1]
    half_time_idx = np.where(data >= final_rt / 2)[0]
    half_time = hours[half_time_idx[0]] if len(half_time_idx) > 0 else 0
    
    summary_text += f"{label}:\n"
    summary_text += f"  Peak time: {peak_time}h\n"
    summary_text += f"  Final RT: {final_rt:,}\n"
    summary_text += f"  50% reached at: {half_time}h\n"
    summary_text += f"  Growth rate at peak: {np.diff(data)[peak_idx]:.0f} RT/h\n\n"

ax6.text(0.1, 0.9, summary_text, transform=ax6.transAxes, 
         fontsize=10, verticalalignment='top', fontfamily='monospace',
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))

plt.savefig('comprehensive_viral_analysis.png', dpi=150, bbox_inches='tight')
plt.show()

4. バズりの法則と予測モデル

4.1 初速の重要性

投稿の最初の6〜12時間の拡散速度がその後の到達範囲を強く予測します。

def analyze_early_momentum(data, early_hours=6):
    """
    初期の勢いを分析
    
    Parameters:
    -----------
    data : array
        時系列データ
    early_hours : int
        初期期間（時間）
    
    Returns:
    --------
    dict : 分析結果
    """
    early_rt = data[early_hours]
    final_rt = data[-1]
    momentum_ratio = final_rt / early_rt if early_rt > 0 else 0
    
    return {
        'early_rt': early_rt,
        'final_rt': final_rt,
        'momentum_ratio': momentum_ratio,
        'prediction': early_rt * momentum_ratio
    }

print("=== 初速分析 ===")
for label, data in tweets.items():
    result = analyze_early_momentum(data, early_hours=6)
    print(f"\n{label}:")
    print(f"  最初の6時間: {result['early_rt']:,} RT")
    print(f"  最終到達数: {result['final_rt']:,} RT")
    print(f"  増幅率: {result['momentum_ratio']:.1f}x")

4.2 半減期の計算

バズりの「寿命」を測る指標として半減期を計算します。

def calculate_half_life(data):
    """
    バズりの半減期を計算
    
    Parameters:
    -----------
    data : array
        累積RTデータ
    
    Returns:
    --------
    float : 半減期（時間）
    """
    velocity = np.diff(data)  # 1時間あたりの増加数
    peak_idx = np.argmax(velocity)
    peak_velocity = velocity[peak_idx]
    half_velocity = peak_velocity / 2
    
    # ピーク以降で半減する時刻を探す
    after_peak = velocity[peak_idx:]
    try:
        half_idx = np.where(after_peak < half_velocity)[0][0]
        return half_idx
    except:
        return len(after_peak)

print("\n=== 半減期分析 ===")
for label, data in tweets.items():
    half_life = calculate_half_life(data)
    print(f"{label}: 半減期 = {half_life}時間")

5. バズり予測システムの実装

5.1 機械学習アプローチ

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_absolute_percentage_error

class ViralPredictor:
    """
    バズり予測システム
    
    初期データから最終到達数を予測するクラス
    """
    
    def __init__(self):
        self.history = []
        self.model = LinearRegression()
    
    def add_data_point(self, time, rt_count):
        """データポイントを追加"""
        self.history.append({'time': time, 'rt': rt_count})
    
    def predict_peak(self, method='exponential'):
        """
        ピーク到達時刻と規模を予測
        
        Parameters:
        -----------
        method : str
            'exponential' または 'linear'
        
        Returns:
        --------
        dict : 予測結果
        """
        if len(self.history) < 3:
            return None
        
        times = np.array([d['time'] for d in self.history])
        rts = np.array([d['rt'] for d in self.history])
        
        if method == 'exponential':
            # 対数変換して線形回帰
            log_rts = np.log(np.maximum(rts, 1))
            self.model.fit(times.reshape(-1, 1), log_rts)
            
            # 成長率を取得
            growth_rate = self.model.coef_[0]
            intercept = self.model.intercept_
            
            # 予測
            future_time = times[-1] + 10
            predicted_log = self.model.predict([[future_time]])[0]
            predicted_peak = np.exp(predicted_log)
            
            # 分類
            if growth_rate > 0.3:
                status = "強いバズりの予兆！"
                confidence = "高"
            elif growth_rate > 0.15:
                status = "中規模バズり進行中"
                confidence = "中"
            elif growth_rate > 0.05:
                status = "緩やかな拡散"
                confidence = "中"
            else:
                status = "拡散が鈍化"
                confidence = "低"
            
            # R²スコアで適合度を評価
            r2 = r2_score(log_rts, self.model.predict(times.reshape(-1, 1)))
            
            return {
                'status': status,
                'predicted_peak': predicted_peak,
                'growth_rate': growth_rate,
                'r_squared': r2,
                'confidence': confidence,
                'basic_reproduction_number': np.exp(growth_rate)  # R₀の近似
            }
        
        return None
    
    def visualize_prediction(self, hours_ahead=20):
        """予測を可視化"""
        if len(self.history) < 3:
            print("データが不足しています（最低3ポイント必要）")
            return
        
        times = np.array([d['time'] for d in self.history])
        rts = np.array([d['rt'] for d in self.history])
        
        # 予測
        prediction = self.predict_peak()
        
        if prediction is None:
            return
        
        # 将来の予測曲線
        future_times = np.linspace(0, times[-1] + hours_ahead, 200)
        log_rts = np.log(np.maximum(rts, 1))
        self.model.fit(times.reshape(-1, 1), log_rts)
        predicted_log = self.model.predict(future_times.reshape(-1, 1))
        predicted = np.exp(predicted_log)
        
        # 信頼区間の概算（簡易版）
        residuals = log_rts - self.model.predict(times.reshape(-1, 1))
        std_error = np.std(residuals)
        upper_bound = np.exp(predicted_log + 1.96 * std_error)
        lower_bound = np.exp(predicted_log - 1.96 * std_error)
        
        # 可視化
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
        
        # 左図：予測曲線
        ax1.scatter(times, rts, color='red', s=100, label='Observed data', 
                   zorder=3, edgecolors='black', linewidths=1.5)
        ax1.plot(future_times, predicted, 'b-', linewidth=2.5, 
                label=f'Predicted (R²={prediction["r_squared"]:.3f})', alpha=0.8)
        ax1.fill_between(future_times, lower_bound, upper_bound, 
                         alpha=0.2, color='blue', label='95% Confidence interval')
        ax1.axvline(x=times[-1], color='gray', linestyle='--', 
                   alpha=0.5, label='Current time')
        ax1.set_xlabel('Elapsed time (hours)', fontsize=12)
        ax1.set_ylabel('RT count', fontsize=12)
        ax1.set_title('Viral Prediction Model', fontsize=14, fontweight='bold')
        ax1.legend(fontsize=10)
        ax1.grid(True, alpha=0.3)
        ax1.semilogy()
        
        # 右図：統計情報
        ax2.text(0.1, 0.95, stats_text, transform=ax2.transAxes,
                fontsize=10, verticalalignment='top', fontfamily='monospace',
                bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.3))
        
        plt.tight_layout()
        plt.savefig('viral_prediction.png', dpi=150, bbox_inches='tight')
        plt.show()

# 使用例：初期6時間のデータで予測
predictor = ViralPredictor()

# 実際のバズり初期段階を模擬
early_data = [
    (1, 50),
    (2, 150),
    (3, 500),
    (4, 1500),
    (5, 4000),
    (6, 9000)
]

for time, rt in early_data:
    predictor.add_data_point(time, rt)

# 予測実行
prediction = predictor.predict_peak()
print(f"\n予測結果: {prediction['status']}")
print(f"成長率: {prediction['growth_rate']:.3f}")
print(f"R₀推定値: {prediction['basic_reproduction_number']:.2f}")
print(f"予測最大RT数: {prediction['predicted_peak']:,.0f}")
print(f"モデル適合度 (R²): {prediction['r_squared']:.3f}")

# 可視化
predictor.visualize_prediction(hours_ahead=24)

5.2 予測精度の検証

# 実データとの比較検証
def validate_prediction(actual_data, predictor, test_point=6):
    """
    予測モデルの精度を検証
    
    Parameters:
    -----------
    actual_data : array
        実際のデータ
    predictor : ViralPredictor
        予測器
    test_point : int
        予測開始時点
    """
    # 初期データで学習
    for i in range(test_point):
        predictor.add_data_point(i, actual_data[i])
    
    # 予測
    prediction = predictor.predict_peak()
    
    # 実際の最終値
    actual_final = actual_data[-1]
    predicted_final = prediction['predicted_peak']
    
    # 誤差計算
    if actual_final == 0:
        if predicted_final == 0:
            error_percentage = 0.0
            error_message = "0.0% (両方とも0)"
        else:
            error_percentage = float('inf')
            error_message = "N/A (実測値が0、予測値は非0)"
    else:
        error_percentage = abs(predicted_final - actual_final) / actual_final * 100
        error_message = f"{error_percentage:.1f}%"
    
    print(f"\n=== 予測精度検証 ===")
    print(f"予測値: {predicted_final:,.0f} RT")
    print(f"実測値: {actual_final:,.0f} RT")
    print(f"誤差率: {error_message}")
    print(f"R²スコア: {prediction['r_squared']:.3f}")
    
    return {
        'predicted': predicted_final,
        'actual': actual_final,
        'error_pct': error_percentage,
        'r_squared': prediction['r_squared']
    }

# 各バズりパターンで検証
print("\n" + "="*50)
print("予測モデルの精度検証")
print("="*50)

for label, data in tweets.items():
    print(f"\n{label}")
    predictor = ViralPredictor()
    validate_prediction(data, predictor, test_point=6)

6. モデルの限界と実用上の注意点

6.1 理論的限界

SIRモデルおよび本記事のモデルには以下の限界があります。

均質性の仮定：すべてのユーザーが同じ行動をすると仮定していますが、実際には影響力に大きな差があります
パラメータの時間不変性：βとγが一定と仮定していますが、実際には時間とともに変化します
ネットワーク構造の単純化：実際のSNSは複雑なネットワーク構造を持ちます
アルゴリズムの影響：プラットフォームのレコメンデーションアルゴリズムは考慮されていません

6.2 実用上の推奨事項

本モデルを実際に使用する際の推奨

# 実用的なパラメータ推定ガイド
def estimate_parameters_from_real_data(early_rt_counts, early_hours=6):
    """
    実データから現実的なパラメータを推定
    
    Parameters:
    -----------
    early_rt_counts : list
        初期のRT数の時系列
    early_hours : int
        観測時間
    
    Returns:
    --------
    dict : 推定パラメータ
    """
    times = np.arange(len(early_rt_counts))
    rts = np.array(early_rt_counts)
    
    # 対数線形回帰で成長率を推定
    log_rts = np.log(np.maximum(rts, 1))
    coeffs = np.polyfit(times, log_rts, 1)
    growth_rate = coeffs[0]
    
    # SIRモデルのパラメータに変換
    # これは経験的な変換式（要調整）
    beta_estimate = growth_rate * 2  # 接触率
    gamma_estimate = growth_rate * 0.5  # 回復率
    
    return {
        'beta': max(0.01, min(1.0, beta_estimate)),  # 0.01-1.0の範囲に制限
        'gamma': max(0.01, min(0.5, gamma_estimate)),  # 0.01-0.5の範囲に制限
        'R0': beta_estimate / gamma_estimate if gamma_estimate > 0 else 0,
        'growth_rate': growth_rate
    }

# 使用例
sample_data = [100, 300, 900, 2700, 8100, 24300]
params = estimate_parameters_from_real_data(sample_data)

print("\n=== 推定パラメータ ===")
print(f"β (接触率): {params['beta']:.3f}")
print(f"γ (回復率): {params['gamma']:.3f}")
print(f"R₀: {params['R0']:.2f}")
print(f"成長率: {params['growth_rate']:.3f}")

7. まとめと今後の発展

7.1 本記事のまとめ

指数関数的成長：バズりの初期段階は指数関数で近似できる
対数スケール：対数グラフで直線になることで指数成長を判定可能
SIRモデル：より現実的なバズりの山型を表現でき、100年の研究実績がある
基本再生産数R₀：R₀>1でバズり、R₀<1で収束
初速の重要性：最初の6時間のデータが最終到達数を予測
予測の限界：モデルは単純化されており、実際はより複雑

7.2 より高度な発展

ネットワークモデル：グラフ理論を用いたより詳細な拡散モデル
機械学習の統合：LSTMやTransformerによる時系列予測
感情分析：投稿内容の感情がバズりに与える影響の分析
マルチプラットフォーム：複数SNS間の相互作用を考慮
実データ収集：Twitter API v2を使った実証分析

免責事項：本記事は趣味で作成したものです。モデルは大幅に単純化されています。実際のSNSはもっと複雑です。「こんな考え方もあるんだな〜」くらいの気持ちで楽しんでいただければ幸いです！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up