GASでGmailを自動解析する — GmailApp.search + 正規表現で通知メールから構造化データを抽出

Posted at 2026-06-26

通知メールから自動で営業リードを抽出する

メール本文の非構造化テキストを正規表現で解析し、自動で営業リード情報を抽出できます。

メール取得と基本的な抽出パターン

GmailApp.search() で条件に合致するメールを取得し、getPlainBody() で平文化したうえで正規表現で必要なフィールドを抽出します。

function extractLeadsFromMail() {
  const threads = GmailApp.search('subject:"新着案件" is:unread', 0, 10);
  
  for (const thread of threads) {
    const messages = thread.getMessages();
    for (const msg of messages) {
      const body = msg.getPlainBody();
      
      // メールが HTML なら平文化を試みる
      const plainBody = body.length > 0 ? body : htmlToPlainText(msg.getContentType());
      
      // 案件名: "案件：(.*)" で抽出
      const jobMatch = plainBody.match(/案件[：:]\s*(.+?)(?:\n|$)/);
      const jobTitle = jobMatch ? jobMatch[1].trim() : null;
      
      // 予算: "予算：(\d+)" で抽出
      const budgetMatch = plainBody.match(/予算[：:]\s*(\d{1,3}(?:,\d{3})*)\s*円/);
      const budget = budgetMatch ? budgetMatch[1] : null;
      
      // メール送信者から組織名を推測
      const senderEmail = msg.getFrom();
      const domainMatch = senderEmail.match(/@(.+)/);
      const domain = domainMatch ? domainMatch[1] : null;
      
      Logger.log({
        jobTitle,
        budget,
        domain,
        senderEmail,
        timestamp: msg.getDate()
      });
      
      // 処理済みラベルを付与
      thread.addLabel(GmailApp.getUserLabelByName('処理済み'));
    }
  }
}

HTMLメールのフォールバック

HTMLメールがgetPlainBody() で空文字列になる場合があります。その場合は Rich Text 形式で取得してから HTML タグを削除します。

function htmlToPlainText(contentType) {
  // GmailApp ではなく GmailMessage API を使う場合の処理
  // ここでは簡易版：HTML エンティティと主要タグを削除
  return function(html) {
    return html
      .replace(/<[^>]*>/g, '')
      .replace(/&nbsp;/g, ' ')
      .replace(/&lt;/g, '<')
      .replace(/&gt;/g, '>')
      .replace(/&amp;/g, '&')
      .replace(/\r\n/g, '\n');
  };
}

ラベルによる処理済み制御

メール解析後の重複処理を防ぐために、ラベルで管理します。未処理メールのみを対象にすることで API 呼び出しを削減できます。

function collectUnprocessedLeads() {
  // "処理済み" ラベルを除外
  const threads = GmailApp.search('-label:処理済み subject:"案件" is:unread', 0, 50);
  
  if (threads.length === 0) {
    Logger.log('処理対象メールなし');
    return;
  }
  
  const leads = [];
  
  for (const thread of threads) {
    const msg = thread.getMessages()[0];
    const extracted = extractFromBody(msg.getPlainBody());
    
    if (extracted.isValid) {
      leads.push(extracted);
      thread.addLabel(GmailApp.getUserLabelByName('処理済み'));
    } else {
      // 不完全な形式は "要確認" ラベルのみ
      thread.addLabel(GmailApp.getUserLabelByName('要確認'));
    }
  }
  
  return leads;
}

function extractFromBody(body) {
  const jobMatch = body.match(/案件[：:]\s*(.+?)(?:\n|$)/);
  const budgetMatch = body.match(/予算[：:]\s*(\d+)/);
  
  return {
    jobTitle: jobMatch ? jobMatch[1].trim() : null,
    budget: budgetMatch ? parseInt(budgetMatch[1]) : null,
    isValid: jobMatch && budgetMatch
  };
}

複数の正規表現パターンに対応

異なるメール形式から統一された構造データを抽出する場合、複数のパターンを試行します。

function flexibleExtraction(body) {
  const patterns = {
    jobTitle: [
      /案件[：:]\s*(.+?)(?:\n|$)/,
      /プロジェクト[：:]\s*(.+?)(?:\n|$)/,
      /【(.+?)】.*案件/
    ],
    budget: [
      /予算[：:]\s*(\d{1,3}(?:,\d{3})*)\s*円/,
      /(\d+)\s*万円/,
      /\$(\d+)/
    ]
  };
  
  const result = {};
  
  for (const [field, regexArray] of Object.entries(patterns)) {
    for (const regex of regexArray) {
      const match = body.match(regex);
      if (match) {
        result[field] = match[1];
        break;
      }
    }
  }
  
  return result;
}

文字コード問題への対処

Gmail API経由でメールを取得する際、一部の日本語テキストが文字化けすることがあります。getPlainBody() は UTF-8 で返されるため、追加の変換は不要な場合が多いですが、特殊文字が含まれる場合は明示的に正規化します。

function normalizeText(text) {
  // 全角スペースを半角スペースに統一
  text = text.replace(/　/g, ' ');
  
  // 全角括弧を半角に
  text = text.replace(/（/g, '(').replace(/）/g, ')');
  
  // 複数の改行を1つに統一
  text = text.replace(/\n\n+/g, '\n');
  
  return text.trim();
}

実装時のポイント

getPlainBody() で空文字列が返される場合は、メール形式の確認が必要です。Gmail API の getRawMessage() を使うと元の形式を取得できます
正規表現の先読み・後読みを使う場合、GAS では標準的な正規表現エンジンを使うため、複雑なパターンは動作確認が必須です
大量のメール処理時は Apps Script の実行時間制限（6分）に注意し、バッチサイズを制限してください
正規表現はメール形式の変更に脆弱になるため、同じ送信元から複数形式が来ないか定期的に確認するとよいです

参照: https://note.com/large_yarrow1156/n/nbbeaa3e6e1c8

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up