Python × Dify × RAGで学ぶ業務システム開発入門【第7回 Excel業務自動化編】

Last updated at 2026-06-20Posted at 2026-06-20

はじめに

前回は、複数テーブルのリレーション・JOIN・集計クエリを使った研修管理を実装しました。

前回の記事：Python × Dify × RAGで学ぶ業務システム開発入門【第6回研修管理編】

第7回では、Excel業務の自動化を実装します。

人事・研修業務で最も時間を取られる作業の1つが「Excelとの格闘」です。

現状の課題
  → 社員台帳Excelを手作業でシステムに入力している
  → 研修結果Excelから1行ずつコピペしている
  → 評価シートを1人ずつExcelで手作業作成している

本回で実現すること
  → Excelアップロード → pandas で読込 → DBに一括登録
  → DBのデータ → openpyxl で整形 → Excelファイルとしてダウンロード

本連載は「作って終わり」ではなく、なぜそう設計するのかを重視しています。
設計の意図を理解することで、他のシステムにも応用できる力が身につきます。

本記事で学ぶこと

項目	内容
pandas でのExcel読込	`pd.read_excel` でデータをDataFrameに変換する
データのバリデーション	Excel内のデータに不整合がないか検証する
DBへの一括登録	DataFrameのレコードをループでINSERTする
openpyxl での書き出し	DBのデータをExcelファイルとして出力する
ファイルアップロード	Flaskでファイルを受け取り安全に保存する
ファイルダウンロード	生成したExcelをブラウザからダウンロードさせる

pandas と openpyxl の役割分担

Excel処理にはpandasとopenpyxlの2つのライブラリを使います。
それぞれ得意分野が異なります。

ライブラリ	得意なこと	使う場面
pandas	データの読込・変換・集計	Excelインポート（読込→DB登録）
openpyxl	セルの書式設定・レイアウト調整	Excel出力（DB→帳票作成）

Excelインポート（取込）
  Excel → pandas.read_excel → DataFrame → DB登録
    pandasが得意：列名マッピング・型変換・欠損値処理

Excelエクスポート（出力）
  DB → データ取得 → openpyxl で整形 → Excelダウンロード
    openpyxlが得意：ヘッダー装飾・列幅調整・罫線・シート名設定

Excelインポートの全体フロー

利用者がExcelファイルをアップロード
  ↓
Flask がファイルを受け取り、uploads/ に保存
  ↓
拡張子チェック（.xlsx / .csv のみ許可）
  ↓
pandas でExcelを読み込み → DataFrame
  ↓
列名のマッピング（日本語ヘッダー → 英語カラム名）
  ↓
行ごとにバリデーション
  ↓ エラーあり
エラー行と理由を一覧表示。DB登録はしない
  ↓ エラーなし
DBにINSERT（重複はスキップ）
  ↓
結果メッセージ表示（登録件数 / スキップ件数）

実装：excel_service.py（インポート）

# services/excel_service.py
import os
import pandas as pd
from openpyxl import Workbook
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from config import Config
from database import get_db


def allowed_file(filename: str) -> bool:
    """アップロードファイルの拡張子チェック"""
    return (
        "." in filename
        and filename.rsplit(".", 1)[1].lower() in Config.ALLOWED_EXTENSIONS
    )


# ============================================================
# 社員台帳インポート
# ============================================================

def import_employee_excel(file_path: str) -> dict:
    """
    社員台帳Excelを読み込んでDBに一括登録する

    想定するExcelの列構成:
        A列: 社員番号
        B列: 氏名
        C列: 部署
        D列: 役職
        E列: 入社日

    Returns:
        {
            "success_count": 登録成功件数,
            "skip_count":    スキップ件数（重複など）,
            "errors":        [エラーメッセージのリスト]
        }
    """
    result = {"success_count": 0, "skip_count": 0, "errors": []}

    try:
        # pandasでExcel読込
        df = pd.read_excel(file_path, header=0)
    except Exception as e:
        result["errors"].append(f"Excelファイルの読込に失敗しました: {str(e)}")
        return result

    # 列数チェック
    if len(df.columns) < 5:
        result["errors"].append(
            f"列数が不足しています（5列必要ですが{len(df.columns)}列しかありません）"
        )
        return result

    # 列名を英語にマッピング
    df.columns = ["employee_no", "name", "department", "position", "hire_date"]

    # 入社日を文字列に変換（NaT対策）
    df["hire_date"] = pd.to_datetime(df["hire_date"], errors="coerce")
    df["hire_date"] = df["hire_date"].apply(
        lambda x: x.strftime("%Y-%m-%d") if pd.notna(x) else None
    )

    conn = get_db()

    for idx, row in df.iterrows():
        row_num = idx + 2  # Excelの行番号（ヘッダー=1行目）

        # 必須チェック
        employee_no = str(row["employee_no"]).strip() if pd.notna(row["employee_no"]) else ""
        name = str(row["name"]).strip() if pd.notna(row["name"]) else ""

        if not employee_no:
            result["errors"].append(f"{row_num}行目: 社員番号が空です")
            continue
        if not name:
            result["errors"].append(f"{row_num}行目: 氏名が空です")
            continue

        # 重複チェック
        existing = conn.execute(
            "SELECT id FROM employee WHERE employee_no = ?",
            (employee_no,)
        ).fetchone()
        if existing:
            result["skip_count"] += 1
            continue

        # DB登録
        conn.execute(
            """
            INSERT INTO employee (employee_no, name, department, position, hire_date)
            VALUES (?, ?, ?, ?, ?)
            """,
            (
                employee_no,
                name,
                str(row["department"]).strip() if pd.notna(row["department"]) else None,
                str(row["position"]).strip() if pd.notna(row["position"]) else None,
                row["hire_date"],
            )
        )
        result["success_count"] += 1

    conn.commit()
    conn.close()
    return result

# ============================================================
# 研修結果インポート
# ============================================================

def import_training_excel(file_path: str, training_id: int) -> dict:
    """
    研修結果Excelを読み込んでDBに一括登録する

    想定するExcelの列構成:
        A列: 社員番号
        B列: 氏名（参考表示用・DB登録には使わない）
        C列: 出席率
        D列: 理解度
        E列: 課題スコア

    Returns:
        {
            "success_count": 登録成功件数,
            "skip_count":    スキップ件数,
            "errors":        [エラーメッセージのリスト]
        }
    """
    result = {"success_count": 0, "skip_count": 0, "errors": []}

    try:
        df = pd.read_excel(file_path, header=0)
    except Exception as e:
        result["errors"].append(f"Excelファイルの読込に失敗しました: {str(e)}")
        return result

    if len(df.columns) < 5:
        result["errors"].append(
            f"列数が不足しています（5列必要ですが{len(df.columns)}列しかありません）"
        )
        return result

    df.columns = ["employee_no", "name", "attendance_rate",
                  "understanding_level", "report_score"]

    conn = get_db()

    for idx, row in df.iterrows():
        row_num = idx + 2

        employee_no = str(row["employee_no"]).strip() if pd.notna(row["employee_no"]) else ""
        if not employee_no:
            result["errors"].append(f"{row_num}行目: 社員番号が空です")
            continue

        # 社員IDを取得
        emp = conn.execute(
            "SELECT id FROM employee WHERE employee_no = ?",
            (employee_no,)
        ).fetchone()
        if emp is None:
            result["errors"].append(
                f"{row_num}行目: 社員番号 {employee_no} はシステムに登録されていません"
            )
            continue

        employee_id = emp["id"]

        # 数値バリデーション
        try:
            attendance = float(row["attendance_rate"]) if pd.notna(row["attendance_rate"]) else 0.0
            understanding = int(row["understanding_level"]) if pd.notna(row["understanding_level"]) else 0
            report = int(row["report_score"]) if pd.notna(row["report_score"]) else 0
        except (ValueError, TypeError):
            result["errors"].append(f"{row_num}行目: 数値が不正です")
            continue

        # 既存レコードがあれば更新、なければ登録
        existing = conn.execute(
            """
            SELECT id FROM training_history
            WHERE employee_id = ? AND training_id = ?
            """,
            (employee_id, training_id)
        ).fetchone()

        if existing:
            conn.execute(
                """
                UPDATE training_history
                SET attendance_rate = ?, understanding_level = ?, report_score = ?
                WHERE id = ?
                """,
                (attendance, understanding, report, existing["id"])
            )
        else:
            conn.execute(
                """
                INSERT INTO training_history
                    (employee_id, training_id, attendance_rate,
                     understanding_level, report_score)
                VALUES (?, ?, ?, ?, ?)
                """,
                (employee_id, training_id, attendance, understanding, report)
            )

        result["success_count"] += 1

    conn.commit()
    conn.close()
    return result

実装：excel_service.py（エクスポート）

# ============================================================
# 評価シート出力
# ============================================================

def export_evaluation_excel(training_id: int) -> str:
    """
    研修の評価データをExcelファイルに出力する

    出力内容:
        社員番号 | 氏名 | 部署 | 出席率 | 理解度 | 課題スコア | 総合評価

    Returns:
        生成したExcelファイルのパス
    """
    conn = get_db()
    rows = conn.execute(
        """
        SELECT
            e.employee_no,
            e.name,
            e.department,
            th.attendance_rate,
            th.understanding_level,
            th.report_score
        FROM training_history th
        JOIN employee e ON th.employee_id = e.id
        WHERE th.training_id = ?
        ORDER BY e.employee_no
        """,
        (training_id,)
    ).fetchall()

    training = conn.execute(
        "SELECT training_name FROM training WHERE id = ?",
        (training_id,)
    ).fetchone()
    conn.close()

    training_name = training["training_name"] if training else "研修"

    # openpyxl でワークブック作成
    wb = Workbook()
    ws = wb.active
    ws.title = "評価シート"

    # --- ヘッダーのスタイル定義 ---
    header_font = Font(bold=True, size=11)
    header_fill = PatternFill(
        start_color="4472C4", end_color="4472C4", fill_type="solid"
    )
    header_font_white = Font(bold=True, size=11, color="FFFFFF")
    thin_border = Border(
        left=Side(style="thin"),
        right=Side(style="thin"),
        top=Side(style="thin"),
        bottom=Side(style="thin")
    )

    # --- タイトル行 ---
    ws.merge_cells("A1:G1")
    ws["A1"] = f"評価シート：{training_name}"
    ws["A1"].font = Font(bold=True, size=14)

    # --- ヘッダー行 ---
    headers = ["社員番号", "氏名", "部署", "出席率(%)",
               "理解度", "課題スコア", "総合評価"]
    for col, header in enumerate(headers, start=1):
        cell = ws.cell(row=3, column=col, value=header)
        cell.font = header_font_white
        cell.fill = header_fill
        cell.border = thin_border
        cell.alignment = Alignment(horizontal="center")

    # --- データ行 ---
    for row_idx, row in enumerate(rows, start=4):
        data = dict(row)
        # 総合評価 = (出席率 + 理解度 + 課題スコア) / 3
        total = round(
            (data["attendance_rate"] + data["understanding_level"]
             + data["report_score"]) / 3, 1
        )

        values = [
            data["employee_no"],
            data["name"],
            data["department"] or "―",
            data["attendance_rate"],
            data["understanding_level"],
            data["report_score"],
            total,
        ]
        for col, value in enumerate(values, start=1):
            cell = ws.cell(row=row_idx, column=col, value=value)
            cell.border = thin_border
            if col >= 4:  # 数値列は右寄せ
                cell.alignment = Alignment(horizontal="right")

    # --- 列幅調整 ---
    column_widths = [12, 14, 12, 12, 10, 12, 10]
    for col, width in enumerate(column_widths, start=1):
        ws.column_dimensions[chr(64 + col)].width = width

    # ファイル保存
    output_path = os.path.join(
        Config.OUTPUT_FOLDER,
        f"evaluation_{training_id}.xlsx"
    )
    wb.save(output_path)
    return output_path

実装：app.py（インポート・エクスポートルーティング）

# app.py（Excel インポート・エクスポート部分）
import os
from flask import send_file
from werkzeug.utils import secure_filename
from services.excel_service import (
    allowed_file, import_employee_excel,
    import_training_excel, export_evaluation_excel
)
from services.training_service import get_all_trainings


@app.route("/import", methods=["GET", "POST"])
@login_required
@roles_required("admin", "staff")
def import_excel():
    """Excelインポート画面"""
    trainings = get_all_trainings()  # 研修結果インポート用のプルダウン

    if request.method == "POST":
        # ファイルの存在チェック
        if "file" not in request.files:
            flash("ファイルが選択されていません", "error")
            return render_template("import.html", trainings=trainings)

        file = request.files["file"]
        if file.filename == "":
            flash("ファイルが選択されていません", "error")
            return render_template("import.html", trainings=trainings)

        # 拡張子チェック
        if not allowed_file(file.filename):
            flash("許可されていないファイル形式です（.xlsx または .csv のみ）", "error")
            return render_template("import.html", trainings=trainings)

        # ファイルを安全に保存
        filename = secure_filename(file.filename)
        file_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
        file.save(file_path)

        # インポート種別に応じて処理を分岐
        import_type = request.form.get("import_type", "employee")

        if import_type == "employee":
            result = import_employee_excel(file_path)
        elif import_type == "training":
            training_id = request.form.get("training_id")
            if not training_id:
                flash("研修を選択してください", "error")
                return render_template("import.html", trainings=trainings)
            result = import_training_excel(file_path, int(training_id))
        else:
            flash("不正なインポート種別です", "error")
            return render_template("import.html", trainings=trainings)

        # アップロードファイルを削除（一時ファイルのため）
        os.remove(file_path)

        # 結果メッセージ
        if result["errors"]:
            for err in result["errors"]:
                flash(err, "error")
        if result["success_count"] > 0:
            flash(f"{result['success_count']}件を登録しました", "success")
        if result["skip_count"] > 0:
            flash(f"{result['skip_count']}件をスキップしました（重複）", "info")

        return render_template("import.html", trainings=trainings)

    return render_template("import.html", trainings=trainings)


@app.route("/export/<int:training_id>")
@login_required
@roles_required("admin", "staff")
def export_excel(training_id):
    """評価シートExcelダウンロード"""
    output_path = export_evaluation_excel(training_id)
    return send_file(
        output_path,
        as_attachment=True,
        download_name=f"evaluation_{training_id}.xlsx",
        mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
    )

secure_filename はファイル名から危険な文字（../ など）を除去します。
ユーザーが送信したファイル名をそのまま使うと、ディレクトリトラバーサル攻撃を受ける可能性があります。

実装：テンプレート（import.html）

{% extends "base.html" %}
{% block title %}Excelインポート - AI人事・研修システム{% endblock %}

{% block content %}
<div class="page-header">
  <h1>📥 Excelインポート</h1>
</div>

<div class="form-card">
  <form method="POST" enctype="multipart/form-data">

    <!-- インポート種別 -->
    <div class="form-group">
      <label>インポート種別 <span class="required">*</span></label>
      <div class="radio-group">
        <label>
          <input type="radio" name="import_type" value="employee" checked
                 onchange="document.getElementById('training-select').style.display='none'">
          社員台帳
        </label>
        <label>
          <input type="radio" name="import_type" value="training"
                 onchange="document.getElementById('training-select').style.display='block'">
          研修結果
        </label>
      </div>
    </div>

    <!-- 研修選択（研修結果の場合のみ表示） -->
    <div class="form-group" id="training-select" style="display:none;">
      <label for="training_id">対象研修</label>
      <select name="training_id" id="training_id">
        <option value="">-- 研修を選択 --</option>
        {% for tr in trainings %}
          <option value="{{ tr.id }}">{{ tr.training_name }}</option>
        {% endfor %}
      </select>
    </div>

    <!-- ファイル選択 -->
    <div class="form-group">
      <label for="file">Excelファイル <span class="required">*</span></label>
      <input type="file" id="file" name="file"
             accept=".xlsx,.csv" required>
      <small class="text-muted">
        対応形式：.xlsx / .csv（最大16MB）
      </small>
    </div>

    <button type="submit" class="btn btn-primary">インポート実行</button>
  </form>
</div>

<!-- Excelフォーマット説明 -->
<div class="info-card">
  <h3>📋 社員台帳Excelのフォーマット</h3>
  <table class="table table-compact">
    <thead>
      <tr><th>A列</th><th>B列</th><th>C列</th><th>D列</th><th>E列</th></tr>
    </thead>
    <tbody>
      <tr><td>社員番号</td><td>氏名</td><td>部署</td><td>役職</td><td>入社日</td></tr>
      <tr><td>EMP001</td><td>山田太郎</td><td>人事部</td><td>主任</td><td>2020/4/1</td></tr>
    </tbody>
  </table>

  <h3>📋 研修結果Excelのフォーマット</h3>
  <table class="table table-compact">
    <thead>
      <tr><th>A列</th><th>B列</th><th>C列</th><th>D列</th><th>E列</th></tr>
    </thead>
    <tbody>
      <tr><td>社員番号</td><td>氏名</td><td>出席率</td><td>理解度</td><td>課題スコア</td></tr>
      <tr><td>EMP001</td><td>山田太郎</td><td>90</td><td>85</td><td>95</td></tr>
    </tbody>
  </table>
</div>
{% endblock %}

ファイルアップロードのセキュリティ

ファイルアップロード機能にはセキュリティリスクが伴います。
本システムでは以下の対策を行っています。

対策	実装方法	防ぐ攻撃
拡張子チェック	`allowed_file()` で `.xlsx` / `.csv` のみ許可	悪意あるファイルの実行
ファイル名のサニタイズ	`secure_filename()` で危険な文字を除去	ディレクトリトラバーサル
ファイルサイズ制限	`MAX_CONTENT_LENGTH = 16MB`	DoS攻撃（巨大ファイル送信）
一時ファイルの削除	処理完了後に `os.remove()`	ディスク圧迫・情報漏洩

enctype="multipart/form-data" をフォームタグに付け忘れると、ファイルが送信されません。
ファイルアップロードフォームでは必ず設定してください。

動作確認

手順

# Flaskアプリを起動
python app.py

# ブラウザで http://localhost:5000/import にアクセス

テスト用Excelファイルの作成

動作確認用のExcelファイルを用意します。

# create_test_excel.py
"""テスト用Excelファイルを作成するスクリプト"""
import pandas as pd

# 社員台帳テスト
employee_data = {
    "社員番号": ["EMP001", "EMP002", "EMP003"],
    "氏名": ["山田太郎", "鈴木花子", "田中一郎"],
    "部署": ["人事部", "研修部", "営業部"],
    "役職": ["主任", "担当", "係長"],
    "入社日": ["2020-04-01", "2021-04-01", "2019-04-01"],
}
df_emp = pd.DataFrame(employee_data)
df_emp.to_excel("test_employee.xlsx", index=False)
print("test_employee.xlsx を作成しました")

# 研修結果テスト
training_data = {
    "社員番号": ["EMP001", "EMP002", "EMP003"],
    "氏名": ["山田太郎", "鈴木花子", "田中一郎"],
    "出席率": [100, 90, 85],
    "理解度": [85, 70, 75],
    "課題スコア": [95, 80, 70],
}
df_tr = pd.DataFrame(training_data)
df_tr.to_excel("test_training_result.xlsx", index=False)
print("test_training_result.xlsx を作成しました")

確認ポイント

操作	期待する動作
社員台帳Excelをアップロード	「3件を登録しました」と表示される
同じExcelを再アップロード	「3件をスキップしました（重複）」と表示される
社員番号が空の行を含むExcel	「X行目: 社員番号が空です」エラーが表示される
.txt ファイルをアップロード	「許可されていないファイル形式です」と表示される
研修結果Excelをアップロード	受講履歴が登録・更新される
評価シートExcel出力リンクを押す	Excelがダウンロードされる
ダウンロードしたExcelを開く	ヘッダーに色・罫線がついた帳票が表示される

今後の連載予定

回	タイトル	主な内容
第1回	業務システム全体設計編	システム概要・技術選定・アーキテクチャ
第2回	要求定義・要件定義編	業務分析・機能要件・非機能要件
第3回	ER図・画面設計編	テーブル設計・画面遷移図・ワイヤーフレーム
第4回	Flaskログイン機能編	セッション・パスワードハッシュ・認証ミドルウェア
第5回	社員管理CRUD編	一覧・登録・更新・削除・バリデーション
第6回	研修管理編	リレーション・集計・出席率自動計算
第7回	Excel業務自動化編（本記事）	pandas取込・openpyxl出力・テンプレート活用
第8回	Dify API連携編	プロンプト設計・API呼び出し・エラーハンドリング
第9回	RAG構築編	Knowledgeへの登録・Embedding・検索精度改善
第10回	FAQチャットボット編	チャットUI・会話履歴・ストリーミングレスポンス
第11回	GitHubチーム開発編	ブランチ戦略・Pull Request・コードレビュー
第12回	テスト・発表編	pytest・テスト設計・デモ・振り返り

おわりに

第7回では、pandasによるExcelインポートとopenpyxlによるExcelエクスポートを実装しました。

ポイントを振り返ります。

pandas読込：pd.read_excel でDataFrameに変換し、列名マッピング・型変換を行った
行単位のバリデーション：必須チェック・重複チェック・数値チェックを行単位で実施し、エラー行を特定した
UPSERT処理：研修結果インポートでは既存データがあれば更新、なければ新規登録するパターンを実装した
openpyxl出力：ヘッダー装飾・罫線・列幅調整を施した実務レベルの帳票を生成した
セキュリティ：拡張子チェック・ファイル名サニタイズ・サイズ制限・一時ファイル削除を実施した

次回は**「Dify API連携編」**として、受講者の評価データをDify APIに送信し、AI評価コメントを自動生成する機能を実装します。

次回 : Python × Dify × RAGで学ぶ業務システム開発入門【第8回 Dify API連携編】

参考リンク

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up