10
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

有価証券報告書をpythonで取得してみる2024 金融庁API版

Last updated at Posted at 2019-07-07

はじめに

僕のIT経歴の大半は金融業種です。ずーっと課題だよなぁって思ってたのが「プライベートの成果物がない」ということでした。あたりまえにNDA(秘密保持契約)を結ぶからね、持ち帰ることができないし、俺のソースです!って言うこともできない。大変だよね、家に帰ってからもプログラム作るなんてさ。で、それでも最近やっぱり動かないとって思って、何を作ろうかってずーっと考えていて、でも思いつかなくての繰り返しで、よしできることからやろう!ってことで、公開されてる企業の金融データを取り扱うところから始めてみようって思っていろいろたぐってたら、有価証券報告書を取得するみたいのがあって、あー有報なら取り扱ったことあるぞって思ったんだけどなんかRで書いてあって、で、よく見たらなんか垂れ幕下がっていて「そのままのコピペでは動かない筈です。改変箇所は考えてください」とか書いてあるんです。もうね、チッキショー:muscle::fire:って小梅太夫みたいに叫んで変な気起こしたわけです(いいカンフル剤って意味で)。

最初にこれを書いたときよりもだいぶスキルがついたのでAPI版として仕上げていく(原形をとどめないDDD)。自分のportfolioに「接ぎ木する感じ」で書いていくので、有価証券報告書の機能だけにすればもっとスリムにできるけどそこは読んでるひとたちで読み解いてね

https://github.com/duri0214/portfolio

参考

この本を買おうとした

いつもの僕なら即ポチってるんだけど、ここでも変な気を起こして、本を見て眼の前のソースができあがるなら、このソースを「翻訳」したほうが経験値的にもお金のかからなさ的にも合理的だよなって思った。Rを読む力、Pythonに変換する力、得られたものは思っていた以上に大きかったな。いわゆる教科書的な本を読みながらお行儀の良いプログラムを作るよりも、誰かが作り捨てた(?)プログラムを翻訳することで得られる経験値はすごく味がある。

app 作成

console
portfolio> python manage.py startapp securities
config/settings.py
INSTALLED_APPS = [
        :
+   "securities",
]
config/urls.py
urlpatterns = [
        :
+   path("securities/", include("securities.urls")),
    path("admin/", admin.site.urls),
    path("accounts/", include("django.contrib.auth.urls")),
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
securities/urls.py(new)
from django.urls import path

from securities.views import IndexView

app_name = "securities"
urlpatterns = [
    path("", IndexView.as_view(), name="index"),
]

securities/views.py(全消しして上書き)
from django.shortcuts import render
from django.views.generic import TemplateView

from securities.domain.service.xbrl import XbrlService


class IndexView(TemplateView):
    template_name = "securities/report/index.html"

    def get(self, request, *args, **kwargs):
        xbrl_service = XbrlService()

        return render(request, self.template_name, xbrl_service.to_dict())

securities/domain/__init__.py(new)
securities/domain/repository/__init__.py(new)
securities/domain/service/__init__.py(new)
securities/domain/valueobject/__init__.py(new)
securities/domain/service/xbrl.py
class XbrlService():

    def to_dict(self, **kwargs):
        pass

securities/templates/securities/base.html(new)
{% load static %}
<!DOCTYPE html>
<html lang="ja">
<head>
    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=UA-43097095-9"></script>
    <script>
        window.dataLayer = window.dataLayer || [];

        function gtag() {
            dataLayer.push(arguments);
        }

        gtag('js', new Date());
        gtag('config', 'UA-43097095-9');
    </script>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">

    <title>有価証券報告書ビューア</title>

    <!-- bootstrap and css -->
    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/css/bootstrap.min.css"
          integrity="sha384-GJzZqFGwb1QTTN6wy59ffF1BuGJpLSa9DkKMp0DgiMDm4iYMj70gZWKYbI706tWS" crossorigin="anonymous">
    <link rel="stylesheet" href="{% static 'securities/css/base.css' %}">

    <!-- favicon -->
    <link rel="shortcut icon" href="{% static 'securities/s_s.ico' %}">

    <!-- for ajax -->
    <script>let myurl = {"base": "{% url 'vnm:index' %}", "login": "{% url 'login' %}"};</script>
</head>
<body>
<h1></h1>
<header>
    <nav class="navbar fixed-top navbar-expand-lg navbar-light bg-light">
        <a class="navbar-brand" href="#">Henojiya</a>
        <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent"
                aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
            <span class="navbar-toggler-icon"></span>
        </button>
        <div class="collapse navbar-collapse" id="navbarSupportedContent">
            <ul class="navbar-nav mr-auto">
                {% if user.is_authenticated %}
                    <li class="nav-link">{{ user.username }}さん</li>
                {% else %}
                    <li class="nav-link">ゲストさん</li>
                {% endif %}
                {% if user.is_authenticated %}
                    <li class="nav-link"><a href="{% url 'logout' %}">LOGOUT</a></li>
                {% else %}
                    <li class="nav-link"><a href="{% url 'login' %}">LOGIN</a></li>
                {% endif %}
                <select class="select2-1" onChange="location.href=value;">
                    <option></option>
                    <option value="{% url 'vnm:index' %}" selected>VIETNAM</option>
                    <option value="{% url 'mrk:index' %}">GMARKER</option>
                    <option value="{% url 'shp:index' %}">SHOPPING</option>
                    <option value="{% url 'war:index' %}">WAREHOUSE</option>
                    <option value="{% url 'txo:index' %}">TAXONOMY</option>
                    <option value="{% url 'soil:home' %}">SOIL ANALYSIS</option>
                    <option value="{% url 'securities:index' %}">SECURITIES REPORT</option>
                </select>
            </ul>
            <form class="form-inline my-2 my-lg-0">
                <input class="form-control mr-sm-2" type="search" placeholder="alt + / で検索" aria-label="Search"
                       accesskey="/">
                <button class="btn btn-outline-success my-2 my-sm-0" type="submit">Search</button>
            </form>
        </div>
    </nav>
    {% block header %}{% endblock %}
</header>

<div id="main">
    {% block content %}{% endblock %}
</div>
<footer>
    <p>© 2019 henojiya. / <a href="https://github.com/duri0214" target="_blank">github portfolio</a></p>
</footer>

<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"
        integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
        crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js"
        integrity="sha384-wHAiFfRlMFy6i5SRaxvfOCifBUQy1xHdJ/yoi7FRNXMRBu5WHdZYu1hA6ZOblgut"
        crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js"
        integrity="sha384-B0UglyR+jN6CkvvICOB2joaf5I4l3gm9GU6Hc1og6Ls7i6U/mkkaduKaBhlAXv9k"
        crossorigin="anonymous"></script>

<!-- for select2 -->
<link href="https://cdn.jsdelivr.net/npm/select2@4.1.0-rc.0/dist/css/select2.min.css" rel="stylesheet"/>
<script src="https://cdn.jsdelivr.net/npm/select2@4.1.0-rc.0/dist/js/select2.min.js"></script>
<script>
    $(function () {
        $('.select2-1').select2({
            // コントロールのプレースホルダを指定します。
            placeholder: 'Please Select',
        });
    });
</script>
<link rel="stylesheet" href="{% static 'securities/css/index_select2.css' %}">
</body>
</html>

securities/templates/securities/report/index.html
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
    <div class="jumbotron">
        <h1 class="display-4">Let's analyze Securities Report!</h1>
        <p class="lead">it's interesting Securities Report</p>
        <hr class="my-4">
        <p>You can read the Securities Report</p>
    </div>

    <div class="container">
        <p>hello world</p>
    </div>
{% endblock %}

securities/static/securities/css/base.css
body {
    padding-top: 48px;
}

.footer {
    position: sticky;
    margin-top: 20px;
    bottom: 0;
    width: 100%;
    /* Set the fixed height of the footer here */
    height: 30px;
    background-color: #f5f5f5;
}

body > .container {
    padding: 60px 15px 0;
}

.footer > .container {
    padding-right: 15px;
    padding-left: 15px;
}

Company(企業マスタ)を作る

アップロードの仕組みを作る

Edinet有報検索画面みぎうえにあるリンクから EDINETコードリスト を手でダウンロードする。それをアップロードするための仕組みを作ろう
image.png

image.png

まぁ金融の言い回しでいうと、いわゆる「顔ぶれ」ってやつだな
image.png

lib/zipfileservice.py
import zipfile
from pathlib import Path

from django.conf import settings
from django.core.files.uploadedfile import InMemoryUploadedFile


class ZipFileService:
    @staticmethod
    def handle_uploaded_zip(file: InMemoryUploadedFile, app_name: str) -> Path:
        """
        アップロードされたファイルを一時フォルダ media/{app_name} に保存
        Args:
            file: requestから受け取ったファイル
            app_name: アプリ名
        """
        # 解凍場所の用意
        upload_folder = Path(settings.MEDIA_ROOT) / app_name
        upload_folder.mkdir(parents=True, exist_ok=True)

        # ファイルを保存
        destination_zip_path = upload_folder / "uploaded.zip"
        with destination_zip_path.open("wb+") as z:
            for chunk in file.chunks():
                z.write(chunk)

        # ファイルを解凍
        with zipfile.ZipFile(destination_zip_path) as z:
            for info in z.infolist():
                info.filename = ZipFileService._convert_to_cp932(info.filename)
                z.extract(info, path=str(upload_folder))

        return upload_folder

    @staticmethod
    def _convert_to_cp932(folder_name: str) -> str:
        """
        WindowsでZipファイルを作成すると、文字化けが起こるので対応

        See Also: https://qiita.com/tohka383/items/b72970b295cbc4baf5ab
        """
        return folder_name.encode("cp437").decode("cp932")

securities/domain/service/upload.py
import shutil
from pathlib import Path

from django.core.management import call_command

from lib.zipfileservice import ZipFileService


class UploadService:
    def __init__(self, request):
        self.request = request
        self.app_name = request.resolver_match.app_name

    def upload(self):
        upload_folder = ZipFileService.handle_uploaded_zip(
            self.request.FILES["file"], self.app_name
        )
        self.execute_command_and_cleanup(upload_folder)

    @staticmethod
    def execute_command_and_cleanup(upload_folder: Path):
        if upload_folder.exists():
            call_command("import_edinet_code", str(upload_folder))
            shutil.rmtree(upload_folder)

securities/views.py
from django.shortcuts import render
from django.urls import reverse_lazy
from django.views.generic import TemplateView, FormView

from securities.domain.service.upload import UploadService
from securities.forms import UploadForm


class IndexView(TemplateView):
    template_name = "securities/report/index.html"

    def get(self, request, *args, **kwargs):
        # xbrl_service = XbrlService()
        d = {}  # xbrl_service.to_dict()

        return render(request, self.template_name, d)


class EdinetCodeUploadView(FormView):
    template_name = "securities/edinet_code_upload/form.html"
    form_class = UploadForm
    success_url = reverse_lazy("securities:edinet_code_upload_success")

    def form_valid(self, form):
        service = UploadService(self.request)
        service.upload()
        return super().form_valid(form)


class EdinetCodeUploadSuccessView(TemplateView):
    template_name = "securities/edinet_code_upload/success.html"

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        # context["import_errors"] = SoilHardnessMeasurementImportErrors.objects.all()
        return context

securities/urls.py
from django.urls import path

from securities.views import (
    IndexView,
    EdinetCodeUploadView,
    EdinetCodeUploadSuccessView,
)

app_name = "securities"
urlpatterns = [
    path("", IndexView.as_view(), name="index"),
    path(
        "edinet_code_upload/upload",
        EdinetCodeUploadView.as_view(),
        name="edinet_code_upload",
    ),
    path(
        "edinet_code_upload/success",
        EdinetCodeUploadSuccessView.as_view(),
        name="edinet_code_upload_success",
    ),
]

securities/forms.py
from django import forms
from django.forms import ClearableFileInput


class UploadForm(forms.Form):
    file = forms.FileField(widget=ClearableFileInput(attrs={"class": "form-control"}))

securities/templates/securities/report/index.html
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
    <div class="jumbotron">
        <h1 class="display-4">Let's analyze Securities Report!</h1>
        <p class="lead">it's interesting Securities Report</p>
        <hr class="my-4">
        <p>You can read the Securities Report</p>
    </div>

    <div class="container">
        <p>hello world</p>
+       <a class="btn btn-outline-primary mb-3" href="{% url 'securities:edinet_code_upload' %}"
+          role="button">EDINETコードリスト取り込み</a>
    </div>
{% endblock %}

securities/templates/securities/edinet_code_upload/form.html
{% extends "securities/base.html" %}
{% load static %}
{% block header %}
    <nav style="--bs-breadcrumb-divider: '>';" aria-label="breadcrumb">
        <ol class="breadcrumb">
            <li class="breadcrumb-item"><a href="{% url 'securities:index' %}">Home</a></li>
            <li class="breadcrumb-item active" aria-current="page">Upload EDINETコードリスト</li>
        </ol>
    </nav>
{% endblock %}
{% block content %}
    <div class="container">
        <h1>EDINETコードリストのアップロード</h1>
        <p>EDINETコードリスト(Edinetcode_yyyymmdd.zip) を <a
                href="https://disclosure2.edinet-fsa.go.jp/weee0010.aspx#TXT_TITLE_CODE"
                target="_blank">ダウンロード</a>
            してアップロードしてください</p>
        <form method="post" enctype="multipart/form-data">
            {% csrf_token %}
            {{ form.as_p }}
            <button class="btn btn-outline-primary mb-3" type="submit">Upload</button>
        </form>
    </div>
{% endblock %}

securities/templates/securities/edinet_code_upload/success.html
{% extends "securities/base.html" %}
{% load static %}
{% block header %}
    <nav style="--bs-breadcrumb-divider: '>';" aria-label="breadcrumb">
        <ol class="breadcrumb">
            <li class="breadcrumb-item"><a href="{% url 'securities:index' %}">Home</a></li>
            <li class="breadcrumb-item"><a href="{% url 'securities:edinet_code_upload' %}">Upload
                EDINETコードリスト</a>
            </li>
            <li class="breadcrumb-item active" aria-current="page">Upload success</li>
        </ol>
    </nav>
{% endblock %}
{% block content %}
    <div class="container">
        <h1>Upload Successful</h1>
        <p>アップロードが完了しました!</p>
    </div>
{% endblock %}

確認

image.png
image.png
image.png

アップロードされたcsvを処理する

securities/models.py
from django.db import models


class Company(models.Model):
    edinet_code = models.CharField("EDINETコード", max_length=6, null=True)
    type_of_submitter = models.CharField("提出者種別", max_length=30, null=True)
    listing_status = models.CharField("上場区分", max_length=3, null=True)
    consolidated_status = models.CharField("連結の有無", max_length=1, null=True)
    capital = models.IntegerField("資本金", null=True)
    end_fiscal_year = models.CharField("決算日", max_length=6, null=True)
    submitter_name = models.CharField("提出者名", max_length=100, null=True)
    submitter_name_en = models.CharField("提出者名(英字)", max_length=100, null=True)
    submitter_name_kana = models.CharField(
        "提出者名(ヨミ)", max_length=100, null=True
    )
    address = models.CharField("所在地", max_length=255, null=True)
    submitter_industry = models.CharField("提出者業種", max_length=25, null=True)
    securities_code = models.CharField("証券コード", max_length=5, null=True)
    corporate_number = models.CharField("提出者法人番号", max_length=13, null=True)

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)
    
console
python manage.py makemigrations securities
python manage.py migrate
securities/management/commands/import_edinet_code.py
from pathlib import Path

import pandas as pd
from django.core.management.base import BaseCommand

from securities.models import Company


def na(value):
    return value if pd.notna(value) else None


class Command(BaseCommand):
    help = "Import edinet code upload from CSV"

    def add_arguments(self, parser):
        parser.add_argument(
            "folder_path", type=str, help="Folder path containing CSV file"
        )

    def handle(self, *args, **options):
        folder_path = options["folder_path"]
        filename = "EdinetcodeDlInfo.csv"
        file_path = Path(folder_path) / filename
        if not file_path.exists():
            raise FileNotFoundError(f"File does not exist: {file_path}")
        Company.objects.all().delete()

        # Note: 最初の行には `ダウンロード実行日...` のようなメタデータが入っているのでskip
        df = pd.read_csv(
            file_path,
            skiprows=1,
            encoding="cp932",
            dtype={
                "連結の有無": str,
                "決算日": str,
                "証券コード": str,
                "提出者法人番号": str,
            },
        )
        # 3行目以降のデータを保存
        edinet_list = []
        for _, row in df.iterrows():
            edinet_list.append(
                Company(
                    edinet_code=na(row["EDINETコード"]),
                    type_of_submitter=na(row["提出者種別"]),
                    listing_status=na(row["上場区分"]),
                    consolidated_status=na(row["連結の有無"]),
                    capital=(int(row["資本金"]) if pd.notna(row["資本金"]) else None),
                    end_fiscal_year=na(row["決算日"]),
                    submitter_name=na(row["提出者名"]),
                    submitter_name_en=na(row["提出者名(英字)"]),
                    submitter_name_kana=na(row["提出者名(ヨミ)"]),
                    address=na(row["所在地"]),
                    submitter_industry=na(row["提出者業種"]),
                    securities_code=na(row["証券コード"]),
                    corporate_number=na(row["提出者法人番号"]),
                )
            )
        Company.objects.bulk_create(edinet_list)

        self.stdout.write(
            self.style.SUCCESS("Successfully imported all edinet code from CSV")
        )

xbrlのダウンロード

  • EDINET API(Version2)が2024年4月1日から利用が開始されています
  • CSVファイルでの提供機能追加などで便利になりましたが、使い方はVesion1とほぼ同じです
  • 認証登録が必要です
  • start_dateは、プログラム実行日から5年前までの日付を指定することができます
  • paramsで "type": 2 を指定しているのは、有価証券報告書を示すため
  • resのresultsで提出書類一覧がリスト管理されているので、resultsを使ってループ処理を行います
  • resultsの提出書類ごとに、ordinanceCode(府令コード)と form_code(様式コード)を取得します
    • 有価証券報告書を対象とし ordinanceCode010form_code030000 の提出書類のみ処理を行う
  • 有価証券報告書のvalueobjectは securities_report とする

APIキーを手に入れる

だ..ダウンロードか...
image.png

  • chromeのポップアップホワイトリストに https://api.edinet-fsa.go.jp を追加
  • 登録画面 にいく
    • 今すぐサインアップ
    • eメールアドレスを入力する
    • 確認コードを入力
    • パスワードを設定する(chromeのパスワードジェネレータで作成)
    • 多要素認証するため電話番号を入力
    • 確認コードを入力
  • 新ウィンドウでポップアップがでる
    • 所属、氏名、電話番号(ハイフンなし)を連絡先として save
    • APIキーを .env に控える
    • ポップアップを閉じる

image.png

は?登録したら画面が白いんだけど!ってときはポップアップブロックが効いている...

APIのエンドポイント

https://api.edinet-fsa.go.jp/api/v2/documents.json

例: https://api.edinet-fsa.go.jp/api/v2/documents.json?date=2023-04-01&type=2&Subscription-Key=ZZZ…ZZZ

リクエストパラメータ

書類一覧API

パラメータ名 項目名 必須 設定値 説明
date ファイル日付 YYYY-MM-DD 出力対象とする提出書類一覧のファイル日付を指定します(10年間 < x < 本日)
type 取得情報 1 or 2 1: メタデータのみ, 2: 提出書類一覧及びメタデータ
SubscriptionKey APIキー API キー EDINET API の認証に利用します

書類取得API

パラメータ名 項目名 必須 設定値 説明
type 必要書類 1 or 2 or 3 or 4 or 5 1: 提出本文書、監査報告書およびxbrl, 2: PDF, 3: 代替書面・添付文書, 4: 英文ファイル, 5: CSV
SubscriptionKey APIキー API キー EDINET API の認証に利用します

インターフェース仕様

マニュアルpdfの表をエクセルに持ってくると崩れるから手で整えてマークダウン化して地味につらかったわww

項目名(大) 項目名(中) 項目名(小) 項目ID 文字種(桁数) 説明
1 メタデータ metadata object -(-) メタデータの識別子です。
2 タイトル title string 全半角(18) APIの名称が出力されます。
3 パラメータ parameter object -(-) リクエストパラメータの識別子です。
4 ファイル日付 date string 半角(10) 指定したファイル日付が出力されます。YYYY-MM-DD形式
5 取得情報 type string 半角(1) 指定した取得情報が出力されます。
6 結果セット resultset object -(-) 結果セットの識別子です。
7 件数 count number 半角(5以下) 指定したファイル日付における提出書類一覧の更新時間が出力されます。
8 書類一覧更新日時 processDateTime string 半角(16) 提出書類一覧の内容に変更がない場合でも書類一覧更新日時は更新されます。YYYY-MM-DD hh:mm 形式
9 ステータス status string 半角(3) 「3-3 ステータスコード」に記載のステータスが出力されます(リクエスト成功時は「200」)。
10 メッセージ message string 半角(21以下) 「3-3 ステータスコード」に記載のメッセージが出力されます(リクエスト成功時は「OK」)。
11 提出書類一覧 results array -(-) 提出書類一覧の識別子です。
- 提出書類(繰り返し) - object -(-) -
12 連番 seqNumber number 半角(5以下) ファイル日付ごとの連番です。詳細は「注意の連番について」を参照してください。
13 書類管理番号(*1) docID string 半角(8) 書類管理番号が出力されます。
14 提出者 EDINET コード(*1)(*2) edinetCode string 半角(6) 提出者のEDINETコードが出力されます。
15 提出者証券コード(*2) secCode string 半角(5) 提出者の証券コードが出力されます。
16 提出者法人番号(*2) JCN string 半角(13) 提出者の法人番号が出力されます。
17 提出者名(*2) filerName string 全角(128以下) 提出者の名前が出力されます。
18 ファンドコード(*1) fundCode string 半角(6) ファンドコードが出力されます。
19 府令コード(*1) ordinanceCode string 半角(3) 府令コードが出力されます。
20 様式コード(*1) formCode string 半角(6) 様式コードが出力されます。
21 書類種別コード(*1) docTypeCode string 半角(3) 書類種別コードが出力されます。
22 期間(自)(*3) periodStart string 半角(10) 期間(自)が出力されます。(YYYY-MM-DD形式)
23 期間(至)(*3) periodEnd string 半角(10) 期間(至)が出力されます。(YYYY-MM-DD形式)
24 提出日時 submitDateTime string 半角(16) 提出日時が出力されます。(YYYY-MM-DD hh:mm形式)
25 提出書類概要 docDescription string 全半角(147以下) EDINETの閲覧サイトの書類検索結果画面において、「提出書類」欄に表示される文字列が出力されます。
26 発行会社EDINETコード(*1)(*2) issuerEdinetCode string 半角(6) 大量保有について発行会社のEDINETコードが出力されます。
27 対象EDINETコード(*1)(*2) subjectEdinetCode string 半角(6) 公開買付けについて対象となるEDINETコードが出力されます。
28 子会社EDINETコード(*1)(*2) subsidiaryEdinetCode string 半角(69以下) 子会社のEDINETコードが出力されます。複数存在する場合(最大10個)、","(カンマ)で結合した文字列が出力されます。
29 臨報提出事由(*4) currentReportReason string 全半角(1000以下) 臨時報告書の提出事由が出力されます。複数存在する場合、","(カンマ)で結合した文字列が出力されます。
30 親書類管理番号(*1) parentDocID string 半角(8) 親書類管理番号が出力されます。
31 操作日時 opeDateTime string 半角(16) 「3-1-6 財務局職員による書類情報修正」、「3-1-7 財務局職員による書類の不開示」、磁気ディスク提出及び紙面提出を行った日時が出力されます。(YYYY-MM-DD hh:mm形式)
32 取下区分 withdrawalStatus string 半角(1) 取下書は"1"、取り下げられた書類は"2"、それ以外は"0"が出力されます。3-1-5 書類の取下げ
33 書類情報修正区分 docInfoEditStatus string 半角(1) 財務局職員が書類を修正した情報は"1"、修正された書類は"2"、それ以外は"0"が出力されます。3-1-6 財務局職員による書類情報修正
34 開示不開示区分 disclosureStatus string 半角(1) 財務局職員によって書類の不開示を開始した情報は"1"、不開示とされている書類は"2"、財務局職員によって書類の不開示を解除した情報は3、それ以外は"0"が出力されます。3-1-7 財務局職員による書類の不開示
35 XBRL有無フラグ xbrlFlag string 半角(1) 書類にXBRLがある場合は"1"、それ以外は"0"が出力されます。
36 PDF有無フラグ(*5) pdfFlag string 半角(1) 書類にPDFがある場合は"1"、それ以外は"0"が出力されます。
37 代替書面・添付文書有無フラグ attachDocFlag string 半角(1) 書類に代替書面・添付文書がある場合は"1"、それ以外は"0"が出力されます。
38 英文ファイル有無フラグ englishDocFlag string 半角(1) 書類に英文ファイルがある場合は1、それ以外は"0"が出力されます。
39 CSV有無フラグ csvFlag string 半角(1) 書類にCSVファイルがある場合は1、それ以外は"0"が出力されます。
40 縦覧区分 legalStatus string 半角(1) 1:縦覧中, 2:延長期間中(法定縦覧期間満了書類だが引き続き閲覧可能。), 0:閲覧期間満了(縦覧期間満了かつ延長期間なし、延長期間満了又は取下げにより閲覧できないもの。なお、不開示は含まない。)1-2-2 EDINET API で取得対象となるデータの範囲

ダウンロード処理

securities/domain/service/xbrl.py
+ import datetime
+ import logging
+ import requests
    :
+ from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
+ SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1
    :
class XbrlService:
    :
+   # ここから下を追加
    @staticmethod
    def _make_doc_id_list(request_data: RequestData) -> list[str]:
        def _process_results_data(results: list) -> list[str]:
            """
            有価証券報告書: ordinanceCode == "010" and formCode =="030000"
            訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
            """
            doc_id_list = []
            for result in results:
                if result.ordinance_code == "010" and result.form_code == "030000":
                    doc_id_list.append(result.doc_id)
            return doc_id_list

        securities_report_doc_list = []
        for day in request_data.day_list:
            url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
            params = {
                "date": day,
                "type": request_data.SECURITIES_REPORT_AND_META_DATA,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            res = requests.get(url, params=params)
            res.raise_for_status()
            response_data = ResponseData(res.json())
            securities_report_doc_list.extend(
                _process_results_data(response_data.results)
            )
        return securities_report_doc_list

    def _download_xbrl_in_zip(self, securities_report_doc_list):
        """
        params.type:
            1: 提出本文書、監査報告書およびxbrl
            2: PDF
            3: 代替書面・添付文書
            4: 英文ファイル
            5: CSV
        """
        denominator = len(securities_report_doc_list)
        for i, doc_id in enumerate(securities_report_doc_list):
            logging.info(f"{doc_id}: {i + 1}/{denominator}")
            url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{doc_id}"
            params = {
                "type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            filename = self.work_dir / f"{doc_id}.zip"
            res = requests.get(url, params=params, stream=True)

            if res.status_code == 200:
                with open(filename, "wb") as file:
                    for chunk in res.iter_content(chunk_size=1024):
                        file.write(chunk)
                        
    def download_xbrl(self):
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
        request_data = RequestData(
            start_date=datetime.date(2023, 11, 1),
            end_date=datetime.date(2023, 11, 9),
        )
        securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
        logging.info(f"number of lists:{len(securities_report_doc_list)}")
        logging.info(f"securities report doc list:{securities_report_doc_list}")

        self._download_xbrl_in_zip(securities_report_doc_list)
        print("download finish")
securities/domain/valueobject/edinet.py
import datetime
from dataclasses import dataclass


@dataclass
class RequestData:
    SECURITIES_REPORT_AND_META_DATA = 2
    start_date: datetime.date
    end_date: datetime.date

    def __post_init__(self):
        if self.start_date > datetime.date.today():
            raise ValueError("start_date is in the future")
        if self.end_date > datetime.date.today():
            raise ValueError("end_date is in the future")
        if self.start_date > self.end_date:
            raise ValueError("start_date is later than end_date")

        self.doc_type = self.SECURITIES_REPORT_AND_META_DATA

        # Calculate day_list
        period = self.end_date - self.start_date
        self.day_list = []
        for d in range(int(period.days)):
            day = self.start_date + datetime.timedelta(days=d)
            self.day_list.append(day)
        self.day_list.append(self.end_date)

xbrlを処理してCSVに出力する

arelleインストールの補足

正式なarelleプロジェクトがpypiにリリースされています。
arelleのgithubからでもよいですが、公式のプロジェクトを利用するようにしましょう。
https://pypi.org/project/arelle-release/

タクソノミ要素リスト

target_keys に指定するキー名は以下の「タクソノミ要素リスト」をダウンロードして、見たい計数を選択する

target_keys = {
    "EDINETCodeDEI": "edinet_code",
    "FilerNameInJapaneseDEI": "filer_name_jp",
    "AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
    "AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
    "AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",
    "AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
    "AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
    "NumberOfEmployees": "number_of_employees",
}

image.png

目が散りそうなほどのシート量だ...
image.png

valueobject

securities/domain/valueobject/edinet.py
import datetime
from dataclasses import dataclass


@dataclass
class RequestData:
    SECURITIES_REPORT_AND_META_DATA = 2
    start_date: datetime.date
    end_date: datetime.date

    def __post_init__(self):
        if self.start_date > datetime.date.today():
            raise ValueError("start_date is in the future")
        if self.end_date > datetime.date.today():
            raise ValueError("end_date is in the future")
        if self.start_date > self.end_date:
            raise ValueError("start_date is later than end_date")

        self.doc_type = self.SECURITIES_REPORT_AND_META_DATA

        # Calculate day_list
        period = self.end_date - self.start_date
        self.day_list = []
        for d in range(int(period.days)):
            day = self.start_date + datetime.timedelta(days=d)
            self.day_list.append(day)
        self.day_list.append(self.end_date)


class ResponseData:
    class _Metadata:
        class _Parameter:
            def __init__(self, data: dict) -> None:
                self.date = data.get("date")
                self.type = data.get("type")

        class _ResultSet:
            def __init__(self, data: dict) -> None:
                self.count = data.get("count")

        def __init__(self, data: dict) -> None:
            self.title = data.get("title")
            self.parameter = self._Parameter(data.get("parameter"))
            self.result_set = self._ResultSet(data.get("resultset"))
            self.process_date_time = data.get("processDateTime")
            self.status = data.get("status")
            self.message = data.get("message")

    class _Result:
        def __init__(self, data):
            self.seq_number = data.get("seqNumber")
            self.doc_id = data.get("docID")
            self.edinet_code = data.get("edinetCode")
            self.sec_code = data.get("secCode")
            self.jcn = data.get("JCN")
            self.filer_name = data.get("filerName")
            self.fund_code = data.get("fundCode")
            self.ordinance_code = data.get("ordinanceCode")
            self.form_code = data.get("formCode")
            self.doc_type_code = data.get("docTypeCode")
            self.period_start = data.get("periodStart")
            self.period_end = data.get("periodEnd")
            self.submit_date_time = data.get("submitDateTime")
            self.doc_description = data.get("docDescription")
            self.issuer_edinet_code = data.get("issuerEdinetCode")
            self.subject_edinet_code = data.get("subjectEdinetCode")
            self.subsidiary_edinet_code = data.get("subsidiaryEdinetCode")
            self.current_report_reason = data.get("currentReportReason")
            self.parent_doc_id = data.get("parentDocID")
            self.ope_date_time = data.get("opeDateTime")
            self.withdrawal_status = data.get("withdrawalStatus")
            self.doc_info_edit_status = data.get("docInfoEditStatus")
            self.disclosure_status = data.get("disclosureStatus")
            self.xbrl_flag = data.get("xbrlFlag")
            self.pdf_flag = data.get("pdfFlag")
            self.attach_doc_flag = data.get("attachDocFlag")
            self.english_doc_flag = data.get("englishDocFlag")
            self.csv_flag = data.get("csvFlag")
            self.legal_status = data.get("legalStatus")

    def __init__(self, data):
        self.metadata = self._Metadata(data.get("metadata"))
        self.results = [self._Result(item) for item in data.get("results", [])]


@dataclass
class CountingData:
    edinet_code: str | None = None
    filer_name_jp: str | None = None
    industry_name: str | None = None
    avg_salary: str | None = None
    avg_tenure_years: str | None = None
    avg_tenure_months: str | None = None
    avg_age_years: str | None = None
    avg_age_months: str | None = None
    number_of_employees: str | None = None

    @property
    def avg_tenure_years_combined(self) -> str | None:
        if self.avg_tenure_months:
            avg_tenure_decimal = round(int(self.avg_tenure_months) / 12, 1)
            avg_tenure = int(self.avg_tenure_years) + avg_tenure_decimal
            return str(avg_tenure)
        return self.avg_tenure_years

    @property
    def avg_age_years_combined(self) -> str | None:
        if self.avg_age_months:
            age_years_decimal = round(int(self.avg_age_months) / 12, 1)
            age_years = int(self.avg_age_years) + age_years_decimal
            return str(age_years)
        return self.avg_age_years

    def to_list(self) -> list[str | None]:
        return [
            self.edinet_code,
            self.filer_name_jp,
            self.industry_name,
            self.avg_salary,
            self.avg_tenure_years_combined,
            self.avg_age_years_combined,
            self.number_of_employees,
        ]

repository

securities/domain/repository/edinet.py
from securities.models import Company


class EdinetRepository:
    @staticmethod
    def get_industry_name(edinet_code: str) -> str | None:
        try:
            company = Company.objects.get(edinet_code=edinet_code)
            return company.submitter_industry
        except Company.DoesNotExist:
            return None

service

securities/domain/service/xbrl.py
import datetime
import logging
import os
import shutil
import zipfile
from pathlib import Path

import pandas as pd
import requests
from arelle import Cntlr

from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData

SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1


class XbrlService:
    def __init__(self, work_dir: Path):
        self.work_dir = work_dir
        self.temp_dir = self.work_dir / "temp"
        self.repository = EdinetRepository()

    @staticmethod
    def _make_doc_id_list(request_data: RequestData) -> list[str]:
        def _process_results_data(results: list) -> list[str]:
            """
            有価証券報告書: ordinanceCode == "010" and formCode =="030000"
            訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
            """
            doc_id_list = []
            for result in results:
                if result.ordinance_code == "010" and result.form_code == "030000":
                    doc_id_list.append(result.doc_id)
            return doc_id_list

        securities_report_doc_list = []
        for day in request_data.day_list:
            url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
            params = {
                "date": day,
                "type": request_data.SECURITIES_REPORT_AND_META_DATA,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            res = requests.get(url, params=params)
            res.raise_for_status()
            response_data = ResponseData(res.json())
            securities_report_doc_list.extend(
                _process_results_data(response_data.results)
            )
        return securities_report_doc_list

    def _download_xbrl_in_zip(self, securities_report_doc_list):
        """
        params.type:
            1: 提出本文書、監査報告書およびxbrl
            2: PDF
            3: 代替書面・添付文書
            4: 英文ファイル
            5: CSV
        """
        denominator = len(securities_report_doc_list)
        for i, doc_id in enumerate(securities_report_doc_list):
            logging.info(f"{doc_id}: {i + 1}/{denominator}")
            url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{doc_id}"
            params = {
                "type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            filename = self.work_dir / f"{doc_id}.zip"
            res = requests.get(url, params=params, stream=True)

            if res.status_code == 200:
                with open(filename, "wb") as file:
                    for chunk in res.iter_content(chunk_size=1024):
                        file.write(chunk)

    def download_xbrl(self):
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
        request_data = RequestData(
            start_date=datetime.date(2023, 11, 1),
            end_date=datetime.date(2023, 11, 9),
        )
        securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
        logging.info(f"number of lists:{len(securities_report_doc_list)}")
        logging.info(f"securities report doc list:{securities_report_doc_list}")

        self._download_xbrl_in_zip(securities_report_doc_list)
        logging.info("download finish")

    def _unzip_files_and_extract_xbrl(self) -> list[str]:
        """
        指定されたディレクトリ内のzipファイルを解凍し、指定したパターンに一致するXBRLファイルのリストを返します。
        xbrlファイルは各zipファイルに1つ、存在するようだ

        使用例:
            >> obj = XbrlService()
            >> result = obj.unzip_files_and_extract_xbrl('/path/to/zip/directory', '*.xbrl')
            >> print(result)
            ['/path/to/extracted/file1.xbrl', '/path/to/extracted/file2.xbrl']
        """

        zip_files = list(self.work_dir.glob("*.zip"))
        logging.info(f"number of zip files: {len(zip_files)}")
        for zip_file in zip_files:
            with zipfile.ZipFile(str(zip_file), "r") as zipf:
                zipf.extractall(str(self.temp_dir))
        xbrl_files = list(self.work_dir.glob("**/XBRL/PublicDoc/*.xbrl"))

        return [str(path) for path in xbrl_files]

    def _assign_attributes(self, counting_data: CountingData, facts):
        target_keys = {
            "EDINETCodeDEI": "edinet_code",
            "FilerNameInJapaneseDEI": "filer_name_jp",
            "AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
            "AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
            "AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",
            "AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
            "AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
            "NumberOfEmployees": "number_of_employees",
        }
        for fact in facts:
            key_to_set = target_keys.get(fact.concept.qname.localName)
            if key_to_set:
                setattr(counting_data, key_to_set, fact.value)
                if key_to_set == "edinet_code":
                    counting_data.industry_name = self.repository.get_industry_name(
                        counting_data.edinet_code
                    )
                elif (
                    key_to_set == "number_of_employees"
                    and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
                ):
                    setattr(counting_data, "number_of_employees", None)
        return counting_data

    def make_counting_data(self) -> list[CountingData]:
        counting_list = []
        for xbrl_path in self._unzip_files_and_extract_xbrl():
            counting_data = CountingData()
            ctrl = Cntlr.Cntlr()
            model_xbrl = ctrl.modelManager.load(xbrl_path)
            logging.info(f"{Path(xbrl_path).name}")
            counting_data = self._assign_attributes(counting_data, model_xbrl.facts)
            counting_list.append(counting_data)
        shutil.rmtree(self.temp_dir)
        return counting_list

    def to_csv(self, data: list[CountingData], output_filename: str):
        employee_frame = pd.DataFrame(
            data=[x.to_list() for x in data],
            columns=[
                "EDINETCODE",
                "企業名",
                "業種",
                "平均年間給与(円)",
                "平均勤続年数(年)",
                "平均年齢(歳)",
                "従業員数(人)",
            ],
        )
        employee_frame.to_csv(str(self.work_dir / output_filename), encoding="cp932", index=False)
        logging.info(f"{self.work_dir}{output_filename} が出力されました")


if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
    service.download_xbrl()
    service.to_csv(
        data=service.make_counting_data(),
        output_filename="output.csv",
    )

確認

とりあえずコンソール実行で動くこと。動いたならあとはDjangoで動くように整える(アップロード処理と同じくバッチで呼ぶのがいいだろうな)
image.png
image.png

image.png

リファクタリング

Countingテーブルをつくって保存する

CSVだと取り回し悪いからね

securities/models.py
    :
class Counting(models.Model):
    company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
    avg_salary = models.IntegerField("平均年間給与(円)", null=True)
    avg_tenure = models.FloatField("平均勤続年数(年)", null=True)
    avg_age = models.FloatField("平均年齢(歳)", null=True)
    number_of_employees = models.IntegerField("従業員数(人)", null=True)

    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

securities/domain/valueobject/edinet.py
@dataclass
class CountingData:
        :
    def to_entity(self, company_master: dict[str, Company]) -> Counting:
        return Counting(
            company=company_master[self.edinet_code],
            avg_salary=self.avg_salary,
            avg_tenure=self.avg_tenure_years_combined,
            avg_age=self.avg_age_years_combined,
            number_of_employees=self.number_of_employees,
        )

securities/domain/service/xbrl.py
if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
    service.download_xbrl()
-   service.to_csv(
-       data=service.make_counting_data(),
-       output_filename="output.csv",
-   )
+   company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
+   Counting.objects.bulk_create(
+       [x.to_entity(company_mst) for x in service.make_counting_data()]
+   )

Countingモデルに日付をつける

いまのままだといつのデータなのかわからない。_make_doc_id_listlist[str] を返しているところを改造して list[ResponseData] を返せば日付が手に入るはずだ。思っていた以上の大改修になってしまった..

securities/models.py
class Counting(models.Model):
    company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
+   period_start = models.DateField("期間(自)", null=True)
+   period_end = models.DateField("期間(至)", null=True)
+   submit_date = models.DateField("提出日時")
    avg_salary = models.IntegerField("平均年間給与(円)", null=True)
        :
        
securities/domain/valueobject/edinet.py
@dataclass
class CountingData:
        :
-   def to_entity(self, company_master: dict[str, Company]) -> Counting:
+   def to_entity(
+       self, doc_attr_dict: dict[str, ResponseData], company_master: dict[str, Company]
+   ) -> Counting:
+       response_data = doc_attr_dict[self.edinet_code]
+       submit_date = datetime.datetime.strptime(
+           response_data.results[0].submit_date_time, "%Y-%m-%d %H:%M"
+       )
        return Counting(
            company=company_master[self.edinet_code],
+           period_start=response_data.results[0].period_start,
+           period_end=response_data.results[0].period_end,
+           submit_date=submit_date,
            avg_salary=self.avg_salary,
            avg_tenure=self.avg_tenure_years_combined,
            avg_age=self.avg_age_years_combined,
            number_of_employees=self.number_of_employees,
        )

securities/domain/service/xbrl.py
class XbrlService:
        :
-   def _make_doc_id_list(request_data: RequestData) -> list[str]:
-       def _process_results_data(results: list) -> list[str]:
-           """
-           有価証券報告書: ordinanceCode == "010" and formCode =="030000"
-           訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
-           """
-           doc_id_list = []
-           for result in results:
-               if result.ordinance_code == "010" and result.form_code == "030000":
-                   doc_id_list.append(result.doc_id)
-           return doc_id_list
-
-       securities_report_doc_list = []
+   def _extract(request_data: RequestData) -> list[ResponseData]:
+       """
+       特定の提出書類をもつ ResponseData を抽出する
+        有価証券報告書: ordinanceCode == "010" and formCode =="030000"
+        訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
+       """
+       securities_report_list = []

        for day in request_data.day_list:
            url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
            params = {
                "date": day,
                "type": request_data.SECURITIES_REPORT_AND_META_DATA,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            res = requests.get(url, params=params)
            res.raise_for_status()
            response_data = ResponseData(res.json())
-           securities_report_doc_list.extend(
-               _process_results_data(response_data.results)
-           )
-       return securities_report_doc_list
+           for result in response_data.results:
+               if result.ordinance_code == "010" and result.form_code == "030000":
+                   logging.info(
+                       f"{day}, {result.filer_name}, edinet_code: {result.edinet_code}, doc_id: {result.doc_id}"
+                   )
+                   response_data.results = [result]
+                  securities_report_list.append(response_data)
+       return securities_report_list

-   def _download_xbrl_in_zip(self, securities_report_doc_list):
+   def _download_xbrl_in_zip(self, securities_report_list: list[ResponseData]):
        """
        params.type:
            1: 提出本文書、監査報告書およびxbrl
            2: PDF
            3: 代替書面・添付文書
            4: 英文ファイル
            5: CSV
        """
-       denominator = len(securities_report_doc_list)
-       for i, doc_id in enumerate(securities_report_doc_list):
+       denominator = len(securities_report_list)
+       for i, securities_report in enumerate(securities_report_list):
+           doc_id = securities_report.results[0].doc_id
            logging.info(f"{doc_id}: {i + 1}/{denominator}")
                :
-   def download_xbrl(self):
+   def download_xbrl(self) -> dict[str, ResponseData]:
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
        request_data = RequestData(
            start_date=datetime.date(2023, 11, 1),
            end_date=datetime.date(2023, 11, 9),
        )
-       securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
-       logging.info(f"number of lists:{len(securities_report_doc_list)}")
-       logging.info(f"securities report doc list:{securities_report_doc_list}")
-
-       self._download_xbrl_in_zip(securities_report_doc_list)
+       securities_report_list = self._extract(request_data)
+       self._download_xbrl_in_zip(securities_report_list)
        logging.info("download finish")
+
+       securities_report_dict = {}
+       for x in securities_report_list:
+           securities_report_dict[x.results[0].edinet_code] = x
+
+       return securities_report_dict
            :
if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
-   service.download_xbrl()
+   doc_attr_dict = service.download_xbrl()
    company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
    Counting.objects.bulk_create(
-       [x.to_entity(company_mst) for x in service.make_counting_data()]
+       [x.to_entity(doc_attr_dict, company_mst) for x in service.make_counting_data()]
    )
+   logging.info("bulk_create finish")

repositoryに引っ越して company と submitDate(提出日時)で delete してから bulk_insert

リランすると無限にinsertできちゃうからね...

securities/models.py
class Counting(models.Model):
        :
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

+   class Meta:
+       unique_together = ["company", "submit_date"]

securities/domain/service/xbrl.py
    :
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
- from securities.models import Company, Counting
    :
if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
    doc_attr_dict = service.download_xbrl()
-   company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
-   Counting.objects.bulk_create(
-       [x.to_entity(doc_attr_dict, company_mst) for x in service.make_counting_data()]
-   )
+   service.repository.delete_existing_records(list(doc_attr_dict.values()))
+   service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
    logging.info("bulk_create finish")

securities/domain/repository/edinet.py
+ from datetime import datetime

+ from securities.domain.valueobject.edinet import ResponseData, CountingData
- from securities.models import Company
+ from securities.models import Company, Counting

class EdinetRepository:
        :
+ ここから下を追加
    @staticmethod
    def delete_existing_records(response_data_list: list[ResponseData]):
        edinet_codes = [data.results[0].edinet_code for data in response_data_list]
        edinet_code_to_company = {
            company.edinet_code: company
            for company in Company.objects.filter(edinet_code__in=edinet_codes)
        }
        for data in response_data_list:
            edinet_code = data.results[0].edinet_code
            submit_date = datetime.strptime(
                data.results[0].submit_date_time, "%Y-%m-%d %H:%M"
            )
            company = edinet_code_to_company[edinet_code]
            Counting.objects.filter(company=company, submit_date=submit_date).delete()

    @staticmethod
    def bulk_insert(
        doc_attr_dict: dict[str, ResponseData], counting_data_list: list[CountingData]
    ):
        edinet_codes = [data.results[0].edinet_code for data in doc_attr_dict.values()]
        edinet_code_to_company = {
            company.edinet_code: company
            for company in Company.objects.filter(edinet_code__in=edinet_codes)
        }
        insert_objects = [
            x.to_entity(
                doc_attr_dict,
                edinet_code_to_company,
            )
            for x in counting_data_list
        ]
        Counting.objects.bulk_create(insert_objects)

業種はCompanyマスタから引く

当初のCSV用の処理だと業種を引く必要があったけど、企業マスタがあるからね

securities/domain/valueobject/edinet.py
@dataclass
class CountingData:
    edinet_code: str | None = None
    filer_name_jp: str | None = None
-   industry_name: str | None = None
    avg_salary: str | None = None
    avg_tenure_years: str | None = None
    avg_tenure_months: str | None = None
    avg_age_years: str | None = None
    avg_age_months: str | None = None
    number_of_employees: str | None = None
        :
    def to_list(self) -> list[str | None]:
        return [
            self.edinet_code,
            self.filer_name_jp,
-           self.industry_name,
            self.avg_salary,
            self.avg_tenure_years_combined,
            self.avg_age_years_combined,
            self.number_of_employees,
        ]

securities/domain/repository/edinet.py
class EdinetRepository:
-   @staticmethod
-   def get_industry_name(edinet_code: str) -> str | None:
-       try:
-           company = Company.objects.get(edinet_code=edinet_code)
-           return company.submitter_industry
-       except Company.DoesNotExist:
-           return None

    @staticmethod
    def delete_existing_records(response_data_list: list[ResponseData]):
        :
securities/domain/service/xbrl.py
    :
from arelle import Cntlr
+ from django.core.exceptions import ObjectDoesNotExist

from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
+ from securities.models import Company
    :
-   def _assign_attributes(self, counting_data: CountingData, facts):
+   @staticmethod
+   def _assign_attributes(counting_data: CountingData, facts):

        target_keys = {
            "EDINETCodeDEI": "edinet_code",
            "FilerNameInJapaneseDEI": "filer_name_jp",
            "AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
            "AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
            "AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",  # noqa E501
            "AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
            "AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
            "NumberOfEmployees": "number_of_employees",
        }
        for fact in facts:
            key_to_set = target_keys.get(fact.concept.qname.localName)
            if key_to_set:
                setattr(counting_data, key_to_set, fact.value)
-               if key_to_set == "edinet_code":
-                   counting_data.industry_name = self.repository.get_industry_name(
-                       counting_data.edinet_code
-                   )
-               elif (
                if (
                    key_to_set == "number_of_employees"
                    and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
                ):
                    setattr(counting_data, "number_of_employees", None)
        return counting_data
            :
    def to_csv(self, data: list[CountingData], output_filename: str):
+       all_companies = Company.objects.all()
+       new_data = []
+       for x in data:
+           try:
+               # If matching Company object is found, insert industry name to list
+               company = all_companies.get(edinet_code=x.edinet_code)
+               data_list = x.to_list()
+               data_list.insert(2, company.submitter_industry)
+           except ObjectDoesNotExist:
+               # If no matching Company object is found, insert None
+               data_list = x.to_list()
+               data_list.insert(2, None)
+           new_data.append(data_list)
+
        employee_frame = pd.DataFrame(
-           data=[x.to_list() for x in data],
+           data=new_data,
            columns=[
                "EDINETCODE",
                "企業名",
                "業種",
                "平均年間給与(円)",
                "平均勤続年数(年)",
                "平均年齢(歳)",
                "従業員数(人)",
            ],
        )
        employee_frame.to_csv(str(self.work_dir / output_filename), encoding="cp932")
        logging.info(f"{self.work_dir}{output_filename} が出力されました")

if __name__ == "__main__":
        :
+   点検だけして消す
+   service.to_csv(
+       data=service.make_counting_data(),
+       output_filename="output.csv",
+   )

unzip処理をlibに移管

securities/domain/service/xbrl.py
    :
import shutil
- import zipfile
from pathlib import Path

import pandas as pd
import requests
from arelle import Cntlr
from django.core.exceptions import ObjectDoesNotExist

+ from lib.zipfileservice import ZipFileService
from securities.domain.repository.edinet import EdinetRepository
    :
class XbrlService:
    :
    def _unzip_files_and_extract_xbrl(self) -> list[str]:
        """
        指定されたディレクトリ内のzipファイルを解凍し、指定したパターンに一致するXBRLファイルのリストを返します。
        xbrlファイルは各zipファイルに1つ、存在するようだ

-       使用例:
-           >> obj = XbrlService()
-           >> result = obj.unzip_files_and_extract_xbrl('/path/to/zip/directory', '*.xbrl')
-           >> print(result)
+       Returns:
            ['/path/to/extracted/file1.xbrl', '/path/to/extracted/file2.xbrl']
        """
-
-       zip_files = list(self.work_dir.glob("*.zip"))
-       logging.info(f"number of zip files: {len(zip_files)}")
-       for zip_file in zip_files:
-           with zipfile.ZipFile(str(zip_file), "r") as zipf:
-               zipf.extractall(str(self.temp_dir))
+       ZipFileService.extract_zip_files(self.work_dir, self.temp_dir)
-       xbrl_files = list(self.work_dir.glob("**/XBRL/PublicDoc/*.xbrl"))
+       xbrl_files = list(self.temp_dir.glob("XBRL/PublicDoc/*.xbrl"))

        return [str(path) for path in xbrl_files]
lib/zipfileservice.py
    :
class ZipFileService:
        :       
+   @staticmethod
+   def extract_zip_files(source_dir: Path, target_dir: Path):
+       """
+       ソースディレクトリからのすべてのzipファイルをターゲットディレクトリに解凍します。
+       """
+       source_dir_path = Path(source_dir)
+       target_dir_path = Path(target_dir)
+       target_dir_path.mkdir(parents=True, exist_ok=True)
+
+       zip_files = source_dir_path.glob("*.zip")
+       for zip_file in zip_files:
+           with zipfile.ZipFile(str(zip_file), "r") as zipf:
+               zipf.extractall(str(target_dir_path))
        :

doc_id の重複を排除する

securities/domain/service/xbrl.py
class XbrlService:
    def __init__(self, work_dir: Path):
        self.work_dir = work_dir
        self.temp_dir = self.work_dir / "temp"
        self.repository = EdinetRepository()

    @staticmethod
    def _extract(request_data: RequestData) -> list[ResponseData]:
        """
-       特定の提出書類をもつ ResponseData を抽出する
+       特定の提出書類をもつ ResponseData を抽出する(重複した doc_id は除外される)
         有価証券報告書: ordinanceCode == "010" and formCode =="030000"
         訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
        """
-       securities_report_list = []
+       securities_report_dict = {}
        for day in request_data.day_list:
            url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
            params = {
                "date": day,
                "type": request_data.SECURITIES_REPORT_AND_META_DATA,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            res = requests.get(url, params=params)
            res.raise_for_status()
            response_data = ResponseData(res.json())
            for result in response_data.results:
                if result.ordinance_code == "010" and result.form_code == "030000":
                    logging.info(
-                       f"{day}, {result.filer_name}, edinet_code: {result.edinet_code}, doc_id: {result.doc_id}"
+                       f"{day}, "
+                       f"edinet_code: {result.edinet_code}, "
+                       f"doc_id: {result.doc_id}, "
+                       f"期間(自): {response_data.results[0].period_start}, "
+                       f"期間(至): {response_data.results[0].period_end}, "
+                       f"{result.filer_name}, "
                    )
                    response_data.results = [result]
-                   securities_report_list.append(response_data)
-       return securities_report_list
+           if (
+               response_data.results
+               and response_data.results[0].doc_id not in securities_report_dict
+           ):
+               securities_report_dict[response_data.results[0].doc_id] = response_data
+       return list(securities_report_dict.values())

            :

    def download_xbrl(self) -> dict[str, ResponseData]:
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
        request_data = RequestData(
            start_date=datetime.date(2023, 11, 1),
-           end_date=datetime.date(2023, 11, 9),
+           end_date=datetime.date(2023, 11, 29),
        )
        securities_report_list = self._extract(request_data)
        self._download_xbrl_in_zip(securities_report_list)
        logging.info("download finish")

        securities_report_dict = {}
        for x in securities_report_list:
            securities_report_dict[x.results[0].edinet_code] = x

        return securities_report_dict

日付の入力をおおそとに持ってくる

ロジックの奥に日付をリテラルしてもしかたがない

securities/domain/service/xbrl.py
class XbrlService:
    :
-   def download_xbrl(self) -> dict[str, ResponseData]:
+   def download_xbrl(self, request_data: RequestData) -> dict[str, ResponseData]:
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
-       request_data = RequestData(
-           start_date=datetime.date(2023, 11, 1),
-           end_date=datetime.date(2023, 11, 29),
-       )
        securities_report_list = self._extract(request_data)
        self._download_xbrl_in_zip(securities_report_list)
        logging.info("download finish")

        securities_report_dict = {}
        for x in securities_report_list:
            securities_report_dict[x.results[0].edinet_code] = x

        return securities_report_dict

            :
if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
-   doc_attr_dict = service.download_xbrl()
+   doc_attr_dict = service.download_xbrl(
+       RequestData(
+           start_date=datetime.date(2023, 11, 1),
+           end_date=datetime.date(2023, 11, 29),
+       )
+   )
    service.repository.delete_existing_records(list(doc_attr_dict.values()))
    service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
    logging.info("bulk_create finish")

1年分の大量データをまわしてみて気付いたところを修正

  • _extructでフィルタした顔ぶれをダウンロード
  • ダウンロードしたフォルダのzipファイルをすべて処理していく

なので、doc_attr_dictcounting_data_dict は顔ぶれも一致するはずなんだが、なぜかキーエラーが発生するということがわかった。結構調査したけど無理だった。まあ4レコード分なので問題はなかろう

    :
jpsps100000-ssr-001_G13178-000_2023-06-06_01_2023-09-06.xbrl
jpsps100000-ssr-001_G14453-000_2022-08-16_01_2022-11-16.xbrl
E22688 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
E37857 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
E11541 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
None の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
bulk_create finish
securities/domain/repository/edinet.py
+ import logging
    :
    @staticmethod
    def bulk_insert(
-       doc_attr_dict: dict[str, ResponseData], counting_data_list: list[CountingData]
+       doc_attr_dict: dict[str, ResponseData], counting_data_dict: dict[str, CountingData],
    ):
+       # Delete the CountingData instances whose 'edinet_code' is not in 'doc_attr_dict'
+       for edinet_code in list(counting_data_dict.keys()):
+           if edinet_code not in doc_attr_dict:
+               del counting_data_dict[edinet_code]
+               logging.warning(
+                   f"{edinet_code} の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました"
+               )
+
        edinet_codes = [data.results[0].edinet_code for data in doc_attr_dict.values()]
        edinet_code_to_company = {
            company.edinet_code: company
            for company in Company.objects.filter(edinet_code__in=edinet_codes)
        }
        insert_objects = [
            x.to_entity(
                doc_attr_dict,
                edinet_code_to_company,
            )
-           for x in counting_data_list
+           for x in counting_data_dict.values()
        ]
        Counting.objects.bulk_create(insert_objects)
securities/domain/service/xbrl.py
class XbrlService:
    def __init__(self, work_dir: Path):
        self.work_dir = work_dir
+       if not self.work_dir.exists():
+           self.work_dir.mkdir(parents=True, exist_ok=True)
        self.temp_dir = self.work_dir / "temp"
        self.repository = EdinetRepository()

    @staticmethod
    def _extract(request_data: RequestData) -> list[ResponseData]:
        """
        特定の提出書類をもつ ResponseData を抽出する(重複した doc_id は除外される)
         有価証券報告書: ordinanceCode == "010" and formCode =="030000"
         訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
        """
            :
            for result in response_data.results:
                if result.ordinance_code == "010" and result.form_code == "030000":
-                   logging.info(
-                       f"{day}, "
-                       f"edinet_code: {result.edinet_code}, "
-                       f"doc_id: {result.doc_id}, "
-                       f"期間(自): {response_data.results[0].period_start}, "
-                       f"期間(至): {response_data.results[0].period_end}, "
-                       f"{result.filer_name}, "
-                   )
+                   logging.info(f"{day}, {result}")
                    response_data.results = [result]
            if (
                response_data.results
+               and response_data.results[0].submit_date_time
                and response_data.results[0].doc_id not in securities_report_dict
            ):
                securities_report_dict[response_data.results[0].doc_id] = response_data
        return list(securities_report_dict.values())

            :
            
-   def make_counting_data(self) -> list[CountingData]:
-       counting_list = []
+   def make_counting_data(self) -> dict[str, CountingData]:
+       counting_data_dict = {}
        for xbrl_path in self._unzip_files_and_extract_xbrl():
            counting_data = CountingData()
            ctrl = Cntlr.Cntlr()
            model_xbrl = ctrl.modelManager.load(xbrl_path)
            logging.info(f"{Path(xbrl_path).name}")
            counting_data = self._assign_attributes(counting_data, model_xbrl.facts)
-           counting_list.append(counting_data)
+           counting_data_dict[counting_data.edinet_code] = counting_data
        shutil.rmtree(self.temp_dir)
-       return counting_list
+       return counting_data_dict
            :
if __name__ == "__main__":
    # 前提条件: EDINETコードリストのアップロード
    home_dir = os.path.expanduser("~")
    service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
    doc_attr_dict = service.download_xbrl(
        RequestData(
-           start_date=datetime.date(2023, 11, 1),
-           end_date=datetime.date(2023, 11, 29),
+           start_date=datetime.date(2022, 11, 1),
+           end_date=datetime.date(2023, 10, 31),
        )
    )
    service.repository.delete_existing_records(list(doc_attr_dict.values()))
    service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
    logging.info("bulk_create finish")

securities/domain/valueobject/edinet.py
@dataclass
class RequestData:
    :
    class _Result:
        def __init__(self, data):
                :
-           self.xbrl_flag = data.get("xbrlFlag")
-           self.pdf_flag = data.get("pdfFlag")
-           self.attach_doc_flag = data.get("attachDocFlag")
-           self.english_doc_flag = data.get("englishDocFlag")
-           self.csv_flag = data.get("csvFlag")
+           self.xbrl_flag: bool = bool(data.get("xbrlFlag"))
+           self.pdf_flag: bool = bool(data.get("pdfFlag"))
+           self.attach_doc_flag: bool = bool(data.get("attachDocFlag"))
+           self.english_doc_flag: bool = bool(data.get("englishDocFlag"))
+           self.csv_flag: bool = bool(data.get("csvFlag"))
            self.legal_status = data.get("legalStatus")
+
+       def __str__(self):
+           return (
+               f"Seq Number: {self.seq_number}, "
+               f"Doc ID: {self.doc_id}, "
+               f"Edinet Code: {self.edinet_code}, "
+               f"Sec Code: {self.sec_code}, "
+               f"JCN: {self.jcn}, "
+               f"Filer Name: {self.filer_name}, "
+               f"Fund Code: {self.fund_code}, "
+               f"Ordinance Code: {self.ordinance_code}, "
+               f"Form Code: {self.form_code}, "
+               f"Doc Type Code: {self.doc_type_code}, "
+               f"Period Start: {self.period_start}, "
+               f"Period End: {self.period_end}, "
+               f"Submit Date Time: {self.submit_date_time}, "
+               f"Doc Description: {self.doc_description}, "
+               f"Issuer Edinet Code: {self.issuer_edinet_code}, "
+               f"Subject Edinet Code: {self.subject_edinet_code}, "
+               f"Subsidiary Edinet Code: {self.subsidiary_edinet_code}, "
+               f"Current Report Reason: {self.current_report_reason}, "
+               f"Parent Doc ID: {self.parent_doc_id}, "
+               f"Ope Date Time: {self.ope_date_time}, "
+               f"Withdrawal Status: {self.withdrawal_status}, "
+               f"Doc Info Edit Status: {self.doc_info_edit_status}, "
+               f"Disclosure Status: {self.disclosure_status}, "
+               f"Xbrl Flag: {self.xbrl_flag}, "
+               f"PDF Flag: {self.pdf_flag}, "
+               f"Attach Doc Flag: {self.attach_doc_flag}, "
+               f"English Doc Flag: {self.english_doc_flag}, "
+               f"CSV Flag: {self.csv_flag}, "
+               f"Legal Status: {self.legal_status}"
+           )

XBRLデータを可視化する

repository

securities/domain/repository/plot.py
from django.db.models import Q, QuerySet, F

from securities.domain.valueobject.plot import RequestData
from securities.models import Counting


class PlotRepository:
    @staticmethod
    def get_period_data(request_data: RequestData) -> QuerySet:
        return (
            Counting.objects.select_related("company")
            .filter(
                Q(submit_date__gte=request_data.start_date)
                & Q(submit_date__lte=request_data.end_date)
            )
            .annotate(submitter_industry=F("company__submitter_industry"))
        )

    @staticmethod
    def get_period_data_for_specific_industry(
        request_data: RequestData, industry: str
    ) -> QuerySet:
        return (
            Counting.objects.select_related("company")
            .filter(
                Q(submit_date__gte=request_data.start_date)
                & Q(submit_date__lte=request_data.end_date)
                & Q(company__submitter_industry=industry)
            )
            .annotate(submitter_name=F("company__submitter_name"))
        )
        

valueobject

securities/domain/valueobject/plot.py
import datetime
from dataclasses import dataclass


@dataclass
class RequestData:
    start_date: datetime.date
    end_date: datetime.date

    def __post_init__(self):
        if self.start_date > datetime.date.today():
            raise ValueError("start_date is in the future")
        if self.end_date > datetime.date.today():
            raise ValueError("end_date is in the future")
        if self.start_date > self.end_date:
            raise ValueError("start_date is later than end_date")


@dataclass
class PlotParams:
    """
    Attributes:
        x (str): プロットのx軸のラベル
        title (str | None): グラフタイトルとファイル名に使用される
    Notes: 使用するグラフが横軸なので single版(i.e. this) は x という名前になるので縦にするときは注意
    """

    x: str
    title: str | None


@dataclass
class PlotParamsForKDE(PlotParams):
    """
    Attributes:
        y (str): プロットのy軸のラベル
        color (str): KDEプロットに使用する色
    """

    y: str
    color: str
    

service

securities/domain/service/plot.py
import datetime
import os
from abc import abstractmethod, ABC
from pathlib import Path

import pandas as pd
import seaborn
from django.db.models import QuerySet
from matplotlib import pyplot as plt, ticker

from securities.domain.repository.plot import PlotRepository
from securities.domain.valueobject.plot import (
    RequestData,
    PlotParams,
    PlotParamsForKDE,
)

COLUMN_COMPANY_NAME = "submitter_name"
COLUMN_INDUSTRY = "submitter_industry"
COLUMN_AVG_SALARY = "avg_salary"
COLUMN_AVG_TENURE = "avg_tenure"
COLUMN_AVG_AGE = "avg_age"
COMMON_FONT = ["IPAexGothic"]


class PlotServiceBase(ABC):
    def __init__(
        self,
        work_dir: Path,
        target_period: RequestData,
        display_title: bool,
        categorical_column: str = None,
    ):
        """
        Args:
            work_dir: 処理対象のフォルダ
            target_period: 期間
            display_title: グラフタイトル表示可否
            categorical_column: カテゴリカルラベルを作るための列
        """
        plt.rcParams["font.family"] = COMMON_FONT
        self.work_dir = work_dir
        if not self.work_dir.exists():
            self.work_dir.mkdir(parents=True, exist_ok=True)
        self._repository = PlotRepository()
        self.categorical_column = categorical_column
        self.clean_data = self._clean(self._get_target_data(target_period))
        self.display_title = display_title
        if self.categorical_column:
            self.categorical_labels_dict = self._get_labels_sorted_by_averages(
                self.clean_data
            )

    @abstractmethod
    def _get_target_data(self, target_period: RequestData) -> QuerySet:
        raise NotImplementedError

    @abstractmethod
    def _clean(self, query: QuerySet) -> pd.DataFrame:
        raise NotImplementedError

    def _get_labels_sorted_by_averages(
        self, clean_data: pd.DataFrame
    ) -> dict[str, list[str]]:
        """
        業種別平均でソートしたラベルを 3種類 取得する\n
        Returns: ['不動産業', 'サービス業', '情報・通信業', '水産・農林業', ... ]
        """

        def _sort_labels_by_column_average(
            _data: pd.DataFrame, sort_on: str
        ) -> list[str]:
            sorted_df = (
                _data.groupby([self.categorical_column], as_index=False)
                .mean()
                .sort_values(sort_on)
            )
            return sorted_df[self.categorical_column].tolist()

        return {
            COLUMN_AVG_SALARY: _sort_labels_by_column_average(
                clean_data, sort_on=COLUMN_AVG_SALARY
            ),
            COLUMN_AVG_TENURE: _sort_labels_by_column_average(
                clean_data, sort_on=COLUMN_AVG_TENURE
            ),
            COLUMN_AVG_AGE: _sort_labels_by_column_average(
                clean_data, sort_on=COLUMN_AVG_AGE
            ),
        }

    def plot_all(self, plot_params_list: list[PlotParams | PlotParamsForKDE]):
        for plot_params in plot_params_list:
            self._plot(plot_params=plot_params)

    @abstractmethod
    def _plot(self, plot_params: PlotParams | PlotParamsForKDE):
        raise NotImplementedError

    @staticmethod
    def _configure_plot():
        # Note: gca() は "get current axes" を意味する
        ax = plt.gca()
        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)
        ax.yaxis.set_ticks_position("left")
        ax.xaxis.set_ticks_position("bottom")

    @abstractmethod
    def save(self, title: str):
        raise NotImplementedError


class BoxenPlotService(PlotServiceBase):
    def __init__(
        self, work_dir: Path, target_period: RequestData, categorical_column: str
    ):
        super().__init__(
            work_dir=work_dir,
            target_period=target_period,
            display_title=True,
            categorical_column=categorical_column,
        )

    def _get_target_data(self, target_period):
        return self._repository.get_period_data(target_period)

    def _clean(self, query: QuerySet) -> pd.DataFrame:
        return pd.DataFrame(
            list(
                query.values(
                    self.categorical_column,
                    COLUMN_AVG_SALARY,
                    COLUMN_AVG_TENURE,
                    COLUMN_AVG_AGE,
                )
            )
        ).dropna()

    def _plot(self, plot_params: PlotParams):
        plt.figure(figsize=(15, 10))
        seaborn.stripplot(
            x=plot_params.x,
            y=self.categorical_column,
            orient="h",
            data=self.clean_data,
            size=3,
            edgecolor="auto",
            order=self.categorical_labels_dict[plot_params.x],
        )
        ax = seaborn.boxenplot(
            x=plot_params.x,
            y=self.categorical_column,
            hue=self.categorical_column,  # TODO: hueをつけないことがdeprecatedだが、hueをつけると色合いがおかしくなる
            orient="h",
            data=self.clean_data,
            palette="rainbow",
            order=self.categorical_labels_dict[plot_params.x],
        )
        ax.grid(which="major", color="lightgray", ls=":", alpha=0.5)
        ax.xaxis.set_minor_locator(ticker.AutoMinorLocator())
        plt.xlabel(plot_params.x, fontsize=18)
        plt.ylabel(self.categorical_column, fontsize=16)
        if self.display_title:
            plt.title(plot_params.title, fontsize=24)
        self._configure_plot()
        self.save(plot_params.title)
        # plt.show()

    def save(self, title: str):
        plt.savefig(self.work_dir / f"boxen_plot_{title}.png")


class BarPlotService(PlotServiceBase):
    def __init__(
        self, work_dir: Path, target_period: RequestData, categorical_column: str
    ):
        super().__init__(
            work_dir=work_dir,
            target_period=target_period,
            display_title=True,
            categorical_column=categorical_column,
        )

    def _get_target_data(self, target_period):
        # TODO: 業種はとりあえず db値"情報・通信業" で固定している(Qiita準拠)
        return self._repository.get_period_data_for_specific_industry(
            target_period, "情報・通信業"
        )

    def _clean(self, query: QuerySet) -> pd.DataFrame:
        return pd.DataFrame(
            list(
                query.values(
                    self.categorical_column,
                    COLUMN_AVG_SALARY,
                    COLUMN_AVG_TENURE,
                    COLUMN_AVG_AGE,
                )
            )
        ).dropna()

    def _plot(self, plot_params: PlotParams):
        # COLUMN_AVG_SALARY が最も高い上位50の行
        df_sort_by_salary = self.clean_data.sort_values(COLUMN_AVG_SALARY)[-50:]
        df_info_label_list_sort_by_salary = df_sort_by_salary[
            self.categorical_column
        ].tolist()
        plt.figure(figsize=(15, 12))
        ax = seaborn.barplot(
            x=self.categorical_column,
            y=plot_params.x,
            hue=self.categorical_column,  # TODO: hueをつけないことがdeprecatedだが、hueをつけると色合いがおかしくなる
            data=self.clean_data,
            palette="rocket",
            order=df_info_label_list_sort_by_salary,
        )
        seaborn.set(style="ticks")
        plt.xticks(rotation=90)
        plt.subplots_adjust(hspace=0.8, bottom=0.35)
        ax.grid(which="major", axis="y", color="lightgray", ls=":", alpha=0.5)
        ax.yaxis.set_major_formatter(
            plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x)))
        )
        plt.xlabel(self.categorical_column, fontsize=12)
        plt.ylabel(COLUMN_AVG_SALARY, fontsize=18)
        if self.display_title:
            plt.title(plot_params.title, fontsize=24)
        self._configure_plot()
        self.save(plot_params.title)
        # plt.show()

    def save(self, title: str):
        plt.savefig(self.work_dir / f"bar_plot_{title}.png")


class KernelDensityEstimationPlotService(PlotServiceBase):
    def __init__(self, work_dir: Path, target_period: RequestData):
        super().__init__(
            work_dir=work_dir,
            target_period=target_period,
            display_title=False,
        )

    def _get_target_data(self, target_period: RequestData) -> QuerySet:
        return self._repository.get_period_data(target_period)

    def _clean(self, query: QuerySet) -> pd.DataFrame:
        return pd.DataFrame(
            list(
                query.values(
                    COLUMN_AVG_SALARY,
                    COLUMN_AVG_TENURE,
                    COLUMN_AVG_AGE,
                )
            )
        ).dropna()

    def _plot(self, plot_params: PlotParamsForKDE):
        seaborn.jointplot(
            x=plot_params.x,
            y=plot_params.y,
            data=self.clean_data,
            kind="kde",
            color=plot_params.color,
        )
        if self.display_title:
            plt.title(plot_params.title, fontsize=24)
        self._configure_plot()
        self.save(plot_params.title)
        # plt.show()

    def save(self, title: str):
        plt.savefig(self.work_dir / f"kernel_density_estimation_plot_{title}.png")


if __name__ == "__main__":
    home_dir = os.path.expanduser("~")
    period = RequestData(
        start_date=datetime.date(2022, 11, 1),
        end_date=datetime.date(2023, 10, 31),
    )

    # plot1: 箱ひげ図
    service = BoxenPlotService(
        work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
        target_period=period,
        categorical_column=COLUMN_INDUSTRY,
    )
    service.plot_all(
        [
            PlotParams(x=COLUMN_AVG_SALARY, title="業種別平均年間給与額"),
            PlotParams(x=COLUMN_AVG_TENURE, title="業種別平均勤続年数"),
            PlotParams(x=COLUMN_AVG_AGE, title="業種別平均年齢"),
        ]
    )

    # plot2: 棒グラフ
    service = BarPlotService(
        work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
        target_period=period,
        categorical_column=COLUMN_COMPANY_NAME,
    )
    service.plot_all(
        [PlotParams(x=COLUMN_AVG_SALARY, title="情報・通信業界_平均年間給与TOP50")]
    )

    # plot3: カーネル密度推定
    service = KernelDensityEstimationPlotService(
        work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
        target_period=period,
    )
    service.plot_all(
        [
            PlotParamsForKDE(
                x=COLUMN_AVG_TENURE,
                y=COLUMN_AVG_SALARY,
                color="#d9f2f8",
                title="平均勤続年数x平均年間給与",
            ),
            PlotParamsForKDE(
                x=COLUMN_AVG_AGE,
                y=COLUMN_AVG_SALARY,
                color="#fac8be",
                title="平均年齢x平均年間給与",
            ),
            PlotParamsForKDE(
                x=COLUMN_AVG_AGE,
                y=COLUMN_AVG_TENURE,
                color="#008000",
                title="平均年齢x平均勤続年数",
            ),
        ]
    )
    print("visualize finish")
    

確認

なんかもとの記事のグラフと比べて鮮やかさ?が足りないんだけど:rolling_eyes:
色がループしてるような?有報のデータ数が足りないのか?seabornの設定が違うのか?
まぁいまはdddに作り変えて表示できれば御の字だ

my output guide

フロントから実行できるようにするためのリファクタリング

まぁ、ここまでできれば十分ではあるんだけど、いまは開発環境で「再生」を押すことが前提となっている。

  • いまは期間を指定して有報をバッチ処理するような作りになっているが、このまま公開して不特定多数にたくさんAPIを動かされると面倒なので、 DatePicker を追加して処理可能銘柄の一覧を表示し、チェックボックスで予約し、日次バッチで実行するようにする
  • 処理可能銘柄の一覧はページネーションをつける

いちおうセルフPullRequestを作ってから記事に起こすんだけど、また大手術だね:rolling_eyes:

lib/zipfileservice.py

  • target_dir が存在しなかったら作る、みたいな処理だったけどなかったら例外で返すようにした
lib/zipfileservice.py(extract_zip_filesを全とっかえ)
class ZipFileService:
        :
    @staticmethod
    def extract_zip_files(source_dir: Path, target_dir: Path):
        """
        ソースディレクトリからのすべてのzipファイルをターゲットディレクトリに解凍します。
        """
        source_dir_path = Path(source_dir)
        target_dir_path = Path(target_dir)

        # Check if the source directory exists
        if not source_dir_path.exists():
            raise FileNotFoundError(f"The source directory {source_dir} does not exist")

        # Check if the target directory exists
        if not target_dir_path.exists():
            raise FileNotFoundError(f"The target directory {target_dir} does not exist")

        zip_files = source_dir_path.glob("*.zip")
        for zip_file in zip_files:
            with zipfile.ZipFile(str(zip_file), "r") as zip_f:
                zip_f.extractall(str(target_dir_path))
                    :

securities/domain/repository/edinet.py

securities/domain/repository/edinet.py(全とっかえ)
from django.db.models import Q

from securities.models import Company, Counting, ReportDocument


class EdinetRepository:
    @staticmethod
    def delete_existing_records(report_doc_list: list[ReportDocument]) -> None:
        delete_conditions = Q()

        for report_doc in report_doc_list:
            company = Company.objects.get(edinet_code=report_doc.company.edinet_code)
            delete_conditions |= Q(
                company=company, submit_date=report_doc.submit_date_time
            )

        Counting.objects.filter(delete_conditions).delete()
        

securities/domain/service/xbrl.py

securities/domain/service/xbrl.py(全とっかえ)
import datetime
import logging
import os
import shutil
from datetime import datetime
from pathlib import Path

import requests
from arelle import Cntlr
from django.utils import timezone

from lib.zipfileservice import ZipFileService
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData
from securities.models import ReportDocument, Company

SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1


class XbrlService:
    def __init__(self):
        self.repository = EdinetRepository()
        self.companies = {
            company.edinet_code: company for company in Company.objects.all()
        }

    def fetch_report_doc_list(self, request_data: RequestData) -> list[ReportDocument]:
        """
        Args:
            request_data: APIへのリクエスト条件

        Returns:
            list[ReportDocument]: A list of ReportDocument objects.
        """
        report_doc_list: list[ReportDocument] = []
        for day in request_data.day_list:
            url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
            params = {
                "date": day,
                "type": request_data.SECURITIES_REPORT_AND_META_DATA,
                "Subscription-Key": os.environ.get("EDINET_API_KEY"),
            }
            res = requests.get(url, params=params)
            res.raise_for_status()

            for item in res.json().get("results", []):
                submit_date_string = item.get("submitDateTime")
                if submit_date_string is None:
                    continue
                ordinance_code = item.get("ordinanceCode")
                form_code = item.get("formCode")
                if not (ordinance_code == "010" and form_code == "030000"):
                    continue
                submit_date_time = timezone.make_aware(
                    datetime.strptime(submit_date_string, "%Y-%m-%d %H:%M")
                )
                ope_date_time_string = item.get("opeDateTime")
                ope_date_time = (
                    timezone.make_aware(
                        datetime.strptime(ope_date_time_string, "%Y-%m-%d %H:%M")
                    )
                    if ope_date_time_string
                    else None
                )
                edinet_code = item.get("edinetCode")
                if edinet_code not in self.companies:
                    continue

                report_doc = ReportDocument(
                    seq_number=item.get("seqNumber"),
                    doc_id=item.get("docID"),
                    ordinance_code=ordinance_code,
                    form_code=form_code,
                    period_start=item.get("periodStart"),
                    period_end=item.get("periodEnd"),
                    submit_date_time=submit_date_time,
                    doc_description=item.get("docDescription"),
                    ope_date_time=ope_date_time,
                    withdrawal_status=item.get("withdrawalStatus"),
                    doc_info_edit_status=item.get("docInfoEditStatus"),
                    disclosure_status=item.get("disclosureStatus"),
                    xbrl_flag=bool(item.get("xbrlFlag")),
                    pdf_flag=bool(item.get("pdfFlag")),
                    english_doc_flag=bool(item.get("englishDocFlag")),
                    csv_flag=bool(item.get("csvFlag")),
                    legal_status=item.get("legalStatus"),
                    company=self.companies[edinet_code],
                )
                report_doc_list.append(report_doc)
                logging.info(f"{day}, {report_doc}")
        return report_doc_list

    @staticmethod
    def download_xbrl(report_doc: ReportDocument, work_dir: Path) -> None:
        """
        Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
        """
        logging.info(f"{report_doc.doc_id} をダウンロード...")
        url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{report_doc.doc_id}"
        params = {
            "type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
            "Subscription-Key": os.environ.get("EDINET_API_KEY"),
        }
        filename = work_dir / f"{report_doc.doc_id}.zip"
        res = requests.get(url, params=params, stream=True)
        res.raise_for_status()

        with open(filename, "wb") as file:
            for chunk in res.iter_content(chunk_size=1024):
                file.write(chunk)
        logging.info(f"{report_doc.doc_id} をダウンロード完了")

    @staticmethod
    def _assign_attributes(counting_data: CountingData, facts):
        target_keys = {
            "EDINETCodeDEI": "edinet_code",
            "FilerNameInJapaneseDEI": "filer_name_jp",
            "AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
            "AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
            "AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",  # noqa E501
            "AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
            "AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
            "NumberOfEmployees": "number_of_employees",
        }
        for fact in facts:
            key_to_set = target_keys.get(fact.concept.qname.localName)
            if key_to_set:
                setattr(counting_data, key_to_set, fact.value)
                if (
                    key_to_set == "number_of_employees"
                    and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
                ):
                    setattr(counting_data, "number_of_employees", None)
        return counting_data

    def make_counting_data(self, work_dir: Path) -> CountingData:
        temp_dir = Path(work_dir) / "temp"
        if not temp_dir.exists():
            temp_dir.mkdir(parents=True, exist_ok=True)

        ZipFileService.extract_zip_files(work_dir, temp_dir)
        xbrl_path = str(next(temp_dir.glob("XBRL/PublicDoc/*.xbrl")))

        ctrl = Cntlr.Cntlr()

        # TODO: VPSここでつっかえてるな...(質問中)
        model_xbrl = ctrl.modelManager.load(xbrl_path)
        # TODO: ここまでこれていない

        logging.info(f"  xbrl: {Path(xbrl_path).name}")
        counting_data = self._assign_attributes(
            counting_data=CountingData(), facts=model_xbrl.facts
        )
        shutil.rmtree(temp_dir)

        return counting_data
        

securities/domain/valueobject/edinet.py

securities/domain/valueobject/edinet.py(全とっかえ)
import datetime
from dataclasses import dataclass


@dataclass
class RequestData:
    SECURITIES_REPORT_AND_META_DATA = 2
    start_date: datetime.date
    end_date: datetime.date

    def __post_init__(self):
        if self.start_date > datetime.date.today():
            raise ValueError("start_date is in the future")
        if self.end_date > datetime.date.today():
            raise ValueError("end_date is in the future")
        if self.start_date > self.end_date:
            raise ValueError("start_date is later than end_date")

        self.doc_type = self.SECURITIES_REPORT_AND_META_DATA

        # Calculate day_list
        period = self.end_date - self.start_date
        self.day_list = []
        for d in range(int(period.days)):
            day = self.start_date + datetime.timedelta(days=d)
            self.day_list.append(day)
        self.day_list.append(self.end_date)


@dataclass
class CountingData:
    """
    CountingData

    計数データを表すクラス

    Attributes:
        edinet_code (str | None): The EDINET code of the entity.
        filer_name_jp (str | None): The name of the entity in Japanese.
        avg_salary (str | None): The average salary of the entity.
        avg_tenure_years (str | None): The average tenure of employees in years.
        avg_tenure_months (str | None): The average tenure of employees in months.
        avg_age_years (str | None): The average age of employees in years.
        avg_age_months (str | None): The average age of employees in months.
        number_of_employees (str | None): The number of employees in the entity.

    Properties:
        avg_tenure_years_combined (str | None): 従業員の合計平均勤続年数。
            self.avg_tenure_months が存在する場合、平均在職期間の小数部分が計算されます。
            self.avg_tenure_months を 12 で割って、avg_tenure_years に加算します。
            結合された平均在職期間値の文字列表現を返します。

        avg_age_years_combined (str | None): 従業員の平均年齢を合計した年数。
            self.avg_age_months が指定されている場合、平均年齢の小数部分が計算されます。
            self.avg_age_months を 12 で割って、avg_age_years に加算します。
            結合された平均年齢値の文字列表現を返します。
    """

    edinet_code: str | None = None
    filer_name_jp: str | None = None
    avg_salary: int = 0
    avg_tenure_years: int = 0
    avg_tenure_months: int = 0
    avg_age_years: int = 0
    avg_age_months: int = 0
    number_of_employees: int = 0

    @property
    def avg_tenure_years_combined(self) -> float:
        if self.avg_tenure_months:
            avg_tenure_decimal = round(int(self.avg_tenure_months) / 12, 1)
            return int(self.avg_tenure_years) + avg_tenure_decimal
        return self.avg_tenure_years

    @property
    def avg_age_years_combined(self) -> float:
        if self.avg_age_months:
            age_years_decimal = round(int(self.avg_age_months) / 12, 1)
            return int(self.avg_age_years) + age_years_decimal
        return self.avg_age_years
        

securities/management/commands/daily_download_edinet.py

  • 金融庁APIでzipをダウンロードするバッチを追加
securities/management/commands/daily_download_edinet.py
import logging
import os
import shutil
from pathlib import Path

from django.core.management.base import BaseCommand

from config import settings
from securities.domain.service.xbrl import XbrlService
from securities.models import ReportDocument, Company, Counting


class Command(BaseCommand):
    help = "Download edinet data"

    def handle(self, *args, **options):
        report_doc_list = ReportDocument.objects.filter(download_reserved=True)[:20]

        work_dir = Path(settings.MEDIA_ROOT) / "securities"
        if not work_dir.exists():
            work_dir.mkdir(parents=True, exist_ok=True)

        companies = Company.objects.all()
        company_mst = {c.edinet_code: c for c in companies}

        service = XbrlService()
        service.repository.delete_existing_records(report_doc_list)

        counting_list: list[Counting] = []
        for report_doc in report_doc_list:
            service.download_xbrl(report_doc=report_doc, work_dir=work_dir)
            counting_data = service.make_counting_data(work_dir=work_dir)
            counting = Counting(
                period_start=report_doc.period_start,
                period_end=report_doc.period_end,
                submit_date=report_doc.submit_date_time,
                avg_salary=counting_data.avg_salary,
                avg_tenure=counting_data.avg_tenure_years,
                avg_age=counting_data.avg_age_years_combined,
                number_of_employees=counting_data.number_of_employees,
                company=company_mst[report_doc.company.edinet_code],
            )
            counting_list.append(counting)
            os.remove(work_dir / f"{report_doc.doc_id}.zip")

        Counting.objects.bulk_create(counting_list)
        logging.info(f"計数データ作成完了: {len(report_doc_list)}")
        shutil.rmtree(work_dir)

        ReportDocument.objects.filter(
            id__in=[report_document.id for report_document in report_doc_list]
        ).update(download_reserved=False)

        self.stdout.write(self.style.SUCCESS("Successfully download edinet data"))
        

securities/models.py

  • 提出書類一覧APIで返ってくる顔ぶれデータをしまうためのテーブルを追加
securities/models.py(全とっかえ)
from django.db import models


class Company(models.Model):
    """
    提出書類一覧APIで返ってくる顔ぶれから書類を取得してできあがる、企業マスタ

    Attributes:
        edinet_code (CharField): The EDINET code of the company.
        type_of_submitter (CharField): The type of submitter of the company.
        listing_status (CharField): The listing status of the company.
        consolidated_status (CharField): The consolidated status of the company.
        capital (IntegerField): The capital of the company.
        end_fiscal_year (CharField): The end fiscal year of the company.
        submitter_name (CharField): The name of the submitter of the company.
        submitter_name_en (CharField): The name of the submitter in English.
        submitter_name_kana (CharField): The name of the submitter in Kana.
        address (CharField): The address of the company.
        submitter_industry (CharField): The industry of the submitter.
        securities_code (CharField): The securities code of the company.
        corporate_number (CharField): The corporate number of the submitter.
        created_at (DateTimeField): The timestamp when the company was created.
        updated_at (DateTimeField): The timestamp when the company was last updated.
    """

    edinet_code = models.CharField(
        verbose_name="EDINETコード", max_length=6, null=True
    )
    type_of_submitter = models.CharField(
        verbose_name="提出者種別", max_length=30, null=True
    )
    listing_status = models.CharField(verbose_name="上場区分", max_length=3, null=True)
    consolidated_status = models.CharField(
        verbose_name="連結の有無", max_length=1, null=True
    )
    capital = models.IntegerField(verbose_name="資本金", null=True)
    end_fiscal_year = models.CharField(verbose_name="決算日", max_length=6, null=True)
    submitter_name = models.CharField(
        verbose_name="提出者名", max_length=100, null=True
    )
    submitter_name_en = models.CharField(
        verbose_name="提出者名(英字)", max_length=100, null=True
    )
    submitter_name_kana = models.CharField(
        verbose_name="提出者名(ヨミ)", max_length=100, null=True
    )
    address = models.CharField(verbose_name="所在地", max_length=255, null=True)
    submitter_industry = models.CharField(
        verbose_name="提出者業種", max_length=25, null=True
    )
    securities_code = models.CharField(
        verbose_name="証券コード", max_length=5, null=True
    )
    corporate_number = models.CharField(
        verbose_name="提出者法人番号", max_length=13, null=True
    )
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)


class ReportDocument(models.Model):
    """
    提出書類一覧APIで返ってくる顔ぶれデータ

    Attributes:
        seq_number (models.SmallIntegerField): The sequence number of the document.
        doc_id (models.CharField): The document's management number.
        ordinance_code (models.CharField): The ordinance code of the document.
        form_code (models.CharField): The code representing the document's form.
        period_start (models.DateField): The start date of the period covered by the document.
        period_end (models.DateField): The end date of the period covered by the document.
        submit_date_time (models.DateTimeField): The date and time when the document was submitted.
        doc_description (models.CharField): A brief description of the document.
        ope_date_time (models.DateTimeField): The date and time when the document was operated (nullable).
        withdrawal_status (models.CharField): The withdrawal status of the document.
        doc_info_edit_status (models.CharField): The modification status of the document information.
        disclosure_status (models.CharField): The disclosure status of the document.
        xbrl_flag (models.BooleanField): A flag indicating whether the document has an XBRL file.
        pdf_flag (models.BooleanField): A flag indicating whether the document has a PDF file.
        english_doc_flag (models.BooleanField): A flag indicating whether the document has an English file.
        csv_flag (models.BooleanField): A flag indicating whether the document has a CSV file.
        legal_status (models.BooleanField): A flag indicating whether the document is vertical reading.
        download_reserved (models.BooleanField): ダウンロード予約済みかどうか
        created_at (models.DateTimeField): The date and time when the document was created.
        updated_at (models.DateTimeField): The date and time when the document was last updated.

        company (ForeignKey): A foreign key to the associated Company object.
    """

    seq_number = models.SmallIntegerField(verbose_name="連番")
    doc_id = models.CharField(verbose_name="書類管理番号", max_length=8)
    ordinance_code = models.CharField(verbose_name="府令コード", max_length=3)
    form_code = models.CharField(verbose_name="様式コード", max_length=6)
    period_start = models.DateField(verbose_name="期間(自)")
    period_end = models.DateField(verbose_name="期間(至)")
    submit_date_time = models.DateTimeField(verbose_name="提出日時")
    doc_description = models.CharField(verbose_name="提出書類概要", max_length=147)
    ope_date_time = models.DateTimeField(verbose_name="操作日時", null=True)
    withdrawal_status = models.CharField(verbose_name="取下区分", max_length=1)
    doc_info_edit_status = models.CharField(
        verbose_name="書類情報修正区分", max_length=1
    )
    disclosure_status = models.CharField(verbose_name="開示不開示区分", max_length=1)
    xbrl_flag = models.BooleanField(verbose_name="XBRL有無フラグ")
    pdf_flag = models.BooleanField(verbose_name="PDF有無フラグ")
    english_doc_flag = models.BooleanField(verbose_name="英文ファイル有無フラグ")
    csv_flag = models.BooleanField(verbose_name="CSV有無フラグ")
    legal_status = models.BooleanField(verbose_name="縦覧区分")
    download_reserved = models.BooleanField(default=False)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)

    def __str__(self):
        return f"{self.doc_id} - {self.company.edinet_code}"


class Counting(models.Model):
    """
    計数データ

    Attributes:
        period_start (DateField): The starting date of the period for which the data represents.
        period_end (DateField): The ending date of the period for which the data represents.
        submit_date (DateField): The date and time when the data was submitted.
        avg_salary (IntegerField): The average annual salary of the company's employees in Japanese yen.
        avg_tenure (FloatField): The average length of employment in years.
        avg_age (FloatField): The average age of the employees in years.
        number_of_employees (IntegerField): The total number of employees in the company.
        created_at (DateTimeField): The date and time when the object was created.
        updated_at (DateTimeField): The date and time when the object was last updated.

        company (ForeignKey): A foreign key to the associated Company object.
    """

    period_start = models.DateField(verbose_name="期間(自)", null=True)
    period_end = models.DateField(verbose_name="期間(至)", null=True)
    submit_date = models.DateField(verbose_name="提出日時")
    avg_salary = models.IntegerField(verbose_name="平均年間給与(円)", null=True)
    avg_tenure = models.FloatField(verbose_name="平均勤続年数(年)", null=True)
    avg_age = models.FloatField(verbose_name="平均年齢(歳)", null=True)
    number_of_employees = models.IntegerField(verbose_name="従業員数(人)", null=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)

    class Meta:
        unique_together = ["company", "submit_date"]
        

securities/templates/securities/base.html

  • カレンダーを追加するための設定
securities/templates/securities/base.html(追加)
    :
<!-- jQuery UI CSS for DatePicker -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.css"/>
<!-- jQuery UI for DatePicker -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>
<script>
    $(function () {
        $("#start_date").datepicker({
            dateFormat: "yy-mm-dd"
        });
        $("#end_date").datepicker({
            dateFormat: "yy-mm-dd"
        });
    });
</script>
    :

securities/templates/securities/report/index.html

securities/templates/securities/report/index.html(全とっかえ)
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
    <div class="jumbotron">
        <h1 class="display-4">Let's analyze Securities Report!</h1>
        <p class="lead">it's interesting Securities Report</p>
        <hr class="my-4">
        <p>You can read the Securities Report</p>
    </div>

    <div class="container">
        <h2>1. 会社マスタを作成</h2>
        <a class="btn btn-outline-primary mb-3" href="{% url 'securities:edinet_code_upload' %}"
           role="button">EDINETコードリスト取り込み</a>

        <h2 class="mt-5">2. 書類一覧を取得</h2>
        <form method="post" action="{% url 'securities:index' %}">
            {% csrf_token %}
            <div class="form-row">
                <div class="form-group col-md-6">
                    <label for="start_date">開始日:</label>
                    <input type="text" id="start_date" name="start_date" class="form-control"
                           value="{{ start_date|date:'Y-m-d' }}">
                    {% if form.start_date.errors %}
                        <div class="text-danger">{{ form.start_date.errors }}</div>
                    {% endif %}
                </div>
                <div class="form-group col-md-6">
                    <label for="end_date">終了日:</label>
                    <input type="text" id="end_date" name="end_date" class="form-control"
                           value="{{ end_date|date:'Y-m-d' }}">
                    {% if form.end_date.errors %}
                        <div class="text-danger">{{ form.end_date.errors }}</div>
                    {% endif %}
                </div>
            </div>
            <button type="submit" class="btn btn-primary">指定した期間の書類一覧を取得</button>
        </form>

        <h2 class="mt-5">3. 有報をダウンロード予約する</h2>
        <button id="submit-for-reserve" class="btn btn-outline-primary">ダウンロード予約する</button>
        <a class="btn btn-outline-info{% if request.GET.reserved == 'yes' %} active{% endif %}"
           href="{% url 'securities:index' %}{% if request.GET.reserved != 'yes' %}?reserved=yes{% endif %}">
            ダウンロード予約済みリスト
        </a>
        <table class="table table-striped table-bordered">
            <thead class="bg-primary text-white">
            <tr>
                <th scope="col"></th>
                <th scope="col">Doc ID</th>
                <th scope="col">EDINET Code</th>
                <th scope="col">Sec Code</th>
                <th scope="col">Corp Number</th>
                <th scope="col">Filer Name</th>
                <th scope="col">Period Start</th>
                <th scope="col">Period End</th>
                <th scope="col">Submit Date Time</th>
                <th scope="col">Doc Description</th>
                <th scope="col">XBRL Flag</th>
            </tr>
            </thead>
            <tbody>
            {% for report_document in object_list %}
                <tr>
                    <td>{% if request.GET.reserved != 'yes' %}
                        <input type="checkbox" value="{{ report_document.id }}" class="report-checkbox">{% endif %}
                    </td>
                    <td>{{ report_document.doc_id }}</td>
                    <td>{{ report_document.company.edinet_code }}</td>
                    <td>{{ report_document.company.securities_code }}</td>
                    <td>{{ report_document.company.corporate_number }}</td>
                    <td>{{ report_document.company.submitter_name }}</td>
                    <td>{{ report_document.period_start }}</td>
                    <td>{{ report_document.period_end }}</td>
                    <td>{{ report_document.submit_date_time }}</td>
                    <td>{{ report_document.doc_description }}</td>
                    <td>{{ report_document.xbrl_flag }}</td>
                </tr>
            {% empty %}
                <tr>
                    <td colspan="10">No documents available.</td>
                </tr>
            {% endfor %}
            </tbody>
        </table>
        <nav aria-label="Page navigation example">
            <ul class="pagination">
                {% if page_obj.has_previous %}
                    <li class="page-item"><a class="page-link" href="?page=1">First</a></li>
                    <li class="page-item"><a class="page-link"
                                             href="?page={{ page_obj.previous_page_number }}">Previous</a></li>
                {% else %}
                    <li class="page-item disabled"><a class="page-link" href="#">First</a></li>
                    <li class="page-item disabled"><a class="page-link" href="#">Previous</a></li>
                {% endif %}
                <li class="page-item active"><a class="page-link" href="#">{{ page_obj.number }}</a></li>
                {% if page_obj.has_next %}
                    <li class="page-item"><a class="page-link" href="?page={{ page_obj.next_page_number }}">Next</a>
                    </li>
                    <li class="page-item"><a class="page-link" href="?page={{ page_obj.paginator.num_pages }}">Last</a>
                    </li>
                {% else %}
                    <li class="page-item disabled"><a class="page-link" href="#">Next</a></li>
                    <li class="page-item disabled"><a class="page-link" href="#">Last</a></li>
                {% endif %}
            </ul>
        </nav>
    </div>
    <script>
        document.querySelector('#submit-for-reserve').addEventListener('click', function () {
            const ids = Array.from(document.querySelectorAll('.report-checkbox:checked')).map(box => box.value);

            fetch('{% url 'securities:download_reserve' %}', {
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                    'X-CSRFToken': '{{ csrf_token }}',
                },
                body: JSON.stringify(ids)
            }).then(function (response) {
                if (response.ok) {
                    return response.json();
                }
                throw new Error('Network response was not ok');
            }).then(function (json) {
                if (json.status === "success") {
                    location.reload();
                }
                console.log(json);
            });
        });
    </script>
{% endblock %}

securities/urls.py

securities/urls.py(追加)
    EdinetCodeUploadView,
    EdinetCodeUploadSuccessView,
+   DownloadReserveView,
)

app_name = "securities"
urlpatterns = [
    path("", IndexView.as_view(), name="index"),
+   path("download_reserve/", DownloadReserveView.as_view(), name="download_reserve"),
    path(
        "edinet_code_upload/upload",
        EdinetCodeUploadView.as_view(),

securities/views.py

  • doc_id list を取得して、zipでダウンロードという処理がひとかたまりになっていたものを分離
securities/views.py(全とっかえ)
import json
from datetime import datetime

from dateutil.relativedelta import relativedelta
from django.core.exceptions import ObjectDoesNotExist
from django.http import JsonResponse
from django.shortcuts import redirect
from django.urls import reverse_lazy
from django.utils.decorators import method_decorator
from django.utils.timezone import now
from django.views import View
from django.views.decorators.csrf import ensure_csrf_cookie
from django.views.generic import TemplateView, FormView, ListView

from securities.domain.service.upload import UploadService
from securities.domain.service.xbrl import XbrlService
from securities.domain.valueobject.edinet import RequestData
from securities.forms import UploadForm
from securities.models import ReportDocument, Company


class IndexView(ListView):
    template_name = "securities/report/index.html"
    model = ReportDocument
    paginate_by = 10

    def get_queryset(self):
        queryset = super().get_queryset()
        if self.request.GET.get("reserved") == "yes":
            queryset = queryset.filter(download_reserved=True)
        else:
            queryset = queryset.filter(download_reserved=False)
        queryset = queryset.order_by("doc_id")
        return queryset

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        current_time = now()
        context["start_date"] = current_time - relativedelta(months=2)  # 2 month ago
        context["end_date"] = current_time - relativedelta(days=1)  # yesterday

        return context

    @staticmethod
    def post(request, **kwargs):
        if not Company.objects.exists():
            return redirect("securities:index")

        ReportDocument.objects.all().delete()

        start_date_str = request.POST.get("start_date")
        end_date_str = request.POST.get("end_date")
        start_date = datetime.strptime(start_date_str, "%Y-%m-%d").date()
        end_date = datetime.strptime(end_date_str, "%Y-%m-%d").date()
        service = XbrlService()
        report_document_list = service.fetch_report_doc_list(
            RequestData(start_date=start_date, end_date=end_date)
        )
        ReportDocument.objects.bulk_create(report_document_list)
        return redirect("securities:index")


@method_decorator(ensure_csrf_cookie, name="dispatch")
class DownloadReserveView(View):
    @staticmethod
    def post(request):
        ids = json.loads(request.body)
        for identifier in ids:
            try:
                doc = ReportDocument.objects.get(pk=identifier)
                doc.download_reserved = True
                doc.save()
            except ObjectDoesNotExist:
                return JsonResponse(
                    {"error": f"No ReportDocument exists with ID {identifier}"},
                    status=400,
                )
        return JsonResponse({"status": "success"})


class EdinetCodeUploadView(FormView):
    template_name = "securities/edinet_code_upload/form.html"
    form_class = UploadForm
    success_url = reverse_lazy("securities:edinet_code_upload_success")

    def form_valid(self, form):
        service = UploadService(self.request)
        service.upload()
        return super().form_valid(form)


class EdinetCodeUploadSuccessView(TemplateView):
    template_name = "securities/edinet_code_upload/success.html"

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        # context["import_errors"] = SoilHardnessMeasurementImportErrors.objects.all()
        return context
        

TODO: ubuntuサーバーでarelleを回すとフリーズする、を質問中
(ubuntuサーバーのcronは止めている)

# crontab -e

25 18 * * * /var/www/html/venv/bin/python /var/www/html/portfolio/manage.py daily_download_edinet

可視化の部分のリファクタリング

10
7
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
10
7

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?