はじめに
僕のIT経歴の大半は金融業種です。ずーっと課題だよなぁって思ってたのが「プライベートの成果物がない」ということでした。あたりまえにNDA(秘密保持契約)を結ぶからね、持ち帰ることができないし、俺のソースです!って言うこともできない。大変だよね、家に帰ってからもプログラム作るなんてさ。で、それでも最近やっぱり動かないとって思って、何を作ろうかってずーっと考えていて、でも思いつかなくての繰り返しで、よしできることからやろう!ってことで、公開されてる企業の金融データを取り扱うところから始めてみようって思っていろいろたぐってたら、有価証券報告書を取得するみたいのがあって、あー有報なら取り扱ったことあるぞって思ったんだけどなんかRで書いてあって、で、よく見たらなんか垂れ幕下がっていて「そのままのコピペでは動かない筈です。改変箇所は考えてください」とか書いてあるんです。もうね、チッキショーって小梅太夫みたいに叫んで変な気起こしたわけです(いいカンフル剤って意味で)。
最初にこれを書いたときよりもだいぶスキルがついたのでAPI版として仕上げていく(原形をとどめないDDD)。自分のportfolioに「接ぎ木する感じ」で書いていくので、有価証券報告書の機能だけにすればもっとスリムにできるけどそこは読んでるひとたちで読み解いてね
参考
- RでXBRLデータを取得してみた
- 投資するための財務分析step1「財務情報XBRLを取得する」
- ElementTreeやlxmlで名前空間を含むXMLの要素を取得する
- @XBRLJapan さんのページ
この本を買おうとした
いつもの僕なら即ポチってるんだけど、ここでも変な気を起こして、本を見て眼の前のソースができあがるなら、このソースを「翻訳」したほうが経験値的にもお金のかからなさ的にも合理的だよなって思った。Rを読む力、Pythonに変換する力、得られたものは思っていた以上に大きかったな。いわゆる教科書的な本を読みながらお行儀の良いプログラムを作るよりも、誰かが作り捨てた(?)プログラムを翻訳することで得られる経験値はすごく味がある。
app 作成
portfolio> python manage.py startapp securities
INSTALLED_APPS = [
:
+ "securities",
]
urlpatterns = [
:
+ path("securities/", include("securities.urls")),
path("admin/", admin.site.urls),
path("accounts/", include("django.contrib.auth.urls")),
] + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)
from django.urls import path
from securities.views import IndexView
app_name = "securities"
urlpatterns = [
path("", IndexView.as_view(), name="index"),
]
from django.shortcuts import render
from django.views.generic import TemplateView
from securities.domain.service.xbrl import XbrlService
class IndexView(TemplateView):
template_name = "securities/report/index.html"
def get(self, request, *args, **kwargs):
xbrl_service = XbrlService()
return render(request, self.template_name, xbrl_service.to_dict())
class XbrlService():
def to_dict(self, **kwargs):
pass
{% load static %}
<!DOCTYPE html>
<html lang="ja">
<head>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-43097095-9"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'UA-43097095-9');
</script>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>有価証券報告書ビューア</title>
<!-- bootstrap and css -->
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/css/bootstrap.min.css"
integrity="sha384-GJzZqFGwb1QTTN6wy59ffF1BuGJpLSa9DkKMp0DgiMDm4iYMj70gZWKYbI706tWS" crossorigin="anonymous">
<link rel="stylesheet" href="{% static 'securities/css/base.css' %}">
<!-- favicon -->
<link rel="shortcut icon" href="{% static 'securities/s_s.ico' %}">
<!-- for ajax -->
<script>let myurl = {"base": "{% url 'vnm:index' %}", "login": "{% url 'login' %}"};</script>
</head>
<body>
<h1></h1>
<header>
<nav class="navbar fixed-top navbar-expand-lg navbar-light bg-light">
<a class="navbar-brand" href="#">Henojiya</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent"
aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
{% if user.is_authenticated %}
<li class="nav-link">{{ user.username }}さん</li>
{% else %}
<li class="nav-link">ゲストさん</li>
{% endif %}
{% if user.is_authenticated %}
<li class="nav-link"><a href="{% url 'logout' %}">LOGOUT</a></li>
{% else %}
<li class="nav-link"><a href="{% url 'login' %}">LOGIN</a></li>
{% endif %}
<select class="select2-1" onChange="location.href=value;">
<option></option>
<option value="{% url 'vnm:index' %}" selected>VIETNAM</option>
<option value="{% url 'mrk:index' %}">GMARKER</option>
<option value="{% url 'shp:index' %}">SHOPPING</option>
<option value="{% url 'war:index' %}">WAREHOUSE</option>
<option value="{% url 'txo:index' %}">TAXONOMY</option>
<option value="{% url 'soil:home' %}">SOIL ANALYSIS</option>
<option value="{% url 'securities:index' %}">SECURITIES REPORT</option>
</select>
</ul>
<form class="form-inline my-2 my-lg-0">
<input class="form-control mr-sm-2" type="search" placeholder="alt + / で検索" aria-label="Search"
accesskey="/">
<button class="btn btn-outline-success my-2 my-sm-0" type="submit">Search</button>
</form>
</div>
</nav>
{% block header %}{% endblock %}
</header>
<div id="main">
{% block content %}{% endblock %}
</div>
<footer>
<p>© 2019 henojiya. / <a href="https://github.com/duri0214" target="_blank">github portfolio</a></p>
</footer>
<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"
integrity="sha384-q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js"
integrity="sha384-wHAiFfRlMFy6i5SRaxvfOCifBUQy1xHdJ/yoi7FRNXMRBu5WHdZYu1hA6ZOblgut"
crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js"
integrity="sha384-B0UglyR+jN6CkvvICOB2joaf5I4l3gm9GU6Hc1og6Ls7i6U/mkkaduKaBhlAXv9k"
crossorigin="anonymous"></script>
<!-- for select2 -->
<link href="https://cdn.jsdelivr.net/npm/select2@4.1.0-rc.0/dist/css/select2.min.css" rel="stylesheet"/>
<script src="https://cdn.jsdelivr.net/npm/select2@4.1.0-rc.0/dist/js/select2.min.js"></script>
<script>
$(function () {
$('.select2-1').select2({
// コントロールのプレースホルダを指定します。
placeholder: 'Please Select',
});
});
</script>
<link rel="stylesheet" href="{% static 'securities/css/index_select2.css' %}">
</body>
</html>
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
<div class="jumbotron">
<h1 class="display-4">Let's analyze Securities Report!</h1>
<p class="lead">it's interesting Securities Report</p>
<hr class="my-4">
<p>You can read the Securities Report</p>
</div>
<div class="container">
<p>hello world</p>
</div>
{% endblock %}
body {
padding-top: 48px;
}
.footer {
position: sticky;
margin-top: 20px;
bottom: 0;
width: 100%;
/* Set the fixed height of the footer here */
height: 30px;
background-color: #f5f5f5;
}
body > .container {
padding: 60px 15px 0;
}
.footer > .container {
padding-right: 15px;
padding-left: 15px;
}
Company(企業マスタ)を作る
アップロードの仕組みを作る
Edinet有報検索画面 の みぎうえにあるリンクから EDINETコードリスト
を手でダウンロードする。それをアップロードするための仕組みを作ろう
import zipfile
from pathlib import Path
from django.conf import settings
from django.core.files.uploadedfile import InMemoryUploadedFile
class ZipFileService:
@staticmethod
def handle_uploaded_zip(file: InMemoryUploadedFile, app_name: str) -> Path:
"""
アップロードされたファイルを一時フォルダ media/{app_name} に保存
Args:
file: requestから受け取ったファイル
app_name: アプリ名
"""
# 解凍場所の用意
upload_folder = Path(settings.MEDIA_ROOT) / app_name
upload_folder.mkdir(parents=True, exist_ok=True)
# ファイルを保存
destination_zip_path = upload_folder / "uploaded.zip"
with destination_zip_path.open("wb+") as z:
for chunk in file.chunks():
z.write(chunk)
# ファイルを解凍
with zipfile.ZipFile(destination_zip_path) as z:
for info in z.infolist():
info.filename = ZipFileService._convert_to_cp932(info.filename)
z.extract(info, path=str(upload_folder))
return upload_folder
@staticmethod
def _convert_to_cp932(folder_name: str) -> str:
"""
WindowsでZipファイルを作成すると、文字化けが起こるので対応
See Also: https://qiita.com/tohka383/items/b72970b295cbc4baf5ab
"""
return folder_name.encode("cp437").decode("cp932")
import shutil
from pathlib import Path
from django.core.management import call_command
from lib.zipfileservice import ZipFileService
class UploadService:
def __init__(self, request):
self.request = request
self.app_name = request.resolver_match.app_name
def upload(self):
upload_folder = ZipFileService.handle_uploaded_zip(
self.request.FILES["file"], self.app_name
)
self.execute_command_and_cleanup(upload_folder)
@staticmethod
def execute_command_and_cleanup(upload_folder: Path):
if upload_folder.exists():
call_command("import_edinet_code", str(upload_folder))
shutil.rmtree(upload_folder)
from django.shortcuts import render
from django.urls import reverse_lazy
from django.views.generic import TemplateView, FormView
from securities.domain.service.upload import UploadService
from securities.forms import UploadForm
class IndexView(TemplateView):
template_name = "securities/report/index.html"
def get(self, request, *args, **kwargs):
# xbrl_service = XbrlService()
d = {} # xbrl_service.to_dict()
return render(request, self.template_name, d)
class EdinetCodeUploadView(FormView):
template_name = "securities/edinet_code_upload/form.html"
form_class = UploadForm
success_url = reverse_lazy("securities:edinet_code_upload_success")
def form_valid(self, form):
service = UploadService(self.request)
service.upload()
return super().form_valid(form)
class EdinetCodeUploadSuccessView(TemplateView):
template_name = "securities/edinet_code_upload/success.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
# context["import_errors"] = SoilHardnessMeasurementImportErrors.objects.all()
return context
from django.urls import path
from securities.views import (
IndexView,
EdinetCodeUploadView,
EdinetCodeUploadSuccessView,
)
app_name = "securities"
urlpatterns = [
path("", IndexView.as_view(), name="index"),
path(
"edinet_code_upload/upload",
EdinetCodeUploadView.as_view(),
name="edinet_code_upload",
),
path(
"edinet_code_upload/success",
EdinetCodeUploadSuccessView.as_view(),
name="edinet_code_upload_success",
),
]
from django import forms
from django.forms import ClearableFileInput
class UploadForm(forms.Form):
file = forms.FileField(widget=ClearableFileInput(attrs={"class": "form-control"}))
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
<div class="jumbotron">
<h1 class="display-4">Let's analyze Securities Report!</h1>
<p class="lead">it's interesting Securities Report</p>
<hr class="my-4">
<p>You can read the Securities Report</p>
</div>
<div class="container">
<p>hello world</p>
+ <a class="btn btn-outline-primary mb-3" href="{% url 'securities:edinet_code_upload' %}"
+ role="button">EDINETコードリスト取り込み</a>
</div>
{% endblock %}
{% extends "securities/base.html" %}
{% load static %}
{% block header %}
<nav style="--bs-breadcrumb-divider: '>';" aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="{% url 'securities:index' %}">Home</a></li>
<li class="breadcrumb-item active" aria-current="page">Upload EDINETコードリスト</li>
</ol>
</nav>
{% endblock %}
{% block content %}
<div class="container">
<h1>EDINETコードリストのアップロード</h1>
<p>EDINETコードリスト(Edinetcode_yyyymmdd.zip) を <a
href="https://disclosure2.edinet-fsa.go.jp/weee0010.aspx#TXT_TITLE_CODE"
target="_blank">ダウンロード</a>
してアップロードしてください</p>
<form method="post" enctype="multipart/form-data">
{% csrf_token %}
{{ form.as_p }}
<button class="btn btn-outline-primary mb-3" type="submit">Upload</button>
</form>
</div>
{% endblock %}
{% extends "securities/base.html" %}
{% load static %}
{% block header %}
<nav style="--bs-breadcrumb-divider: '>';" aria-label="breadcrumb">
<ol class="breadcrumb">
<li class="breadcrumb-item"><a href="{% url 'securities:index' %}">Home</a></li>
<li class="breadcrumb-item"><a href="{% url 'securities:edinet_code_upload' %}">Upload
EDINETコードリスト</a>
</li>
<li class="breadcrumb-item active" aria-current="page">Upload success</li>
</ol>
</nav>
{% endblock %}
{% block content %}
<div class="container">
<h1>Upload Successful</h1>
<p>アップロードが完了しました!</p>
</div>
{% endblock %}
確認
アップロードされたcsvを処理する
from django.db import models
class Company(models.Model):
edinet_code = models.CharField("EDINETコード", max_length=6, null=True)
type_of_submitter = models.CharField("提出者種別", max_length=30, null=True)
listing_status = models.CharField("上場区分", max_length=3, null=True)
consolidated_status = models.CharField("連結の有無", max_length=1, null=True)
capital = models.IntegerField("資本金", null=True)
end_fiscal_year = models.CharField("決算日", max_length=6, null=True)
submitter_name = models.CharField("提出者名", max_length=100, null=True)
submitter_name_en = models.CharField("提出者名(英字)", max_length=100, null=True)
submitter_name_kana = models.CharField(
"提出者名(ヨミ)", max_length=100, null=True
)
address = models.CharField("所在地", max_length=255, null=True)
submitter_industry = models.CharField("提出者業種", max_length=25, null=True)
securities_code = models.CharField("証券コード", max_length=5, null=True)
corporate_number = models.CharField("提出者法人番号", max_length=13, null=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
python manage.py makemigrations securities
python manage.py migrate
from pathlib import Path
import pandas as pd
from django.core.management.base import BaseCommand
from securities.models import Company
def na(value):
return value if pd.notna(value) else None
class Command(BaseCommand):
help = "Import edinet code upload from CSV"
def add_arguments(self, parser):
parser.add_argument(
"folder_path", type=str, help="Folder path containing CSV file"
)
def handle(self, *args, **options):
folder_path = options["folder_path"]
filename = "EdinetcodeDlInfo.csv"
file_path = Path(folder_path) / filename
if not file_path.exists():
raise FileNotFoundError(f"File does not exist: {file_path}")
Company.objects.all().delete()
# Note: 最初の行には `ダウンロード実行日...` のようなメタデータが入っているのでskip
df = pd.read_csv(
file_path,
skiprows=1,
encoding="cp932",
dtype={
"連結の有無": str,
"決算日": str,
"証券コード": str,
"提出者法人番号": str,
},
)
# 3行目以降のデータを保存
edinet_list = []
for _, row in df.iterrows():
edinet_list.append(
Company(
edinet_code=na(row["EDINETコード"]),
type_of_submitter=na(row["提出者種別"]),
listing_status=na(row["上場区分"]),
consolidated_status=na(row["連結の有無"]),
capital=(int(row["資本金"]) if pd.notna(row["資本金"]) else None),
end_fiscal_year=na(row["決算日"]),
submitter_name=na(row["提出者名"]),
submitter_name_en=na(row["提出者名(英字)"]),
submitter_name_kana=na(row["提出者名(ヨミ)"]),
address=na(row["所在地"]),
submitter_industry=na(row["提出者業種"]),
securities_code=na(row["証券コード"]),
corporate_number=na(row["提出者法人番号"]),
)
)
Company.objects.bulk_create(edinet_list)
self.stdout.write(
self.style.SUCCESS("Successfully imported all edinet code from CSV")
)
xbrlのダウンロード
- EDINET API(Version2)が2024年4月1日から利用が開始されています
- CSVファイルでの提供機能追加などで便利になりましたが、使い方はVesion1とほぼ同じです
- 認証登録が必要です
- start_dateは、プログラム実行日から5年前までの日付を指定することができます
- paramsで
"type": 2
を指定しているのは、有価証券報告書を示すため - resのresultsで提出書類一覧がリスト管理されているので、resultsを使ってループ処理を行います
- resultsの提出書類ごとに、ordinanceCode(府令コード)と form_code(様式コード)を取得します
- 有価証券報告書を対象とし
ordinanceCode
が010
、form_code
が030000
の提出書類のみ処理を行う
- 有価証券報告書を対象とし
- 有価証券報告書のvalueobjectは
securities_report
とする
APIキーを手に入れる
- chromeのポップアップホワイトリストに
https://api.edinet-fsa.go.jp
を追加 -
登録画面 にいく
- 今すぐサインアップ
- eメールアドレスを入力する
- 確認コードを入力
- パスワードを設定する(chromeのパスワードジェネレータで作成)
- 多要素認証するため電話番号を入力
- 確認コードを入力
- 新ウィンドウでポップアップがでる
- 所属、氏名、電話番号(ハイフンなし)を連絡先として save
- APIキーを
.env
に控える - ポップアップを閉じる
APIのエンドポイント
https://api.edinet-fsa.go.jp/api/v2/documents.json
例: https://api.edinet-fsa.go.jp/api/v2/documents.json?date=2023-04-01&type=2&Subscription-Key=ZZZ…ZZZ
リクエストパラメータ
書類一覧API
パラメータ名 | 項目名 | 必須 | 設定値 | 説明 |
---|---|---|---|---|
date | ファイル日付 | ○ | YYYY-MM-DD | 出力対象とする提出書類一覧のファイル日付を指定します(10年間 < x < 本日) |
type | 取得情報 | - | 1 or 2 | 1: メタデータのみ, 2: 提出書類一覧及びメタデータ |
SubscriptionKey | APIキー | ○ | API キー | EDINET API の認証に利用します |
書類取得API
パラメータ名 | 項目名 | 必須 | 設定値 | 説明 |
---|---|---|---|---|
type | 必要書類 | ○ | 1 or 2 or 3 or 4 or 5 | 1: 提出本文書、監査報告書およびxbrl, 2: PDF, 3: 代替書面・添付文書, 4: 英文ファイル, 5: CSV |
SubscriptionKey | APIキー | ○ | API キー | EDINET API の認証に利用します |
インターフェース仕様
マニュアルpdfの表をエクセルに持ってくると崩れるから手で整えてマークダウン化して地味につらかったわww
№ | 項目名(大) | 項目名(中) | 項目名(小) | 項目ID | 型 | 文字種(桁数) | 説明 |
---|---|---|---|---|---|---|---|
1 | メタデータ | metadata | object | -(-) | メタデータの識別子です。 | ||
2 | タイトル | title | string | 全半角(18) | APIの名称が出力されます。 | ||
3 | パラメータ | parameter | object | -(-) | リクエストパラメータの識別子です。 | ||
4 | ファイル日付 | date | string | 半角(10) | 指定したファイル日付が出力されます。YYYY-MM-DD形式 | ||
5 | 取得情報 | type | string | 半角(1) | 指定した取得情報が出力されます。 | ||
6 | 結果セット | resultset | object | -(-) | 結果セットの識別子です。 | ||
7 | 件数 | count | number | 半角(5以下) | 指定したファイル日付における提出書類一覧の更新時間が出力されます。 | ||
8 | 書類一覧更新日時 | processDateTime | string | 半角(16) | 提出書類一覧の内容に変更がない場合でも書類一覧更新日時は更新されます。YYYY-MM-DD hh:mm 形式 | ||
9 | ステータス | status | string | 半角(3) | 「3-3 ステータスコード」に記載のステータスが出力されます(リクエスト成功時は「200」)。 | ||
10 | メッセージ | message | string | 半角(21以下) | 「3-3 ステータスコード」に記載のメッセージが出力されます(リクエスト成功時は「OK」)。 | ||
11 | 提出書類一覧 | results | array | -(-) | 提出書類一覧の識別子です。 | ||
- | 提出書類(繰り返し) | - | object | -(-) | - | ||
12 | 連番 | seqNumber | number | 半角(5以下) | ファイル日付ごとの連番です。詳細は「注意の連番について」を参照してください。 | ||
13 | 書類管理番号(*1) | docID | string | 半角(8) | 書類管理番号が出力されます。 | ||
14 | 提出者 EDINET コード(*1)(*2) | edinetCode | string | 半角(6) | 提出者のEDINETコードが出力されます。 | ||
15 | 提出者証券コード(*2) | secCode | string | 半角(5) | 提出者の証券コードが出力されます。 | ||
16 | 提出者法人番号(*2) | JCN | string | 半角(13) | 提出者の法人番号が出力されます。 | ||
17 | 提出者名(*2) | filerName | string | 全角(128以下) | 提出者の名前が出力されます。 | ||
18 | ファンドコード(*1) | fundCode | string | 半角(6) | ファンドコードが出力されます。 | ||
19 | 府令コード(*1) | ordinanceCode | string | 半角(3) | 府令コードが出力されます。 | ||
20 | 様式コード(*1) | formCode | string | 半角(6) | 様式コードが出力されます。 | ||
21 | 書類種別コード(*1) | docTypeCode | string | 半角(3) | 書類種別コードが出力されます。 | ||
22 | 期間(自)(*3) | periodStart | string | 半角(10) | 期間(自)が出力されます。(YYYY-MM-DD形式) | ||
23 | 期間(至)(*3) | periodEnd | string | 半角(10) | 期間(至)が出力されます。(YYYY-MM-DD形式) | ||
24 | 提出日時 | submitDateTime | string | 半角(16) | 提出日時が出力されます。(YYYY-MM-DD hh:mm形式) | ||
25 | 提出書類概要 | docDescription | string | 全半角(147以下) | EDINETの閲覧サイトの書類検索結果画面において、「提出書類」欄に表示される文字列が出力されます。 | ||
26 | 発行会社EDINETコード(*1)(*2) | issuerEdinetCode | string | 半角(6) | 大量保有について発行会社のEDINETコードが出力されます。 | ||
27 | 対象EDINETコード(*1)(*2) | subjectEdinetCode | string | 半角(6) | 公開買付けについて対象となるEDINETコードが出力されます。 | ||
28 | 子会社EDINETコード(*1)(*2) | subsidiaryEdinetCode | string | 半角(69以下) | 子会社のEDINETコードが出力されます。複数存在する場合(最大10個)、","(カンマ)で結合した文字列が出力されます。 | ||
29 | 臨報提出事由(*4) | currentReportReason | string | 全半角(1000以下) | 臨時報告書の提出事由が出力されます。複数存在する場合、","(カンマ)で結合した文字列が出力されます。 | ||
30 | 親書類管理番号(*1) | parentDocID | string | 半角(8) | 親書類管理番号が出力されます。 | ||
31 | 操作日時 | opeDateTime | string | 半角(16) | 「3-1-6 財務局職員による書類情報修正」、「3-1-7 財務局職員による書類の不開示」、磁気ディスク提出及び紙面提出を行った日時が出力されます。(YYYY-MM-DD hh:mm形式) | ||
32 | 取下区分 | withdrawalStatus | string | 半角(1) | 取下書は"1"、取り下げられた書類は"2"、それ以外は"0"が出力されます。3-1-5 書類の取下げ | ||
33 | 書類情報修正区分 | docInfoEditStatus | string | 半角(1) | 財務局職員が書類を修正した情報は"1"、修正された書類は"2"、それ以外は"0"が出力されます。3-1-6 財務局職員による書類情報修正 | ||
34 | 開示不開示区分 | disclosureStatus | string | 半角(1) | 財務局職員によって書類の不開示を開始した情報は"1"、不開示とされている書類は"2"、財務局職員によって書類の不開示を解除した情報は3、それ以外は"0"が出力されます。3-1-7 財務局職員による書類の不開示 | ||
35 | XBRL有無フラグ | xbrlFlag | string | 半角(1) | 書類にXBRLがある場合は"1"、それ以外は"0"が出力されます。 | ||
36 | PDF有無フラグ(*5) | pdfFlag | string | 半角(1) | 書類にPDFがある場合は"1"、それ以外は"0"が出力されます。 | ||
37 | 代替書面・添付文書有無フラグ | attachDocFlag | string | 半角(1) | 書類に代替書面・添付文書がある場合は"1"、それ以外は"0"が出力されます。 | ||
38 | 英文ファイル有無フラグ | englishDocFlag | string | 半角(1) | 書類に英文ファイルがある場合は1、それ以外は"0"が出力されます。 | ||
39 | CSV有無フラグ | csvFlag | string | 半角(1) | 書類にCSVファイルがある場合は1、それ以外は"0"が出力されます。 | ||
40 | 縦覧区分 | legalStatus | string | 半角(1) | 1:縦覧中, 2:延長期間中(法定縦覧期間満了書類だが引き続き閲覧可能。), 0:閲覧期間満了(縦覧期間満了かつ延長期間なし、延長期間満了又は取下げにより閲覧できないもの。なお、不開示は含まない。)1-2-2 EDINET API で取得対象となるデータの範囲 |
ダウンロード処理
+ import datetime
+ import logging
+ import requests
:
+ from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
+ SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1
:
class XbrlService:
:
+ # ここから下を追加
@staticmethod
def _make_doc_id_list(request_data: RequestData) -> list[str]:
def _process_results_data(results: list) -> list[str]:
"""
有価証券報告書: ordinanceCode == "010" and formCode =="030000"
訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
"""
doc_id_list = []
for result in results:
if result.ordinance_code == "010" and result.form_code == "030000":
doc_id_list.append(result.doc_id)
return doc_id_list
securities_report_doc_list = []
for day in request_data.day_list:
url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
params = {
"date": day,
"type": request_data.SECURITIES_REPORT_AND_META_DATA,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
res = requests.get(url, params=params)
res.raise_for_status()
response_data = ResponseData(res.json())
securities_report_doc_list.extend(
_process_results_data(response_data.results)
)
return securities_report_doc_list
def _download_xbrl_in_zip(self, securities_report_doc_list):
"""
params.type:
1: 提出本文書、監査報告書およびxbrl
2: PDF
3: 代替書面・添付文書
4: 英文ファイル
5: CSV
"""
denominator = len(securities_report_doc_list)
for i, doc_id in enumerate(securities_report_doc_list):
logging.info(f"{doc_id}: {i + 1}/{denominator}")
url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{doc_id}"
params = {
"type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
filename = self.work_dir / f"{doc_id}.zip"
res = requests.get(url, params=params, stream=True)
if res.status_code == 200:
with open(filename, "wb") as file:
for chunk in res.iter_content(chunk_size=1024):
file.write(chunk)
def download_xbrl(self):
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
request_data = RequestData(
start_date=datetime.date(2023, 11, 1),
end_date=datetime.date(2023, 11, 9),
)
securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
logging.info(f"number of lists:{len(securities_report_doc_list)}")
logging.info(f"securities report doc list:{securities_report_doc_list}")
self._download_xbrl_in_zip(securities_report_doc_list)
print("download finish")
import datetime
from dataclasses import dataclass
@dataclass
class RequestData:
SECURITIES_REPORT_AND_META_DATA = 2
start_date: datetime.date
end_date: datetime.date
def __post_init__(self):
if self.start_date > datetime.date.today():
raise ValueError("start_date is in the future")
if self.end_date > datetime.date.today():
raise ValueError("end_date is in the future")
if self.start_date > self.end_date:
raise ValueError("start_date is later than end_date")
self.doc_type = self.SECURITIES_REPORT_AND_META_DATA
# Calculate day_list
period = self.end_date - self.start_date
self.day_list = []
for d in range(int(period.days)):
day = self.start_date + datetime.timedelta(days=d)
self.day_list.append(day)
self.day_list.append(self.end_date)
xbrlを処理してCSVに出力する
arelleインストールの補足
正式なarelleプロジェクトがpypiにリリースされています。
arelleのgithubからでもよいですが、公式のプロジェクトを利用するようにしましょう。
https://pypi.org/project/arelle-release/
タクソノミ要素リスト
target_keys
に指定するキー名は以下の「タクソノミ要素リスト」をダウンロードして、見たい計数を選択する
target_keys = {
"EDINETCodeDEI": "edinet_code",
"FilerNameInJapaneseDEI": "filer_name_jp",
"AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
"AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
"AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",
"AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
"AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
"NumberOfEmployees": "number_of_employees",
}
valueobject
import datetime
from dataclasses import dataclass
@dataclass
class RequestData:
SECURITIES_REPORT_AND_META_DATA = 2
start_date: datetime.date
end_date: datetime.date
def __post_init__(self):
if self.start_date > datetime.date.today():
raise ValueError("start_date is in the future")
if self.end_date > datetime.date.today():
raise ValueError("end_date is in the future")
if self.start_date > self.end_date:
raise ValueError("start_date is later than end_date")
self.doc_type = self.SECURITIES_REPORT_AND_META_DATA
# Calculate day_list
period = self.end_date - self.start_date
self.day_list = []
for d in range(int(period.days)):
day = self.start_date + datetime.timedelta(days=d)
self.day_list.append(day)
self.day_list.append(self.end_date)
class ResponseData:
class _Metadata:
class _Parameter:
def __init__(self, data: dict) -> None:
self.date = data.get("date")
self.type = data.get("type")
class _ResultSet:
def __init__(self, data: dict) -> None:
self.count = data.get("count")
def __init__(self, data: dict) -> None:
self.title = data.get("title")
self.parameter = self._Parameter(data.get("parameter"))
self.result_set = self._ResultSet(data.get("resultset"))
self.process_date_time = data.get("processDateTime")
self.status = data.get("status")
self.message = data.get("message")
class _Result:
def __init__(self, data):
self.seq_number = data.get("seqNumber")
self.doc_id = data.get("docID")
self.edinet_code = data.get("edinetCode")
self.sec_code = data.get("secCode")
self.jcn = data.get("JCN")
self.filer_name = data.get("filerName")
self.fund_code = data.get("fundCode")
self.ordinance_code = data.get("ordinanceCode")
self.form_code = data.get("formCode")
self.doc_type_code = data.get("docTypeCode")
self.period_start = data.get("periodStart")
self.period_end = data.get("periodEnd")
self.submit_date_time = data.get("submitDateTime")
self.doc_description = data.get("docDescription")
self.issuer_edinet_code = data.get("issuerEdinetCode")
self.subject_edinet_code = data.get("subjectEdinetCode")
self.subsidiary_edinet_code = data.get("subsidiaryEdinetCode")
self.current_report_reason = data.get("currentReportReason")
self.parent_doc_id = data.get("parentDocID")
self.ope_date_time = data.get("opeDateTime")
self.withdrawal_status = data.get("withdrawalStatus")
self.doc_info_edit_status = data.get("docInfoEditStatus")
self.disclosure_status = data.get("disclosureStatus")
self.xbrl_flag = data.get("xbrlFlag")
self.pdf_flag = data.get("pdfFlag")
self.attach_doc_flag = data.get("attachDocFlag")
self.english_doc_flag = data.get("englishDocFlag")
self.csv_flag = data.get("csvFlag")
self.legal_status = data.get("legalStatus")
def __init__(self, data):
self.metadata = self._Metadata(data.get("metadata"))
self.results = [self._Result(item) for item in data.get("results", [])]
@dataclass
class CountingData:
edinet_code: str | None = None
filer_name_jp: str | None = None
industry_name: str | None = None
avg_salary: str | None = None
avg_tenure_years: str | None = None
avg_tenure_months: str | None = None
avg_age_years: str | None = None
avg_age_months: str | None = None
number_of_employees: str | None = None
@property
def avg_tenure_years_combined(self) -> str | None:
if self.avg_tenure_months:
avg_tenure_decimal = round(int(self.avg_tenure_months) / 12, 1)
avg_tenure = int(self.avg_tenure_years) + avg_tenure_decimal
return str(avg_tenure)
return self.avg_tenure_years
@property
def avg_age_years_combined(self) -> str | None:
if self.avg_age_months:
age_years_decimal = round(int(self.avg_age_months) / 12, 1)
age_years = int(self.avg_age_years) + age_years_decimal
return str(age_years)
return self.avg_age_years
def to_list(self) -> list[str | None]:
return [
self.edinet_code,
self.filer_name_jp,
self.industry_name,
self.avg_salary,
self.avg_tenure_years_combined,
self.avg_age_years_combined,
self.number_of_employees,
]
repository
from securities.models import Company
class EdinetRepository:
@staticmethod
def get_industry_name(edinet_code: str) -> str | None:
try:
company = Company.objects.get(edinet_code=edinet_code)
return company.submitter_industry
except Company.DoesNotExist:
return None
service
import datetime
import logging
import os
import shutil
import zipfile
from pathlib import Path
import pandas as pd
import requests
from arelle import Cntlr
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1
class XbrlService:
def __init__(self, work_dir: Path):
self.work_dir = work_dir
self.temp_dir = self.work_dir / "temp"
self.repository = EdinetRepository()
@staticmethod
def _make_doc_id_list(request_data: RequestData) -> list[str]:
def _process_results_data(results: list) -> list[str]:
"""
有価証券報告書: ordinanceCode == "010" and formCode =="030000"
訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
"""
doc_id_list = []
for result in results:
if result.ordinance_code == "010" and result.form_code == "030000":
doc_id_list.append(result.doc_id)
return doc_id_list
securities_report_doc_list = []
for day in request_data.day_list:
url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
params = {
"date": day,
"type": request_data.SECURITIES_REPORT_AND_META_DATA,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
res = requests.get(url, params=params)
res.raise_for_status()
response_data = ResponseData(res.json())
securities_report_doc_list.extend(
_process_results_data(response_data.results)
)
return securities_report_doc_list
def _download_xbrl_in_zip(self, securities_report_doc_list):
"""
params.type:
1: 提出本文書、監査報告書およびxbrl
2: PDF
3: 代替書面・添付文書
4: 英文ファイル
5: CSV
"""
denominator = len(securities_report_doc_list)
for i, doc_id in enumerate(securities_report_doc_list):
logging.info(f"{doc_id}: {i + 1}/{denominator}")
url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{doc_id}"
params = {
"type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
filename = self.work_dir / f"{doc_id}.zip"
res = requests.get(url, params=params, stream=True)
if res.status_code == 200:
with open(filename, "wb") as file:
for chunk in res.iter_content(chunk_size=1024):
file.write(chunk)
def download_xbrl(self):
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
request_data = RequestData(
start_date=datetime.date(2023, 11, 1),
end_date=datetime.date(2023, 11, 9),
)
securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
logging.info(f"number of lists:{len(securities_report_doc_list)}")
logging.info(f"securities report doc list:{securities_report_doc_list}")
self._download_xbrl_in_zip(securities_report_doc_list)
logging.info("download finish")
def _unzip_files_and_extract_xbrl(self) -> list[str]:
"""
指定されたディレクトリ内のzipファイルを解凍し、指定したパターンに一致するXBRLファイルのリストを返します。
xbrlファイルは各zipファイルに1つ、存在するようだ
使用例:
>> obj = XbrlService()
>> result = obj.unzip_files_and_extract_xbrl('/path/to/zip/directory', '*.xbrl')
>> print(result)
['/path/to/extracted/file1.xbrl', '/path/to/extracted/file2.xbrl']
"""
zip_files = list(self.work_dir.glob("*.zip"))
logging.info(f"number of zip files: {len(zip_files)}")
for zip_file in zip_files:
with zipfile.ZipFile(str(zip_file), "r") as zipf:
zipf.extractall(str(self.temp_dir))
xbrl_files = list(self.work_dir.glob("**/XBRL/PublicDoc/*.xbrl"))
return [str(path) for path in xbrl_files]
def _assign_attributes(self, counting_data: CountingData, facts):
target_keys = {
"EDINETCodeDEI": "edinet_code",
"FilerNameInJapaneseDEI": "filer_name_jp",
"AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
"AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
"AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months",
"AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
"AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
"NumberOfEmployees": "number_of_employees",
}
for fact in facts:
key_to_set = target_keys.get(fact.concept.qname.localName)
if key_to_set:
setattr(counting_data, key_to_set, fact.value)
if key_to_set == "edinet_code":
counting_data.industry_name = self.repository.get_industry_name(
counting_data.edinet_code
)
elif (
key_to_set == "number_of_employees"
and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
):
setattr(counting_data, "number_of_employees", None)
return counting_data
def make_counting_data(self) -> list[CountingData]:
counting_list = []
for xbrl_path in self._unzip_files_and_extract_xbrl():
counting_data = CountingData()
ctrl = Cntlr.Cntlr()
model_xbrl = ctrl.modelManager.load(xbrl_path)
logging.info(f"{Path(xbrl_path).name}")
counting_data = self._assign_attributes(counting_data, model_xbrl.facts)
counting_list.append(counting_data)
shutil.rmtree(self.temp_dir)
return counting_list
def to_csv(self, data: list[CountingData], output_filename: str):
employee_frame = pd.DataFrame(
data=[x.to_list() for x in data],
columns=[
"EDINETCODE",
"企業名",
"業種",
"平均年間給与(円)",
"平均勤続年数(年)",
"平均年齢(歳)",
"従業員数(人)",
],
)
employee_frame.to_csv(str(self.work_dir / output_filename), encoding="cp932", index=False)
logging.info(f"{self.work_dir} に {output_filename} が出力されました")
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
service.download_xbrl()
service.to_csv(
data=service.make_counting_data(),
output_filename="output.csv",
)
確認
とりあえずコンソール実行で動くこと。動いたならあとはDjangoで動くように整える(アップロード処理と同じくバッチで呼ぶのがいいだろうな)
リファクタリング
Countingテーブルをつくって保存する
CSVだと取り回し悪いからね
:
class Counting(models.Model):
company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
avg_salary = models.IntegerField("平均年間給与(円)", null=True)
avg_tenure = models.FloatField("平均勤続年数(年)", null=True)
avg_age = models.FloatField("平均年齢(歳)", null=True)
number_of_employees = models.IntegerField("従業員数(人)", null=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
@dataclass
class CountingData:
:
def to_entity(self, company_master: dict[str, Company]) -> Counting:
return Counting(
company=company_master[self.edinet_code],
avg_salary=self.avg_salary,
avg_tenure=self.avg_tenure_years_combined,
avg_age=self.avg_age_years_combined,
number_of_employees=self.number_of_employees,
)
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
service.download_xbrl()
- service.to_csv(
- data=service.make_counting_data(),
- output_filename="output.csv",
- )
+ company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
+ Counting.objects.bulk_create(
+ [x.to_entity(company_mst) for x in service.make_counting_data()]
+ )
Countingモデルに日付をつける
いまのままだといつのデータなのかわからない。_make_doc_id_list
が list[str]
を返しているところを改造して list[ResponseData]
を返せば日付が手に入るはずだ。思っていた以上の大改修になってしまった..
class Counting(models.Model):
company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
+ period_start = models.DateField("期間(自)", null=True)
+ period_end = models.DateField("期間(至)", null=True)
+ submit_date = models.DateField("提出日時")
avg_salary = models.IntegerField("平均年間給与(円)", null=True)
:
@dataclass
class CountingData:
:
- def to_entity(self, company_master: dict[str, Company]) -> Counting:
+ def to_entity(
+ self, doc_attr_dict: dict[str, ResponseData], company_master: dict[str, Company]
+ ) -> Counting:
+ response_data = doc_attr_dict[self.edinet_code]
+ submit_date = datetime.datetime.strptime(
+ response_data.results[0].submit_date_time, "%Y-%m-%d %H:%M"
+ )
return Counting(
company=company_master[self.edinet_code],
+ period_start=response_data.results[0].period_start,
+ period_end=response_data.results[0].period_end,
+ submit_date=submit_date,
avg_salary=self.avg_salary,
avg_tenure=self.avg_tenure_years_combined,
avg_age=self.avg_age_years_combined,
number_of_employees=self.number_of_employees,
)
class XbrlService:
:
- def _make_doc_id_list(request_data: RequestData) -> list[str]:
- def _process_results_data(results: list) -> list[str]:
- """
- 有価証券報告書: ordinanceCode == "010" and formCode =="030000"
- 訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
- """
- doc_id_list = []
- for result in results:
- if result.ordinance_code == "010" and result.form_code == "030000":
- doc_id_list.append(result.doc_id)
- return doc_id_list
-
- securities_report_doc_list = []
+ def _extract(request_data: RequestData) -> list[ResponseData]:
+ """
+ 特定の提出書類をもつ ResponseData を抽出する
+ 有価証券報告書: ordinanceCode == "010" and formCode =="030000"
+ 訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
+ """
+ securities_report_list = []
for day in request_data.day_list:
url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
params = {
"date": day,
"type": request_data.SECURITIES_REPORT_AND_META_DATA,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
res = requests.get(url, params=params)
res.raise_for_status()
response_data = ResponseData(res.json())
- securities_report_doc_list.extend(
- _process_results_data(response_data.results)
- )
- return securities_report_doc_list
+ for result in response_data.results:
+ if result.ordinance_code == "010" and result.form_code == "030000":
+ logging.info(
+ f"{day}, {result.filer_name}, edinet_code: {result.edinet_code}, doc_id: {result.doc_id}"
+ )
+ response_data.results = [result]
+ securities_report_list.append(response_data)
+ return securities_report_list
- def _download_xbrl_in_zip(self, securities_report_doc_list):
+ def _download_xbrl_in_zip(self, securities_report_list: list[ResponseData]):
"""
params.type:
1: 提出本文書、監査報告書およびxbrl
2: PDF
3: 代替書面・添付文書
4: 英文ファイル
5: CSV
"""
- denominator = len(securities_report_doc_list)
- for i, doc_id in enumerate(securities_report_doc_list):
+ denominator = len(securities_report_list)
+ for i, securities_report in enumerate(securities_report_list):
+ doc_id = securities_report.results[0].doc_id
logging.info(f"{doc_id}: {i + 1}/{denominator}")
:
- def download_xbrl(self):
+ def download_xbrl(self) -> dict[str, ResponseData]:
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
request_data = RequestData(
start_date=datetime.date(2023, 11, 1),
end_date=datetime.date(2023, 11, 9),
)
- securities_report_doc_list = list(set(self._make_doc_id_list(request_data)))
- logging.info(f"number of lists:{len(securities_report_doc_list)}")
- logging.info(f"securities report doc list:{securities_report_doc_list}")
-
- self._download_xbrl_in_zip(securities_report_doc_list)
+ securities_report_list = self._extract(request_data)
+ self._download_xbrl_in_zip(securities_report_list)
logging.info("download finish")
+
+ securities_report_dict = {}
+ for x in securities_report_list:
+ securities_report_dict[x.results[0].edinet_code] = x
+
+ return securities_report_dict
:
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
- service.download_xbrl()
+ doc_attr_dict = service.download_xbrl()
company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
Counting.objects.bulk_create(
- [x.to_entity(company_mst) for x in service.make_counting_data()]
+ [x.to_entity(doc_attr_dict, company_mst) for x in service.make_counting_data()]
)
+ logging.info("bulk_create finish")
repositoryに引っ越して company と submitDate(提出日時)で delete してから bulk_insert
リランすると無限にinsertできちゃうからね...
class Counting(models.Model):
:
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
+ class Meta:
+ unique_together = ["company", "submit_date"]
:
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
- from securities.models import Company, Counting
:
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
doc_attr_dict = service.download_xbrl()
- company_mst = {entity.edinet_code: entity for entity in Company.objects.all()}
- Counting.objects.bulk_create(
- [x.to_entity(doc_attr_dict, company_mst) for x in service.make_counting_data()]
- )
+ service.repository.delete_existing_records(list(doc_attr_dict.values()))
+ service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
logging.info("bulk_create finish")
+ from datetime import datetime
+ from securities.domain.valueobject.edinet import ResponseData, CountingData
- from securities.models import Company
+ from securities.models import Company, Counting
class EdinetRepository:
:
+ ここから下を追加
@staticmethod
def delete_existing_records(response_data_list: list[ResponseData]):
edinet_codes = [data.results[0].edinet_code for data in response_data_list]
edinet_code_to_company = {
company.edinet_code: company
for company in Company.objects.filter(edinet_code__in=edinet_codes)
}
for data in response_data_list:
edinet_code = data.results[0].edinet_code
submit_date = datetime.strptime(
data.results[0].submit_date_time, "%Y-%m-%d %H:%M"
)
company = edinet_code_to_company[edinet_code]
Counting.objects.filter(company=company, submit_date=submit_date).delete()
@staticmethod
def bulk_insert(
doc_attr_dict: dict[str, ResponseData], counting_data_list: list[CountingData]
):
edinet_codes = [data.results[0].edinet_code for data in doc_attr_dict.values()]
edinet_code_to_company = {
company.edinet_code: company
for company in Company.objects.filter(edinet_code__in=edinet_codes)
}
insert_objects = [
x.to_entity(
doc_attr_dict,
edinet_code_to_company,
)
for x in counting_data_list
]
Counting.objects.bulk_create(insert_objects)
業種はCompanyマスタから引く
当初のCSV用の処理だと業種を引く必要があったけど、企業マスタがあるからね
@dataclass
class CountingData:
edinet_code: str | None = None
filer_name_jp: str | None = None
- industry_name: str | None = None
avg_salary: str | None = None
avg_tenure_years: str | None = None
avg_tenure_months: str | None = None
avg_age_years: str | None = None
avg_age_months: str | None = None
number_of_employees: str | None = None
:
def to_list(self) -> list[str | None]:
return [
self.edinet_code,
self.filer_name_jp,
- self.industry_name,
self.avg_salary,
self.avg_tenure_years_combined,
self.avg_age_years_combined,
self.number_of_employees,
]
class EdinetRepository:
- @staticmethod
- def get_industry_name(edinet_code: str) -> str | None:
- try:
- company = Company.objects.get(edinet_code=edinet_code)
- return company.submitter_industry
- except Company.DoesNotExist:
- return None
@staticmethod
def delete_existing_records(response_data_list: list[ResponseData]):
:
:
from arelle import Cntlr
+ from django.core.exceptions import ObjectDoesNotExist
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData, ResponseData
+ from securities.models import Company
:
- def _assign_attributes(self, counting_data: CountingData, facts):
+ @staticmethod
+ def _assign_attributes(counting_data: CountingData, facts):
target_keys = {
"EDINETCodeDEI": "edinet_code",
"FilerNameInJapaneseDEI": "filer_name_jp",
"AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
"AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
"AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months", # noqa E501
"AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
"AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
"NumberOfEmployees": "number_of_employees",
}
for fact in facts:
key_to_set = target_keys.get(fact.concept.qname.localName)
if key_to_set:
setattr(counting_data, key_to_set, fact.value)
- if key_to_set == "edinet_code":
- counting_data.industry_name = self.repository.get_industry_name(
- counting_data.edinet_code
- )
- elif (
if (
key_to_set == "number_of_employees"
and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
):
setattr(counting_data, "number_of_employees", None)
return counting_data
:
def to_csv(self, data: list[CountingData], output_filename: str):
+ all_companies = Company.objects.all()
+ new_data = []
+ for x in data:
+ try:
+ # If matching Company object is found, insert industry name to list
+ company = all_companies.get(edinet_code=x.edinet_code)
+ data_list = x.to_list()
+ data_list.insert(2, company.submitter_industry)
+ except ObjectDoesNotExist:
+ # If no matching Company object is found, insert None
+ data_list = x.to_list()
+ data_list.insert(2, None)
+ new_data.append(data_list)
+
employee_frame = pd.DataFrame(
- data=[x.to_list() for x in data],
+ data=new_data,
columns=[
"EDINETCODE",
"企業名",
"業種",
"平均年間給与(円)",
"平均勤続年数(年)",
"平均年齢(歳)",
"従業員数(人)",
],
)
employee_frame.to_csv(str(self.work_dir / output_filename), encoding="cp932")
logging.info(f"{self.work_dir} に {output_filename} が出力されました")
if __name__ == "__main__":
:
+ 点検だけして消す
+ service.to_csv(
+ data=service.make_counting_data(),
+ output_filename="output.csv",
+ )
unzip処理をlibに移管
:
import shutil
- import zipfile
from pathlib import Path
import pandas as pd
import requests
from arelle import Cntlr
from django.core.exceptions import ObjectDoesNotExist
+ from lib.zipfileservice import ZipFileService
from securities.domain.repository.edinet import EdinetRepository
:
class XbrlService:
:
def _unzip_files_and_extract_xbrl(self) -> list[str]:
"""
指定されたディレクトリ内のzipファイルを解凍し、指定したパターンに一致するXBRLファイルのリストを返します。
xbrlファイルは各zipファイルに1つ、存在するようだ
- 使用例:
- >> obj = XbrlService()
- >> result = obj.unzip_files_and_extract_xbrl('/path/to/zip/directory', '*.xbrl')
- >> print(result)
+ Returns:
['/path/to/extracted/file1.xbrl', '/path/to/extracted/file2.xbrl']
"""
-
- zip_files = list(self.work_dir.glob("*.zip"))
- logging.info(f"number of zip files: {len(zip_files)}")
- for zip_file in zip_files:
- with zipfile.ZipFile(str(zip_file), "r") as zipf:
- zipf.extractall(str(self.temp_dir))
+ ZipFileService.extract_zip_files(self.work_dir, self.temp_dir)
- xbrl_files = list(self.work_dir.glob("**/XBRL/PublicDoc/*.xbrl"))
+ xbrl_files = list(self.temp_dir.glob("XBRL/PublicDoc/*.xbrl"))
return [str(path) for path in xbrl_files]
:
class ZipFileService:
:
+ @staticmethod
+ def extract_zip_files(source_dir: Path, target_dir: Path):
+ """
+ ソースディレクトリからのすべてのzipファイルをターゲットディレクトリに解凍します。
+ """
+ source_dir_path = Path(source_dir)
+ target_dir_path = Path(target_dir)
+ target_dir_path.mkdir(parents=True, exist_ok=True)
+
+ zip_files = source_dir_path.glob("*.zip")
+ for zip_file in zip_files:
+ with zipfile.ZipFile(str(zip_file), "r") as zipf:
+ zipf.extractall(str(target_dir_path))
:
doc_id の重複を排除する
class XbrlService:
def __init__(self, work_dir: Path):
self.work_dir = work_dir
self.temp_dir = self.work_dir / "temp"
self.repository = EdinetRepository()
@staticmethod
def _extract(request_data: RequestData) -> list[ResponseData]:
"""
- 特定の提出書類をもつ ResponseData を抽出する
+ 特定の提出書類をもつ ResponseData を抽出する(重複した doc_id は除外される)
有価証券報告書: ordinanceCode == "010" and formCode =="030000"
訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
"""
- securities_report_list = []
+ securities_report_dict = {}
for day in request_data.day_list:
url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
params = {
"date": day,
"type": request_data.SECURITIES_REPORT_AND_META_DATA,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
res = requests.get(url, params=params)
res.raise_for_status()
response_data = ResponseData(res.json())
for result in response_data.results:
if result.ordinance_code == "010" and result.form_code == "030000":
logging.info(
- f"{day}, {result.filer_name}, edinet_code: {result.edinet_code}, doc_id: {result.doc_id}"
+ f"{day}, "
+ f"edinet_code: {result.edinet_code}, "
+ f"doc_id: {result.doc_id}, "
+ f"期間(自): {response_data.results[0].period_start}, "
+ f"期間(至): {response_data.results[0].period_end}, "
+ f"{result.filer_name}, "
)
response_data.results = [result]
- securities_report_list.append(response_data)
- return securities_report_list
+ if (
+ response_data.results
+ and response_data.results[0].doc_id not in securities_report_dict
+ ):
+ securities_report_dict[response_data.results[0].doc_id] = response_data
+ return list(securities_report_dict.values())
:
def download_xbrl(self) -> dict[str, ResponseData]:
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
request_data = RequestData(
start_date=datetime.date(2023, 11, 1),
- end_date=datetime.date(2023, 11, 9),
+ end_date=datetime.date(2023, 11, 29),
)
securities_report_list = self._extract(request_data)
self._download_xbrl_in_zip(securities_report_list)
logging.info("download finish")
securities_report_dict = {}
for x in securities_report_list:
securities_report_dict[x.results[0].edinet_code] = x
return securities_report_dict
日付の入力をおおそとに持ってくる
ロジックの奥に日付をリテラルしてもしかたがない
class XbrlService:
:
- def download_xbrl(self) -> dict[str, ResponseData]:
+ def download_xbrl(self, request_data: RequestData) -> dict[str, ResponseData]:
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
- request_data = RequestData(
- start_date=datetime.date(2023, 11, 1),
- end_date=datetime.date(2023, 11, 29),
- )
securities_report_list = self._extract(request_data)
self._download_xbrl_in_zip(securities_report_list)
logging.info("download finish")
securities_report_dict = {}
for x in securities_report_list:
securities_report_dict[x.results[0].edinet_code] = x
return securities_report_dict
:
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
- doc_attr_dict = service.download_xbrl()
+ doc_attr_dict = service.download_xbrl(
+ RequestData(
+ start_date=datetime.date(2023, 11, 1),
+ end_date=datetime.date(2023, 11, 29),
+ )
+ )
service.repository.delete_existing_records(list(doc_attr_dict.values()))
service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
logging.info("bulk_create finish")
1年分の大量データをまわしてみて気付いたところを修正
-
_extruct
でフィルタした顔ぶれをダウンロード - ダウンロードしたフォルダのzipファイルをすべて処理していく
なので、doc_attr_dict
と counting_data_dict
は顔ぶれも一致するはずなんだが、なぜかキーエラーが発生するということがわかった。結構調査したけど無理だった。まあ4レコード分なので問題はなかろう
:
jpsps100000-ssr-001_G13178-000_2023-06-06_01_2023-09-06.xbrl
jpsps100000-ssr-001_G14453-000_2022-08-16_01_2022-11-16.xbrl
E22688 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
E37857 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
E11541 の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
None の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました
bulk_create finish
+ import logging
:
@staticmethod
def bulk_insert(
- doc_attr_dict: dict[str, ResponseData], counting_data_list: list[CountingData]
+ doc_attr_dict: dict[str, ResponseData], counting_data_dict: dict[str, CountingData],
):
+ # Delete the CountingData instances whose 'edinet_code' is not in 'doc_attr_dict'
+ for edinet_code in list(counting_data_dict.keys()):
+ if edinet_code not in doc_attr_dict:
+ del counting_data_dict[edinet_code]
+ logging.warning(
+ f"{edinet_code} の CountingData が doc_attr_dict に見つからなかったので CountingData から削除しました"
+ )
+
edinet_codes = [data.results[0].edinet_code for data in doc_attr_dict.values()]
edinet_code_to_company = {
company.edinet_code: company
for company in Company.objects.filter(edinet_code__in=edinet_codes)
}
insert_objects = [
x.to_entity(
doc_attr_dict,
edinet_code_to_company,
)
- for x in counting_data_list
+ for x in counting_data_dict.values()
]
Counting.objects.bulk_create(insert_objects)
class XbrlService:
def __init__(self, work_dir: Path):
self.work_dir = work_dir
+ if not self.work_dir.exists():
+ self.work_dir.mkdir(parents=True, exist_ok=True)
self.temp_dir = self.work_dir / "temp"
self.repository = EdinetRepository()
@staticmethod
def _extract(request_data: RequestData) -> list[ResponseData]:
"""
特定の提出書類をもつ ResponseData を抽出する(重複した doc_id は除外される)
有価証券報告書: ordinanceCode == "010" and formCode =="030000"
訂正有価証券報告書: ordinanceCode == "010" and formCode =="030001"
"""
:
for result in response_data.results:
if result.ordinance_code == "010" and result.form_code == "030000":
- logging.info(
- f"{day}, "
- f"edinet_code: {result.edinet_code}, "
- f"doc_id: {result.doc_id}, "
- f"期間(自): {response_data.results[0].period_start}, "
- f"期間(至): {response_data.results[0].period_end}, "
- f"{result.filer_name}, "
- )
+ logging.info(f"{day}, {result}")
response_data.results = [result]
if (
response_data.results
+ and response_data.results[0].submit_date_time
and response_data.results[0].doc_id not in securities_report_dict
):
securities_report_dict[response_data.results[0].doc_id] = response_data
return list(securities_report_dict.values())
:
- def make_counting_data(self) -> list[CountingData]:
- counting_list = []
+ def make_counting_data(self) -> dict[str, CountingData]:
+ counting_data_dict = {}
for xbrl_path in self._unzip_files_and_extract_xbrl():
counting_data = CountingData()
ctrl = Cntlr.Cntlr()
model_xbrl = ctrl.modelManager.load(xbrl_path)
logging.info(f"{Path(xbrl_path).name}")
counting_data = self._assign_attributes(counting_data, model_xbrl.facts)
- counting_list.append(counting_data)
+ counting_data_dict[counting_data.edinet_code] = counting_data
shutil.rmtree(self.temp_dir)
- return counting_list
+ return counting_data_dict
:
if __name__ == "__main__":
# 前提条件: EDINETコードリストのアップロード
home_dir = os.path.expanduser("~")
service = XbrlService(work_dir=Path(home_dir, "Downloads/xbrlReport"))
doc_attr_dict = service.download_xbrl(
RequestData(
- start_date=datetime.date(2023, 11, 1),
- end_date=datetime.date(2023, 11, 29),
+ start_date=datetime.date(2022, 11, 1),
+ end_date=datetime.date(2023, 10, 31),
)
)
service.repository.delete_existing_records(list(doc_attr_dict.values()))
service.repository.bulk_insert(doc_attr_dict, service.make_counting_data())
logging.info("bulk_create finish")
@dataclass
class RequestData:
:
class _Result:
def __init__(self, data):
:
- self.xbrl_flag = data.get("xbrlFlag")
- self.pdf_flag = data.get("pdfFlag")
- self.attach_doc_flag = data.get("attachDocFlag")
- self.english_doc_flag = data.get("englishDocFlag")
- self.csv_flag = data.get("csvFlag")
+ self.xbrl_flag: bool = bool(data.get("xbrlFlag"))
+ self.pdf_flag: bool = bool(data.get("pdfFlag"))
+ self.attach_doc_flag: bool = bool(data.get("attachDocFlag"))
+ self.english_doc_flag: bool = bool(data.get("englishDocFlag"))
+ self.csv_flag: bool = bool(data.get("csvFlag"))
self.legal_status = data.get("legalStatus")
+
+ def __str__(self):
+ return (
+ f"Seq Number: {self.seq_number}, "
+ f"Doc ID: {self.doc_id}, "
+ f"Edinet Code: {self.edinet_code}, "
+ f"Sec Code: {self.sec_code}, "
+ f"JCN: {self.jcn}, "
+ f"Filer Name: {self.filer_name}, "
+ f"Fund Code: {self.fund_code}, "
+ f"Ordinance Code: {self.ordinance_code}, "
+ f"Form Code: {self.form_code}, "
+ f"Doc Type Code: {self.doc_type_code}, "
+ f"Period Start: {self.period_start}, "
+ f"Period End: {self.period_end}, "
+ f"Submit Date Time: {self.submit_date_time}, "
+ f"Doc Description: {self.doc_description}, "
+ f"Issuer Edinet Code: {self.issuer_edinet_code}, "
+ f"Subject Edinet Code: {self.subject_edinet_code}, "
+ f"Subsidiary Edinet Code: {self.subsidiary_edinet_code}, "
+ f"Current Report Reason: {self.current_report_reason}, "
+ f"Parent Doc ID: {self.parent_doc_id}, "
+ f"Ope Date Time: {self.ope_date_time}, "
+ f"Withdrawal Status: {self.withdrawal_status}, "
+ f"Doc Info Edit Status: {self.doc_info_edit_status}, "
+ f"Disclosure Status: {self.disclosure_status}, "
+ f"Xbrl Flag: {self.xbrl_flag}, "
+ f"PDF Flag: {self.pdf_flag}, "
+ f"Attach Doc Flag: {self.attach_doc_flag}, "
+ f"English Doc Flag: {self.english_doc_flag}, "
+ f"CSV Flag: {self.csv_flag}, "
+ f"Legal Status: {self.legal_status}"
+ )
XBRLデータを可視化する
repository
from django.db.models import Q, QuerySet, F
from securities.domain.valueobject.plot import RequestData
from securities.models import Counting
class PlotRepository:
@staticmethod
def get_period_data(request_data: RequestData) -> QuerySet:
return (
Counting.objects.select_related("company")
.filter(
Q(submit_date__gte=request_data.start_date)
& Q(submit_date__lte=request_data.end_date)
)
.annotate(submitter_industry=F("company__submitter_industry"))
)
@staticmethod
def get_period_data_for_specific_industry(
request_data: RequestData, industry: str
) -> QuerySet:
return (
Counting.objects.select_related("company")
.filter(
Q(submit_date__gte=request_data.start_date)
& Q(submit_date__lte=request_data.end_date)
& Q(company__submitter_industry=industry)
)
.annotate(submitter_name=F("company__submitter_name"))
)
valueobject
import datetime
from dataclasses import dataclass
@dataclass
class RequestData:
start_date: datetime.date
end_date: datetime.date
def __post_init__(self):
if self.start_date > datetime.date.today():
raise ValueError("start_date is in the future")
if self.end_date > datetime.date.today():
raise ValueError("end_date is in the future")
if self.start_date > self.end_date:
raise ValueError("start_date is later than end_date")
@dataclass
class PlotParams:
"""
Attributes:
x (str): プロットのx軸のラベル
title (str | None): グラフタイトルとファイル名に使用される
Notes: 使用するグラフが横軸なので single版(i.e. this) は x という名前になるので縦にするときは注意
"""
x: str
title: str | None
@dataclass
class PlotParamsForKDE(PlotParams):
"""
Attributes:
y (str): プロットのy軸のラベル
color (str): KDEプロットに使用する色
"""
y: str
color: str
service
import datetime
import os
from abc import abstractmethod, ABC
from pathlib import Path
import pandas as pd
import seaborn
from django.db.models import QuerySet
from matplotlib import pyplot as plt, ticker
from securities.domain.repository.plot import PlotRepository
from securities.domain.valueobject.plot import (
RequestData,
PlotParams,
PlotParamsForKDE,
)
COLUMN_COMPANY_NAME = "submitter_name"
COLUMN_INDUSTRY = "submitter_industry"
COLUMN_AVG_SALARY = "avg_salary"
COLUMN_AVG_TENURE = "avg_tenure"
COLUMN_AVG_AGE = "avg_age"
COMMON_FONT = ["IPAexGothic"]
class PlotServiceBase(ABC):
def __init__(
self,
work_dir: Path,
target_period: RequestData,
display_title: bool,
categorical_column: str = None,
):
"""
Args:
work_dir: 処理対象のフォルダ
target_period: 期間
display_title: グラフタイトル表示可否
categorical_column: カテゴリカルラベルを作るための列
"""
plt.rcParams["font.family"] = COMMON_FONT
self.work_dir = work_dir
if not self.work_dir.exists():
self.work_dir.mkdir(parents=True, exist_ok=True)
self._repository = PlotRepository()
self.categorical_column = categorical_column
self.clean_data = self._clean(self._get_target_data(target_period))
self.display_title = display_title
if self.categorical_column:
self.categorical_labels_dict = self._get_labels_sorted_by_averages(
self.clean_data
)
@abstractmethod
def _get_target_data(self, target_period: RequestData) -> QuerySet:
raise NotImplementedError
@abstractmethod
def _clean(self, query: QuerySet) -> pd.DataFrame:
raise NotImplementedError
def _get_labels_sorted_by_averages(
self, clean_data: pd.DataFrame
) -> dict[str, list[str]]:
"""
業種別平均でソートしたラベルを 3種類 取得する\n
Returns: ['不動産業', 'サービス業', '情報・通信業', '水産・農林業', ... ]
"""
def _sort_labels_by_column_average(
_data: pd.DataFrame, sort_on: str
) -> list[str]:
sorted_df = (
_data.groupby([self.categorical_column], as_index=False)
.mean()
.sort_values(sort_on)
)
return sorted_df[self.categorical_column].tolist()
return {
COLUMN_AVG_SALARY: _sort_labels_by_column_average(
clean_data, sort_on=COLUMN_AVG_SALARY
),
COLUMN_AVG_TENURE: _sort_labels_by_column_average(
clean_data, sort_on=COLUMN_AVG_TENURE
),
COLUMN_AVG_AGE: _sort_labels_by_column_average(
clean_data, sort_on=COLUMN_AVG_AGE
),
}
def plot_all(self, plot_params_list: list[PlotParams | PlotParamsForKDE]):
for plot_params in plot_params_list:
self._plot(plot_params=plot_params)
@abstractmethod
def _plot(self, plot_params: PlotParams | PlotParamsForKDE):
raise NotImplementedError
@staticmethod
def _configure_plot():
# Note: gca() は "get current axes" を意味する
ax = plt.gca()
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
ax.yaxis.set_ticks_position("left")
ax.xaxis.set_ticks_position("bottom")
@abstractmethod
def save(self, title: str):
raise NotImplementedError
class BoxenPlotService(PlotServiceBase):
def __init__(
self, work_dir: Path, target_period: RequestData, categorical_column: str
):
super().__init__(
work_dir=work_dir,
target_period=target_period,
display_title=True,
categorical_column=categorical_column,
)
def _get_target_data(self, target_period):
return self._repository.get_period_data(target_period)
def _clean(self, query: QuerySet) -> pd.DataFrame:
return pd.DataFrame(
list(
query.values(
self.categorical_column,
COLUMN_AVG_SALARY,
COLUMN_AVG_TENURE,
COLUMN_AVG_AGE,
)
)
).dropna()
def _plot(self, plot_params: PlotParams):
plt.figure(figsize=(15, 10))
seaborn.stripplot(
x=plot_params.x,
y=self.categorical_column,
orient="h",
data=self.clean_data,
size=3,
edgecolor="auto",
order=self.categorical_labels_dict[plot_params.x],
)
ax = seaborn.boxenplot(
x=plot_params.x,
y=self.categorical_column,
hue=self.categorical_column, # TODO: hueをつけないことがdeprecatedだが、hueをつけると色合いがおかしくなる
orient="h",
data=self.clean_data,
palette="rainbow",
order=self.categorical_labels_dict[plot_params.x],
)
ax.grid(which="major", color="lightgray", ls=":", alpha=0.5)
ax.xaxis.set_minor_locator(ticker.AutoMinorLocator())
plt.xlabel(plot_params.x, fontsize=18)
plt.ylabel(self.categorical_column, fontsize=16)
if self.display_title:
plt.title(plot_params.title, fontsize=24)
self._configure_plot()
self.save(plot_params.title)
# plt.show()
def save(self, title: str):
plt.savefig(self.work_dir / f"boxen_plot_{title}.png")
class BarPlotService(PlotServiceBase):
def __init__(
self, work_dir: Path, target_period: RequestData, categorical_column: str
):
super().__init__(
work_dir=work_dir,
target_period=target_period,
display_title=True,
categorical_column=categorical_column,
)
def _get_target_data(self, target_period):
# TODO: 業種はとりあえず db値"情報・通信業" で固定している(Qiita準拠)
return self._repository.get_period_data_for_specific_industry(
target_period, "情報・通信業"
)
def _clean(self, query: QuerySet) -> pd.DataFrame:
return pd.DataFrame(
list(
query.values(
self.categorical_column,
COLUMN_AVG_SALARY,
COLUMN_AVG_TENURE,
COLUMN_AVG_AGE,
)
)
).dropna()
def _plot(self, plot_params: PlotParams):
# COLUMN_AVG_SALARY が最も高い上位50の行
df_sort_by_salary = self.clean_data.sort_values(COLUMN_AVG_SALARY)[-50:]
df_info_label_list_sort_by_salary = df_sort_by_salary[
self.categorical_column
].tolist()
plt.figure(figsize=(15, 12))
ax = seaborn.barplot(
x=self.categorical_column,
y=plot_params.x,
hue=self.categorical_column, # TODO: hueをつけないことがdeprecatedだが、hueをつけると色合いがおかしくなる
data=self.clean_data,
palette="rocket",
order=df_info_label_list_sort_by_salary,
)
seaborn.set(style="ticks")
plt.xticks(rotation=90)
plt.subplots_adjust(hspace=0.8, bottom=0.35)
ax.grid(which="major", axis="y", color="lightgray", ls=":", alpha=0.5)
ax.yaxis.set_major_formatter(
plt.FuncFormatter(lambda x, loc: "{:,}".format(int(x)))
)
plt.xlabel(self.categorical_column, fontsize=12)
plt.ylabel(COLUMN_AVG_SALARY, fontsize=18)
if self.display_title:
plt.title(plot_params.title, fontsize=24)
self._configure_plot()
self.save(plot_params.title)
# plt.show()
def save(self, title: str):
plt.savefig(self.work_dir / f"bar_plot_{title}.png")
class KernelDensityEstimationPlotService(PlotServiceBase):
def __init__(self, work_dir: Path, target_period: RequestData):
super().__init__(
work_dir=work_dir,
target_period=target_period,
display_title=False,
)
def _get_target_data(self, target_period: RequestData) -> QuerySet:
return self._repository.get_period_data(target_period)
def _clean(self, query: QuerySet) -> pd.DataFrame:
return pd.DataFrame(
list(
query.values(
COLUMN_AVG_SALARY,
COLUMN_AVG_TENURE,
COLUMN_AVG_AGE,
)
)
).dropna()
def _plot(self, plot_params: PlotParamsForKDE):
seaborn.jointplot(
x=plot_params.x,
y=plot_params.y,
data=self.clean_data,
kind="kde",
color=plot_params.color,
)
if self.display_title:
plt.title(plot_params.title, fontsize=24)
self._configure_plot()
self.save(plot_params.title)
# plt.show()
def save(self, title: str):
plt.savefig(self.work_dir / f"kernel_density_estimation_plot_{title}.png")
if __name__ == "__main__":
home_dir = os.path.expanduser("~")
period = RequestData(
start_date=datetime.date(2022, 11, 1),
end_date=datetime.date(2023, 10, 31),
)
# plot1: 箱ひげ図
service = BoxenPlotService(
work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
target_period=period,
categorical_column=COLUMN_INDUSTRY,
)
service.plot_all(
[
PlotParams(x=COLUMN_AVG_SALARY, title="業種別平均年間給与額"),
PlotParams(x=COLUMN_AVG_TENURE, title="業種別平均勤続年数"),
PlotParams(x=COLUMN_AVG_AGE, title="業種別平均年齢"),
]
)
# plot2: 棒グラフ
service = BarPlotService(
work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
target_period=period,
categorical_column=COLUMN_COMPANY_NAME,
)
service.plot_all(
[PlotParams(x=COLUMN_AVG_SALARY, title="情報・通信業界_平均年間給与TOP50")]
)
# plot3: カーネル密度推定
service = KernelDensityEstimationPlotService(
work_dir=Path(home_dir, "Downloads/xbrlReport/plot"),
target_period=period,
)
service.plot_all(
[
PlotParamsForKDE(
x=COLUMN_AVG_TENURE,
y=COLUMN_AVG_SALARY,
color="#d9f2f8",
title="平均勤続年数x平均年間給与",
),
PlotParamsForKDE(
x=COLUMN_AVG_AGE,
y=COLUMN_AVG_SALARY,
color="#fac8be",
title="平均年齢x平均年間給与",
),
PlotParamsForKDE(
x=COLUMN_AVG_AGE,
y=COLUMN_AVG_TENURE,
color="#008000",
title="平均年齢x平均勤続年数",
),
]
)
print("visualize finish")
確認
なんかもとの記事のグラフと比べて鮮やかさ?が足りないんだけど
色がループしてるような?有報のデータ数が足りないのか?seabornの設定が違うのか?
まぁいまはdddに作り変えて表示できれば御の字だ
my output | guide |
---|---|
フロントから実行できるようにするためのリファクタリング
まぁ、ここまでできれば十分ではあるんだけど、いまは開発環境で「再生」を押すことが前提となっている。
- いまは期間を指定して有報をバッチ処理するような作りになっているが、このまま公開して不特定多数にたくさんAPIを動かされると面倒なので、
DatePicker
を追加して処理可能銘柄の一覧を表示し、チェックボックスで予約し、日次バッチで実行するようにする - 処理可能銘柄の一覧はページネーションをつける
いちおうセルフPullRequestを作ってから記事に起こすんだけど、また大手術だね
lib/zipfileservice.py
- target_dir が存在しなかったら作る、みたいな処理だったけどなかったら例外で返すようにした
class ZipFileService:
:
@staticmethod
def extract_zip_files(source_dir: Path, target_dir: Path):
"""
ソースディレクトリからのすべてのzipファイルをターゲットディレクトリに解凍します。
"""
source_dir_path = Path(source_dir)
target_dir_path = Path(target_dir)
# Check if the source directory exists
if not source_dir_path.exists():
raise FileNotFoundError(f"The source directory {source_dir} does not exist")
# Check if the target directory exists
if not target_dir_path.exists():
raise FileNotFoundError(f"The target directory {target_dir} does not exist")
zip_files = source_dir_path.glob("*.zip")
for zip_file in zip_files:
with zipfile.ZipFile(str(zip_file), "r") as zip_f:
zip_f.extractall(str(target_dir_path))
:
securities/domain/repository/edinet.py
from django.db.models import Q
from securities.models import Company, Counting, ReportDocument
class EdinetRepository:
@staticmethod
def delete_existing_records(report_doc_list: list[ReportDocument]) -> None:
delete_conditions = Q()
for report_doc in report_doc_list:
company = Company.objects.get(edinet_code=report_doc.company.edinet_code)
delete_conditions |= Q(
company=company, submit_date=report_doc.submit_date_time
)
Counting.objects.filter(delete_conditions).delete()
securities/domain/service/xbrl.py
import datetime
import logging
import os
import shutil
from datetime import datetime
from pathlib import Path
import requests
from arelle import Cntlr
from django.utils import timezone
from lib.zipfileservice import ZipFileService
from securities.domain.repository.edinet import EdinetRepository
from securities.domain.valueobject.edinet import CountingData, RequestData
from securities.models import ReportDocument, Company
SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT = 1
class XbrlService:
def __init__(self):
self.repository = EdinetRepository()
self.companies = {
company.edinet_code: company for company in Company.objects.all()
}
def fetch_report_doc_list(self, request_data: RequestData) -> list[ReportDocument]:
"""
Args:
request_data: APIへのリクエスト条件
Returns:
list[ReportDocument]: A list of ReportDocument objects.
"""
report_doc_list: list[ReportDocument] = []
for day in request_data.day_list:
url = "https://api.edinet-fsa.go.jp/api/v2/documents.json"
params = {
"date": day,
"type": request_data.SECURITIES_REPORT_AND_META_DATA,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
res = requests.get(url, params=params)
res.raise_for_status()
for item in res.json().get("results", []):
submit_date_string = item.get("submitDateTime")
if submit_date_string is None:
continue
ordinance_code = item.get("ordinanceCode")
form_code = item.get("formCode")
if not (ordinance_code == "010" and form_code == "030000"):
continue
submit_date_time = timezone.make_aware(
datetime.strptime(submit_date_string, "%Y-%m-%d %H:%M")
)
ope_date_time_string = item.get("opeDateTime")
ope_date_time = (
timezone.make_aware(
datetime.strptime(ope_date_time_string, "%Y-%m-%d %H:%M")
)
if ope_date_time_string
else None
)
edinet_code = item.get("edinetCode")
if edinet_code not in self.companies:
continue
report_doc = ReportDocument(
seq_number=item.get("seqNumber"),
doc_id=item.get("docID"),
ordinance_code=ordinance_code,
form_code=form_code,
period_start=item.get("periodStart"),
period_end=item.get("periodEnd"),
submit_date_time=submit_date_time,
doc_description=item.get("docDescription"),
ope_date_time=ope_date_time,
withdrawal_status=item.get("withdrawalStatus"),
doc_info_edit_status=item.get("docInfoEditStatus"),
disclosure_status=item.get("disclosureStatus"),
xbrl_flag=bool(item.get("xbrlFlag")),
pdf_flag=bool(item.get("pdfFlag")),
english_doc_flag=bool(item.get("englishDocFlag")),
csv_flag=bool(item.get("csvFlag")),
legal_status=item.get("legalStatus"),
company=self.companies[edinet_code],
)
report_doc_list.append(report_doc)
logging.info(f"{day}, {report_doc}")
return report_doc_list
@staticmethod
def download_xbrl(report_doc: ReportDocument, work_dir: Path) -> None:
"""
Notes: 有価証券報告書の提出期限は原則として決算日から3ヵ月以内(3月末決算の企業であれば、同年6月中)
"""
logging.info(f"{report_doc.doc_id} をダウンロード...")
url = f"https://api.edinet-fsa.go.jp/api/v2/documents/{report_doc.doc_id}"
params = {
"type": SUBMITTED_MAIN_DOCUMENTS_AND_AUDIT_REPORT,
"Subscription-Key": os.environ.get("EDINET_API_KEY"),
}
filename = work_dir / f"{report_doc.doc_id}.zip"
res = requests.get(url, params=params, stream=True)
res.raise_for_status()
with open(filename, "wb") as file:
for chunk in res.iter_content(chunk_size=1024):
file.write(chunk)
logging.info(f"{report_doc.doc_id} をダウンロード完了")
@staticmethod
def _assign_attributes(counting_data: CountingData, facts):
target_keys = {
"EDINETCodeDEI": "edinet_code",
"FilerNameInJapaneseDEI": "filer_name_jp",
"AverageAnnualSalaryInformationAboutReportingCompanyInformationAboutEmployees": "avg_salary",
"AverageLengthOfServiceYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_years",
"AverageLengthOfServiceMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_tenure_months", # noqa E501
"AverageAgeYearsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_years",
"AverageAgeMonthsInformationAboutReportingCompanyInformationAboutEmployees": "avg_age_months",
"NumberOfEmployees": "number_of_employees",
}
for fact in facts:
key_to_set = target_keys.get(fact.concept.qname.localName)
if key_to_set:
setattr(counting_data, key_to_set, fact.value)
if (
key_to_set == "number_of_employees"
and fact.contextID != "CurrentYearInstant_NonConsolidatedMember"
):
setattr(counting_data, "number_of_employees", None)
return counting_data
def make_counting_data(self, work_dir: Path) -> CountingData:
temp_dir = Path(work_dir) / "temp"
if not temp_dir.exists():
temp_dir.mkdir(parents=True, exist_ok=True)
ZipFileService.extract_zip_files(work_dir, temp_dir)
xbrl_path = str(next(temp_dir.glob("XBRL/PublicDoc/*.xbrl")))
ctrl = Cntlr.Cntlr()
# TODO: VPSここでつっかえてるな...(質問中)
model_xbrl = ctrl.modelManager.load(xbrl_path)
# TODO: ここまでこれていない
logging.info(f" xbrl: {Path(xbrl_path).name}")
counting_data = self._assign_attributes(
counting_data=CountingData(), facts=model_xbrl.facts
)
shutil.rmtree(temp_dir)
return counting_data
securities/domain/valueobject/edinet.py
import datetime
from dataclasses import dataclass
@dataclass
class RequestData:
SECURITIES_REPORT_AND_META_DATA = 2
start_date: datetime.date
end_date: datetime.date
def __post_init__(self):
if self.start_date > datetime.date.today():
raise ValueError("start_date is in the future")
if self.end_date > datetime.date.today():
raise ValueError("end_date is in the future")
if self.start_date > self.end_date:
raise ValueError("start_date is later than end_date")
self.doc_type = self.SECURITIES_REPORT_AND_META_DATA
# Calculate day_list
period = self.end_date - self.start_date
self.day_list = []
for d in range(int(period.days)):
day = self.start_date + datetime.timedelta(days=d)
self.day_list.append(day)
self.day_list.append(self.end_date)
@dataclass
class CountingData:
"""
CountingData
計数データを表すクラス
Attributes:
edinet_code (str | None): The EDINET code of the entity.
filer_name_jp (str | None): The name of the entity in Japanese.
avg_salary (str | None): The average salary of the entity.
avg_tenure_years (str | None): The average tenure of employees in years.
avg_tenure_months (str | None): The average tenure of employees in months.
avg_age_years (str | None): The average age of employees in years.
avg_age_months (str | None): The average age of employees in months.
number_of_employees (str | None): The number of employees in the entity.
Properties:
avg_tenure_years_combined (str | None): 従業員の合計平均勤続年数。
self.avg_tenure_months が存在する場合、平均在職期間の小数部分が計算されます。
self.avg_tenure_months を 12 で割って、avg_tenure_years に加算します。
結合された平均在職期間値の文字列表現を返します。
avg_age_years_combined (str | None): 従業員の平均年齢を合計した年数。
self.avg_age_months が指定されている場合、平均年齢の小数部分が計算されます。
self.avg_age_months を 12 で割って、avg_age_years に加算します。
結合された平均年齢値の文字列表現を返します。
"""
edinet_code: str | None = None
filer_name_jp: str | None = None
avg_salary: int = 0
avg_tenure_years: int = 0
avg_tenure_months: int = 0
avg_age_years: int = 0
avg_age_months: int = 0
number_of_employees: int = 0
@property
def avg_tenure_years_combined(self) -> float:
if self.avg_tenure_months:
avg_tenure_decimal = round(int(self.avg_tenure_months) / 12, 1)
return int(self.avg_tenure_years) + avg_tenure_decimal
return self.avg_tenure_years
@property
def avg_age_years_combined(self) -> float:
if self.avg_age_months:
age_years_decimal = round(int(self.avg_age_months) / 12, 1)
return int(self.avg_age_years) + age_years_decimal
return self.avg_age_years
securities/management/commands/daily_download_edinet.py
- 金融庁APIでzipをダウンロードするバッチを追加
import logging
import os
import shutil
from pathlib import Path
from django.core.management.base import BaseCommand
from config import settings
from securities.domain.service.xbrl import XbrlService
from securities.models import ReportDocument, Company, Counting
class Command(BaseCommand):
help = "Download edinet data"
def handle(self, *args, **options):
report_doc_list = ReportDocument.objects.filter(download_reserved=True)[:20]
work_dir = Path(settings.MEDIA_ROOT) / "securities"
if not work_dir.exists():
work_dir.mkdir(parents=True, exist_ok=True)
companies = Company.objects.all()
company_mst = {c.edinet_code: c for c in companies}
service = XbrlService()
service.repository.delete_existing_records(report_doc_list)
counting_list: list[Counting] = []
for report_doc in report_doc_list:
service.download_xbrl(report_doc=report_doc, work_dir=work_dir)
counting_data = service.make_counting_data(work_dir=work_dir)
counting = Counting(
period_start=report_doc.period_start,
period_end=report_doc.period_end,
submit_date=report_doc.submit_date_time,
avg_salary=counting_data.avg_salary,
avg_tenure=counting_data.avg_tenure_years,
avg_age=counting_data.avg_age_years_combined,
number_of_employees=counting_data.number_of_employees,
company=company_mst[report_doc.company.edinet_code],
)
counting_list.append(counting)
os.remove(work_dir / f"{report_doc.doc_id}.zip")
Counting.objects.bulk_create(counting_list)
logging.info(f"計数データ作成完了: {len(report_doc_list)}")
shutil.rmtree(work_dir)
ReportDocument.objects.filter(
id__in=[report_document.id for report_document in report_doc_list]
).update(download_reserved=False)
self.stdout.write(self.style.SUCCESS("Successfully download edinet data"))
securities/models.py
- 提出書類一覧APIで返ってくる顔ぶれデータをしまうためのテーブルを追加
from django.db import models
class Company(models.Model):
"""
提出書類一覧APIで返ってくる顔ぶれから書類を取得してできあがる、企業マスタ
Attributes:
edinet_code (CharField): The EDINET code of the company.
type_of_submitter (CharField): The type of submitter of the company.
listing_status (CharField): The listing status of the company.
consolidated_status (CharField): The consolidated status of the company.
capital (IntegerField): The capital of the company.
end_fiscal_year (CharField): The end fiscal year of the company.
submitter_name (CharField): The name of the submitter of the company.
submitter_name_en (CharField): The name of the submitter in English.
submitter_name_kana (CharField): The name of the submitter in Kana.
address (CharField): The address of the company.
submitter_industry (CharField): The industry of the submitter.
securities_code (CharField): The securities code of the company.
corporate_number (CharField): The corporate number of the submitter.
created_at (DateTimeField): The timestamp when the company was created.
updated_at (DateTimeField): The timestamp when the company was last updated.
"""
edinet_code = models.CharField(
verbose_name="EDINETコード", max_length=6, null=True
)
type_of_submitter = models.CharField(
verbose_name="提出者種別", max_length=30, null=True
)
listing_status = models.CharField(verbose_name="上場区分", max_length=3, null=True)
consolidated_status = models.CharField(
verbose_name="連結の有無", max_length=1, null=True
)
capital = models.IntegerField(verbose_name="資本金", null=True)
end_fiscal_year = models.CharField(verbose_name="決算日", max_length=6, null=True)
submitter_name = models.CharField(
verbose_name="提出者名", max_length=100, null=True
)
submitter_name_en = models.CharField(
verbose_name="提出者名(英字)", max_length=100, null=True
)
submitter_name_kana = models.CharField(
verbose_name="提出者名(ヨミ)", max_length=100, null=True
)
address = models.CharField(verbose_name="所在地", max_length=255, null=True)
submitter_industry = models.CharField(
verbose_name="提出者業種", max_length=25, null=True
)
securities_code = models.CharField(
verbose_name="証券コード", max_length=5, null=True
)
corporate_number = models.CharField(
verbose_name="提出者法人番号", max_length=13, null=True
)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class ReportDocument(models.Model):
"""
提出書類一覧APIで返ってくる顔ぶれデータ
Attributes:
seq_number (models.SmallIntegerField): The sequence number of the document.
doc_id (models.CharField): The document's management number.
ordinance_code (models.CharField): The ordinance code of the document.
form_code (models.CharField): The code representing the document's form.
period_start (models.DateField): The start date of the period covered by the document.
period_end (models.DateField): The end date of the period covered by the document.
submit_date_time (models.DateTimeField): The date and time when the document was submitted.
doc_description (models.CharField): A brief description of the document.
ope_date_time (models.DateTimeField): The date and time when the document was operated (nullable).
withdrawal_status (models.CharField): The withdrawal status of the document.
doc_info_edit_status (models.CharField): The modification status of the document information.
disclosure_status (models.CharField): The disclosure status of the document.
xbrl_flag (models.BooleanField): A flag indicating whether the document has an XBRL file.
pdf_flag (models.BooleanField): A flag indicating whether the document has a PDF file.
english_doc_flag (models.BooleanField): A flag indicating whether the document has an English file.
csv_flag (models.BooleanField): A flag indicating whether the document has a CSV file.
legal_status (models.BooleanField): A flag indicating whether the document is vertical reading.
download_reserved (models.BooleanField): ダウンロード予約済みかどうか
created_at (models.DateTimeField): The date and time when the document was created.
updated_at (models.DateTimeField): The date and time when the document was last updated.
company (ForeignKey): A foreign key to the associated Company object.
"""
seq_number = models.SmallIntegerField(verbose_name="連番")
doc_id = models.CharField(verbose_name="書類管理番号", max_length=8)
ordinance_code = models.CharField(verbose_name="府令コード", max_length=3)
form_code = models.CharField(verbose_name="様式コード", max_length=6)
period_start = models.DateField(verbose_name="期間(自)")
period_end = models.DateField(verbose_name="期間(至)")
submit_date_time = models.DateTimeField(verbose_name="提出日時")
doc_description = models.CharField(verbose_name="提出書類概要", max_length=147)
ope_date_time = models.DateTimeField(verbose_name="操作日時", null=True)
withdrawal_status = models.CharField(verbose_name="取下区分", max_length=1)
doc_info_edit_status = models.CharField(
verbose_name="書類情報修正区分", max_length=1
)
disclosure_status = models.CharField(verbose_name="開示不開示区分", max_length=1)
xbrl_flag = models.BooleanField(verbose_name="XBRL有無フラグ")
pdf_flag = models.BooleanField(verbose_name="PDF有無フラグ")
english_doc_flag = models.BooleanField(verbose_name="英文ファイル有無フラグ")
csv_flag = models.BooleanField(verbose_name="CSV有無フラグ")
legal_status = models.BooleanField(verbose_name="縦覧区分")
download_reserved = models.BooleanField(default=False)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
def __str__(self):
return f"{self.doc_id} - {self.company.edinet_code}"
class Counting(models.Model):
"""
計数データ
Attributes:
period_start (DateField): The starting date of the period for which the data represents.
period_end (DateField): The ending date of the period for which the data represents.
submit_date (DateField): The date and time when the data was submitted.
avg_salary (IntegerField): The average annual salary of the company's employees in Japanese yen.
avg_tenure (FloatField): The average length of employment in years.
avg_age (FloatField): The average age of the employees in years.
number_of_employees (IntegerField): The total number of employees in the company.
created_at (DateTimeField): The date and time when the object was created.
updated_at (DateTimeField): The date and time when the object was last updated.
company (ForeignKey): A foreign key to the associated Company object.
"""
period_start = models.DateField(verbose_name="期間(自)", null=True)
period_end = models.DateField(verbose_name="期間(至)", null=True)
submit_date = models.DateField(verbose_name="提出日時")
avg_salary = models.IntegerField(verbose_name="平均年間給与(円)", null=True)
avg_tenure = models.FloatField(verbose_name="平均勤続年数(年)", null=True)
avg_age = models.FloatField(verbose_name="平均年齢(歳)", null=True)
number_of_employees = models.IntegerField(verbose_name="従業員数(人)", null=True)
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
company = models.ForeignKey(Company, on_delete=models.CASCADE, null=True)
class Meta:
unique_together = ["company", "submit_date"]
securities/templates/securities/base.html
- カレンダーを追加するための設定
:
<!-- jQuery UI CSS for DatePicker -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.css"/>
<!-- jQuery UI for DatePicker -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script>
<script>
$(function () {
$("#start_date").datepicker({
dateFormat: "yy-mm-dd"
});
$("#end_date").datepicker({
dateFormat: "yy-mm-dd"
});
});
</script>
:
securities/templates/securities/report/index.html
{% extends "securities/base.html" %}
{% load static %}
{% load humanize %}
{% block content %}
<div class="jumbotron">
<h1 class="display-4">Let's analyze Securities Report!</h1>
<p class="lead">it's interesting Securities Report</p>
<hr class="my-4">
<p>You can read the Securities Report</p>
</div>
<div class="container">
<h2>1. 会社マスタを作成</h2>
<a class="btn btn-outline-primary mb-3" href="{% url 'securities:edinet_code_upload' %}"
role="button">EDINETコードリスト取り込み</a>
<h2 class="mt-5">2. 書類一覧を取得</h2>
<form method="post" action="{% url 'securities:index' %}">
{% csrf_token %}
<div class="form-row">
<div class="form-group col-md-6">
<label for="start_date">開始日:</label>
<input type="text" id="start_date" name="start_date" class="form-control"
value="{{ start_date|date:'Y-m-d' }}">
{% if form.start_date.errors %}
<div class="text-danger">{{ form.start_date.errors }}</div>
{% endif %}
</div>
<div class="form-group col-md-6">
<label for="end_date">終了日:</label>
<input type="text" id="end_date" name="end_date" class="form-control"
value="{{ end_date|date:'Y-m-d' }}">
{% if form.end_date.errors %}
<div class="text-danger">{{ form.end_date.errors }}</div>
{% endif %}
</div>
</div>
<button type="submit" class="btn btn-primary">指定した期間の書類一覧を取得</button>
</form>
<h2 class="mt-5">3. 有報をダウンロード予約する</h2>
<button id="submit-for-reserve" class="btn btn-outline-primary">ダウンロード予約する</button>
<a class="btn btn-outline-info{% if request.GET.reserved == 'yes' %} active{% endif %}"
href="{% url 'securities:index' %}{% if request.GET.reserved != 'yes' %}?reserved=yes{% endif %}">
ダウンロード予約済みリスト
</a>
<table class="table table-striped table-bordered">
<thead class="bg-primary text-white">
<tr>
<th scope="col"></th>
<th scope="col">Doc ID</th>
<th scope="col">EDINET Code</th>
<th scope="col">Sec Code</th>
<th scope="col">Corp Number</th>
<th scope="col">Filer Name</th>
<th scope="col">Period Start</th>
<th scope="col">Period End</th>
<th scope="col">Submit Date Time</th>
<th scope="col">Doc Description</th>
<th scope="col">XBRL Flag</th>
</tr>
</thead>
<tbody>
{% for report_document in object_list %}
<tr>
<td>{% if request.GET.reserved != 'yes' %}
<input type="checkbox" value="{{ report_document.id }}" class="report-checkbox">{% endif %}
</td>
<td>{{ report_document.doc_id }}</td>
<td>{{ report_document.company.edinet_code }}</td>
<td>{{ report_document.company.securities_code }}</td>
<td>{{ report_document.company.corporate_number }}</td>
<td>{{ report_document.company.submitter_name }}</td>
<td>{{ report_document.period_start }}</td>
<td>{{ report_document.period_end }}</td>
<td>{{ report_document.submit_date_time }}</td>
<td>{{ report_document.doc_description }}</td>
<td>{{ report_document.xbrl_flag }}</td>
</tr>
{% empty %}
<tr>
<td colspan="10">No documents available.</td>
</tr>
{% endfor %}
</tbody>
</table>
<nav aria-label="Page navigation example">
<ul class="pagination">
{% if page_obj.has_previous %}
<li class="page-item"><a class="page-link" href="?page=1">First</a></li>
<li class="page-item"><a class="page-link"
href="?page={{ page_obj.previous_page_number }}">Previous</a></li>
{% else %}
<li class="page-item disabled"><a class="page-link" href="#">First</a></li>
<li class="page-item disabled"><a class="page-link" href="#">Previous</a></li>
{% endif %}
<li class="page-item active"><a class="page-link" href="#">{{ page_obj.number }}</a></li>
{% if page_obj.has_next %}
<li class="page-item"><a class="page-link" href="?page={{ page_obj.next_page_number }}">Next</a>
</li>
<li class="page-item"><a class="page-link" href="?page={{ page_obj.paginator.num_pages }}">Last</a>
</li>
{% else %}
<li class="page-item disabled"><a class="page-link" href="#">Next</a></li>
<li class="page-item disabled"><a class="page-link" href="#">Last</a></li>
{% endif %}
</ul>
</nav>
</div>
<script>
document.querySelector('#submit-for-reserve').addEventListener('click', function () {
const ids = Array.from(document.querySelectorAll('.report-checkbox:checked')).map(box => box.value);
fetch('{% url 'securities:download_reserve' %}', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-CSRFToken': '{{ csrf_token }}',
},
body: JSON.stringify(ids)
}).then(function (response) {
if (response.ok) {
return response.json();
}
throw new Error('Network response was not ok');
}).then(function (json) {
if (json.status === "success") {
location.reload();
}
console.log(json);
});
});
</script>
{% endblock %}
securities/urls.py
EdinetCodeUploadView,
EdinetCodeUploadSuccessView,
+ DownloadReserveView,
)
app_name = "securities"
urlpatterns = [
path("", IndexView.as_view(), name="index"),
+ path("download_reserve/", DownloadReserveView.as_view(), name="download_reserve"),
path(
"edinet_code_upload/upload",
EdinetCodeUploadView.as_view(),
securities/views.py
- doc_id list を取得して、zipでダウンロードという処理がひとかたまりになっていたものを分離
import json
from datetime import datetime
from dateutil.relativedelta import relativedelta
from django.core.exceptions import ObjectDoesNotExist
from django.http import JsonResponse
from django.shortcuts import redirect
from django.urls import reverse_lazy
from django.utils.decorators import method_decorator
from django.utils.timezone import now
from django.views import View
from django.views.decorators.csrf import ensure_csrf_cookie
from django.views.generic import TemplateView, FormView, ListView
from securities.domain.service.upload import UploadService
from securities.domain.service.xbrl import XbrlService
from securities.domain.valueobject.edinet import RequestData
from securities.forms import UploadForm
from securities.models import ReportDocument, Company
class IndexView(ListView):
template_name = "securities/report/index.html"
model = ReportDocument
paginate_by = 10
def get_queryset(self):
queryset = super().get_queryset()
if self.request.GET.get("reserved") == "yes":
queryset = queryset.filter(download_reserved=True)
else:
queryset = queryset.filter(download_reserved=False)
queryset = queryset.order_by("doc_id")
return queryset
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
current_time = now()
context["start_date"] = current_time - relativedelta(months=2) # 2 month ago
context["end_date"] = current_time - relativedelta(days=1) # yesterday
return context
@staticmethod
def post(request, **kwargs):
if not Company.objects.exists():
return redirect("securities:index")
ReportDocument.objects.all().delete()
start_date_str = request.POST.get("start_date")
end_date_str = request.POST.get("end_date")
start_date = datetime.strptime(start_date_str, "%Y-%m-%d").date()
end_date = datetime.strptime(end_date_str, "%Y-%m-%d").date()
service = XbrlService()
report_document_list = service.fetch_report_doc_list(
RequestData(start_date=start_date, end_date=end_date)
)
ReportDocument.objects.bulk_create(report_document_list)
return redirect("securities:index")
@method_decorator(ensure_csrf_cookie, name="dispatch")
class DownloadReserveView(View):
@staticmethod
def post(request):
ids = json.loads(request.body)
for identifier in ids:
try:
doc = ReportDocument.objects.get(pk=identifier)
doc.download_reserved = True
doc.save()
except ObjectDoesNotExist:
return JsonResponse(
{"error": f"No ReportDocument exists with ID {identifier}"},
status=400,
)
return JsonResponse({"status": "success"})
class EdinetCodeUploadView(FormView):
template_name = "securities/edinet_code_upload/form.html"
form_class = UploadForm
success_url = reverse_lazy("securities:edinet_code_upload_success")
def form_valid(self, form):
service = UploadService(self.request)
service.upload()
return super().form_valid(form)
class EdinetCodeUploadSuccessView(TemplateView):
template_name = "securities/edinet_code_upload/success.html"
def get_context_data(self, **kwargs):
context = super().get_context_data(**kwargs)
# context["import_errors"] = SoilHardnessMeasurementImportErrors.objects.all()
return context
TODO: ubuntuサーバーでarelleを回すとフリーズする、を質問中
(ubuntuサーバーのcronは止めている)
# crontab -e
25 18 * * * /var/www/html/venv/bin/python /var/www/html/portfolio/manage.py daily_download_edinet