More than 3 years have passed since last update.

gBizINFOから法人データの取得

Last updated at 2021-11-09Posted at 2021-10-07

gBizINFOの概要

gBizINFO
gBizINFOは政府が保有する法人に関するデータを、経済産業省が法人番号にひも付けて整理し、オープンデータとして提供しているサービス。2017年1月に稼働し、2020年3月に「法人インフォ」から「gBizINFO」に改名、順次機能を拡充してきた。収録データは、法人として登記されている約500万社を対象とし、法人番号、法人名、本社所在地に加えて、省庁との契約情報、表彰情報等の政府が保有し公開している法人活動情報を本サイトで一括検索、閲覧できる。法人の中には、行政機関や管理組合等、法人番号が付与されている組織すべてが含まれている。なお、2020年7月よりgBizINFOのシステムはパブリッククラウドサービス「Amazon Web Services（AWS）」へと移行している。

1. データの種類

政府が保有している法人基本情報、届出認定情報、表彰情報、補助金情報、調達情報、特許情報、財務情報、職場情報。

1-1. 法人基本情報

データ項目：法人番号、法人名、本社所在地、法人代表者名、法人代表者役職、資本金、従業員数、従業員数（男性）、従業員数（女性）、業種コード、事業概要、設立年月日、創業年、最終更新日

1-2. 届出認定情報

データ項目：法人番号、法人名、本社所在地、認定日、届出認定名称、対象、部門、企業規模、有効期限、省庁

1-3. 表彰情報

データ項目：法人番号、法人名、本社所在地、認定日、表彰名、受賞対象、部門、省庁

1-4. 補助金情報

データ項目：法人番号、法人名、本社所在地、認定日、補助金名称、金額、対象、省庁、連名リスト、補助金財源

1-5. 調達情報

データ項目：法人番号、法人名、本社所在地、受注日、事業名、金額、省庁、連名リスト

1-6. 特許情報

データ項目：法人番号、法人名、本社所在地、出願番号、出願年月日、特許分類コード、意匠分類コード、商標分類コード、発明の名称/意匠に係わる物品/表示用商標

1-7. 財務情報

データ項目：法人番号、法人名、本社所在地、会計基準、事業年度、売上高、営業収益、営業収入、営業総収入、経常収益、正味収入保険料、経常利益又は経常損失、当期純利益又は当期純損失、資本金、純資産額、総資産額、従業員数、大株主1、大株主1の株式保有割合、大株主2、大株主2の株式保有割合、大株主3、大株主3の株式保有割合、大株主4、大株主4の株式保有割合、大株主5、大株主5の株式保有割合

1-8. 職場情報

データ項目：法人番号、法人名、本社所在地、平均継続勤務年数、従業員の平均年齢、月平均所定外労働時間、労働者に占める女性労働者の割合、女性管理者人数、管理職全体人数、女性役員人数、役員全体人数、育児休業対象者数（男性）、育児休業対象者数（女性）、育児休業取得者数（男性）、育児休業取得者数（女性）

2. REST APIの提供

API利用方法
gBizINFOが保有する法人情報に対して、情報提供REST APIにより、指定する条件に合致する法人の法人番号を検索することが可能。また、情報提供REST APIにより、指定する法人番号の法人情報（法人基本情報、補助金情報、届出・認定情報、調達情報、特許情報、財務情報）を提供している。

API利用申請
APIの利用申請を行い、APIトークンを取得する。

3. 法人データの取得

3-1. 法人番号の取得

国税庁法人番号公表サイト・全件データのダウンロード
国税庁の法人番号公表サイトから公表されているすべての法人の全件データをダウンロードする。

3-2. REST APIによるデータ取得

情報提供REST APIにより、法人の全件データに対応する法人情報（法人基本情報、補助金情報、届出・認定情報、調達情報、特許情報、財務情報）を取得する。

3-3. 開発環境

OS: macOS Big Sur 11.1
Python: 3.9.2
Cython: 0.29.24

3-4. データの取得

法人の全件リストを読み込み、gBizINFOのREST APIから各法人番号に対応する法人データを取得する。Pythonではループ処理に非常に時間を要するため、ここでは簡便的にCythonを使用する。

1. Cythonの有効化

%load_ext Cython

2. 基本的な構文

# Cythonを実行するためのマジックコマンド
%%cython
import requests
import json
import datetime
import pandas as pd
import numpy as np
from tqdm import tqdm

# APIトークン（各自が申請取得したAPIトークンに差し替える）
token = 'XXXXXXX'

# 法人の全件データ読み込み
corp_nums = pd.read_csv("00_zenkoku_all_20210930.csv", header=None, sep=',', low_memory=False)
# テスト用に法人件数を1,000件に限定(全件取得したい場合は1000を削除)
corp_nums = list(corp_nums.iloc[0:1000,1].fillna(-1).astype(np.int64))

class gbizinfo:
    def __init__(self, token):
        self.headers = {
            "Accept": "application/json",
            "X-hojinInfo-api-token": token
        }
        self.endpoint_url = 'https://info.gbiz.go.jp/hojin/v1/hojin/'

# 届出・認定情報
    def _get_corporate_certification(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/certification',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'certification', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'], 
              errors='ignore')
        return df

# 表彰情報
    def _get_corporate_commendation(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/commendation',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'commendation', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'], 
              errors='ignore')
        return df

# 財務情報
    def _get_corporate_finance(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/finance',
                    headers = self.headers,
                    timeout = 10
                    )
                d = r.text
                res = json.loads(d)['hojin-infos'][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'finance', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'], 
              errors='ignore')
        return df

# 特許情報
    def _get_corporate_patent(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/patent',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'patent', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'], 
              errors='ignore')
        return df

# 調達情報
    def _get_corporate_procurement(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/procurement',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'procurement', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'],  
              errors='ignore')
        return df

# 補助金情報    
    def _get_corporate_subsidy(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/subsidy',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'subsidy', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'], 
              errors='ignore')
        return df

# 職場情報
    def _get_corporate_workplace(self, list corp_nums):
        cdef int i
        cdef list results
        results = []

        for i in tqdm(range(len(corp_nums))):
            try:
                r = requests.get(
                    url = self.endpoint_url + str(corp_nums[i]) + '/workplace',
                    headers = self.headers,
                    timeout = 10
                    )
                res = r.json()["hojin-infos"][0]
                results.append(res)
            except:
                results.append({})
        df = pd.json_normalize(results, 'workplace_info', 
             ['corporate_number', 'name', 'representative_name', 'postal_code', 
              'location', 'capital_stock', 'employee_number', 'business_items',
              'update_date', 'date_of_establishment'],
              errors='ignore')
        return df

3. 実行文（例として、補助金情報を取得）

# 届出・認定情報の取得
# df_gbizinfo_certification =　gbizinfo(token=token)._get_corporate_certification(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'category', 'date_of_approval', 'enterprise_scale', 
#        'expiration_date', 'government_departments', 'target', 'title']
# df_gbizinfo_certification = df_gbizinfo_certification[cols]
# print(df_gbizinfo_certification)
# df_gbizinfo_certification.to_csv('gbizinfo_certification.csv', encoding='utf-8-sig')

# 表彰情報の取得
# df_gbizinfo_commendation = gbizinfo(token=token)._get_corporate_commendation(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'date_of_commendation', 'title', 'target', 'category', 
#        'government_departments']
# df_gbizinfo_commendation = df_gbizinfo_commendation[cols]
# print(df_gbizinfo_commendation)
# df_gbizinfo_commendation.to_csv('gbizinfo_commendation.csv', encoding='utf-8-sig')

# 財務情報の取得
# df_gbizinfo_finance = gbizinfo(token=token)._get_corporate_finance(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'date_of_approval', 'title', 'amount', 'target', 
#        'government_departments', 'note', 'joint_signatures', 'subsidy_resource']
# df_gbizinfo_finance = df_gbizinfo_finance[cols]
# print(df_gbizinfo_finance)
# df_gbizinfo_finance.to_csv('gbizinfo_finance2.csv', encoding='utf-8-sig')

# 特許情報の取得
# df_gbizinfo_patent = gbizinfo(token=token)._get_corporate_patent(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'application_date', 'application_number', 
#        'classifications', 'patent_type', 'title']
# df_gbizinfo_patent = df_gbizinfo_patent[cols]
# print(df_gbizinfo_patent)
# df_gbizinfo_patent.to_csv('gbizinfo_patent.csv', encoding='utf-8-sig')

# 調達情報の取得
# df_gbizinfo_procurement = gbizinfo(token=token)._get_corporate_procurement(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'date_of_order', 'title', 'amount', 
#        'government_departments']
# df_gbizinfo_procurement = df_gbizinfo_procurement[cols]
# print(df_gbizinfo_procurement)
# df_gbizinfo_procurement.to_csv('gbizinfo_procurement.csv', encoding='utf-8-sig')

# 補助金情報の取得
df_gbizinfo_subsidy = gbizinfo(token=token)._get_corporate_subsidy(corp_nums)
cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
        'capital_stock', 'employee_number', 'business_items', 'update_date', 
        'date_of_establishment', 'date_of_approval', 'title', 'amount', 'target', 
        'government_departments', 'note', 'joint_signatures', 'subsidy_resource']
df_gbizinfo_subsidy = df_gbizinfo_subsidy[cols]
print(df_gbizinfo_subsidy)
df_gbizinfo_subsidy.to_csv('gbizinfo_subsidy.csv', encoding='utf-8-sig')

# 職場情報の取得
# df_gbizinfo_workplace = gbizinfo(token=token)._get_corporate_workplace(corp_nums)
# cols = ['corporate_number', 'name', 'representative_name', 'postal_code', 'location', 
#        'capital_stock', 'employee_number', 'business_items', 'update_date', 
#        'date_of_establishment', 'base_infos', 'compatibility_of_childcare_and_work', 
#        'women_activity_infos']
# df_gbizinfo_workplace = df_gbizinfo_workplace[cols]
# print(df_gbizinfo_workplace)
# df_gbizinfo_workplace.to_csv('gbizinfo_workplace.csv', encoding='utf-8-sig')

4. 出力サンプル

取得した補助金情報の出力結果を下記にサンプルとして示す。ここに示すものは、データサンプルとして抽出したものに過ぎず、いかなる意図も持たないものであることを付言しておく。

法人番号	認定日	補助金名称	金額	省庁
1010001000220	2020/06/05	水産物販売促進緊急対策事業	¥6,177,130,640	農水省
1010001062178	2020/05/22	国産農林水産物販売促進緊急対策事業	¥10,208,376,000	農水省
1010001112841	2011/01/01	原子力施設立地推進調整委託費	¥945,000	経産省
1010001008767	2018/06/05	国際競争力強化促進事業	¥14,040,000	国交省
1010001008767	2019/03/27	国際競争力強化促進事業	¥36,301,000	国交省
1010001008767	2019/08/06	国際競争力強化促進事業	¥16,595,000	国交省
1010001008767	2020/03/27	国際競争力強化促進事業	¥42,648,000	国交省

5. コメント

調達情報、補助金情報は、最もデータが最も充実しており、興味深いデータも一部、見受けられた。例えば、大塚商会は防衛省からインクカートリッジを約2千万円で調達、日本食糧新聞社は農水省から約102億円の補助金を受けていた等。
表彰情報は、表彰年度のデータが多々欠落していたのが残念であった。
職場情報は、データの内容は興味深いが、現状では、データの欠如が多く見受けられた。
非上場企業の情報も掲載されていることは、非常に有用であり、データの拡充が望まれる。

6. 参照

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up