0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

日本の情報通信産業の部門別雇用者数の推移を可視化

Posted at

令和2年版の情報通信白書の日本の情報通信産業の部門別雇用者数の推移を可視化

ダウンロード

wget https://www.soumu.go.jp/johotsusintokei/whitepaper/ja/r02/pdf/02data.pdf -O data.pdf

プログラム

import pandas as pd
import pdfplumber

with pdfplumber.open("data.pdf") as pdf:

    page = pdf.pages[4]

    # 範囲を切り取り
    crop = page.within_bbox((50, 100, page.width - 50, page.height - 400))

    table_settings = {
        "vertical_strategy": "lines",
        "horizontal_strategy": "lines",
        "intersection_x_tolerance": 15,
    }

    im = crop.to_image()

    # テーブル確認
    im.debug_tablefinder(table_settings)

    table = crop.extract_table(table_settings)

    table[0][0] = "中分類"
    table[0][1] = "小分類"

    df = pd.DataFrame(table[1:], columns=table[0])

    df1 = df[
        df["中分類"].str.startswith(("1", "2", "3", "4", "5", "6", "7", "8", "9"))
    ].drop("小分類", axis=1)

    df1["中分類"] = df1["中分類"].str.lstrip("123456789.")

    df1.set_index("中分類", inplace=True)

    # カンマを除去、整数に変換
    df2 = df1.applymap(lambda s: s.replace(",", "")).astype(int)

    df2

テーブル確認結果

table.png

可視化

import japanize_matplotlib
import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.rcParams["figure.dpi"] = 200

df2.T.plot(grid=True)

plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left", borderaxespad=0, fontsize=8)

# グラフを保存
plt.savefig("01.png", dpi=200, bbox_inches="tight")
plt.show()

01.png

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?