1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

東京都の国のステージ判断のための指標のPDFからCSV作成(camelot)

Posted at

pdfplumber
https://qiita.com/barobaro/items/75d076f4fbe9771a0b3a

Twitterにcamelotだと変換できないと書いてたのでcamelotで作成

latticeだと「process_background=True」で取得はできるが、表になってないので加工が必要 ※めんどくさい

streamで変換

  • camelotの範囲指定しても範囲絞り込めない
  • デフォルトの「edge_tol=50」 だと取り込み範囲が表以外も含まれるため+10してみると「edge_tol=60」表の範囲が取得できた
  • 行間が大きいので「row_tol=40」で調整、10から+10していっただけ
  • 「病床のひっ迫具合」が「入院率」と結合されてしまっているのでPython版は修正

ダウンロード

!wget https://www.fukushihoken.metro.tokyo.lg.jp/iryo/kansen/corona_portal/info/kunishihyou.files/kuni0824.pdf -O data.pdf

コマンド

camelot -p 1 -o data.csv -f csv stream -e 60 -r 40 data.pdf

Python

import camelot

tables = camelot.read_pdf(
    "data.pdf", flavor="stream", edge_tol=60, row_tol=40, strip_text=" \n"
)

df = tables[0].df

df.iat[6, 0] = "入院率"

df

df.to_csv("data.csv", index=False, header=False)
1
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?