ダウンロード
wget https://aivoice.jp/pdf/exVOICE_kotonoha.pdf
tabula-javaでコマンド変換
JAVAをインストール
# tabula-javaをダウンロード
wget https://github.com/tabulapdf/tabula-java/releases/download/v1.0.4/tabula-1.0.4-jar-with-dependencies.jar -O tabula.jar
java -jar tabula.jar -o akane.csv -p 1-7 -l exVOICE_kotonoha.pdf
java -jar tabula.jar -o aoi.csv -p 8-14 -l exVOICE_kotonoha.pdf
pdfplumberで変換
インストール
pip install pandas
pip install pdfplumber
CSV変換
import pdfplumber
import pandas as pd
with pdfplumber.open("exVOICE_kotonoha.pdf") as pdf:
data = []
for page in pdf.pages:
table = page.extract_table()
data.extend(table)
dft = pd.DataFrame(data, columns=data[0])
dfg = dft.groupby((df["通しNo"] == "通しNo").cumsum())
dfs = [g.iloc[1:].reset_index(drop=True) for _, g in dfg]
for i, df in enumerate(dfs):
df.to_csv(f"a_i_voice{i}.csv", encoding="utf_8_sig")