csvで読み込んだデータを自動で加工する業務が増えてきました。
そこで、最近はpandasを利用して自動化を最近お試してやってます。
時間開くとどうやって使うんだっけ・・・?と良くなるので自分のメモとして
pandasを利用したcsvの読み取り、列行削除、xlsxへの出力までの流れを簡単にまとめました
#CSVの読み込み
まず列の名前を適当につけます。
名前つけてread_csvしないとなぜかエラーに・・・
そのあと、pd.read_csvで読み込み。
engine=pythonをつけると名前が日本語のcsvでも読み取れます。
import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read
df.head(6)
c00 | c01 | c02 | c03 | c04 | c05 | c06 | c07 | c08 | c09 | c10 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | < Measurement Data Text Output > | None | None | None | None | None | None | None | None | None | NaN |
1 | Date/Time | @date() | ch.1 | ch.2 | ch.3 | ch.4 | ch.5 | ch.6 | ch.7 | ch.8 | NaN |
2 | NaN | NaN | ゚C | %RH | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 2018/04/16 12:34'15 | 43206.5237847222 | 23.5000 | 25.0000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 2018/04/16 12:36'15 | 43206.5251736111 | 23.6000 | 25.0000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 | 2018/04/16 12:38'15 | 43206.5265625000 | 23.0000 | 25.0000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
#列の削除操作
4列目以降を削除します。
drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row
df.head(6)
c00 | c01 | c02 | |
---|---|---|---|
0 | < Measurement Data Text Output > | None | None |
1 | Date/Time | @date() | ch.1 |
2 | NaN | NaN | ゚C |
3 | 2018/04/16 12:34'15 | 43206.5237847222 | 23.5000 |
4 | 2018/04/16 12:36'15 | 43206.5251736111 | 23.6000 |
5 | 2018/04/16 12:38'15 | 43206.5265625000 | 23.0000 |
#行の削除操作
1,2行を削除します。
df = df.drop([0,1], axis=0)#delete col
df.head(6)
c00 | c01 | c02 | |
---|---|---|---|
2 | NaN | NaN | ゚C |
3 | 2018/04/16 12:34'15 | 43206.5237847222 | 23.5000 |
4 | 2018/04/16 12:36'15 | 43206.5251736111 | 23.6000 |
5 | 2018/04/16 12:38'15 | 43206.5265625000 | 23.0000 |
6 | 2018/04/16 12:40'15 | 43206.5279513889 | 22.7000 |
7 | 2018/04/16 12:42'15 | 43206.5293402778 | 22.7000 |
#行番号振り直し
行を削除して番号がバラバラだとあとあと面倒になりそうなので、番号振り直し
df = df.reset_index(drop=True) #Renumber
df.head(6)
c00 | c01 | c02 | |
---|---|---|---|
0 | NaN | NaN | ゚C |
1 | 2018/04/16 12:34'15 | 43206.5237847222 | 23.5000 |
2 | 2018/04/16 12:36'15 | 43206.5251736111 | 23.6000 |
3 | 2018/04/16 12:38'15 | 43206.5265625000 | 23.0000 |
4 | 2018/04/16 12:40'15 | 43206.5279513889 | 22.7000 |
5 | 2018/04/16 12:42'15 | 43206.5293402778 | 22.7000 |
#xlsx形式で出力する
import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save
#まとめ
import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read
drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row
df = df.drop([0,1], axis=0)#delete col
df = df.reset_index(drop=True) #Renumber
import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save