More than 5 years have passed since last update.

【pandasのcsv基本操作】csvの読み取り列行削除 xlsx出力

Last updated at 2018-11-02Posted at 2018-11-02

csvで読み込んだデータを自動で加工する業務が増えてきました。
そこで、最近はpandasを利用して自動化を最近お試してやってます。

時間開くとどうやって使うんだっけ・・・？と良くなるので自分のメモとして
pandasを利用したcsvの読み取り、列行削除、xlsxへの出力までの流れを簡単にまとめました

CSVの読み込み

まず列の名前を適当につけます。
名前つけてread_csvしないとなぜかエラーに・・・
そのあと、pd.read_csvで読み込み。
engine=pythonをつけると名前が日本語のcsvでも読み取れます。

import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read
df.head(6)

	c00	c01	c02	c03	c04	c05	c06	c07	c08	c09	c10
0	< Measurement Data Text Output >	None	None	None	None	None	None	None	None	None	NaN
1	Date/Time	@date()	ch.1	ch.2	ch.3	ch.4	ch.5	ch.6	ch.7	ch.8	NaN
2	NaN	NaN	ﾟC	%RH	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	2018/04/16 12:34'15	43206.5237847222	23.5000	25.0000	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	2018/04/16 12:36'15	43206.5251736111	23.6000	25.0000	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	2018/04/16 12:38'15	43206.5265625000	23.0000	25.0000	NaN	NaN	NaN	NaN	NaN	NaN	NaN

列の削除操作

4列目以降を削除します。

drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row
df.head(6)

	c00	c01	c02
0	< Measurement Data Text Output >	None	None
1	Date/Time	@date()	ch.1
2	NaN	NaN	ﾟC
3	2018/04/16 12:34'15	43206.5237847222	23.5000
4	2018/04/16 12:36'15	43206.5251736111	23.6000
5	2018/04/16 12:38'15	43206.5265625000	23.0000

行の削除操作

1,2行を削除します。

df = df.drop([0,1], axis=0)#delete col
df.head(6)

	c00	c01	c02
2	NaN	NaN	ﾟC
3	2018/04/16 12:34'15	43206.5237847222	23.5000
4	2018/04/16 12:36'15	43206.5251736111	23.6000
5	2018/04/16 12:38'15	43206.5265625000	23.0000
6	2018/04/16 12:40'15	43206.5279513889	22.7000
7	2018/04/16 12:42'15	43206.5293402778	22.7000

行番号振り直し

行を削除して番号がバラバラだとあとあと面倒になりそうなので、番号振り直し

df = df.reset_index(drop=True) #Renumber
df.head(6)

	c00	c01	c02
0	NaN	NaN	ﾟC
1	2018/04/16 12:34'15	43206.5237847222	23.5000
2	2018/04/16 12:36'15	43206.5251736111	23.6000
3	2018/04/16 12:38'15	43206.5265625000	23.0000
4	2018/04/16 12:40'15	43206.5279513889	22.7000
5	2018/04/16 12:42'15	43206.5293402778	22.7000

xlsx形式で出力する

import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save

まとめ

import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read

drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row

df = df.drop([0,1], axis=0)#delete col
df = df.reset_index(drop=True) #Renumber

import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

【pandasのcsv基本操作】csvの読み取り 列行削除 xlsx出力