LoginSignup
12
11

More than 5 years have passed since last update.

【pandasのcsv基本操作】csvの読み取り 列行削除 xlsx出力

Last updated at Posted at 2018-11-02

csvで読み込んだデータを自動で加工する業務が増えてきました。
そこで、最近はpandasを利用して自動化を最近お試してやってます。

時間開くとどうやって使うんだっけ・・・?と良くなるので自分のメモとして
pandasを利用したcsvの読み取り、列行削除、xlsxへの出力までの流れを簡単にまとめました

CSVの読み込み

まず列の名前を適当につけます。
名前つけてread_csvしないとなぜかエラーに・・・
そのあと、pd.read_csvで読み込み。
engine=pythonをつけると名前が日本語のcsvでも読み取れます。

import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read
df.head(6)
c00 c01 c02 c03 c04 c05 c06 c07 c08 c09 c10
0 < Measurement Data Text Output > None None None None None None None None None NaN
1 Date/Time @date() ch.1 ch.2 ch.3 ch.4 ch.5 ch.6 ch.7 ch.8 NaN
2 NaN NaN ゚C %RH NaN NaN NaN NaN NaN NaN NaN
3 2018/04/16 12:34'15 43206.5237847222 23.5000 25.0000 NaN NaN NaN NaN NaN NaN NaN
4 2018/04/16 12:36'15 43206.5251736111 23.6000 25.0000 NaN NaN NaN NaN NaN NaN NaN
5 2018/04/16 12:38'15 43206.5265625000 23.0000 25.0000 NaN NaN NaN NaN NaN NaN NaN

列の削除操作

4列目以降を削除します。

drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row
df.head(6)
c00 c01 c02
0 < Measurement Data Text Output > None None
1 Date/Time @date() ch.1
2 NaN NaN ゚C
3 2018/04/16 12:34'15 43206.5237847222 23.5000
4 2018/04/16 12:36'15 43206.5251736111 23.6000
5 2018/04/16 12:38'15 43206.5265625000 23.0000

行の削除操作

1,2行を削除します。

df = df.drop([0,1], axis=0)#delete col
df.head(6)
c00 c01 c02
2 NaN NaN ゚C
3 2018/04/16 12:34'15 43206.5237847222 23.5000
4 2018/04/16 12:36'15 43206.5251736111 23.6000
5 2018/04/16 12:38'15 43206.5265625000 23.0000
6 2018/04/16 12:40'15 43206.5279513889 22.7000
7 2018/04/16 12:42'15 43206.5293402778 22.7000

行番号振り直し

行を削除して番号がバラバラだとあとあと面倒になりそうなので、番号振り直し

df = df.reset_index(drop=True) #Renumber
df.head(6)
c00 c01 c02
0 NaN NaN ゚C
1 2018/04/16 12:34'15 43206.5237847222 23.5000
2 2018/04/16 12:36'15 43206.5251736111 23.6000
3 2018/04/16 12:38'15 43206.5265625000 23.0000
4 2018/04/16 12:40'15 43206.5279513889 22.7000
5 2018/04/16 12:42'15 43206.5293402778 22.7000

xlsx形式で出力する

import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save

  
  

まとめ

import pandas as pd
col_names = [ 'c{0:02d}'.format(i) for i in range(11) ] #Name the column
df = pd.read_csv("./hoge.csv", engine='python', names=col_names) #file read

drop_row = ['c{0:02d}'.format(i) for i in range(3,11)] #select row number
df = df.drop(drop_row, axis=1) #delete row

df = df.drop([0,1], axis=0)#delete col
df = df.reset_index(drop=True) #Renumber

import openpyxl
df.to_excel('./test.xlsx', index=False, header=False) #file save


12
11
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
12
11