More than 5 years have passed since last update.

Python CSVファイル文字コード変換、ファイル名抽出、読み込み、出力、マージ操作

Posted at 2019-12-16

自ら作成した一斉にcsv入出力、変換させる自動化処理の
資料を一箇所で見かけなかったので備忘録です。

ライブラリ呼び出し

import csv
import pandas as pd
import numpy as np
import os
import glob
from pathlib import Path

フォルダごとShift-JIS/sjis ファイルを一斉にUTF-8に別のフォルダへ変換出力

###################################################
# 特定フォルダにあるShift JISのファイルをUTF-8に変換
# inフォルダにあるものを読み込みオリジナルは残したまま
# outフォルダに保存
print("Shift JIS to UTF-8 Start!")
p = Path("./in")
files = list (p.glob("*.csv"))
for file in files:
        shift_jis_file = pd.read_csv(file,encoding='Shift_JISx0213')
        file_path = f'out/{file.name}'
        shift_jis_file.to_csv(file_path)
print("Shift JIS to UTF-8 End!")
###################################################

処理させるファイル名読み込み
（今回はfilelist.csvに処理させたいファイル名を列挙しておく）

# 処理させたいファイルリストの読み取り
# （ファイル名のみ一列に記載）
filelist = pd.read_csv('./filelist.csv')

# ファイルごとのループ処理
for index, row in filelist.iterrows():

    #処理するファイル名を１行ごと読み取り
    a_list=str(row.values)

    #入出力パスの定義：読み取ったファイル名をかっこなし、アポストロフィなしにして変数格納、フォルダ名とジョイン
    inputpath=str('./out/'+a_list)
    inputpath=inputpath.replace("'", '')
    inputpath=inputpath.replace('[', '')
    inputpath=inputpath.replace(']', '')

    #入出力パスの定義：読み取ったファイル名をかっこなし、アポストロフィなしにして変数格納、フォルダ名とジョイン
    outputfolder=str('./converted/')
    outputpath=str('./converted/'+a_list)
    outputpath=outputpath.replace("'", '')
    outputpath=outputpath.replace('[', '')
    outputpath=outputpath.replace(']', '')

    #入出力パスの定義：読み取ったファイル名をかっこなし、アポストロフィなしにして変数格納、フォルダ名とジョイン
    mergedpath=str('./merged/tsestockdata.csv')
    mergedpath=mergedpath.replace("'", '')
    mergedpath=mergedpath.replace('[', '')
    mergedpath=mergedpath.replace(']', '')

実際のローデータのファイル（UTF-8に変換したファイルの処理）、
不要行削除、特定の文字列を一斉追加するなどのカラム追加操作

    #読み取りファイル
    df = pd.read_csv(inputpath, header=None)
    
    #特定の値を抽出、抽出した値をカラムの最終列に追加
    tradingdate=inputpath[-14:-4]
    df['TradingDate']=tradingdate

    ##########不要行削除、列名再命名

    df.columns = df.iloc[0]
    df = df.reindex(df.index.drop(0)).reset_index(drop=True)
    df = df.reindex(df.index.drop(0)).reset_index(drop=True)
    df.columns.name = None    
    df.columns = ['Rank', 'Code', 'Market','Company','EndingPrice','PriceChange','PercentChange','Volume','TradingDate']

csvの個別書き出し

    ##########不要行削除、列名再命名のcsv書き出し処理

    df.to_csv(path_or_buf=outputpath, sep=',', na_rep='', float_format=None, columns=None, header=True,
      index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None,
      quotechar='"', line_terminator='\n', chunksize=None, date_format=None,
      doublequote=True, escapechar=None, decimal='.')

    print("Convert File Processed:"+a_list)

csvファイルのマージ作業（BIで集約された一つのファイルを呼び出す形式のほうが楽なため）

########## 特定フォルダにあるcsvファイルの全マージ
DATA_PATH = outputfolder
All_Files = glob.glob('{}*.csv'.format(DATA_PATH))
list2 = []
for file in All_Files:
    list2.append(pd.read_csv(file))
df = pd.concat(list2, sort=False)
df.to_csv(mergedpath, encoding='utf_8')
print("Merge Process End!!")
##########

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up