More than 5 years have passed since last update.

ファイルを読み込む時に複数ある区切り文字を統一してから読み込む

Python

Last updated at 2018-06-13Posted at 2018-06-12

ファイルを開くときに区切りたい文字が複数ある時の対処法を考えました。
複数をの区切り文字があるとnumpy.loadtxtで読み込めなかったので、ちゃんと読み込めるようにすることが今回の目的です。

ただ、全然スマートではないです・・・

# ファイルの中身
# sample.txt
1,2,3,11 22 33 44,55 66 77,4
1,2,3,11 22 33 44,55 66 77,4
1,2,3,11 22 33 44,55 66 77,4
1,2,3,11 22 33 44,55 66 77,4

例えば、こんなデータがある時にカンマと空白の２つで区切ってデータを読み込みたい時があるとします。

そんなときは・・・

# まず、ファイル全体を文字列として読み込む
file = open(path+"sample.txt","r")
string = file.read()
file.close()

# カンマを空白に置換して、区切り文字を空白に統一する
string = string.replace(","," ")

# 再度テキストに書き出す
file = open(path+"sample.txt","w")
file.write(string)
file.close()

こんな方法で区切り文字を統一しました。
これで、numpyなどでデータを読み込めるようになりました。

np.loadtxt(path+"sample.txt",delimiter=" ")

# array([[ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.]])

まだ、駆け出しでファイルの読み書きでも躓くので、記録させていただきました。

追記
一度ファイルに書き出さずに文字列をそのままファイルとして扱う方法を教えていただいたので追記します。

import numpy as np
from io import StringIO

# ファイルを文字列として読み込む
file = open(path+"sample.txt","r")
string = file.read()
file.close()
# カンマを空白に置換
string = string.replace(","," ")
# 文字列をファイルに変換
string = StringIO(string)
# numpyで読み込む
np.loadtxt(string)

# array([[ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.],
#       [ 1.,  2.,  3., 11., 22., 33., 44., 55., 66., 77.,  4.]])

文字列をファイルとして扱う方がスマートですね。
コメントありがとうございます。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up