More than 3 years have passed since last update.

Pythonで整数だけを含んだcsvを読み込む方法

Posted at 2020-05-15

Pythonで以下のようなcsv読み込みに自前で二重ループを書いて苦戦している方をみかけたので、代表的な読み込み方を4つまとめておく。

sample.csv

1,2,3
4,5,6
7,8,9
10,11,12

なお、ここではファイル読み込みをエミュレートするためにStringIOを使うが、実際に使うときには適宜open関数やファイル名の文字列などに読み替えてほしい。

csvモジュールを使う

Pythonの標準ライブラリであるCSVを使う方法。
標準ライブラリだけで動くメリットはあるかもしれないが、実際のところ数値しか含まないデータを二重のリストのままで扱うことはまずない、はず。

各要素は文字列のままなので、数値として変換するためにmap(int, row)を挟んでいる点に注意。

from io import StringIO
import csv
s = """1,2,3
4,5,6
7,8,9
10,11,12"""

with StringIO(s) as csvfile:
    csvreader = csv.reader(csvfile)
    rows = []
    for row in csvreader:
        rows.append(list(map(int, row)))

# or リスト内包表記
with StringIO(s) as csvfile:
    csvreader = csv.reader(csvfile)
    rows = [list(map(int, row)) for row in csvreader]

import numpy as np
arr = np.array(rows)

numpy

loadtxtとgenfromtxtがある。どちらも似たようなインターフェイスだが、genfromtxtは欠損値の置換ができたりするので少し高機能。

numpy.loadtxt

from io import StringIO
import numpy as np

s = """1,2,3
4,5,6
7,8,9
10,11,12"""

with StringIO(s) as csvfile:
    arr = np.loadtxt(csvfile, delimiter=",", dtype=int)

numpy.genfromtxt

from io import StringIO
import numpy as np

s = """1,2,3
4,5,6
7,8,9
10,11,12"""

with StringIO(s) as csvfile:
    arr = np.genfromtxt(csvfile, delimiter=",", dtype=int)

pandas

np.genfromtxtでも欠損値がある場合を扱えるが、pandasのほうが情報が見つかりやすいかもしれない。

pandas.read_csv

今回はヘッダーがないcsvとして扱うので、header=Noneのオプションを入れておく。

from io import StringIO
import pandas as pd

s = """1,2,3
4,5,6
7,8,9
10,11,12"""

with StringIO(s) as csvfile:
    df = pd.read_csv(csvfile, header=None, dtype=int)
arr = df.values

他に便利な方法があれば教えてください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up