More than 5 years have passed since last update.

Pythonでreadlines()は本当に全行を一度に読み込むのか確認してみた

Posted at 2018-12-26

readlines()は全体を読むらしいので確認してみる。
メモリ使用量を確認すればすぐわかる。

実験

まずは1GBのファイルを作成

make_dummy.pl

# !/usr/bin/perl
use strict;
use warnings;

open my $fh, '>', 'dummy.dat';
for (1..1024*1024) {
        print $fh 'x' x 1023 . "\n"
}
close $fh;

read_dummy.py

# !/usr/bin/python36
# vim: set expandtab ts=4 :

counter = 0
with open('dummy.dat', mode='r') as f:
    if 1:
        for line in f: # 一行ずつ読む
            tmp = line
            counter += 1
    if 0:
        for line in f.readlines(): # 全行読み込んでから一行ずつ
            tmp = line
            counter +=1

print("counter={}".format(counter))

結果

// for line in f:の場合
/usr/bin/time -f "\n%M" ./read_dummy.py
counter=1048576
5744

// for line in f.readlines():の場合
/usr/bin/time -f "\n%M" ./read_dummy.py
counter=1048576
845092     ←　巨大！

結論

大きなファイルを読むときはreadlines()は使わないようにしましょう。

参考

Pythonでファイルを一行ずつ読み込む - Qiita

Pythonでファイルを読み込むスマートなやり方 - oinume journal

fileが__iter__()とnext()を実装しているおかげ。詳しくはPEP-234に詳しく書いてある。

プロセスが使ったメモリー量を調べる - Qiita

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up