More than 1 year has passed since last update.

[Python]テキストファイルから特定の行を取得

Last updated at 2022-11-20Posted at 2022-11-07

はじめに

今回はpythonでテキストファイルから特定の行を取得してみたいと思います。
動作環境
- Python 3.11.0
- macOS Ventura

サンプルテキストファイル

以下、今回用意したサンプルテキストファイルです。

sample.txt

1 sample text file
2 start
3 apple
4 banana
5 orange
6 end
7

行番号で指定した行を取得

行ごとに分割したリストから取得

readlinesメソッドで行ごとに分割したリストを取得可能です。

with open("sample.txt", "r") as f:
	lines = f.readlines()
print(lines)

['sample text file\n', 'start\n', 'apple\n', 'banana\n', 'orange\n', 'end\n']

改行コードもそのまま入ってしまうので、rstripで改行コードを取り除いてみます。

lines_rstrip = [line.rstrip("\n") for line in lines]
print(lines_rstrip)

['sample text file', 'start', 'apple', 'banana', 'orange', 'end']

また、以下の方法で改行コードを取り除いたリストを取得可能です。

with open("sample.txt", "r") as f:
	lines = f.read().splitlines()
print(lines)

['sample text file', 'start', 'apple', 'banana', 'orange', 'end']

では、3行目のappleを取得してみます。改行コードを取り除いたリストlinesのインデックスを指定して、該当行を取得してみます。（インデックスが0から始まるので、実際取得する行番号から1を引く必要があります）

target_line = lines[2]
print(target_line)

apple

複数行（3~5行目）を取得してみます。

target_lines = lines[2:5]
print(target_lines)

for line in target_lines:
	print(line)

['apple', 'banana', 'orange']
apple
banana
orange

linecacheで取得

次にlinecacheで3行目（apple）を取得してみます。

linecacheモジュールは、キャッシュ (一つのファイルから何行も読んでおくのが一般的です) を使って、内部で最適化を図りつつ、Python ソースファイルの任意の行を取得するのを可能にします。 traceback モジュールは、整形されたトレースバックにソースコードを含めるためにこのモジュールを利用しています。

import linecache

# getlineの第二引数で直接行番号を指定
# rstripで改行コードを取り除く
target_line = linecache.getline("sample.txt", 3).rstrip("\n")
print(target_line)

apple

最初の行と最後の行を取得

最初の行

行ごとに分割したリストから、最初の要素を取得してみます。

with open("sample.txt", "r") as f:
	# リストから最初の要素を取得
	first_line = f.readlines()[0].rstrip("\n")
print(first_line)

#または
with open("sample.txt", "r") as f:
	# リストから最初の要素を取得
	first_line = f.read().splitlines()[0]
print(first_line)

sample text file
sample text file

readlineメソッドでも取得可能です。readlineはファイルの先頭から1行ずつ読み込むことが可能です。

with open("sample.txt", "r") as f:
	# 最初の行を読み込む
	first_line = f.readline().rstrip("\n")
print(first_line)

sample text file

linecacheなら、行番号を1に指定することで最初の行を取得可能です。

import linecache

first_line = linecache.getline("sample.txt", 1).rstrip("\n")
print(first_line)

sample text file

最後の行

行ごとに分割したリストから、最後の要素のインデックスを指定することで取得可能です。

with open("sample.txt", "r") as f:
	# リストから最後の要素を取得
	last_line = f.readlines()[-1].rstrip("\n")
print(last_line)

#または
with open("sample.txt", "r") as f:
	# リストから最後の要素を取得
	last_line = f.read().splitlines()[-1]
print(last_line)

end
end

特定の文字列が含まれる行を取得

sampleというキーワードが含まれる行を取得してみます。

# ファイルから一行ずつ読み込んでチェックする
with open("sample.txt", "r") as f:
    for line in f:
        if "sample" in line:
            print(line.rstrip("\n"))

# または行ごとに分割したリストから要素ごとにチェックする
with open("sample.txt", "r") as f:
	lines = f.read().splitlines()

for line in lines:
	if "sample" in line:
		print(line)

sample text file
sample text file

開始行と終了行を文字列で指定し、その間の行を取得

開始行と終了行を文字列(startとend)で指定し、その間の行（3~5行目）を取得してみます。

with open("sample.txt", "r") as f:
	lines = f.read().splitlines()

# 開始行と終了行のインデックスを取得
start_index = lines.index("start")
end_index = lines.index("end")

#　開始行から終了行までの行のリストを取得
target_lines = lines[start_index + 1:end_index]
print(target_lines)

for line in target_lines:
    print(line)

['apple', 'banana', 'orange']
apple
banana
orange

また、以下のような方法もあります。

with open("sample.txt", "r") as f:
	lines = f.read().splitlines()

for num, line in enumerate(lines):
    # 一行ずつ読み込み、開始行を見つけたら、
    # その次の行から終了行までの行を出力
    if line == "start":
    	while True:
            num += 1
            if lines[num] == "end":
                break
            print(lines[num])

apple
banana
orange

参考

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up