Go to Qiita Advent Calendar 2024 Top

0

More than 1 year has passed since last update.

Pythonノート: Text処理

Last updated at 2023-08-15Posted at 2023-07-22

概要

Pythonノート Text処理系

file open / close

open()

open() を使ってファイルを開く

>>> f = open('./xxx.txt')

第二引数でモードを指定する

mode	description
'r'	open a file for reading (default)
'w'	open a file for writing. create a file if it does not exist
'x'	exclusive flag, If the file already exists, the operation fails.
'a'	open for appending
't'	open in text mode (default)
'b'	open in bianry mode
'+'	open a file for updating(reading and writing)

'r', 'w', 'a', 'x' はどれか1つだけ指定可能
'b', 't'もどれか1つだけ指定可能
'+'は'r', 'w', 'a', 'x' と組み合わせて指定
- 'r+' : 既存ファイル先頭位置がファイル位置となる。指定した名前のファイルがない場合はエラー
- 'w+' : ファイル新規作成、または既存ファイルをオープンして内容を削除。ファイル先頭位置がファイル位置
- 'a+' : ファイル新規作成、または既存ファイルをオープンして追記。ファイル末尾がファイル位置となる
- 'x+' : ファイルを新規作成してファイル先頭位置がファイル位置になる

'+'指定は読み書き両用。'r' 'w' は読みのみ、書きのみ

write to text file

>>> f = open('./xxx.txt', mode='w')

write to binary file

>>> f = open('./yyy.bin', 'wb')

第3引数でエンコードを指定できる (defaultの'utf-8'のままで普通は良い)
- encoding = 'utf-8' (default) | 'shift_jis' | 'euc_jp' | 'iso2022_jp'

>>> f = open('./regacy.txt' , encode='shift_jis')

f.close()

-　close file object

>>> f = open("<file path>", "r")

...
>>> f.close()

file read / write

以下のようなtextファイルを読み込む場合を考える

# file.txt
alice
bob
cherry

f.readline()

1行読む（改行文字まで, 改行文字も含む）

>>> f = open('file.txt', 'r')

1回目=1行目

>>> line = f.readline()
>>> line
'## file.txt\n'

2行目

>>> line = f.readline()
>>> line
'alice\n'

3行目

>>> line = f.readline()
>>> line
'bob\n'

f.readlines()

ファイル全体をリスト変数として読み込める

>>> lines = f.readlines()
>>> lines
['# file.txt\n', 'alice\n', 'bob\n', 'cherry']

f.read()

引数指定でn文字読む
未指定の場合すべてを読み込む

>>> val = f.read()
>>> val
'# file.txt\nalice\nbob\ncherry'

with open (file path) as (variable): 構文

with表記を用いると close(開放)まで自動で行う

with open ./a.txt as f:
    <処理>

withのインデントを抜けると自動でclose

f.write()

open()でファイルオープン、以下の文字列をwrite()で書き込む

text = """FILE
ALICE
BOB
CHARLY
"""

with open('new.txt', 'w') as f:
   f.write(text)

出力.txt

FILE
ALICE
BOB
CHARLY

regular expression (re)

`re` module

正規表現を使うにはre moduleを用いる

import

import re

`re.match()`

match()でマッチオブジェクトを取得
match('pattern', 'target strings')

>>> m = re.match('(\w+)\s(\w+)\s\w+',  'aaa bbb ccc' )

match オブジェクト

上記ではmがマッチオブジェクト

`groups()`メソッド

マッチした要素をタプル形式で取得できる
括弧(brace)でマッチした要素はタプルに保持される。
groupsメソッドでタプルを表示

>>> m.groups()
('aaa', 'bbb')

`group()`メソッド

goutp()メソッドではパターン内の()でマッチしたそれぞれの要素を引数＝indexで指定して取り出せる

index=0 はマッチした全体の文字列を表す

>>> m.group(0)
'aaa bbb ccc'

index=1

>>> m.group(1)
'aaa'

index=2

>>> m.group(2)
'bbb'

`re.sub()`

re.sub('pattern', '置換後のpattern', string) で置換した文字列を取得する

>>> s = 'aaa_aaa_bbb_bbb'
>>> s_new = re.sub('aaa', 'ABC', s)
>>> s_new
'ABC_ABC_bbb_bbb'

第4引数で最大置換回数を指定できる

>>> s_new1 = re.sub('aaa', 'ABC', s, 1)
>>> s_new1
'ABC_aaa_bbb_bbb'

0

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

0