1

More than 1 year has passed since last update.

# 【Python】正規表現

Last updated at Posted at 2022-08-10

pythonで正規表現を使うためのメモです。

## マッチ関数

### match

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.match(r'[A-Za-z]+',content)

if result:
#マッチした部分を返す
print(result.group())
#マッチした先頭のインデックスを返す
print(result.start())
#マッチした末尾のインデックスを返す
print(result.end())
#startとendをタプルで返す
print(result.span())
else:
print("No match")

>> The
>> 0
>> 3
>> (0,3)
``````

### compile

``````import re

content = 'The high on August 7 is 35 degrees'
pattern = re.compile(r'[A-Za-z]+')
result = pattern.match(content)
if result:
#マッチした部分を返す
print(result.group())
#マッチした先頭のインデックスを返す
print(result.start())
#マッチした末尾のインデックスを返す
print(result.end())
#startとendをタプルで返す
print(result.span())
else:
print("No match")

>> The
>> 0
>> 3
>> (0,3)
``````

#### コンパイルした場合としなかった場合の比較

コンパイルした場合

``````import re
import time

contents = ['The high on August 7 is 35.8 degrees.', 'Tropical is a night tonight.', 'The high temperature on August 10 is supposed to be 40.59 degrees.']

pattern = re.compile(r'[a-zA-Z]+')

start = time.perf_counter()
for i in range(10**6):
for content in contents:
pattern.search(content)
end = time.perf_counter()
print(f'{end-start}sec')

>> 0.7540798999980325sec

``````

コンパイルしなかった場合

``````import re
import time

contents = ['The high on August 7 is 35.8 degrees.', 'Tropical is a night tonight.', 'The high temperature on August 10 is supposed to be 40.59 degrees.']

pattern = r'[a-zA-Z]+'

start = time.perf_counter()
for i in range(10**6):
for content in contents:
re.search(pattern,content)
end = time.perf_counter()
print(f'{end-start}sec')

>> 2.0059483000004548sec
``````

ループを回した時等で効果を発揮する。

### search

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.search(r'\d+',content)

if result:
#マッチする箇所が複数ある場合、マッチする先頭箇所のみ抽出
print(result.group())
print(result.start())
print(result.end())
print(result.span())
else:
print("No match")

>> 7
>> 19
>> 20
>> (19, 20)

``````

### findall

マッチする全ての文字列をリストとして返す。

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.findall(r'\d+',content)

print(result)

>> ['7', '35']

``````

### fullmatch

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.fullmatch(r'\d+',content)

if result:
print(result.group())
else:
print("No match")

>> No match
``````

### split

``````import re

result = re.split(r'\d+',content)

if result:
print(result)
else:
print("No match")

>> ['Interesting', 'YouTube', 'called', 'Weather', 'News']

``````

### sub（replaceと同じ）

``````import re

content = 'Interesting 01 YouTube 1 called 2 Weather 3 News'

#数字をXに置き換える
result = re.sub(r'\d+','X',content)

print(result)

>> Interesting X YouTube X called X Weather X News

``````

### subn

subと同じ役割。
subでは文字列で返すが、subnはタプルで返す。

``````import re

content = 'Interesting 01 YouTube 1 called 2 Weather 3 News'

#数字をXに置き換える
result = re.subn(r'\d+','X',content)

print(result)

>> ('Interesting X YouTube X called X Weather X News', 4)

``````

## 参考リンク

Register as a new user and use Qiita more conveniently

1. You get articles that match your needs
2. You can efficiently read back useful information
3. You can use dark theme
What you can do with signing up
1