# 【Python】正規表現

pythonで正規表現を使うためのメモです。

## マッチ関数

### match

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.match(r'[A-Za-z]+',content)

if result:
#マッチした部分を返す
print(result.group())
#マッチした先頭のインデックスを返す
print(result.start())
#マッチした末尾のインデックスを返す
print(result.end())
#startとendをタプルで返す
print(result.span())
else:
print("No match")

>> The
>> 0
>> 3
>> (0,3)
``````

### compile

``````import re

content = 'The high on August 7 is 35 degrees'
pattern = re.compile(r'[A-Za-z]+')
result = pattern.match(content)
if result:
#マッチした部分を返す
print(result.group())
#マッチした先頭のインデックスを返す
print(result.start())
#マッチした末尾のインデックスを返す
print(result.end())
#startとendをタプルで返す
print(result.span())
else:
print("No match")

>> The
>> 0
>> 3
>> (0,3)
``````

#### コンパイルした場合としなかった場合の比較

コンパイルした場合

``````import re
import time

contents = ['The high on August 7 is 35.8 degrees.', 'Tropical is a night tonight.', 'The high temperature on August 10 is supposed to be 40.59 degrees.']

pattern = re.compile(r'[a-zA-Z]+')

start = time.perf_counter()
for i in range(10**6):
for content in contents:
pattern.search(content)
end = time.perf_counter()
print(f'{end-start}sec')

>> 0.7540798999980325sec

``````

コンパイルしなかった場合

``````import re
import time

contents = ['The high on August 7 is 35.8 degrees.', 'Tropical is a night tonight.', 'The high temperature on August 10 is supposed to be 40.59 degrees.']

pattern = r'[a-zA-Z]+'

start = time.perf_counter()
for i in range(10**6):
for content in contents:
re.search(pattern,content)
end = time.perf_counter()
print(f'{end-start}sec')

>> 2.0059483000004548sec
``````

ループを回した時等で効果を発揮する。

### search

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.search(r'\d+',content)

if result:
#マッチする箇所が複数ある場合、マッチする先頭箇所のみ抽出
print(result.group())
print(result.start())
print(result.end())
print(result.span())
else:
print("No match")

>> 7
>> 19
>> 20
>> (19, 20)

``````

### findall

マッチする全ての文字列をリストとして返す。

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.findall(r'\d+',content)

print(result)

>> ['7', '35']

``````

### fullmatch

``````import re

content = 'The high on August 7 is 35 degrees'

result = re.fullmatch(r'\d+',content)

if result:
print(result.group())
else:
print("No match")

>> No match
``````

### split

``````import re

result = re.split(r'\d+',content)

if result:
print(result)
else:
print("No match")

>> ['Interesting', 'YouTube', 'called', 'Weather', 'News']

``````

### sub（replaceと同じ）

``````import re

content = 'Interesting 01 YouTube 1 called 2 Weather 3 News'

#数字をXに置き換える
result = re.sub(r'\d+','X',content)

print(result)

>> Interesting X YouTube X called X Weather X News

``````

### subn

subと同じ役割。
subでは文字列で返すが、subnはタプルで返す。

``````import re

content = 'Interesting 01 YouTube 1 called 2 Weather 3 News'

#数字をXに置き換える
result = re.subn(r'\d+','X',content)

print(result)

>> ('Interesting X YouTube X called X Weather X News', 4)

``````

