More than 5 years have passed since last update.

[Python]Regular Expressions 正規表現

Python

Last updated at 2018-10-05Posted at 2016-01-01

文字列検索

re.search(pattern, string)

文字列の中から最初に一致する部分を探す

patternの先頭のrはraw stringの意味で、backslashをそのままsearchに渡してくれる。

In [11]: import re

In [12]: str = 'Less but better'

In [13]: match = re.search(r'\s\w\w\w\s',str)

In [14]: print 'found:', match.group()
found:  but

一致した文字列がない場合はNoneで返ってくるのでif文でチェックするといい。

str = 'Less but better'
match = re.search(r'\s\w\w\w\s',str)
if match:
    print 'found:', match.group()
else:
    print 'Not found'

re.findall(pattern, string)

一致したものを全部返す

str = 'He who moves not forward, goes backward.'
matches = re.findall(r'\w{3}', str)

３文字を取り出す。一致した次から検索する。

In [15]: print matches
['who', 'mov', 'not', 'for', 'war', 'goe', 'bac', 'kwa']

単語だけ抜き出したければ下記のようにする。

In [16]: matches = re.findall(r'\w+', str)

In [17]: matches
Out[17]: ['He', 'who', 'moves', 'not', 'forward', 'goes', 'backward']

re.match(pattern, string)

文字列の先頭から正規表現と一致するかしらべる

In [133]: str = 'Information is not knowledge.'

In [134]: match = re.match(r'I',str, re.M|re.I)

In [135]: print match
<_sre.SRE_Match object at 0x0000000018D774A8>

In [136]: print match.group()
I

In [137]: match = re.match(r'i',str)

In [138]: print match
None

正規表現

正規表現	意味
a,A,9	指定した文字に一致するか
.	改行以外の一文字
\w	文字(a-zA-Z0-9)
\W	文字以外
\s	空白文字(space,tab,return)
\S	空白文字以外
\d	数字(0-9)
\t	タブ
\n	newline
\r	return
\b	文字の区切り文字 "xxxx"のダブルクォート。文字を囲んでいるものでないと一致しない
^	文字列の先頭
$	文字列の最後
\|特殊文字のキャンセル

[ ]

[]は文字セットを表す。[abc]だとa又はb又はcという意味。

繰り返し

正規表現	意味
*	0回以上の繰り返し
+	1回以上の繰り返し
?	0回又は１回の繰り返し
{n}	n回の繰り返し

group

()で囲むとグールプ化でき、一致した部分の一部を取り出すことができる。

str = 'Change before you have to.' 

match = re.search(r'(\w+)\s(\w+)\s(\w+)\s(\w+)\s([\w.]+)',str)
if match:
    print 'found:', match.group()
else:
    print 'Not found'

group() ... 一致した部分
groups() ... グループに分けたもの
group(n) ... n番目のグループを取り出す。一番最初はgroup(1)になる。

In [15]: print match.group()
Change before you have to.

In [16]: print match.groups()
('Change', 'before', 'you', 'have', 'to.')

In [17]: print match.group(1)
Change

findallでgroupを使った場合

In [18]: matches = re.findall(r'(\w+):(\d+)',str)

In [19]: matches
Out[19]: [('aaa', '111'), ('bbbb', '2222'), ('ccccc', '33333')]

こことかこことかこことかを参考にした

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up