More than 5 years have passed since last update.

顔文字を抽出する正規表現をPythonで

Last updated at 2019-09-10Posted at 2019-09-06

import re

m = re.compile(r'\([^あ-ん\u30A1-\u30F4\u2E80-\u2FDF\u3005-\u3007\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFAFF\U00020000-\U0002EBEF]+?\)')
text = "私は今幸せです(^^)"
res = m.findall(text)
print(res) # -> ['(^^)']

解説

()で囲まれた中でひらがな、カタカナ、漢字が含まれない最短一致を抽出
通常の()の日本語での使われ方を避けて顔文字のみを抽出できるようなルールです。
ex. つらい(とてもつらい)

逆にいうと(^日^)のような漢字、ひらがな込みの高度な顔文字は抽出できません。
結果をもとに適宜ルールを変更して使っていただければ幸いです。

参考にした記事

https://note.nkmk.me/python-re-regex-character-type/
http://www-creators.com/archives/1804
http://www-creators.com/archives/1827

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up