5
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

顔文字を抽出する正規表現をPythonで

Last updated at Posted at 2019-09-06
import re

m = re.compile(r'\([^あ-ん\u30A1-\u30F4\u2E80-\u2FDF\u3005-\u3007\u3400-\u4DBF\u4E00-\u9FFF\uF900-\uFAFF\U00020000-\U0002EBEF]+?\)')
text = "私は今幸せです(^^)"
res = m.findall(text)
print(res) # -> ['(^^)']

解説

()で囲まれた中でひらがな、カタカナ、漢字が含まれない最短一致を抽出
通常の()の日本語での使われ方を避けて顔文字のみを抽出できるようなルールです。
ex. つらい(とてもつらい)

逆にいうと(^日^)のような漢字、ひらがな込みの高度な顔文字は抽出できません。
結果をもとに適宜ルールを変更して使っていただければ幸いです。

参考にした記事

https://note.nkmk.me/python-re-regex-character-type/
http://www-creators.com/archives/1804
http://www-creators.com/archives/1827

5
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?