More than 5 years have passed since last update.

pythonの文字列操作

Posted at 2014-07-16

##pythonでの基本的な文字列操作のまとめ

良く忘れるのでメモ

###文字列が含まれるかどうかの判定

"ho" in "hoge" # True

###部分文字列の変更 and 削除

"hogehoge".replace('h', 'k') # "kogekoge"
"hogehoge".replace('h', '') # "ogeoge"

###配列を文字列へ（もちろん、中身は文字列に変換可能であること）

"".join([str(x) for x in l]) # "文字列"
"\t".join([str(x) for x in l]) # "tsv"

###特定文字列の出現頻度カウント

"hogehoge".count('h') # 2

###単語のカウント　＊ただし、形態素解析が済んで、かつスペース区切りとなってる

from collections import Counter
Counter("hoge nga kuke".split()) # {"hoge":1, "nga":1, "kuke":1}

###文字のasciiコードへの相互変換

ord('a') # 97
chr(97) # a

###正規表現で日本語抽出（ひらがな）

import re
jap = re.compile("[あ-ん]")
print(jap.findall("ほげほげnga区毛")) # ['ほ','げ','ほ','げ']

漢字とか記号は、Unicodeで頑張る
Unicode:Wikipedia => http://ja.wikipedia.org/wiki/Unicode

*小言
pythonで日本語扱うときは、3系使いましょう（自戒
理由：http://www.pythonweb.jp/tutorial/string/index5.html