LoginSignup
0
1

More than 5 years have passed since last update.

pythonで大文字から始まる単語のみカウントするワードカウント

Last updated at Posted at 2017-06-01

自然言語処理をpythonで書く中で大文字のみのカウンターを書く場面があったので忘れないようにメモ
・大文字から始まる単語のカウント
・単語の先頭のみが大文字の場合のみ
・先頭大文字+数字なども対象

※jupyter notebookで書いているため言語指定などは行っていません。。。

import re
from collections import Counter

org_text = """
Of course a writer can write from the viewpoint of Southern slave owners who of course would be racist and 
view black people with, at best, benevolent paternalism, without her characters' 
attitudes necessarily being her own. But it's very apparent that Mitchell was wholly and uncritically sympathetic to her antebellum ancestors. 
The Old South was a graceful, chivalrous land where slavery was not a horrible and oppressive institution creating generations of misery and oppression, 
but a divinely-ordained means of preserving racial harmony. And for all that Mitchell, 
like her characters, probably considered herself to be kind and affectionate to all the African-Americans she knew personally, 
there isn't a single black character in the book who isn't an ignorant, semi-human ape -- which is literally how they are described. 
Even beloved Mammy is repeatedly compared to a monkey.
"""

#文字列を区切って配列へ
words= re.split(r'\s|\,|\.|\(|\)',org_text)

#大文字から始まる単語のみ抽出
r = re.compile("^[A-Z]$|^[A-Z][a-z0-9]+$")
dict_word=[x for x in words if r.match(x)]

counter=Counter(dict_word)

for word,count in counter.most_common():
    print("%s,%d" % (word,count))


0
1
5

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1