4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

小説の単語をword cloudでみる

Posted at

自然言語処理の学習途中でWordCloudで遊んでみた。
マスクを指定すればおしゃれなグラフィックが作れるし、形態素解析後の可視化ツールとしても便利
文章は羅生門。
羅生門のtxtファイル自体は青空文庫さんから

from janome.tokenizer import Tokenizer
import zipfile
import os.path, urllib.request as request
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline


file = withopen('rashomon.txt', 'r') as f:
    bindata = f.read()
textdata = bindata.decode('shift_jis')


t=Tokenizer()
tokens=t.tokenize(textdata)

word解析する。フォントを指定いないと文字化けするのでfpathとしてダウンロードしたNotoSanをpathとして指定。


words = ""
for token in tokens:
    if token.part_of_speech.split(',')[0] in ['名詞', '動詞', '副詞']:
        words = words + " " + token.base_form
        

fpath="NotoSansCJKjp-hinted/NotoSansCJKjp-Black.otf"
wordcloud = WordCloud(background_color="white",width=800,height=500,font_path=fpath).generate(words)


plt.figure(figsize=(30,24))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()

image.png

動詞のいる、するが大きく出てしまっているが登場人物の老婆と下人がやはり大きい。

4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?