More than 5 years have passed since last update.

Pythonでテキストマイニング　②Word Cloudで可視化

Last updated at 2017-03-13Posted at 2017-03-13

Pythonでテキストマイニングに挑戦。(Python3系対象)
以下のステップで取り組む。

①形態素解析(前回)
②Word Cloudで可視化(今回)
③日本語の文書を形態素解析してWord Cloudで可視化(次回)

Word Cloudとは

goo辞書によると、「文章中で出現頻度が高い単語を複数選び出し、その頻度に応じた大きさで図示する手法。（中略）文字の大きさだけでなく、色、字体、向きに変化をつけることで、文章の内容をひと目で印象づけることができる。」

要するに、↓こんなやつ。

ユーザーローカルさんが無料でWebサービスを公開しているが、これをPythonでやってみる。

Word Cloudのライブラリ

Pythonで使えるものとしてAndreas Muellerさんが公開しているword_cloudがあるようなので、これを使ってみる。

word_cloudのインストール

pipでインストールできるよう。

sudo pip3 install wordcloud

すんなりインストールできたので試しに使ってみる。
半角スペースで区切られた文字列を渡す必要があるようなので、とりあえず英文で。トランプ大統領の就任演説の冒頭部分を題材にする。

wordcloud_sample.py

# coding: utf-8
from wordcloud import WordCloud

text = "Chief Justice Roberts, President Carter, President Clinton, President \
		Bush, President Obama, fellow Americans, and people of the world: \
		thank you. We, the citizens of America, are now joined in a great \
		national effort to rebuild our country and to restore its promise for \
		all of our people. \
		Together, we will determine the course of America and the world for \
		years to come. \
		We will face challenges. We will confront hardships. But we will get \
		the job done. \
		Every four years, we gather on these steps to carry out the orderly \
		and peaceful transfer of power, and we are grateful to President Obama \
		and First Lady Michelle Obama for their gracious aid throughout this \
		transition. They have been magnificent."

wordcloud = WordCloud(background_color="white",
	font_path="/usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf",
	width=800,height=600).generate(text)

wordcloud.to_file("./wordcloud_sample.png")

簡単に解説すると、作成する画像に関する設定を引数にしてWordCloudオブジェクトを作り、generate()メソッドに描画対象となる文字列を渡して初期化。to_file()メソッドで画像ファイルに出力、という感じ。
コンストラクタの引数は公式リファレンスを参照。

上記サンプルを動かしてできあがった画像が冒頭のもの。

python3 wordcloud_sample.py

参考にしたサイト

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Pythonでテキストマイニング ②Word Cloudで可視化

Word Cloudとは

Word Cloudのライブラリ

word_cloudのインストール

参考にしたサイト

Pythonでテキストマイニング　②Word Cloudで可視化