1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

新型コロナウイルス感染者情報の特徴をwordcloudで可視化してみた

Posted at

概要

  • 新型コロナウイルス(COVID-19)の日本の感染者情報を取得
  • mecabで形態素解析
  • wordcloudで特徴語を可視化

参考

新型コロナウイルス(COVID-19)の感染者情報

config

  • config.py
import re
import os

### MeCab
POS_LIST = [10, 11, 31, 32, 34]
POS_LIST.extend(list(range(36,50)))
POS_LIST.extend([59, 60, 62, 67])
STOP_WORDS = ["する", "ない", "なる", "もう", "しよ", "でき", "なっ", "くっ", "やっ", "ある", "しれ", "思う", "今日", "それ", "これ", "あれ", "どれ", "どの", "NULL", "れる", "なり", "あっ", "できる", ""]
RE_ALPHABET = re.compile("^[0-9a-zA-Z0-9 .,*<>]+$") # alphabet, number, space, comma or dot
current_dir = os.getcwd()
OUTPUT_PNG_FILE = os.path.join(current_dir, "wordcloud.png")
  • 前処理

(略)

形態素解析

import MeCab
from os import path
from wordcloud import WordCloud
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import re

def create_mecab_list(text_list):
	mecab_list = []
	mecab = MeCab.Tagger("-Ochasen -d /usr/local/lib/mecab/dic/mecab-ipadic-neologd") # MacOS
	mecab.parse("")
	# encoding = text.encode('utf-8')
	for text in text_list:
		node = mecab.parseToNode(text)
		while node:
			# [品詞,品詞細分類1,品詞細分類2,品詞細分類3,活用形,活用型,原形,読み,発音]
			# 忙しく  形容詞,自立,*,*,形容詞・イ段,連用テ接続,忙しい,イソガシク,イソガシク
			morpheme = node.feature.split(",")[6]
			if RE_ALPHABET.match(morpheme):
				node = node.next
				continue
			if morpheme in STOP_WORDS:
				node = node.next
				continue
			if len(morpheme) > 1:
				if node.posid in POS_LIST:
					mecab_list.append(morpheme)
			node = node.next
	return mecab_list

wordcloud

import MeCab
from os import path
from wordcloud import WordCloud
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import re

def create_wordcloud(morphemes):
	# fpath = "/usr/share/fonts/truetype/takao-gothic/TakaoPGothic.ttf" # Ubuntu
	fpath = "/System/Library/Fonts/ヒラギノ丸ゴ ProN W4.ttc" # Mac OS X
	wordcloud = WordCloud(
		background_color="whitesmoke",
		collocations=False,
		stopwords=set(STOP_WORDS),
		max_font_size=80,
		relative_scaling=.5,
		width=800,
		height=500,
		font_path=fpath
		).generate(morphemes)
	plt.figure()
	plt.imshow(wordcloud)
	plt.axis("off")
	wordcloud.to_file(OUTPUT_PNG_FILE)

結果

wordcloud (5).png

  • 「女性」より「男性」のほうが感染者が多い
    • → 「女性」より「男性」のほうが文字サイズが大きい
  • 「20代」が以外に多い
  • 「マスク」大事…

その他新型コロナ関連情報

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?