More than 5 years have passed since last update.

Pythonで聖書の頻出語をカウント

Posted at 2020-07-16

0. 実現すること

前回の記事のPython版です。

聖書の頻出語をカウントします。

1. 環境

MacbookPro
Python3
Jupyter notebook

2. データの取得

Githubに公開されているデータをお借りします。
bible.txt

フォルダに保存します。

3. Jupyter notebookの起動

聖書のデータを保存したフォルダでjupyter notebookを起動します。
jupyter notebookをinstallしていない方は別サイト様を参考にしてください。

[【2019年5月】MacBookにJupyter Notebookをインストールする (macOS 10.14.4/Mojave)]
(https://qiita.com/inai/items/e9da22eb336f7f2cd375)

4. Code

read_bible.ipynb

import re
import collections

path = "./bible.txt"

with open(path) as f:
    s = f.read()

# データの整形
s = re.sub(r'[,.:;"?() ]', " ", s)
s = re.sub('\n', " ", s)
s = s.lower()
s = s.split()
s = sorted(s)

# 頻出語をカウント
counter = collections.Counter(s)
counter.most_common()

5. メモ

文字列処理の練習でした。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up