0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

言語処理100本ノック 第4章: 形態素解析 30. 形態素解析結果の読み込み

Posted at

実行環境
・MacOS
・AnacondaからのJupyter Notebook
・Python 3.7

30. 形態素解析結果の読み込み

形態素解析結果(neko.txt.mecab)を読み込むプログラムを実装せよ.ただし,各形態素は表層形(surface),基本形(base),品詞(pos),品詞細分類1(pos1)をキーとするマッピング型に格納し,1文を形態素(マッピング型)のリストとして表現せよ.第4章の残りの問題では,ここで作ったプログラムを活用せよ.

前回、以下の前処理したデータを使って続きの処理をする。

前回の処理

# ファイル読み込み
path = '/Users/shin/Documents/NLP/neko.txt'
with open(path) as f:
    l = f.readlines()
    print(l)

# Pythonで文字列のリスト(配列)の条件を満たす要素を抽出、置換
table = str.maketrans({
    '\n': '',
    '\u3000': '',
    ' ':' '
})
l_replace = [s.translate(table) for s in l]

# 空のlist削除
l_replace = [s for s in l_replace if s != '']

ここからが今回の処理

# 言語処理100本ノック2020: 第四章(形態素解析)
# https://qiita.com/TsumaR/items/40ebf328deeb66e3ba86

from pyknp import Juman
jumanpp = Juman()
result_list = []
for s in l_replace:
    morpheme_list = []
    result = jumanpp.analysis(s)
    for mrph in result.mrph_list(): # 各形態素にアクセス
        component_dict = {
            "surface":mrph.midasi,
            "base":mrph.genkei,
            "pos":mrph.hinsi,
            "pos1":mrph.bunrui
        }
        morpheme_list.append(component_dict)
    result_list.append(morpheme_list)

結果

[[{'surface': '一', 'base': '一', 'pos': '名詞', 'pos1': '数詞'}],
 [{'surface': '吾輩', 'base': '吾輩', 'pos': '名詞', 'pos1': '普通名詞'},
  {'surface': 'は', 'base': 'は', 'pos': '助詞', 'pos1': '副助詞'},
  {'surface': '猫', 'base': '猫', 'pos': '名詞', 'pos1': '普通名詞'},
  {'surface': 'である', 'base': 'だ', 'pos': '判定詞', 'pos1': '*'},
  {'surface': '。', 'base': '。', 'pos': '特殊', 'pos1': '句点'}],
~

ひとまず良さそう。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?