LoginSignup
0
0

今日の作業記録 python error(言語処理100本ノック:30)未解決

Last updated at Posted at 2019-01-23

言語処理100本ノック 2015

30. 形態素解析結果の読み込み

http://www.cl.ecei.tohoku.ac.jp/nlp100/
「形態素解析結果(neko.txt.mecab)を読み込むプログラムを実装せよ.ただし,各形態素は表層形(surface),基本形(base),品詞(pos),品詞細分類1(pos1)をキーとするマッピング型に格納し,1文を形態素(マッピング型)のリストとして表現せよ.第4章の残りの問題では,ここで作ったプログラムを活用せよ.

.」
素人の言語処理100本ノック:30
https://qiita.com/segavvy/items/1f517e06aa3bc5fc2316

# wget http://www.cl.ecei.tohoku.ac.jp/nlp100/data/neko.txt
--2019-01-23 07:03:00--  http://www.cl.ecei.tohoku.ac.jp/nlp100/data/neko.txt
Resolving www.cl.ecei.tohoku.ac.jp (www.cl.ecei.tohoku.ac.jp)... 130.34.192.83
Connecting to www.cl.ecei.tohoku.ac.jp (www.cl.ecei.tohoku.ac.jp)|130.34.192.83|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 965825 (943K) [text/plain]
Saving to: ‘neko.txt’

neko.txt                     100%[============================================>] 943.19K  3.63MB/s    in 0.3s    

2019-01-23 07:03:01 (3.63 MB/s) - ‘neko.txt’ saved [965825/965825]


# ./p30.py

(中略)
Traceback (most recent call last):
  File "./p30.py", line 42, in <module>
    for line in lines:
  File "./p30.py", line 25, in neco_lines
    res_cols = cols[1].split(',')
IndexError: list index out of range

ソースは下記(コマンドとして実行したく1行目追記)

#!/usr/bin/env python
# coding: utf-8

import MeCab
fname = 'neko.txt'
fname_parsed = 'neko.txt.mecab'

def parse_neko():
    with open(fname) as data_file, \
            open(fname_parsed, mode='w') as out_file:

        mecab = MeCab.Tagger()
        out_file.write(mecab.parse(data_file.read()))

def neco_lines():
    with open(fname_parsed) as file_parsed:

        morphemes = []
        for line in file_parsed:
            cols = line.split('\t')
            if(len(cols) < 2):
                #raise StopIteration
                #break
                pass
            res_cols = cols[1].split(',')

            morpheme = {
                'surface': cols[0],
                'base': res_cols[6],
                'pos': res_cols[0],
                'pos1': res_cols[1]
            }
            morphemes.append(morpheme)

            if res_cols[1] == '句点':
                yield morphemes
                morphemes = []

parse_neko()

lines = neco_lines()
for line in lines:
    print(line)

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0