言語処理100本ノック 2015
42. 係り元と係り先の文節の表示
http://www.cl.ecei.tohoku.ac.jp/nlp100/
「係り元の文節と係り先の文節のテキストをタブ区切り形式ですべて抽出せよ.ただし,句読点などの記号は出力しないようにせよ..」
素人の言語処理100本ノック:42
https://qiita.com/segavvy/items/58894f76dba367b2925b
# cp ../chap03/neko.txt .
# ./p42.py
(前略)
Traceback (most recent call last):
File "./p42.py", line 90, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p42.py", line 94, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
ソースは下記(コマンドとして実行したく1行目追記)
#!/usr/bin/env python
# coding: utf-8
import CaboCha
import re
fname = 'neko.txt'
fname_parsed = 'neko.txt.cabocha'
def parse_neko():
with open(fname) as data_file, \
open(fname_parsed, mode='w') as out_file:
cabocha = CaboCha.Parser()
for line in data_file:
out_file.write(cabocha.parse(line).toString(CaboCha.FORMAT_LATTICE))
class Morph:
def __init__(self, surface, base, pos, pos1):
self.surface = surface
self.base = base
self.pos = pos
self.pos1 = pos1
def __str__(self):
return 'surface[{}]\tbase[{}]\tpos[{}]\tpos1[{}]'\
.format(self.surface, self.base, self.pos, self.pos1)
class Chunk:
def __init__(self):
self.morphs = []
self.srcs = []
self.dst = -1
def __str__(self):
surface = ''
for morph in self.morphs:
surface += morph.surface
return '{}\tsrcs{}\tdst[{}]'.format(surface, self.srcs, self.dst)
def normalized_surface(self):
result = ''
for morph in self.morphs:
if morph.pos != '記号':
result += morph.surface
return result
def neco_lines():
with open(fname_parsed) as file_parsed:
chunks = dict()
idx = -1
for line in file_parsed:
if line == 'EOS\n':
if len(chunks) > 0:
sorted_tuple = sorted(chunks.items(), key=lambda x: x[0])
yield list(zip(*sorted_tuple))[1]
chunks.clear()
else:
yield []
elif line[0] == '*':
cols = line.split(' ')
idx = int(cols[1])
dst = int(re.search(r'(.*?)D', cols[2]).group(1))
if idx not in chunks:
chunks[idx] = Chunk()
chunks[idx].dst = dst
if dst != -1:
if dst not in chunks:
chunks[dst] = Chunk()
chunks[dst].srcs.append(idx)
else:
cols = line.split('\t')
res_cols = cols[1].split(',')
chunks[idx].morphs.append(
Morph(
cols[0],
res_cols[6],
res_cols[0],
res_cols[1]
)
)
raise StopIteration
parse_neko()
for chunks in neco_lines():
for chunk in chunks:
if chunk.dst != -1:
src = chunk.normalized_surface()
dst = chunks[chunk.dst].normalized_surface()
if src != '' and dst != '':
print('{}\t{}'.format(src, dst))
43
Traceback (most recent call last):
File "./p43.py", line 94, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p43.py", line 98, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
45
Traceback (most recent call last):
File "./p45.py", line 100, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p45.py", line 105, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
46
Traceback (most recent call last):
File "./p46.py", line 111, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p46.py", line 116, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
47
Traceback (most recent call last):
File "./p47.py", line 120, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p47.py", line 125, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
48
Traceback (most recent call last):
File "./p48.py", line 120, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p48.py", line 125, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
49
Traceback (most recent call last):
File "./p49.py", line 133, in neco_lines
raise StopIteration
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./p49.py", line 138, in <module>
for chunks in neco_lines():
RuntimeError: generator raised StopIteration
最後までおよみいただきありがとうございました。
いいね 💚、フォローをお願いします。
Thank you very much for reading to the last sentence.
Please press the like icon 💚 and follow me for your happy life.