LoginSignup
4
3

More than 5 years have passed since last update.

【Python】言語処理100本ノック:59

Last updated at Posted at 2019-04-19

再帰的な正規表現を用いた解法がネット上で見つからなかったので投稿。
regexモジュールを使うと簡単にかける。(比較は別の方のプログラム参考)

プログラム

import xml.etree.ElementTree as ET
import regex as re

pat_1 = re.compile(r'(?<a>\((?:[^()]+|(?&a))*\))')
pat_2 = re.compile(r'\(.+? ([^()]+?)\)')

tree = ET.parse('nlp.txt.xml')
for parse in tree.iter('parse'):
    for capture in pat_1.search(parse.text).captures('a'):

        if capture[1:3] != 'NP':
            continue

        for match in pat_2.finditer(capture):
            print(match.group(1), end=' ')
        print()

※ Stanford Core NLPの句構造解析結果は nlp.txt.xml に保存されてる前提

出力(一部)

Natural language processing 
Wikipedia 
the free encyclopedia Natural language processing 
NLP 
the free encyclopedia Natural language processing -LRB- NLP -RRB- 
a field 
computer science 
a field of computer science 
artificial intelligence 
...
4
3
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
3