More than 5 years have passed since last update.

言語処理100本ノック : 第1章

Last updated at 2018-02-23Posted at 2018-02-23

自然言語処理 (NLP: Natural Language Processing)の著名なチュートリアルである
言語処理100本ノック 2015 より，第1章: 準備運動 を解きました．
問題はリンク先を見てください．

注意:
初稿の段階で全ての動作は確認していますが，
筆者の都合により回答は改変するかもしれません．

環境

$ python
Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

回答

00: 文字列の逆順

p00.py

string = ''.join(c for c in reversed('stressed'))

別解

string = 'stressed'[::-1]

01: 「パタトクカシーー」

p01.py

string = ''.join(c for idx, c in enumerate('パタトクカシーー') if idx%2 == 1)

別解

string = 'パタトクカシーー'[::2]

02: 「パトカー」＋「タクシー」＝「パタトクカシーー」

p02.py

string = ''.join(c1 + c2 for c1, c2 in zip('パトカー', 'タクシー'))

03: 円周率

p03.py

import re

string = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'.split(' ')
lst = [len(re.sub(r'\W', '', word)) for word in string]

04: 元素記号

p04.py

string = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
str_lst = string.split()

one_lst = [1, 5, 6, 7, 8, 9, 15, 16, 19]
lst = [word[:i] for word, i in zip(str_lst, [1 if i+1 in one_lst else 2 for i in range(len(str_lst))])]

dct = {}
[dct.update({word:idx+1}) for idx, word in enumerate(lst)]

05: n-gram

p05.py

def char_ngram(string, n):
    """ Character-based N-Gram """
    string = string.replace(' ', '');
    return [string[i:i+n] for i in range(len(string)-n+1)]

def word_ngram(string, n):
    """ Word-based N-Gram """
    string = string.split(' ')
    return [string[i:i+n] for i in range(len(string)-n+1)]

06: 集合

p06.py

from p05 import char_ngram

X = set(char_ngram('paraparaparadise', 2))
Y = set(char_ngram('paragraph', 2))

U = X.union(Y)         #和集合
I = X.intersection(Y)  #積集合
D = X.difference(Y)    #差集合

[print('se' in st) for st in [X, Y]]

07: テンプレートによる文生成

p07.py

def gen_sentence(x, y, z):
    return f'{x}時の{y}は{z}'

print(gen_sentence(12, '気温', 22.4))

08: 暗号文

p08.py

def cipher(string):
    return ''.join(chr(219-ord(c)) if c.islower() else c for c in string)

string = "I couldn't believe that I could actually understand what I was reading"
print(string)
print(cipher(string))         # encoded
print(cipher(cipher(string))) # decoded

09: Typoglycemia

p09.py

import random

def char_shuffle(string):
    char_lst = [c for c in string]
    random.shuffle(char_lst)
    return ''.join(c for c in char_lst)

string = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
str_lst = string.split(' ')
string = ' '.join(char_shuffle(string) if len(string) > 4 else string for string in str_lst)

09-EX: Typoglycemia (没)

題意を取り違えたため生まれた没回答をここに供養します．
このコードは5文字以上の単語の順序をバラバラに入れ替えるものです．

p09-EX.py

import random

string = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
str_lst = string.split(' ')

idx_lst = [idx for idx, word in enumerate(str_lst) if len(word) > 4]
random.shuffle(idx_lst)

[idx_lst.insert(idx, idx) for idx in range(len(str_lst)) if idx not in idx_lst]

string = ' '.join(str_lst[i] for i in idx_lst)

06は05で作成したchar_ngram(string)を用いています．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up