自然言語処理 (NLP: Natural Language Processing)の著名なチュートリアルである
言語処理100本ノック 2015 より,第1章: 準備運動 を解きました.
問題はリンク先を見てください.
注意:
初稿の段階で全ての動作は確認していますが,
筆者の都合により回答は改変するかもしれません.
###環境
$ python
Python 3.6.2 |Anaconda custom (64-bit)| (default, Jul 20 2017, 13:14:59)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
###回答
00: 文字列の逆順
p00.py
string = ''.join(c for c in reversed('stressed'))
別解
string = 'stressed'[::-1]
01: 「パタトクカシーー」
p01.py
string = ''.join(c for idx, c in enumerate('パタトクカシーー') if idx%2 == 1)
別解
string = 'パタトクカシーー'[::2]
02: 「パトカー」+「タクシー」=「パタトクカシーー」
p02.py
string = ''.join(c1 + c2 for c1, c2 in zip('パトカー', 'タクシー'))
03: 円周率
p03.py
import re
string = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'.split(' ')
lst = [len(re.sub(r'\W', '', word)) for word in string]
04: 元素記号
p04.py
string = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
str_lst = string.split()
one_lst = [1, 5, 6, 7, 8, 9, 15, 16, 19]
lst = [word[:i] for word, i in zip(str_lst, [1 if i+1 in one_lst else 2 for i in range(len(str_lst))])]
dct = {}
[dct.update({word:idx+1}) for idx, word in enumerate(lst)]
05: n-gram
p05.py
def char_ngram(string, n):
""" Character-based N-Gram """
string = string.replace(' ', '');
return [string[i:i+n] for i in range(len(string)-n+1)]
def word_ngram(string, n):
""" Word-based N-Gram """
string = string.split(' ')
return [string[i:i+n] for i in range(len(string)-n+1)]
06: 集合
p06.py
from p05 import char_ngram
X = set(char_ngram('paraparaparadise', 2))
Y = set(char_ngram('paragraph', 2))
U = X.union(Y) #和集合
I = X.intersection(Y) #積集合
D = X.difference(Y) #差集合
[print('se' in st) for st in [X, Y]]
07: テンプレートによる文生成
p07.py
def gen_sentence(x, y, z):
return f'{x}時の{y}は{z}'
print(gen_sentence(12, '気温', 22.4))
08: 暗号文
p08.py
def cipher(string):
return ''.join(chr(219-ord(c)) if c.islower() else c for c in string)
string = "I couldn't believe that I could actually understand what I was reading"
print(string)
print(cipher(string)) # encoded
print(cipher(cipher(string))) # decoded
09: Typoglycemia
p09.py
import random
def char_shuffle(string):
char_lst = [c for c in string]
random.shuffle(char_lst)
return ''.join(c for c in char_lst)
string = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
str_lst = string.split(' ')
string = ' '.join(char_shuffle(string) if len(string) > 4 else string for string in str_lst)
09-EX: Typoglycemia (没)
題意を取り違えたため生まれた没回答をここに供養します.
このコードは5文字以上の単語の順序をバラバラに入れ替えるものです.
p09-EX.py
import random
string = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
str_lst = string.split(' ')
idx_lst = [idx for idx, word in enumerate(str_lst) if len(word) > 4]
random.shuffle(idx_lst)
[idx_lst.insert(idx, idx) for idx in range(len(str_lst)) if idx not in idx_lst]
string = ' '.join(str_lst[i] for i in idx_lst)
コメント
06
は05
で作成したchar_ngram(string)
を用いています.