More than 1 year has passed since last update.

言語処理100本ノック 2020 第1章: 準備運動解答メモ

Posted at 2023-04-29

はじめに

環境:google colaboratory

00. 文字列の逆順

文字列”stressed”の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

string = "stressed"
inversed = string[::-1]
inversed

desserts

01. 「パタトクカシーー

「パタトクカシーー」という文字列の1,3,5,7文字目を取り出して連結した文字列を得よ．

string = "パタトクカシーー"
extracted = string[::2]
extracted

パトカー

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

pato = "パトカー"
taxi = "タクシー"
ptaatxoi = "".join([i+j for i, j in zip(pato, taxi)])
ptaatxoi

パタトクカシーー

zipでまとめた後で、タプルをアンパックして取り出し。
joinで1つの文字列にまとめる。

03. 円周率

“Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.”という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

import re
string = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
string = re.sub("[,.]+", "", string)
wlist = string.split(" ")
wlen = [len(i) for i in wlist]
wlen

[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]

subで'.'と','を除く。　
splitでスペースごとに分割して単語ごとのリストを作る。　
単語リストの各要素のlengthを取得。

04. 元素記号

“Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.”という文を単語に分解し，1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字，それ以外の単語は先頭の2文字を取り出し，取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ．

string = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."
one_alpahbet = [1, 5, 6, 7, 8, 9, 15, 16, 19]
wlist = string.split(" ")
wdict = {val[0] if id + 1 in one_alpahbet else val[:2]:id+1 for id, val in enumerate(wlist)}
wdict

{'H': 1,
 'He': 2,
 'Li': 3,
 'Be': 4,
 'B': 5,
 'C': 6,
 'N': 7,
 'O': 8,
 'F': 9,
 'Ne': 10,
 'Na': 11,
 'Mi': 12,
 'Al': 13,
 'Si': 14,
 'P': 15,
 'S': 16,
 'Cl': 17,
 'Ar': 18,
 'K': 19,
 'Ca': 20}

一文字だけ取り出す単語のindexをone_alpahbetに書いておく。　
単語リストの作り方は03と一緒。
dict内包表記でkeyとvalueを整える。　

key:val[0] if id + 1 in one_alpahbet else val[:2]
value:id+1

05. n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，”I am an NLPer”という文から単語bi-gram，文字bi-gramを得よ．

seq = "I am an NLPer"

def n_gram(seq, n):
  return [seq[i:i+n] for i in range(len(seq) - n + 1)]

w_ngram = n_gram(seq.split(" "), 2)
c_ngram = n_gram(seq, 2)

print(f"word n-gram: {w_ngram}")
print(f"charactar n-gram: {c_ngram}")

word n-gram: [['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
charactar n-gram: ['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']

n-gramとは何ぞや？が一番の壁。
https://qiita.com/kazmaw/items/4df328cba6429ec210fb

06. 集合

“paraparaparadise”と”paragraph”に含まれる文字bi-gramの集合を，それぞれ, XとYとして求め，XとYの和集合，積集合，差集合を求めよ．さらに，’se’というbi-gramがXおよびYに含まれるかどうかを調べよ．

def n_gram(seq, n):
  return [seq[i:i+n] for i in range(len(seq) - n + 1)]

str1 = "paraparaparadise"
str2 = "paragraph"

X = set(n_gram(str1, 2))
Y = set(n_gram(str2, 2))

add = X | Y
mul = X & Y
sub = X - Y

print(
    f"add: {add}\n"
    f"mul: {mul}\n"
    f"sub: {sub}\n"
    f"'se' includes in X: {'se' in X} \n"
    f"'se' includes in Y: {'se' in Y} "
)

add: {'ad', 'ra', 'ph', 'gr', 'ap', 'se', 'ag', 'is', 'di', 'ar', 'pa'}
mul: {'ap', 'ra', 'ar', 'pa'}
sub: {'se', 'ad', 'is', 'di'}
'se' includes in X: True
'se' includes in Y: False

集合はsetを使う。

07. テンプレートによる文生成

引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ．さらに，x=12, y=”気温”, z=22.4として，実行結果を確認せよ．

def temprate_sample(x, y, z):
  return f"{x}時の{y}は{z}"

x = 12
y = "気温"
z = 22.4

temprate_sample(x, y, z)

12時の気温は22.4

前の問題でも無意識で使っていた。
f文字列はpythonのバージョンが3.5とかだと使えないので注意。

08. 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数cipherを実装せよ．

英小文字ならば(219 - 文字コード)の文字に置換
その他の文字はそのまま出力
この関数を用い，英語のメッセージを暗号化・復号化せよ．

def cipher(string):
  return [chr(219 - ord(i)) if i.islower() else i for i in string]

sample = "AbCdEfG"
cipher(sample)

['A', 'y', 'C', 'w', 'E', 'u', 'G']

問題文だと何をやってるか分かりづらい。小文字の場合には前からx番目のアルファベットを後ろからx番目のものに置き換える。　
a→z,b→y,c→x ... x→c, y→b, z→aになる。

09. Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．適当な英語の文（例えば”I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .”）を与え，その実行結果を確認せよ．

import random
sample = "I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
wlist = sample.split(" ")
shuffled_wlist = [i[0] + "".join(random.sample(i[1:-1], len(i)-2)) + i[-1] if len(i) > 4 else i for i in wlist]
" ".join(shuffled_wlist)

I c’ndulot belviee that I cluod acalulty ustrannded what I was rdaneig : the pohneemanl peowr of the human mind .

ランダムに並び替えるにはrandom.sampleを使う。　
random.shuffleは文字列には使えないので注意。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

言語処理100本ノック 2020 第1章: 準備運動 解答メモ