More than 5 years have passed since last update.

言語処理１００本ノック２０２０やってみた part1

Last updated at 2020-09-10Posted at 2020-09-10

はじめに#

言語処理100本ノック 2020 (Rev 1)の第1章: 準備運動の自分なりの解答です。
問題文の意図と違った解答をしている時が多分あるのでその時はコメントください。

使用環境
google colab

00 文字列の逆順

文字列”stressed”の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

word= 'stressed'
reversed_word = word[::-1]
print(reversed_word)

出力

desserts

【Python】文字列を反転させる方法

01 「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

word = 'パタトクカシーー'
result = word[0]+word[2]+word[4]+word[6]
print(result)

出力

パトカー

02 「パトカー」＋「タクシー」＝「パタトクカシーー」

“Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.”という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

pato = 'パトカー'
taxi = 'タクシー'
patotaxi=''
for i,j in zip(pato,taxi):
  patotaxi += i+j
print(patotax)

出力

パタトクカシーー

03 円周率

“Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.”という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

sentence = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'
li = sentence.replace(',','').split(' ')
word_count_list =[len(i) for i in li]
print(word_count_list)

出力

[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 10]

Python 文字列の先頭と末尾を削除(strip/lstrip/rstrip)
Pythonで文字列の一部を削除（stripなど）

04 元素記号

“Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.”という文を単語に分解し，1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字，それ以外の単語は先頭の2文字を取り出し，取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ

def minus1(li):
  return li-1

def get_word(words_li,index,num):
  dic ={}
  for i in index:
    dic[words_li[i][0:num]] = i+1
  return dic

sentence = 'Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.'
# 空白で区切ったリストを作成
words_list = sentence.split(' ')
words_number = len(words_list)
# 取り出す文字が１文字の単語の場所（インデックス）
tmp = [1,5,6,7,8,9,15,16,19]
# リストなので[実際の番目−１]となる
one_char_words_index =list(map(minus1,tmp))
# tmpのリストにあるの以外は２文字取り出す
two_char_words_index = [index for index in range(words_number) if not index in one_char_words_index]
# key:value = 取り出した文字列 : index
dic ={}
dic = get_word(words_list,one_char_words_index,1)
dic.update(get_word(words_list,two_char_words_index,2))
print(dic)

出力

{'H': 1, 'B': 5, 'C': 6, 'N': 7, 'O': 8, 'F': 9, 'P': 15, 'S': 16, 'K': 19, 'He': 2, 'Li': 3, 'Be': 4, 'Ne': 10, 'Na': 11, 'Mi': 12, 'Al': 13, 'Si': 14, 'Cl': 17, 'Ar': 18, 'Ca': 20}

リスト内包表記とif文を組み合わせるとき
Pythonで辞書に要素を追加、辞書同士を連結（結合）

05 n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，”I am an NLPer”という文から単語bi-gram，文字bi-gramを得よ．

def get_ngram(li,n):
  n_gram =[]
  #実行回数は[対象単語数-(n-gram)+1]
  executions = len(li)-n+1
  start = 0
  end = n
  for _ in range(executions):
    n_gram.append(li[start:end])
    start +=1
    end +=1
  return n_gram

sentence = input("Please input sentence : ")
sentence = sentence.split(" ")
n_gram = get_ngram(sentence,2)
print(n_gram)

出力

Please input sentence : I have a pen. I have an apple.
[['I', 'have'], ['have', 'a'], ['a', 'pen.'], ['pen.', 'I'], ['I', 'have'], ['have', 'an'], ['an', 'apple.']]

06 集合

“paraparaparadise”と”paragraph”に含まれる文字bi-gramの集合を，それぞれ, XとYとして求め，XとYの和集合，積集合，差集合を求めよ．さらに，’se’というbi-gramがXおよびYに含まれるかどうかを調べよ．

def get_char_ngram(word,n):
  char_n_gram =[]
  executions = len(word)-n+1
  start = 0
  end = n
  for _ in range(executions):
    char_n_gram.append(word[start:end])
    start +=1
    end +=1
  return set(char_n_gram)
n=2
X = 'paraparaparadise'
Y = 'paragraph'
X = get_char_ngram(X,n)
Y = get_char_ngram(Y,n)
print('和集合 : {}'.format(X | Y))
print('積集合 : {}'.format(X & Y))
print('差集合 : {}'.format(X - Y))
# set型でなければ使えない

出力

和集合 : {'ap', 'pa', 'ag', 'ar', 'is', 'gr', 'se', 'di', 'ad', 'ph', 'ra'}
積集合 : {'ar', 'ap', 'pa', 'ra'}
差集合 : {'ad', 'is', 'di', 'se'}

Python, set型で集合演算（和集合、積集合や部分集合の判定など）

07 テンプレートによる文生成

引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ．さらに，x=12, y=”気温”, z=22.4として，実行結果を確認せよ

def output(x,y,z):
  print("{}時の{}は{}".format(x,y,z))
x=12
y='気温'
z = 22.4
output(x,y,z)

出力

12時の気温は22.4

08 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数cipherを実装せよ．
英小文字ならば(219 - 文字コード)の文字に置換
その他の文字はそのまま出力
この関数を用い，英語のメッセージを暗号化・復号化せよ．

lowercase = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
inp = input()
n = ''
for i in inp:
  if i in lowercase:
    n += chr(219 - ord(i))
    continue
  n += i
print(n)
# 入力　abcDeFGhIhj

出力

zyxDvFGsIsq

文字コードについて整理する
PythonでUnicodeエスケープされた文字列・バイト列を変換
PythonでUnicodeコードポイントと文字を相互変換（chr, ord, \x, \u, \U）

09 Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．適当な英語の文（例えば”I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .”）を与え，その実行結果を確認せよ．

import random

def change_char_order(li):
  result = []
  for i in li:
    if len(i) >= 4:
      result.append(randam_order(i))
      continue
    result.append(i)
  return result

def randam_order(word):
  first = 1
  end = -1
  res = ''
  pre_inner_word = word[first:end]
  length = len(pre_inner_word)
  #内側の単語数だけランダムに数字を作成
  random_list=[random.random() for i in range(length)]
  #zip()でペアを作ってやる
  comb_list = zip(random_list,pre_inner_word)
  #radam_listの方をsortすることで入れ替わる
  comb_list = sorted(comb_list,key=lambda x:x[0]) #x:x[0]はrandam_listを指定している  x:x[1]だとpre_innner_words
  #[_,sorted_char]の前半部分の[_]はいらないので適当な変数にいれる (搾かす)
  _,sorted_char = zip(*comb_list) 
  for i in sorted_char:
    res += i
  return str(word[0]) + res + str(word[-1])

sentence = 'I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .'
sentence = sentence.split(' ')
print(sentence)
result = change_char_order(sentence)
print(result)
'''

I couldn’t believe that I could actually understand what I was reading : the phenomenal power of the human mind .

 #strにするの面倒だったので割愛
['I', 'clodn’ut', 'bleeive', 'taht', 'I', 'colud', 'aacultly', 'uannretsdd', 'what', 'I', 'was', 'rinedag', ':', 'the', 'pneenmhoal', 'pweor', 'of', 'the', 'hamun', 'mnid', '.']

Pythonでランダムな小数・整数を生成するrandom, randrange, randintなど
Python3で複数の配列を同時にソート

感想

記事書くの大変。使用メソッドとかテーブルにして各問題ごとに書こうと思います。

2020/09/09

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up