More than 5 years have passed since last update.

言語処理100本ノックをやってみる第１章：準備運動

Last updated at 2015-04-21Posted at 2015-04-10

最近Scalaばかり書いていたけれど、急にRubyをやることになったので、勉強がてら言語処理100本ノックをやってみた。
まだRubyの感じが掴めていないので、パッと思いつかなかった問題はScalaでやる感じで。

Rubyだと一般的にこう書く、とか、こう書いた方が良い、とかコメントもらえるとありがたいです。

注意
※ この回答が間違ってる可能性が有ります。
※ 言語処理の知識が無いので、そもそも問題文の解釈を間違えている可能性も有ります。

第３章：正規表現につづく。

第１章：準備運動

00. 文字列の逆順

文字列"stressed"の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

'stressed'.reverse

01. 「パタトクカシーー」

「パタトクカシーー」という文字列の1,3,5,7文字目を取り出して連結した文字列を得よ．

[0,2,4,6].map { |i| ptx[i] }.join

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

["パトカー".chars, "タクシー".chars].transpose.map{ |i| i.join }.join

03. 円周率

"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.".split(" ").map{|word| word.chars.select{|c| c!='.'}.count}

04. 元素記号

"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."という文を単語に分解し，1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字，それ以外の単語は先頭に2文字を取り出し，取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ．

"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.".filter(_ != '.').split(" ").zipWithIndex.map( i => if(Set(1,5,6,7,8,9,15,16,19).contains(i._2)) (i._1.head, i._2) else (i._1.take(2), i._2)).toMap

修正
上記だとzipWithIndexが0始まりなので、先頭１文字の対象が合わないです。
という訳で、修正しました。
ついでに、単語に抽出する処理を正規表現にし、コードを見やすくする為に対象文字列を変数に切り出しています。

val str = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."

"[A-z]+".r.findAllIn(str).zipWithIndex.map(i => if(Seq(0,4,5,6,7,8,14,15,18).contains(i._2)) (i._1.head, i._2) else (i._1.take(2), i._2)).toMap

05. n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，"I am an NLPer"という文から単語bi-gram，文字bi-gramを得よ．

def nGram(n:Int)(target:Array[String]) = {
    (0 to target.length - n).map(num => num until num + n).map(_.map(target(_)))
}

val biGram = nGram(2)_
val wordBiGram = biGram("I am an NLPer".split(" "))
val charBiGram = biGram("I am an NLPer".replace(" ","").split(""))

06. 集合

"paraparaparadise"と"paragraph"に含まれる文字bi-gramの集合を，それぞれ, XとYとして求め，XとYの和集合，積集合，差集合を求めよ．さらに，'se'というbi-gramがXおよびYに含まれるかどうかを調べよ．

def biGram(str:String) = {
    (0 to str.length - 2).map(num => num until num + 2).map(_.map(str(_)).mkString)
}

val X = biGram("paraparaparadise")
val Y = biGram("paragraph")

val union = X union Y
val diff = X diff Y
val intersect = X intersect Y
val xContainsSE = X contains "se"
val yContainsSE = Y contains "se"

07. テンプレートによる文生成

引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ．さらに，x=12, y="気温", z=22.4として，実行結果を確認せよ．

template = ->(x,y,z){"#{x}時の#{y}は#{z}"}
template.call(12, "気温", 22.4)

08. 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数cipherを実装せよ．

英小文字ならば(219 - 文字コード)の文字に置換
その他の文字はそのまま出力
この関数を用い，英語のメッセージを暗号化・復号化せよ．

def cipher(mode, str)
    res = if mode == 'enc'
            str.chars.map do |c|
                if /[a-z]/ =~ c
                    [219 - c.unpack("C*")[0]].pack("C*")
                else
                    c
                end
            end
        elsif mode = 'dec'
            str.chars.map do |c|
                dec = [219 - c[0].unpack("C*")[0]].pack("C*")
                if /[a-z]/ =~ dec
                    dec
                else
                    c
                end
            end
        end
    res.join
end

09. Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．適当な英語の文（例えば"I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."）を与え，その実行結果を確認せよ．

def typoglycemia(str:String) = {
    str.split(" ").map { w => 
        if (w.length <= 4) w
        else w.head + Random.shuffle(w.substring(1, w.length - 1).toSeq).mkString + w.last
    }.mkString(" ")    
}