More than 5 years have passed since last update.

JavaScript で言語処理100本ノック2015 第1章: 準備運動

100

Last updated at 2015-08-06Posted at 2015-07-31

かの有名な言語処理100本ノック2015年版をJSで解いてみました。
第1章の準備運動編です。
http://www.cl.ecei.tohoku.ac.jp/nlp100/

全てNode.jsで実行しましたが、たぶん特に問題なくブラウザでも動くかと思います。
出来るだけ再利用可能な形で書きました。

間違いがあればご指摘ください。

第1章:準備運動編

00. 文字列の逆順

文字列"stressed"の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

var answer = 'stressed'.split("").reverse().join("");
console.log(answer);

01. 「パタトクカシーー」

「パタトクカシーー」という文字列の1,3,5,7文字目を取り出して連結した文字列を得よ．

var answer = "";
for(var i in o = [0, 2, 4, 6]){
  answer += "パタトクカシーー".charAt(o[i]);
}
console.log(answer);

言語処理という事でfor...inをpythonっぽく書こうとしてみました。
怒られそうだけど、見易いし結構好きです。

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

var answer = "";
for (var i = 0; i < "パトカー".length; i++) {
  answer += "パトカー".charAt(i) + "タクシー".charAt(i);
}
console.log(answer);

03. 円周率

"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

var list = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.".replace(/,|\./g,'').split(" ");
var answer = [];
for(var i = 0; i<list.length; i++ ){
  answer[i] = list[i].length;
}
console.log(answer);

04. 元素記号

"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."という文を単語に分解し，
1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字，それ以外の単語は先頭に2文字を取り出し，
取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ．

var list = "Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.".replace(/,|\./g,'').split(" ");
var answer = {};
for(var i = 0; i<list.length; i++ ){
  answer[String(i+1)] = list[i].slice(0,2);
}
for(var i in o = [1, 5, 6, 7, 8, 9, 15, 16, 19]){
  answer[o[i]] = answer[o[i]].charAt(0);
}
console.log(answer);

数字をキーにしたんですが、元素記号をキーにした方がよかったんでしょうか。

05. n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，"I am an NLPer"という文から単語bi-gram，文字bi-gramを得よ．

/**
* N-gram変換
* @class Ngram
* @param {array | string} list 処理を施す文字列または配列
* @param {Number} n 結合する数
* @property {array} word 単語n-gram処理をした結果の配列
* @property {array} character 文字n-gram処理をした結果の配列
* @property {string} separator 文字列結合に使う文字
*/
var Ngram = function(list,n,separator){
  this.list = list;
  this.n = n;
  this.word = [];
  this.character = [];
  this.separator = separator;
  this.init = function(){
    //文字列の場合配列に返還
    if(typeof this.list == "string"){
      this.list = this.list.replace(/,|\./g,'').split(" ");
    }
    this.setWord();
    this.setCharacter();
  }
  this.setWord = function(){
    this.word = [];
    this.njoin(this.list,this.word);
  }
  this.setCharacter = function(){
    this.character = [];
    var strings = this.list.join("").split("");
    this.njoin(strings,this.character);
  }
  /**
  * @method njoin 入力した配列をn-gramに処理して返す
  * @param {array} in_list 処理を施す文字列の入った配列
  * @param {array} out_list 結果を格納する配列
  */
  this.njoin = function(in_list,out_list){
    for(var i = 0; i<= in_list.length - this.n; i++){
      var t = [];
      for (var j=0; j<this.n; j++){
        t.push(in_list[i+j]);
      }
      out_list.push(t.join(this.separator));
    }
  }
  this.init();
}

var answer = new Ngram("I am an NLPer",2,"-");
console.log(answer.word);
console.log(answer.character);

単語n-gram と文字n-gram の使い道が分かっておらず、最適な実装ではないかもしれません。
単語n-gram と文字n-gram が両方セットで利用されるのかとか、セパレータは必要なのかとか。

06. 集合

"paraparaparadise"と"paragraph"に含まれる文字bi-gramの集合を，それぞれ, XとYとして求め，XとYの和集合，積集合，差集合を求めよ．さらに，'se'というbi-gramがXおよびYに含まれるかどうかを調べよ．

/**
* @namespace ArrayHelper
* @method union 和集合
* @method intersection 積集合
* @method difference 差集合
* @method shuffle ランダムにシャッフル
*/
var ArrayHelper = {
  union:function(A,B){
    var T = A.concat(B);
    T = T.filter(function (x, i, self) {
      return self.indexOf(x) === i;
    });
    return T;
  },
  intersection:function(A,B){
    var T = A.concat(B);
    T = T.filter(function (x, i, self) {
      return self.indexOf(x) === i && i !== self.lastIndexOf(x);
    });
    return T;
  },
  difference:function(A,B){
    var I = this.intersection(A,B);
    var T = A.concat(B);
    T = T.filter(function (x, i, self) {
      return (I.indexOf(x) == -1);
    });
    return T;
  },
  shuffle:function(array) {
    var m = array.length;
    var t, i;
    while (m) {
      i = Math.floor(Math.random() * m--);
      t = array[m];
      array[m] = array[i];
      array[i] = t;
    }
    return array;
  }
}

var X = new Ngram("paraparaparadise",2,"-");
var Y = new Ngram("paragraph",2,"-");
console.log(ArrayHelper.union(X.character,Y.character));
console.log(ArrayHelper.intersection(X.character,Y.character));
console.log(ArrayHelper.difference(X.character,Y.character));
console.log(Boolean(X.character.indexOf("s-e")+1));
console.log(Boolean(Y.character.indexOf("s-e")+1));

和集合、積集合、差集合を作るのに下記記事を参考にしました。
filter関数が便利。
http://qiita.com/cocottejs/items/7afe6d5f27ee7c36c61f

問05で作った関数を引き続き利用しています。
また問09で使うshuffleがArrayHelperに含まれています。

07. テンプレートによる文生成

引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ．さらに，x=12, y="気温", z=22.4として，実行結果を確認せよ．

var tmplSentence = function (x, y, z) {
  return x+"時の"+y+"は"+z;
}
console.log(tmplSentence(12,"気温",22));

08. 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数cipherを実装せよ．

英小文字ならば(219 - 文字コード)の文字に置換
その他の文字はそのまま出力
この関数を用い，英語のメッセージを暗号化・復号化せよ．

/**
* 暗号化処理
* @class Cipher
* @param {string} diffcode 変換処理に使う差分
* @param {RegExp} pattern 変換処理を行う文字のパターン
*/
var Cipher = function(diffcode,pattern){
  this.diffcode = diffcode;
  this.pattern = pattern;
  /**
  * 処理の実行
  * @method process
  * @param {string} str 暗号化、復号化を行う文字列
  * @return {string} 変換後の文字列
  */
  this.process = function(str){
    var strings = str.split("");
    var encodeStrings = "";
    for (var i = 0; i < strings.length; i++) {
      if(strings[i].match(this.pattern)){
        encodeStrings += String.fromCharCode(this.diffcode - strings[i].charCodeAt(0));
      }else{
        encodeStrings += strings[i];
      }
    }
    return encodeStrings;
  }
}

var cipher = new Cipher(219,/^[a-z]+$/);
console.log(answer = cipher.process("Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can."));
console.log(cipher.process(answer));

09. Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．
適当な英語の文（例えば"I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."）を与え，その実行結果を確認せよ．

/**
* 前後1文字以外の文字をランダムに並び替え
* @method middleStringShuffle
* @param {string} word 元になる文字
* @return {string} 変換後の文字列
*/
var middleStringShuffle = function(word) {
  var middles  = word.slice(1,-1).split("");
  var re = '$1' + ArrayHelper.shuffle(middles).join("") + '$2';
  var find = /^(.).+(.)$/;
  return strings[i].replace(find,re);
}

var strings = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind .".split(" ");
var answer = [];
for (var i = 0; i < strings.length; i++) {
  if(strings[i].length>4){
    answer.push(middleStringShuffle(strings[i]));
  }else{
    answer.push(strings[i]);
  }
}
answer = answer.join(" ");
console.log(answer);

問06で作った配列処理の関数(ArrayHelper)を引き続き利用しています。

ソースはこちらにも置いて有ります。
https://github.com/pppp606/nlp100_2015

第2章はunixコマンドについての課題なので、次は一つ飛ばしで第3章:正規表現を試してみます。

第3章もアップしました。
http://qiita.com/pppp403/items/08220614f3d69882b390

100

103

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up