Inline::Perl5を使ったPerl 5のNaiveBayesとBenchで勝負してみる

Last updated at 2016-12-18Posted at 2016-12-18

こんにちは、Perl 6アドベントカレンダーの19日目の投稿になります。

今回はInline::Perl5を使ったモジュールとPure Perl 6なモジュールとの間で速度比較する方法を紹介したいと思います。

去年のアドベントカレンダーにもありましたBenchを使います。

設定

拙作p6-Naive-BayesとPerl 5のNaiveBayesの間で速度比較を行います。
テストセットには情報検索の基礎のNaiveBayesの章で登場する例を用います

コード

単純にやってしまうと、Perl5とPerl6でモジュール名がかぶっていて競合します
Perl5の方で、Inline::Perl5.runの中でuse Algorithm::NaiveBayesすると競合するのをさけられます(※もっといい方法あったら教えてください)

bayes.p6

use Bench;

my $repeat-times = 1000;
my $bench = Bench.new;

{ # Inline::Perl5
    use Inline::Perl5;
    my $p5 = Inline::Perl5.new;
    my $code = sub {
        my $result = $p5.run(q'                                                                                                                                          
            use Algorithm::NaiveBayes;                                                                                                                                   
            my $nb = Algorithm::NaiveBayes->new;                                                                                                                         
            $nb->add_instance(attributes => {Chinese => 2, Beijing => 1}, label => "China");                                                                             
            $nb->add_instance(attributes => {Chinese => 2, Shanghai => 1}, label => "China");                                                                            
            $nb->add_instance(attributes => {Chinese => 1, Macao => 1}, label => "China");                                                                               
            $nb->add_instance(attributes => {Chinese => 2, Tokyo => 1, Japan => 1}, label => "Japan");                                                                   
            $nb->train;                                                                                                                                                  
            $nb->predict(attributes => {Chinese => 3, Tokyo => 1, Japan => 1});                                                                                          
        ');
    }
    $bench.timethis($repeat-times,
                    $code,
                    title => 'P5');
}

{ # Perl 6 (plain text)

    my $code = sub {
        use Algorithm::NaiveBayes;
        my $nb = Algorithm::NaiveBayes.new();
        $nb.add-document("Chinese Beijing Chinese", "China");
        $nb.add-document("Chinese Chinese Shanghai", "China");
        $nb.add-document("Chinese Macao", "China");
        $nb.add-document("Tokyo Japan Chinese", "Japan");
        $nb.train();
        my @result = $nb.predict("Chinese Chinese Chinese Tokyo Japan");
    };

    $bench.timethis($repeat-times,
                    $code,
                    title => 'P6 (plain text)');
}

{ # Perl 6 Hash

    my %doc1 = "Chinese Beijing Chinese".words.Bag.hash;
    my %doc2 = "Chinese Chinese Shanghai".words.Bag.hash;
    my %doc3 = "Chinese Macao".words.Bag.hash;
    my %doc4 = "Tokyo Japan Chinese".words.Bag.hash;
    my %doc5 = "Chinese Chinese Chinese Tokyo Japan".words.Bag.hash;
    my $code = sub {
        use Algorithm::NaiveBayes;
        my $nb = Algorithm::NaiveBayes.new();
        $nb.add-document(%doc1, "China");
        $nb.add-document(%doc2, "China");
        $nb.add-document(%doc3, "China");
        $nb.add-document(%doc4, "Japan");
        $nb.train();
        my @result = $nb.predict(%doc5);
    };
    
    $bench.timethis($repeat-times,
                    $code,
                    title => 'P6 (Hash)');
}

実行結果

なんと、Perl5のNaiveBayesの方が7倍程度速い(Hash版で)結果になりました。
Inline::Perl5による変換コストがあっても、少なくとも数値計算系のモジュールならこっちのほうが速いということでしょうか。（あるいは自分の実装がいまいちか）

実行結果

$ perl6 bayes.p6
        P5: 0.4399 wallclock secs @ 2273.3782/s (n=1000)
P6 (plain text): 3.0702 wallclock secs @ 325.7150/s (n=1000)
 P6 (Hash): 2.7789 wallclock secs @ 359.8527/s (n=1000)

補足

そもそもNaiveBayesって何に使うの？

NaiveBayesの主な使用用途は文書分類です。たとえば与えられたメールがスパムメールかどうか判定するのにつかわれます。(教科書的にはこういう説明になりますが、だいぶ古典的なアルゴリズムなので、現在のウェブメールのシステムで実際につかわれているかといえばおそらく使われていないと思います。)

計算結果について

あれ、Perl5版の方と計算結果が違うじゃないかと思った方がいるかと思います。
拙作p6-Naive-Bayesは情報検索の基礎のNaiveBayesの章で登場する例でverifyされています。

$ perl6 -e '(3/4 * (3/7) ** 3 * 1/14 * 1/14).log.say'
-8.10769031284391 # Chinese
$ perl6 -e '(1/4 * (2/9) ** 3 * 2/9 * 2/9).log.say'
-8.90668134500126 # Japan

以上、Perl 6アドベントカレンダーの19日目の投稿でした。

2016/12/18追記：
・若干Perl5の方が有利な設定になっていたので、Perl6の方でHash版も追加しました
・Hash版で.wordsが抜けてたので追加して測りなおしました。すみません。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up