1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Rustでlevenshtein distance libraryをbenchmark

Last updated at Posted at 2023-09-26

Rustでlevenshtein距離を計算

GPTに問い合わせる前の前段処理や、翻訳メモリの実装などに使えないかとRustでlevenshtein距離(文字列の類似度)を計算しようと思ったのですが、色々ライブラリがあって、どれを使えば良いのかよく分かりませんでした。どの実装が速くて便利か、ベンチマークを取ってみようと書いたのがこちらです。

Cargo.toml

[package]
name = "levenshtein-benchmark"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
levenshtein = "1.0.5"
levenshtein_automata = "0.2.1"
edit-distance = "2.1.0"
levenshtein-diff = "0.2.4"
strsim = "0.10.0"

main.rs

//
// Levenshtein distance benchmarks
//
use levenshtein::levenshtein as levenshtein_rs;
use levenshtein_automata::{LevenshteinAutomatonBuilder};
use edit_distance::edit_distance;
use levenshtein_diff as levenshtein_diff;
use strsim::{levenshtein as strsim_levenshtein};

fn main() {

    //let string1 = "kitten";
    //let string2 = "sitting";
    let string1 = "日本語は難しいです";
    let string2 = "英語は難しいです";
    let test_times = 1000;

    println!("*levenshtein lib test* {}/{}", string1, string2);

    //levenshtein-rs
    {
        let _exec_time = Elapsed::start("levenshtein_rs");
        for _n in 1..=test_times {
            let _res = levenshtein_rs(string1, string2);
        }
        println!("distance by levenshtein-rs lib: {}", levenshtein_rs(string1, string2));
    }

    //edit_distance
    {
        let _exec_time = Elapsed::start("edit_distance");
        for _n in 1..=test_times {
            let _res = edit_distance(string1, string2);
        }
        println!("distance by edit_distance::edit_distance: {}", edit_distance(string1, string2));
    }

    //levenshtein_diff
    {
        let _exec_time = Elapsed::start("levenshtein_diff");
        for _n in 1..=test_times {
            let (_distance, _) = levenshtein_diff::distance(string1.as_bytes(), string2.as_bytes());
        }
        let (distance, _) = levenshtein_diff::distance(string1.as_bytes(), string2.as_bytes());
        println!("distance by levenshtein-diff: {:?}", distance);
    }

    //strsim
    {
        let _exec_time = Elapsed::start("strsim_levenshtein");
        for _n in 1..=test_times {
            let _res = strsim_levenshtein(string1, string2);
        }
        println!("distance by strsim_levenshtein: {:?}", strsim_levenshtein(string1, string2));
    }

    //levenshtein_automata
    {
        let _exec_time = Elapsed::start("levenshtein_automata");
        for _n in 1..=test_times {
            let lev_automaton_builder = LevenshteinAutomatonBuilder::new(2, true);
            let dfa = lev_automaton_builder.build_dfa(string1);
            let mut state = dfa.initial_state();
            for &b in string2.as_bytes() {
                state = dfa.transition(state, b);
            }
        }
        let lev_automaton_builder = LevenshteinAutomatonBuilder::new(2, true);
        let dfa = lev_automaton_builder.build_dfa(string1);
        let mut state = dfa.initial_state();
        for &b in string2.as_bytes() {
            state = dfa.transition(state, b);
        }
        println!("distance by levenshtein_automata: {:?}", dfa.distance(state));
    }

}

// https://stackoverflow.com/a/65493006
struct Elapsed(&'static str, std::time::SystemTime);

impl Drop for Elapsed {
    fn drop(&mut self) {
        println!(
            "operation {} finished for {} ms",
            self.0,
            self.1.elapsed().unwrap_or_default().as_millis()
        );
    }
}

impl Elapsed {
    pub fn start(op: &'static str) -> Elapsed {
        let now = std::time::SystemTime::now();

        Elapsed(op, now)
    }
}

結果

C:/***/.cargo/bin/cargo.exe run --color=always --package levenshtein-benchmark --bin levenshtein-benchmark
    Finished dev [unoptimized + debuginfo] target(s) in 0.02s
     Running `target\debug\levenshtein-benchmark.exe`
*levenshtein lib test* 日本語は難しいです/英語は難しいです
distance by levenshtein-rs lib: 2
operation levenshtein_rs finished for 14 ms
distance by edit_distance::edit_distance: 2
operation edit_distance finished for 20 ms
distance by levenshtein-diff: 6
operation levenshtein_diff finished for 245 ms
distance by strsim_levenshtein: 2
operation strsim_levenshtein finished for 11 ms
distance by levenshtein_automata: Exact(2)
operation levenshtein_automata finished for 10302 ms

Process finished with exit code 0

結論

strsimlevenshtein-rs を使おう。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?