Help us understand the problem. What is going on with this article?

Python3での日本語変換モジュールの比較

More than 3 years have passed since last update.

ちゃお・・・†

自然言語処理といえば前処理がつきもの。前処理は速いにこしたことがない。というわけで、Python3での日本語変換モジュールの比較をしました。

比較項目

全角・半角の変換と、ひらがなからカタカナへの変換。対象文字列が長い場合と短い場合の両方を見る。

比較対象

比較結果

詳細はこちら

jaconv cnvk mojimoji zenhan rfZenHan mohayonao nkf
短文を半角→全角 27.1 µs 96.4 µs 5.04 µs 75.8 µs 222 µs 23 µs
長文を半角→全角 89.9 ms 38.6 ms 23.1 ms 360 ms 237 ms 95.4 ms
短文をひらがな→カタカナ 18.1 µs 79.1 µs 25.4 µs 23.2 µs
長文をひらがな→カタカナ 51.6 ms 41.8 ms 246 ms 98.6 ms

Cython使ってるだけあってmojimoji速いです。Pure Pythonだとjaconvは短文のときにパフォーマンスがよくて、長文ではcnvkがよいようです。

Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした