More than 1 year has passed since last update.

Mecabの付き合い方（Python)

Posted at 2024-03-29

mecab-python3のカナ変換しか使っていない、個人の感想です。

unidicは入れない

unidicを先に見ます。でも私はunidic_liteの方をよく使います。

def try_import_unidic():
    """Import unidic or unidic-lite if available. Return dicdir.

    This is specifically for dictionaries installed via pip.
    """
    try:
        import unidic
        return unidic.DICDIR
    except ImportError:
        try:
            import unidic_lite
            return unidic_lite.DICDIR
        except ImportError:
            # This is OK, just give up.
            return

それに、Mecabが、attributeエラーをcatchしてくれないので、unidicの削除忘れで、ある日悩むことになる。

両方対応するには？

try_import_unidic　で辞書の場所が帰ってきます。
それで、ユーザーがargで辞書を指定していない限り、どちらを使っているかわかります。

CSVの区切り文字は、unidicはカンマ、unidic-liteはタブです。
読みカナは、unidicなら９番目、unidic-liteは１番目です。０番目が、変換前の単語ですが、unidicはタブで区切って品詞情報を取り除きます。（ちなみに、この情報があるので、unidic-lite用のコードに、unidicを流し込んでも、出力(テキスト・音声)は壊れますが、エラーは出ません。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up