More than 5 years have passed since last update.

SKK用のはてな辞書を作成する(追記あり)

Last updated at 2017-03-02Posted at 2017-03-01

SKK用のはてな辞書を作成したい

一昨日、SKKに出会って感動する。(ユーザ生活3日目w)
Google系のCGI Serverを使いたくないので辞書生活を選択。
はてな辞書が良いらしい。

Ruby?Pythonでおｋ

Ruby環境ないし・・・・

rubyでのソースは沢山あるのにpythonがない
10分かからずpythonコードはできた
~~CorvusSKKで読み込む→文字コードエラー→苦しむ~~
- ~~家でビルドして、Dictionary持ってきたがエラー~~
- ~~CorvusSKK?Windows?の問題くさい~~
SKKFEPは辞書作るのが楽そう
SKKFEPで登録できた!
SKKFEPと生きていきます!

追記(2017/3/3)

CurvusSKKでの取り込みを教えて貰いました
- @corvussolisさん、ありがとうございます。

てことで、対したことないがソース

make_skk_dic.py

# coding=utf-8
import pandas as pd
import numpy as np
import codecs
import re


def furi_del_norm(txt):
    r = re.match(r"[ァ-ヶ]", txt)
    if r:
        return np.Nan
    return txt


def main():
    df = pd.DataFrame()
    with codecs.open("keywordlist_furigana.csv", 'r', "euc_jp", "ignore") as file:
        df = pd.read_table(file, delimiter="\t")
    df.columns = ["furi", "word"]

    df = df.dropna()
    df["word"] = df["word"].replace('\r')
    df["furi"] = df["furi"].replace('\r')
    df["furi"] = df["furi"].apply(furi_del_norm)
    df = df.dropna()

    df = df.sort(columns=["furi"], ascending=True)

    # to_csv駄目
    TMP_FILE_PATH = "SKK-JISHO.hatena"
    with codecs.open(TMP_FILE_PATH, 'w', "utf-8", "ignore") as file:
        # CorvusSKKなら以下を有効に(追記:2017/03/03)
        # file.write(";; okuri-ari entries.")
        # file.write(";; okuri-nasi entries.")
        for i, row in df.iterrows():
            file.write("%s /%s/" % (str(row["furi"]), str(row["word"])))
            file.write("\n")


if __name__ == "__main__":
    main()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up