More than 3 years have passed since last update.

sudachipyを試してみた

Python

Last updated at 2022-03-23Posted at 2022-03-23

id	text
1	メロスは激怒した。
2	必ず、かの邪智暴虐の王を除かなければならぬと決意した。
3	メロスには政治がわからぬ。
4	メロスは、村の牧人である。
5	笛を吹き、羊と遊んで暮して来た。

tokenizer_obj = dictionary.Dictionary(dict="full").create()
mode = tokenizer.Tokenizer.SplitMode.C
doc = []
for row in range(len(df)):
    t = tokenizer_obj.tokenize(df["text"][row], mode)
    d = [m.normalized_form() for m in t if m.part_of_speech()[0] in ["名詞", "動詞"]]
    doc.append(d)
docs = pd.array([" ".join(doc[i]) for i in range(len(doc))])
print(docs)

#<StringArray>
#['メロス 激怒 為る',
#'邪知 暴虐 王 除く 成る 決意 為る',
#'メロス 政治 分かる',
#'メロス 村 牧人 有る',
#'笛 吹く 羊 遊ぶ 暮らす 来る']
#Length: 5, dtype: string

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up