Anime Recommendations Databaseのアニメタイトルを日本語化する

Posted at 2024-03-22

はじめに

今回はAnime Recommendations Databaseというオープンデータの英語アニメタイトルを日本語のもの変換した方法（力技）と、実際に対訳に使えるJSONファイルを公開します。

データセット

方法

ChatGPTにぶち込みました。時間がかかるので、一部はClaudeにも入れました。
GeminiとCopilotはプロンプロトが悪いのかうまくいきませんでした。

前準備

上記のオープンデータからタイトル一覧を取得し、titles.txtとして保存しました。
これをGPTに渡します。

titles = df_anime[df_anime["視聴タイプ"]=="TV"]["タイトル"].tolist()
with open("titles.txt", "w") as file:
    for title in titles:
        file.write(title + "\n")

使用したプロンプト

以下のプロンプトを入力したうえで、[W]を押していくと自動で翻訳が作られます。

#ゴール
titles.txtの中にある英語タイトルのアニメタイトルを日本語タイトルに変換する

#条件
- #手順 に従って行うこと
- CSVファイルのみ出力する
- やり取りは日本語で行う
- 説明は出力せず、「変換を行います」と話す

#手順
1. 「読み込みを行います」と出力
2. 先頭から40行を読み込む
3. 「次に変換を行います」と出力
4. #出力例 を参考にして先頭から40行分の英語タイトルを日本語タイトルに変換できるように辞書型で出力する
5. 「次に進む（W）」かどうかを聞く．
6. 次に進む場合には41行目から80行目までを読み込み，手順3へ戻る。
- この手順を3787行まで、95回繰り返す

#出力例
{
    "Fullmetal Alchemist: Brotherhood":  "鋼の錬金術師 FULLMETAL ALCHEMIST",
    "Gintama°": "銀魂°",
    # 以下省略、全てのタイトルをこのリストに追加
}

結合

一つずつJsonで保存していたので、これらを結合させます。

import os
import json

folder_path = "日本語訳リスト"

json_files = [file for file in os.listdir(folder_path) if file.endswith(".json")]

combined_data = {}

for file in json_files:
    file_path = os.path.join(folder_path, file)
    with open(file_path, "r", encoding="utf-8") as f:
        try:
            data = json.load(f)
            combined_data = {**combined_data, **data}
        except:
            print("Error reading file:", file)
            continue

output_file = "combined_data.json"
with open(output_file, "w", encoding="utf-8") as f:
    json.dump(combined_data, f, ensure_ascii=False, indent=4)

print("Combined data saved as", output_file)

GitHubで公開しています

使用したい方は使ってください。
（ファイルの行数が多いのでリンクにしています）

Anime Recommendations Databaseの日本語訳 by sugimochi97
https://gist.github.com/sugimochi97/76313f26b0c88e8db86f598534503354

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up