More than 1 year has passed since last update.

watsonx.ai Llhma 2 での日本語表示Tips

Last updated at 2023-10-15Posted at 2023-10-15

IBMが2023年7月より提供している生成AIのサービス、watsonx.aiは、リリース当時は日本語対応したLLMが提供されていませんでしたが、早々に日本語対応しているMeta 社のLlhama2を取り組む発表をし、

以下の通り、2023年9月8日にLlhama2は、watsonx.ai利用が可能になっています。

日本語対応しているLLMであるLlhama2の概要は、2Meta社のサイトをご覧ください。

論文では、Fine Tuning などの方法も記載があります。

IBM独自言語モデルであるGraniteでは、日本語対応が2024年初の予定なので、2023年の間はLlhama2を使っていく必要がありそうです。

Granite LLMの論文は以下となり、大変に読みごたえがある内容になっています。

さて、watsonx.ai上のLlhama2では、未学習の状態で1文では、日本語の入力に対して、
英語で出力されることがあります。

例えば、以下のような例です。多くは「I don't know」と返されます。

curl "https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2023-05-29" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
  -d $'{ "model_id": "meta-llama/llama-2-70b-chat", 
"input": "東京はどこですか？\\n", 
"parameters": {  "decoding_method": "greedy", 
"max_new_tokens": 100,  "min_new_tokens": 0,  "stop_sequences": [   "\n"  ],  "repetition_penalty": 1 }, 
"project_id": "7ef6d21a-2b41-4242-9e61-f0765e819325"}'

その対策としては、ひとまず、2行にすることをお勧めます。
回答内容はともかく、日本語での出力となりました。

curl "https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2023-05-29" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
  -d $'{ "model_id": "meta-llama/llama-2-70b-chat",
"input": "東京はどこですか？\\n大阪はどこですか？\\n",
"parameters": {  "decoding_method": "greedy", 
"max_new_tokens": 100,  "min_new_tokens": 0,  "stop_sequences": [   "\n"  ],  "repetition_penalty": 1 },
"project_id": "7ef6d21a-2b41-4242-9e61-f0765e819325"}'

Prompt TuningのうちでのFew shot Learningを加えると更に精度は良くなります。
回答内容も日本語、かつ、それなりの内容になりました。

curl "https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text?version=2023-05-29" \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/json' \
  -H 'Authorization: Bearer YOUR_ACCESS_TOKEN' \
  -d $'{
  "model_id": "meta-llama/llama-2-70b-chat",
  "input": "東京はどこですか？\\n\\n日本の関東地方です。\\n\\n大阪はどこですか？\\n\\n日本の関西地方です。\\n\\n東京はどこですか？\\n大阪はどこですか？\\n",
  "parameters": {
    "decoding_method": "greedy",
    "max_new_tokens": 100,
    "min_new_tokens": 0,
    "stop_sequences": [
      "\n"
    ],
    "repetition_penalty": 1
  },
  "project_id": "7ef6d21a-2b41-4242-9e61-f0765e819325"
}'

複数行で、Few Shotを加えるのが、watsonx.ai上のLlhama2で日本語出力させる際のTipsです。
（2023年10月時点の状況のため、今後、変わる可能性は大いにあります。）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up