DatabricksでDollyデータセットによるRWKVのファインチューニングを試す

Posted at 2023-04-16

さらにこちらの続きです。

元々はこちらの記事にまで辿り着きたかったのでした。

ノートブックはこちらです。

データセットは、Dollyのdatabricks-dolly-15kの日本語訳です。作成していただいた方ありがとうございます！

元々のデータはJSON形式なのでChatRWKV用に変換に変換します。パスの部分はDatabricksのDBFSを指定しています。

Python

import json

# JSONの読み込み
def read_json(filename):
    with open(filename, "r") as f:
        return json.load(f)

# テキストファイルの書き込み
def write_text(filename, text):
    with open(filename, "w") as f:
        f.write(text)

# プロンプトの生成
def generate_prompt(instruction, input, output):
    if input != "":
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

# Instruction:
{instruction}

# Input:
{input}

# Response:
{output}
<|endoftext|>
"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

# Instruction:
{instruction}

# Response:
{output}
<|endoftext|>
"""

# JSONの読み込み
dataset = read_json(f"{rwkv_dir_local}/databricks_dolly_15k_ja.json")

# テキストの生成
output = ""
for item in dataset:
    if len(item.keys()) > 0:
        output += generate_prompt(
            item["instruction"].strip(),
            item["input"].strip(),
            item["output"].strip()
        )

# ファイル出力
write_text(f"{rwkv_dir_local}/databricks_dolly_15k_ja.txt", output)

他の部分は以前と同じですが、プロンプトを入力する部分を関数にしています。

Python

def dolly_generate(prompt):
  # Instructプロンプトの生成
  prompt = generate_prompt(prompt)
  #print("--[prompt]--\n" + prompt + "----")

  # パイプラインの実行
  result = pipeline.generate(prompt, token_count=200, args=args)
  return result

Python

result = dolly_generate("人工知能の活用方法")
print(result)

人工知能の活用方法は、特に一般的な活用方法があります。最初の活用方法は過去に書かれていますが、人工知能の活用方法は、一般的に書かれています。しかし、それは人工知能の活用方法です。

うーん、とりあえず動いたというレベル。もっといろいろ試してみます。

Databricksクイックスタートガイド

Databricks無料トライアル

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up