More than 1 year has passed since last update.

OpenAI Assistants API(Code Interpreter)をPythonで実装する方法を解説

Last updated at 2023-11-14Posted at 2023-11-14

1. はじめに

前回の記事では、OpenAIのAssistants APIを使用して基本的なAIアシスタントを作成し、ユーザーの質問に対する回答を生成する方法について詳しく説明しました。今回の記事では、このAPIのさらに高度な使い方に焦点を当てます。具体的には、Code Interpreterを有効にした上で、ファイルの入出力機能を活用しより複雑なタスクを実行するアシスタントを作成する方法について解説します。実装は前回の記事に基づいているので、ご覧になってない方は是非ご覧ください。(OpenAI Assistants APIをPythonで実装する方法を解説)

ファイル入出力機能の紹介

今回の実装では、transformer.py というファイルを入力として使用します。このファイルには、VanillaのTransformer、Transformer Encoder、Transformer Decoderの実装が含まれており、それを活用して機械学習エンジニアとしてのアシスタントを作成します。このアシスタントは、Transformerモデルに関する質問に答えるためのコードを書き、実行する機能を持ちます。

コードの全貌

以下のコードは、ファイルのアップロードからアシスタントの作成、スレッドの作成、メッセージの送信と処理、そして結果の表示に至るまでの一連のプロセスを示しています。

ファイルtransformer.pyのアップロード
機械学習エンジニアとしてのアシスタントの作成
ユーザーの質問を含むスレッドの作成
スレッドの実行とステータスの確認
メッセージに含まれる注釈の処理
最終的なメッセージリストの表示

このコードを通じて、ファイルを利用することで、アシスタントがより複雑なタスクを処理できるようになります。また、Assistants APIのファイル入出力機能を活用することで、アシスタントの応用範囲が大幅に広がります。

次のセクションでは、この処理を実装するための具体的な手順について詳しく説明していきます。

2. 実装

このセクションでは、ファイルを活用したAIアシスタントの実装方法について詳しく説明します。具体的には、transformer.pyファイルを使用して、Transformerモデルに関する質問に答える機械学習エンジニアとしてのアシスタントを作成します。

2.1 ファイルのアップロード

最初に、アシスタントが参照するためのファイルをアップロードします。この例では、Transformer モデルの実装を含む transformer.py をアップロードします。

file = client.files.create(
    file=open("transformer.py", "rb"),
    purpose='assistants'
)

このコードは、transformer.py ファイルをバイナリモードで開き、Assistants APIにアップロードします。アップロードされたファイルはアシスタントに関連付けられます。

2.2 アシスタントの作成

次に、特定のファイルを参照するアシスタントを作成します。このアシスタントは、Transformer モデルに関する質問に答えるためのコードを書き、実行する役割を持ちます。

assistant = client.beta.assistants.create(
    name="Machine Learning Engineer",
    instructions="You are a machine learning engineer. Write and run code to answer questions about transformer models.",
    tools=[{"type": "code_interpreter"}],
    model="gpt-4-1106-preview",
    file_ids=[file.id]
)
assistant_id = assistant.id
print("assistant_id:", assistant_id)

2.3 スレッドの作成とメッセージの送信

アシスタントを使用して、ユーザーからの質問を含むスレッドを作成します。ここでは、ユーザーがTransformer Encoderを用いた10値分類の実装を求めています。

thread = client.beta.threads.create(
    messages=[
        {
        "role": "user",
        "content": "Implement the 10-value classification of images with Transformer Encoder and output the code as a file. The file name can be anything.",
        "file_ids": [file.id]
        }
    ]
)
thread_id = thread.id
print("thread_id:", thread_id)

2.4 スレッドの実行とメッセージの処理

前回の記事同様、スレッドの実行が開始されたら、アシスタントからのレスポンスを待ちます。このプロセスでは、アシスタントがファイルを参照して質問に答える過程を実行し、その結果を取得します。

run = client.beta.threads.runs.create(
    thread_id=thread_id,
    assistant_id=assistant_id,
)
run_id = run.id
print("run_id:", run_id)

completed = False
while not completed:
    # ステータスの取得
    run = client.beta.threads.runs.retrieve(thread_id=thread_id, run_id=run_id)
    print("run.status:", run.status)
    # ステータスが 'completed' かどうかをチェック
    if run.status == 'completed':
        completed = True
    else:
        # ステータスが 'completed' ではない場合、少し待つ
        time.sleep(5)

2.5 メッセージの取得と処理

スレッドの実行が完了した後、メッセージの内容を取得し、必要に応じて注釈を処理します。ここで、Code Interpreterによって出力されたファイルはfile_idを用いて保存するため、file_nameとともにfile_listに保存しておきます。

messages = client.beta.threads.messages.list(
  thread_id=thread_id
)

message_value_list = []
file_list = [] # ファイル名、ファイルidを保存する配列

for message in messages:
    message_content = message.content[0].text

    annotations = message_content.annotations

    citations = []
    files = []
    # アノテーションを反復処理し、脚注を追加
    for index, annotation in enumerate(annotations):
        # テキストを脚注で置き換える
        message_content.value = message_content.value.replace(annotation.text, f' [{index}]')

        # アノテーションの種類毎に引用を収集
        if (file_path := getattr(annotation, 'file_path', None)):
            cited_file = client.files.retrieve(file_path.file_id)
            citations.append(f'[{index}] Click <here> to download {cited_file.filename}, file_id: {file_path.file_id}')

            # ファイルID, ファイル名をリストに追加
            files.append((file_path.file_id, annotation.text.split("/")[-1]))

    # ファイルID, ファイル名をリストに追加
    file_list.extend(files)

    # ユーザーに表示する前に、メッセージの末尾に脚注を追加
    message_content.value += '\n' + '\n'.join(citations)
    message_value_list.append(message_content.value)

2.6 ファイルのダウンロード

次にfile_listの各要素はfile_id、file_nameの2つの要素を持つタプルとなっていますが、このfile_idを用いてファイルを取得し、file_nameをファイル名として出力ファイルを保存します。

for file_id, file_name in enumerate(file_list):

    file_data = client.files.content(file_id)

    binary_data = file_data.content

    with open(f"{file_name}", "wb") as f:
        f.write(binary_data)

3. 結果

スレッドの実行とメッセージの処理が完了した後、アシスタントはユーザーの要求に応じて、画像の10値分類を行うTransformer Encoderのコードを生成し、ファイルとして出力しました。このファイルは transformer_encoder_image_classification.py という名前で、以下の内容が含まれています。

生成されたファイルの内容

import torch
import torch.nn as nn
import torch.nn.functional as F
import copy

class TransformerEncoderLayer(nn.Module):
    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1):
        super(TransformerEncoderLayer, self).__init__()
        self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
        self.linear1 = nn.Linear(d_model, dim_feedforward)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(dim_feedforward, d_model)
        self.norm1 = nn.LayerNorm(d_model)
        self.norm2 = nn.LayerNorm(d_model)
        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(dropout)
        self.relu = nn.ReLU()

    def forward(self, src, src_mask=None, src_key_padding_mask=None):
        src2 = self.self_attn(src, src, src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]
        src = src + self.dropout1(src2)
        src = self.norm1(src)
        src2 = self.linear2(self.dropout(self.relu(self.linear1(src))))
        src = src + self.dropout2(src2)
        src = self.norm2(src)
        return src

class TransformerEncoder(nn.Module):
    def __init__(self, encoder_layer, num_layers):
        super(TransformerEncoder, self).__init__()
        self.layers = nn.ModuleList([copy.deepcopy(encoder_layer) for _ in range(num_layers)])

    def forward(self, src, mask=None, src_key_padding_mask=None):
        output = src
        for layer in self.layers:
            output = layer(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
        return output

class ImageTransformer(nn.Module):
    def __init__(self, input_dim, d_model, nhead, num_encoder_layers, dim_feedforward, num_classes):
        super(ImageTransformer, self).__init__()
        self.conv = nn.Conv2d(in_channels=3, out_channels=d_model, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
        self.encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward)
        self.transformer_encoder = TransformerEncoder(self.encoder_layer, num_encoder_layers)
        self.fc = nn.Linear(d_model, num_classes)

    def forward(self, x):
        x = self.conv(x)
        x = x.flatten(2)
        x = x.permute(2, 0, 1)
        x = self.transformer_encoder(x)
        x = x.mean(dim=0)
        x = self.fc(x)
        return x

def main():
    # Hyperparameters for the model, to be adjusted as needed.
    input_dim = (224, 224)  # Typically the spatial dimensions of the input image.
    d_model = 512  # Number of expected features in the input (required for the self-attention mechanism).
    nhead = 8  # Number of attention heads.
    num_encoder_layers = 6  # Number of sub-encoder-layers in the encoder.
    dim_feedforward = 2048  # Dimension of the feedforward network model.
    num_classes = 10  # Number of classes for classification.

    # Create a random image tensor to simulate one batch of image data.
    img_batch = torch.rand((8, 3, *input_dim))  # Simulated batch with batch_size = 8

    # Initialize the model.
    model = ImageTransformer(input_dim, d_model, nhead, num_encoder_layers, dim_feedforward, num_classes)

    # Forward pass of the model (without training just to demonstrate the functionality).
    out = model(img_batch)
    print(out.shape)  # Expected output: torch.Size([8, 10]) for the 10-class classification.

if __name__ == "__main__":
    main()

このファイルでは、Transformerモデルを用いて画像の10クラス分類を行うためのネットワークが定義されています。具体的には、TransformerEncoderLayer、TransformerEncoder、そして画像分類用のImageTransformerクラスが含まれています。また、main関数内ではモデルのハイパーパラメータの設定と、ランダムな画像データに対するモデルのフォワードパスの実行が示されています。
この結果から、Assistants APIを使用して、特定のプログラミングタスクに関する詳細なコードを生成し、それをファイルとして出力することが可能であることがわかります。
このまま実行しても上手くは動くコードではありませんが、しっかりとユーザーの要望通りTransformer Encoderを用いた10値分類を行うコードとなっており、サクッと雛形を作成してくれる点でとても有用だと思います。

また、以下はmessage_value_listの値(Assitantsの回答)です。

To implement a 10-class image classification using a Transformer Encoder, we will develop a simple Transformer Encoder model suitable for image classification tasks in Python using PyTorch, which is a popular deep learning library. The Transformer Encoder will be adapted to handle 2D image data.

Since we'll be creating the code from scratch, I'll start by defining the required classes and functions, including the Transformer Encoder block, the overall model, and then save the resulting Python script to a file. We won't have access to external data or pre-trained models, so the provided code will be ready for training but won't be trained during this session.

Let's write the code for the Transformer Encoder model for image classification and save it as `transformer_encoder_image_classification.py`:

(以下は上記と同じコード)
「
import torch
... 

class TransformerEncoderLayer(nn.Module):
    ...

class TransformerEncoder(nn.Module):
    ...

def main():
    ...

if __name__ == "__main__":
    main()
」

Let's now write the above Python code to a file named `transformer_encoder_image_classification.py`. After saving, we can also verify the content of the file.

annotations: []
citations: []
message_content: Implement the 10-value classification of images with Transformer Encoder and output the code as a file. The file name can be anything.

4. まとめ

この記事では、OpenAIのAssistants APIを利用して、ファイルの入出力機能を活用する方法を紹介しました。具体的には、transformer.py というファイルをアシスタントに関連付け、それを参照して画像の10値分類を行うTransformer Encoderのコードを生成し、出力するプロセスを実装しました。

主なポイント

ファイルのアップロードとアシスタントの設定: ファイルをアップロードし、それをアシスタントの参照資料として使用しました。これにより、アシスタントはより専門的な知識を持つことが可能になります。
スレッドの作成とメッセージの送信: アシスタントを用いてスレッドを作成し、特定のプログラミングタスクに関するユーザーの要求を処理しました。
生成されたコードの詳細: アシスタントは具体的なプログラミングタスクに対応する詳細なコードを生成し、それをファイルとして出力しました。

総括

Assistants APIは、Code Interpreterを有効にすることで単にテキストベースの応答を生成するだけでなく、特定のファイルを参照しそれに基づいてコードやその他の複雑な出力を生成する能力を持っています。これにより開発者や研究者は特定のタスクに対してカスタマイズされたソリューションを迅速に開発することが可能になります。Assistants APIのこのような高度な使用方法は、プログラミング、データ分析、機械学習などの分野での応用可能性を大きく広げています。

今後もOpenAIは、Assistants APIを含むさまざまなツールの機能を拡張し続けていくことが考えられます。OpenAIの最新の動向に引き続き注目していきましょう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up