More than 1 year has passed since last update.

@rymlin

株式会社Relic

最近ストリーム対応したLambdaでChatGPTを動かす

Posted at 2023-06-28

ストリーム？

結果を細切れにやりとりする伝達方式のことです。

細かい話

HTTP/1.1 では、Bodyの送り方がストリームかそうでないかの２通りあります。
通常のHTTP/1.1通信は、Bodyを一度に送ります。
ストリームなHTTP/1.1通信¹は、Bodyを細切れに送ります。

HTTP/2通信のほうは、そもそもストリーム前提のプロトコル...だと思います。

Lambdaがストリーム対応？

2023/4/7に発表されました。
https://aws.amazon.com/jp/blogs/news/introducing-aws-lambda-response-streaming/

従来

ハンドラ内の処理が完全に終わるとレスポンスペイロードが作られる仕組みなので、結果はまとめて返却されます。

export const lambdaHandler = async (event, context) => {
    await xxx;
    await xxx;
    return {
        'statusCode': 200,
        'body': JSON.stringify({
            message: 'hello world',
        })
    };
};

新機能

結果が細切れに返却されます。
結果の生成に時間が掛かるケース、結果のデータ量が多いケースで有用かと思います。

export const lambdaHandler = awslambda.streamifyResponse(
    async (event, responseStream, context) => {
        responseStream.setContentType("text/plain");
        
        await new Promise(resolve => setTimeout(resolve, 500));
        responseStream.write('hello\n');
        await new Promise(resolve => setTimeout(resolve, 500));
        responseStream.write('world');
        
        responseStream.end();
    }
);

OpenAI-APIを手元で動かす

本記事ではchat completion APIを使います。
APIキーと質問文（プロンプト）を乗せてリクエストするだけでAIの回答が返ってきます。

AIがタイピングしているかのように回答を表示するには、 `stream=true`

ChatGPT本家では、質問するとAIがタイピングしているかのように回答してくると思います。
あれを実現するには stream=true を指定します。

stream=false の場合、AIの作文が全部終わってからレスポンスが返ります。
stream=true の場合、AIの作文が細切れで返ってきます。

curlで動かす

$ curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

# ストリームで返ってくる
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"!\n\n"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"As"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":" an"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":" AI"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"?"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Lambdaに組み込む

注意

自己責任でお願いします。
特に「❗️❗️❗️」が書かれた箇所はご注意ください。
慎重に書きましたが
・間違った理解をしている可能性
・実用に耐えない可能性
があるのでご了承ください。

Lambdaを実装する

node用のライブラリを本家が用意しているのでそれを利用しても良いのですが、streamにまだ正式対応していない（github）とのことなので、今回はnodeのfetchでいきます。

OpenAI-APIのレスポンスを単に横流しする実装です。²

（注意）❗️❗️❗️
このサンプルでは実装していませんが、経路のどこかに認証の仕組みは必須かと思います。
（このLambdaを実行できたら誰でもChatGPTを利用できてしまうので）

import { Readable } from 'stream';

export const lambdaHandler = awslambda.streamifyResponse(
    async (event, responseStream, context) => {
        const gptResponse = await fetch('https://api.openai.com/v1/chat/completions', {
            method: 'POST',
            headers: {
                Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
                'Content-Type': 'application/json',
            },
            body: JSON.stringify({
                stream: true,
                model: 'gpt-3.5-turbo',
                messages: [{role: "user", content: "Hello world"}],
            }),
        });
        const gptResponseStream = Readable.fromWeb(gptResponse.body);

        gptResponseStream.pipe(responseStream);
    }
);

pipe？

pipe（pipeline）というのは、node上で入力ストリームと出力ストリームを繋ぐ方法です。

冒頭のAWSブログによると、pipelineでの書き込みが推奨のようです。
hello world のサンプルに記載した responseStream.write() は、実は非推奨です。

curlで動かすために関数URLを作成する

今回実装したLambdaを実行する手段として、冒頭のAWSブログに記載されているのは
・AWS SDK
・Lambda invoke API
・関数URL
です。

残念ながら
・API Gateway（LAMBDA_PROXY）
は、今回のストリーム形式レスポンスに対応していません。

本記事では、認証タイプNONEの関数URLで紹介します。❗️❗️❗️

認証タイプNONEを使うリスク❗️❗️❗️

（参考）
Lambda 関数 URL におけるセキュリティと認証モデル > NONE 認証タイプの使用 | AWS
https://docs.aws.amazon.com/ja_jp/lambda/latest/dg/urls-auth.html

（自分の解釈です）
関数URLが無防備になるので、たとえばDDoSを受けると為す術なくLambda料金が跳ね上がります。
なので、AWS_IAMにしておくのが実質MUSTと思われます。
認証タイプAWS_IAMの関数URLへのリクエストは、署名付きリクエストという方法で行います。

（参考）
署名付き AWS API リクエストを作成する | AWS
https://docs.aws.amazon.com/ja_jp/IAM/latest/UserGuide/create-signed-request.html

関数URLを作成する

「該当Lambda関数＞設定＞関数URL＞作成」と進みます。
以下のように設定し、保存します。

認証タイプ
- AWS_IAM
- NONE（こちらを選択）❗️❗️❗️
呼び出しモード
- BUFFERED
- RESPONSE_STREAM（こちらを選択）

curlで動かす

chat completion APIのレスポンスがそのままストリーム形式で返ってくるかと思います。

$ curl https://xxx.lambda-url.ap-northeast-1.on.aws/ 

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"!\n\n"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"As"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":" an"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":" AI"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{"content":"?"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1687778296,"model":"gpt-3.5-turbo-0301","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

おわり

ストリーム操作に不慣れだったので、今回とても勉強になりました。
Webはリアルタイム性が求められるケースが増えているので、使いこなせるようになっておきたいと思いました。

レスポンスヘッダに「Transfer-Encoding: chunked」が指定されます。Transfer-Encoding - HTTP | MDN ↩
ストリームを加工したい場合は、Transformというクラスを利用します。 ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

最近ストリーム対応したLambdaでChatGPTを動かす

ストリーム？

Lambdaがストリーム対応？

従来

新機能

OpenAI-APIを手元で動かす

AIがタイピングしているかのように回答を表示するには、 stream=true

curlで動かす

Lambdaに組み込む

注意

Lambdaを実装する

pipe？

curlで動かすために関数URLを作成する

認証タイプNONEを使うリスク❗️❗️❗️

関数URLを作成する

curlで動かす

おわり

AIがタイピングしているかのように回答を表示するには、 `stream=true`