[Google Cloud][Gemini]コンテキストキャッシュ使ってみた

Last updated at 2024-08-27Posted at 2024-08-26

はじめに

普段使いはChatGPT、Claudeを使うことが多いですが、
議事録作成やソースコード解説（リポジトリ丸ごと）はGemini一択という感じです。

インプットトークンが大きくなるとコストが気になりますが、
コンテキストキャッシュを使えばコスト最適できそうなので、使ってみました。

無闇にコンテキストキャッシュを使うと、コストが高くなる場合があるため、
コストメリットの出る閾値も解説しようと思います。

コンテキストキャッシュとは

コンテキストキャッシュを使ってみた

コンテキストキャッシュ作成

サンプルままですが、以下でキャッシュ作成できます

import vertexai
import datetime

from vertexai.generative_models import Part
from vertexai.preview import caching

project_id = "XXXXX" # 書き換えてください

vertexai.init(project=project_id, location="us-central1")

system_instruction = """
You are an expert researcher. You always stick to the facts in the sources provided, and never make up new facts.
Now look at these research papers, and answer the following questions.
"""

contents = [
    Part.from_uri(
        "gs://cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
        mime_type="application/pdf",
    ),
    Part.from_uri(
        "gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
        mime_type="application/pdf",
    ),
]

cached_content = caching.CachedContent.create(
    model_name="gemini-1.5-pro-001",
    system_instruction=system_instruction,
    contents=contents,
    ttl=datetime.timedelta(minutes=60),
)

print(cached_content.name)

以下のエラーが出た場合はAPIの有効化を行いましょう
CloudコンソールのvertexAIの画面からAPI有効化することもできます

status = StatusCode.PERMISSION_DENIED
details = "Permission denied on resource project ${PROJECT_ID}."

コンテキストキャッシュ利用

通常のリクエストに２行追加するだけでコンテキストキャッシュ利用可能です

import vertexai

from vertexai.preview.generative_models import GenerativeModel
from vertexai.preview import caching

# TODO(developer): Update and un-comment below lines
project_id = "XXXXX" # 書き換えてください
cache_id = "CACHE_ID" # コンテキストキャッシュ作成の出力に書き換えてください

vertexai.init(project=project_id, location="us-central1")

cached_content = caching.CachedContent(cached_content_name=cache_id)

model = GenerativeModel.from_cached_content(cached_content=cached_content)

response = model.generate_content("What are the papers about?")

print(response.text)

（参考）

コスト

コンテキストキャッシュの料金は
「キャッシュするトークン数」と「キャッシュする時間」によって計算されます

コンテキストキャッシュを使った方が良い場合

コンテキストキャッシュを作るコストがかかる分、
キャッシュをあまり利用しない場合はコストメリットが出ません

視覚的にどれくらいのコスト感になるのか把握したかったので、Claudeに聞いてみました
（こういう時はClaude使っちゃうな・・・）

Google Cloudのコスト計算例をもとにグラフ描画してもらいました

グラフ描画の変数を抽出する

Geminiのコスト計算のサンプルを示します。
サンプルから変数を抽出してください。

# サンプル
Example cached cost calculation
If a user creates a 250,000 character cached context with a TTL of 2 hours and subsequently sends twenty separate requests to the Gemini 1.5 Pro model during those 2 hours, and each request has a 200-character query added to the cached context and 400 character output, the total charge is calculated as follows:
"""
Cache Creation cost:
250,000 input characters x ($0.00125 / 1000) = $0.3125 input cost.
Cache Storage cost:
250,000 characters x 2 hours = 500,000 total character hours;
500,000 total character hours x ($0.001125 / 1000) = $0.5625 storage cost.
Requests using cache cost:
200 characters x 20 requests = 4,000 total character inputs
250,000 cached characters * 20 requests = 5,000,000 total cached character inputs
4,000 total character inputs * ($0.00125 / 1000) = $.005 character input cost
5,000,000 total cached character inputs * ($0.0003125 / 1000) = $1.5625 cached input cost
$.005 chacater input cost + $1.5625 cached input cost = $1.5675 total input cost
Output cost:
400 output characters x 20 prompts = 8,000 total output characters;
8,000 total output characters x ($0.00375 / 1000) = $0.03 output cost.
Total cost:
.3125 cached input cost + 0.5625 cached storage cost + $1.5675 input cost + $0.03 output cost = $2.4725 total cost.
"""

グラフ描画する

X軸をリクエスト数、Y軸をコストとしてグラフをReactで描画してください
変数は１、２、５、６としてRangeで選択できるようにしてください

キャッシュされたコンテキストの文字数: 250,000 文字
キャッシュのTTL（Time To Live）: 2 時間
各リクエストの追加クエリ文字数: 200 文字
各リクエストの出力文字数: 400 文字

キャッシュなしリクエストを比較する

キャッシュなしリクエストの料金表を示します。
先ほど作成したグラフにキャッシュなしも比較できるように描画してください。
キャッシュなしリクエストでは、キャッシュされたコンテキストの文字数＋各リクエストの追加クエリ文字数をインプットトークンとして計算してください

モデル	機能	タイプ	価格
（=< 128,000 コンテキスト ウィンドウ）	価格
（128,000 を超えるコンテキスト ウィンドウ）
Gemini 1.5 Flash	マルチモーダル	画像入力
動画入力
テキスト入力
音声入力	$0.00002 / 画像
$0.00002 / 秒
$0.00001875 / 1,000 文字
$0.000002 / 秒	$0.00004 / 画像
$0.00004 / 秒
$0.0000375 / 1,000 文字
$0.000004 / 秒
		テキスト出力	$0.000075 / 1,000 文字	$0.00015 / 1,000 文字
Gemini 1.5 Pro	マルチモーダル	画像入力
動画入力
テキスト入力
音声入力	$0.001315 / 画像
$0.001315 / 秒
$0.00125 / 1,000 文字
$0.000125 / 秒	$0.00263 / 画像
$0.00263 / 秒
$0.0025 / 1,000 文字
$0.00025 / 秒
		テキスト出力	$0.00375 / 1,000 文字	$0.0075 / 1,000 文字

出力結果

手元で動かしたい方は以下をご利用ください

import React, { useState, useMemo } from 'react';
import { LineChart, Line, XAxis, YAxis, CartesianGrid, Tooltip, Legend, ResponsiveContainer } from 'recharts';

const CostCalculatorGraph = () => {
  const [cachedContextChars, setCachedContextChars] = useState(250000);
  const [cacheTTL, setCacheTTL] = useState(2);
  const [queryChars, setQueryChars] = useState(200);
  const [outputChars, setOutputChars] = useState(400);

  const calculateCachedCost = (requests) => {
    const inputCost = 0.00125 / 1000;
    const storageCost = 0.001125 / 1000;
    const cachedInputCost = 0.0003125 / 1000;
    const outputCost = 0.00375 / 1000;

    const cacheCreationCost = cachedContextChars * inputCost;
    const cacheStorageCost = (cachedContextChars * cacheTTL * storageCost);
    const newInputCost = (queryChars * requests * inputCost);
    const cachedInputCostTotal = (cachedContextChars * requests * cachedInputCost);
    const outputCostTotal = (outputChars * requests * outputCost);

    return cacheCreationCost + cacheStorageCost + newInputCost + cachedInputCostTotal + outputCostTotal;
  };

  const calculateNonCachedCost = (requests) => {
    const inputCost = 0.00125 / 1000;
    const outputCost = 0.00375 / 1000;

    const inputCostTotal = ((cachedContextChars + queryChars) * requests * inputCost);
    const outputCostTotal = (outputChars * requests * outputCost);

    return inputCostTotal + outputCostTotal;
  };

  const data = useMemo(() => {
    return Array.from({ length: 50 }, (_, i) => ({
      requests: i + 1,
      cachedCost: calculateCachedCost(i + 1),
      nonCachedCost: calculateNonCachedCost(i + 1)
    }));
  }, [cachedContextChars, cacheTTL, queryChars, outputChars]);

  return (
    <div className="p-4">
      <h2 className="text-2xl font-bold mb-4">Gemini 1.5 Pro Cost Calculator Graph</h2>
      <p className="text-lg mb-4">Model: Gemini 1.5 Pro (≤128K context window)</p>
      <div className="mb-4">
        <label className="block mb-2">
          Cached Context Characters: {cachedContextChars}
          <input
            type="range"
            min="1000"
            max="128000"
            value={cachedContextChars}
            onChange={(e) => setCachedContextChars(Number(e.target.value))}
            className="w-full"
          />
        </label>
        <label className="block mb-2">
          Cache TTL (hours): {cacheTTL}
          <input
            type="range"
            min="1"
            max="24"
            value={cacheTTL}
            onChange={(e) => setCacheTTL(Number(e.target.value))}
            className="w-full"
          />
        </label>
        <label className="block mb-2">
          Query Characters: {queryChars}
          <input
            type="range"
            min="10"
            max="1000"
            value={queryChars}
            onChange={(e) => setQueryChars(Number(e.target.value))}
            className="w-full"
          />
        </label>
        <label className="block mb-2">
          Output Characters: {outputChars}
          <input
            type="range"
            min="10"
            max="1000"
            value={outputChars}
            onChange={(e) => setOutputChars(Number(e.target.value))}
            className="w-full"
          />
        </label>
      </div>
      <ResponsiveContainer width="100%" height={400}>
        <LineChart data={data}>
          <CartesianGrid strokeDasharray="3 3" />
          <XAxis dataKey="requests" />
          <YAxis />
          <Tooltip />
          <Legend />
          <Line type="monotone" dataKey="cachedCost" name="Cached Cost" stroke="#8884d8" />
          <Line type="monotone" dataKey="nonCachedCost" name="Non-Cached Cost" stroke="#82ca9d" />
        </LineChart>
      </ResponsiveContainer>
    </div>
  );
};

export default CostCalculatorGraph;

計算式は検証できていません
料金はアップデートされる可能性があるため、あくまで参考としてご利用ください

最後に

コンテキストキャッシュはGeminiの２Mトークンという特徴と非常にマッチした機能だと思います。かつ、設定が簡単なため積極的に使うことをオススメします。

キャッシュを作成・維持することにコストがかかってしまうため、
キャッシュをチームで共有するような運用ができると良さそうだなーとか想像しています。

ClaudeのArtifacts便利ですね。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up