kaggleのGen AI Intensive Courseを受けた[Day1]

Kaggle

Last updated at 2024-12-23Posted at 2024-12-01

はじめに

最近英語の勉強をしているITエンジニア2年目です。10月末日、Kaggleからメールが届きました。「11/11-15でやってる生成AIのコースにただで登録できるよ！」とのことだったので、無料ならやってみるかと思い登録しました。全部英語でもChatGPTに聞けばなんとかなるかという考えです。英語の分量多すぎて、5日で終わるはずがありませんでした...

the 5-Day Generative AI Intensiveの概要

5日間で生成AIの基礎や背景の理解を手助けするためのコースです。
以下ChatGPT訳

5日間の生成AIインテンシブとは何ですか？

これは11月11日から15日にかけて行われる5日間のオンラインコースで、生成AI背後にある基本的な技術や手法を深く理解するのに役立つように設計されています。Googleの機械学習研究者とエンジニアのチームによって作成されたこのプログラムには、概念的な深掘りと実践的なコーディング例の両方が含まれており、新しい生成AIプロジェクトに自信を持って取り組めるようになります。

インテンシブはどのように機能しますか？

毎日、参加者は以下のものをメールで受け取ります：

📚 日々の課題
これには、新しく発行されたホワイトペーパー、NotebookLMによって生成されたコンパニオンポッドキャスト、AI Studioでのコンパニオンコードラボが含まれます。

💬 Discordディスカッションスレッド
KaggleのDiscordサーバーには、読書に関する集中的な議論のための専用チャンネルがあります。さらなる明確化を得たり、質問を浮上させたり、他の学習者とつながるのに最適な場所です。

🎥 毎日のライブストリームセミナーとAMA（Ask Me Anything）
KaggleのYouTubeチャンネルで毎日ライブ配信を行い、著者やコース貢献者がトピックをより深く掘り下げ、あなたの切実な質問に答えます。さらに、学習を魅力的に保つための楽しいサプライズも用意しています。

何が取り上げられますか？

1日目：基盤モデルとプロンプトエンジニアリング - トランスフォーマーから微調整や推論加速などの技術まで、LLMの進化を探ります。LLMとの最適な対話のためのプロンプトエンジニアリングの技術を習得します。

2日目：埋め込みとベクトルストア/データベース - 埋め込み手法、ベクトル検索アルゴリズム、LLMを使用した実世界のアプリケーション、およびそれらのトレードオフを含む、埋め込みとベクトルデータベースの概念的基盤について学びます。

3日目：生成AIエージェント - AIエージェントの中核コンポーネントと反復的な開発プロセスを理解することで、洗練されたAIエージェントの構築方法を学びます。

4日目：ドメイン固有のLLM - SecLMやMed-PaLMなどの専門化されたLLMの作成と応用について、それらを構築した研究者からの洞察とともに深く掘り下げます。

5日目：生成AIのためのMLOps - 生成AIのためのMLOpsプラクティスの適応方法と、基盤モデルと生成AIアプリケーションのためのVertex AIのツールの活用方法を発見します。

解説

このコースは、生成AI技術に関する深い理解を得たい人々を対象とした集中的なプログラムです。以下に主要なポイントを挙げます：

構造: 5日間の集中コースで、毎日異なるトピックに焦点を当てています。
内容: 基盤モデル、プロンプトエンジニアリング、埋め込み、ベクトルデータベース、AIエージェント、ドメイン固有のLLM、MLOpsなど、生成AIの重要な側面をカバーしています。
学習方法:
- 日々の課題（ホワイトペーパー、ポッドキャスト、コードラボ）
- Discordでのディスカッション
- ライブストリームセミナーとQ&Aセッション
特徴:
- Googleの専門家によって作成された
- 理論と実践の両方を含む
- 最新の技術と手法に焦点を当てている
対象者: AIエンジニアや研究者、生成AI技術に深く興味のある人々に適しています。

このコースは、生成AI分野の最新の進展を学び、実践的なスキルを身につけたい人々にとって非常に価値のある機会となるでしょう。特に、各日のトピックが体系的に構成されており、基礎から応用まで幅広くカバーしている点が注目に値します。

Notebookと動画とWhite paperで勉強してね！って感じのコースです。NoteBookは全てKaggeleの環境で実行しています。

Day1 - Prompting

Gemini APIのいくつかの例を通して、理論と実践を学べました。
このNotebookではGeminiAPIを使用するため、Google AI Studioを使用してAPI keyを取得しました。

始める前に

NoteBookではPython SDKとAI Studioを通して、プロンプトとプロンプトパラメータの使い方を学べます。インスピレーションを得るため、Geminiモデルファミリーを使って構築されたいくつかのアプリを使うのもよいでしょう。以下は私たちが気に入っているものの一部で、あなたも気にいると思います。

TextFXはLupe Fiascoとのコラボレーションで作られた、ラッパー向けのAI駆動ツール群です。
SQL Talk Gemini APIを使用して、データベースと直接会話する方法を示すアプリです。
NotebookLM Geminiモデルを使用して、あなた専用のAIリサーチアシスタントを構築します。

Gemini APIとVertexAIについての注意

whitepaperではほとんどのサンプルコードにEnterprise Vertex AI platformを使用しています。対照的に、このノートブックとこのシリーズの他のノートブックでは、Gemini Developer API とAI Studioを使用します。
両方のAPIはGeminiモデルファミリーへのアクセスを提供し、モデルに指示を与えるコードはとても似ています。Vertexは、データガバナンス、MLオペレーション、Google Cloudとの深い統合など、強力な機能を必要とする企業、政府機関、上級ユーザー向けの世界クラスのプラットフォームを提供します。
AI Studioは無料で使用でき、互換性のあるGoogleアカウントでログインするだけで始められます。Gemini APIと深く統合されており、以下でコードを実行するために使用できる無料枠が付属しています。
すでにGoogle Cloudをセットアップしている場合は、Vertex AIを通じてEnterprise Gemini APIをチェックし、whitepaperのサンプルを直接実行できます。

SDKのインストール

%pip install -U -q "google-generativeai>=0.8.3"

APIキーのセットアップ

APIキーを持っていなければAI Studioから取得します。detailed instructions in the docs.

import google.generativeai as genai
from IPython.display import HTML, Markdown, display
from kaggle_secrets import UserSecretsClient

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)

最初のプロンプト

flash = genai.GenerativeModel('gemini-1.5-flash')
response = flash.generate_content("Explain AI to me like I'm a kid.")
print(response.text)

modelを指定してレスポンスを表示します。

response.text

Imagine you have a really smart puppy.  You teach it tricks, like "sit" and "fetch."  At first, it doesn't know what those words mean, but you show it what to do, and it learns.  The more you teach it, the better it gets at following your commands.

Artificial Intelligence, or AI, is like that smart puppy, but instead of learning tricks, it learns from information we give it. We feed it lots and lots of information – like pictures of cats and dogs, or words from books – and it learns to recognize patterns.

So, if you show an AI picture of a cat, it can learn what makes a cat a cat (fluffy, whiskers, pointy ears) and then tell you if another picture is also a cat.  It's not actually *thinking* like you or me, but it's getting really good at figuring things out based on the information it's been given.

AI is used for lots of cool things, like recommending your favorite videos, helping doctors diagnose illnesses, and even making self-driving cars!  It's still learning and getting smarter every day, just like your puppy!

chatの始め方

初めのプロンプトではシングルターンの構造でしたが、マルチターンの構造も使用できます。

chat = flash.start_chat(history=[])
response = chat.send_message('Hello! My name is Zlork.')
print(response.text) # It's nice to meet you, Zlork!  How can I help you today?

response = chat.send_message('Can you tell something interesting about dinosaurs?')
print(response.text) # Many dinosaurs likely weren't the dull, grey-green creatures often depicted.  Recent research suggests that many species possessed vibrant, iridescent feathers or scales, possibly used for display, camouflage, or thermoregulation.  Think of them more like brightly colored birds or lizards than the drab monsters of old movie classics!

# chatオブジェクトを使っている間は会話は保たれます。名前を覚えているかを以下で確認
response = chat.send_message('Do you remember what my name is?')
print(response.text) Yes, your name is Zlork.

モデルの選択

Gemini APIはGemini model familyのたくさんのモデルを使用できます。ページ参照

for model in genai.list_models():
  print(model.name) # models/chat-bison-001, models/text-bison-001, models/embedding-gecko-001..., models/aqa

models.listはトークン上限やサポートしているパラメータなどのモデルの機能の情報もレスポンスに含みます。

for model in genai.list_models():
  if model.name == 'models/gemini-1.5-flash':
    print(model)
    break

model.txt

Model(name='models/gemini-1.5-flash',
      base_model_id='',
      version='001',
      display_name='Gemini 1.5 Flash',
      description=('Alias that points to the most recent stable version of Gemini 1.5 Flash, our '
                   'fast and versatile multimodal model for scaling across diverse tasks.'),
      input_token_limit=1000000,
      output_token_limit=8192,
      supported_generation_methods=['generateContent', 'countTokens'],
      temperature=1.0,
      max_temperature=2.0,
      top_p=0.95,
      top_k=40)

生成パラメータの探索

出力の長さ

LLMでテキストを生成する際、出力の長さはコストとパフォーマンスに影響します。より多くのトークンを生成すると、計算量が増加し、エネルギーの消費、待ち時間、コストが増加します。
Gemini APIではモデルが制限を超えてトークンを生成しないようにmax_output_lengthパラメータを指定します。このパラメータの指定は出力トークンの生成に影響はしません。そのため、出力がより簡潔な文体やテキストになるわけではありませんが、指定された長さに達すると生成を停止します。限られた制限でより優れた出力を求めるにはプロンプトエンジニアリングが必要になることがあります。

short_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(max_output_tokens=200))

response = short_model.generate_content('Write a 1000 word essay on the importance of olives in modern society.')
print(response.text)

response.text

## The Enduring Importance of Olives in Modern Society

The olive, a seemingly unassuming fruit, holds a position of profound significance in modern society, extending far beyond its culinary applications.  Its impact reverberates through economic landscapes, cultural traditions, and even environmental considerations, demonstrating a legacy built over millennia.  While the olive's prominence may vary across different regions and cultures, its overall importance remains undeniable, reflecting a complex interplay of historical continuity and contemporary relevance.

The olive's economic importance is perhaps most readily apparent.  Olive oil production, a cornerstone of the Mediterranean diet, is a major industry supporting livelihoods across numerous countries.  From Spain and Italy, the world's largest producers, to smaller-scale operations in Greece, Tunisia, and Morocco, the cultivation, processing, and distribution of olives and olive oil generate substantial employment opportunities, contribute significantly to national GDPs, and sustain rural communities.  Beyond the primary production, the olive oil industry fuels a complex network of supporting industries

プロンプトに短い文章の条件を課すと以下の通りになります。

response = short_model.generate_content('Write a short poem on the importance of olives in modern society.')
print(response.text)

response.text

From sun-drenched groves, a bounty springs,
The olive's grace, the pleasure brings.
In oil so rich, a flavour deep,
A culinary promise to keep.

On tables spread, a simple treat,
From tapenade to olives sweet.
A symbol strong, of history's hand,
A staple food, across the land. 
From ancient times, its worth remains,
In modern lives, it still sustains.

Temperature

Temperatureはトークン選択のランダム性を操ります。Temperatureが高いほど、選ばれる次のトークンは多くの候補から選ばれ、多様な結果を生み出します。一方低いTemperatureは反対の効果を持つため、もしTemperatureを0とした場合はgreedy decodingと呼ばれる最も高い確率のトークンを選択することになります。
Temperatureはランダム性を保証するものではないですが、出力を「少し変化」させるために使用できます。

high_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=2.0))

for _ in range(5):
  response = high_temp_model.generate_content('Pick a random colour... (answer in a single word)')
  if response.parts:
    print(response.text, '-' * 25)

response.text

Purple
 -------------------------
Marigold
 -------------------------
Purple
 -------------------------
Purple
 -------------------------
Aquamarine
 -------------------------

Temperaatureを0にした場合

 low_temp_model = genai.GenerativeModel(
    'gemini-1.5-flash',
    generation_config=genai.GenerationConfig(temperature=0.0))

response.text

Purple
 -------------------------
Purple
 -------------------------
Purple
 -------------------------
Purple
 -------------------------
Purple
 -------------------------

一括実行したら、ここでエラーが出てました。
ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).
短時間にリクエストを送りすぎたためのエラーです。気をつけて！

top-K, top-P

temperatureと同様に、top-Kとtop-Pも出力の多様性を制御するために使用します。
top-Kは出力トークンから選ぶ際、確率の上位のどこまでを対象にするか指定する正の整数です。
top-Pは累積確率が閾値を超えるとトークンが候補に選ばれなくなる値、その閾値を指定します。top-Pが0であればgreedy decodingに相当し、1の場合はモデルの語彙の内全てが対象になります。
どちらも使用する場合、Gemini APIは以下の順序で処理を行います。

Top-K トークンでフィルタリング
Top-P でフィルタリング
最後に、設定されたtemperatureを使用して候補トークンからサンプリング

以下を何度か実行して、出力を確認しましょう。

model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        # These are the default values for gemini-1.5-flash-001.
        temperature=1.0,
        top_k=64,
        top_p=0.95,
    ))

story_prompt = "You are a creative writer. Write a short story about a cat who goes on an adventure."
response = model.generate_content(story_prompt)
print(response.text)

resopnse.text

# 1回目
Bartholomew, a ginger tabby with a penchant for napping in sunbeams, was not a cat for adventure. He preferred the familiar comforts of his plush bed, the rhythmic purr of the washing machine, and the occasional, well-timed head scratch from his human, Emily. But one day, a rogue gust of wind blew open the window, whisking away Bartholomew's favourite toy, a red feathered mouse named "Mr. Whiskers."

Instinct took over. Bartholomew, fueled by the desperation of a cat missing his prized possession, bolted out the window, landing with a soft thud on the manicured lawn. The world outside was a sensory explosion: buzzing bees, the earthy smell of freshly cut grass, and the cacophony of birdsong. He was completely disoriented, but the memory of Mr. Whiskers kept him focused. He followed the wind's path, navigating through a maze of rose bushes and towering hydrangeas, the scent of Mr. Whiskers growing stronger with each step.

He found himself in a bustling alleyway, filled with the aroma of roasting chestnuts and the squawks of pigeons. He was met with curious glances from a group of alley cats, their fur matted and their eyes wary. Bartholomew, despite his fear, proudly announced his quest for Mr. Whiskers, his tiny voice trembling with determination. 

The alley cats, touched by his bravery, joined his search. They led him to a shadowy corner where a stray dog, a mangy terrier named Buster, was happily gnawing on a red feather. It was Mr. Whiskers!  

Bartholomew, emboldened by his new friends, confronted Buster, demanding his toy back.  Buster, surprised by the fierce little ginger cat, begrudgingly relinquished the feather, a hint of amusement in his eyes. 

Bartholomew, ecstatic, returned to his home, clutching the feather, a hero in the eyes of the alley cats. He was no longer the lazy, sun-loving cat. He was Bartholomew, the adventurous explorer, the fearless warrior who had rescued his prized possession from a fearsome foe. He nestled back into his plush bed, the feather clutched tightly in his paw, and dreamt of future adventures.  The alley cats, watching from the shadows, knew that Bartholomew's spirit of adventure had been awakened, and he was never going to be the same again.

# 2回目
Bartholomew, a ginger tabby with a penchant for mischief and a heart full of wanderlust, watched the world through the window. Birds chirped, squirrels scampered, and the wind rustled the leaves, whispering tales of faraway lands. He longed for adventure, something beyond the familiar confines of his home.

One sunny afternoon, the back door swung open, and Bartholomew saw his chance. He slipped out, his tail held high with a mischievous glint in his eyes. The world was a tapestry of scents and sounds, a symphony of excitement. He sniffed the air, tasting the earth and the sweet scent of blooming flowers.

He followed a winding path that led him to the edge of the woods, a lush emerald haven bathed in dappled sunlight. The forest floor was a mosaic of fallen leaves and moss, a perfect playground for a curious cat. He stalked a fat, lumbering beetle, his playful instincts taking over. He chased butterflies, their wings a kaleidoscope of colors against the green backdrop.

As the day wore on, the forest began to dim. Bartholomew, tired but exhilarated, found a cozy nook beneath a giant oak, its branches reaching towards the sky like protective arms. He curled up, the gentle rustling of leaves a lullaby. He dreamt of flying squirrels and talking owls, his adventure echoing in his mind.

As the first rays of dawn touched the forest, Bartholomew woke with a renewed sense of purpose. He had to find his way back. He retraced his steps, the familiar sights and smells guiding him. He navigated the winding path, his heart filled with the thrill of his adventure.

He arrived back home just as the sun peeked over the horizon, painting the sky in hues of orange and gold. The back door opened, and his human, Mrs. Higgins, was waiting. She gasped, her face a mixture of relief and concern. "Bartholomew! Where have you been?"

Bartholomew rubbed against her legs, purring contentedly. He couldn't tell her about the beetle chase or the talking owls, but he knew he had lived a day full of magic and wonder. He had tasted freedom, felt the wind on his fur, and discovered the thrill of the unknown. He knew he would always be a house cat, but he was also a cat with a spirit that soared, a cat who, even in his own backyard, had found an adventure worthy of a thousand tales.

気になってtop-Kを0にしたんですが、0の場合はNoneと同じく全てのトークンが考慮される扱いになるようでした。

Prompting

このセクションはAPIから直接試せるプロンプトの紹介です。テキストを変更して異なる指示やその他の変更を試すことができます。

Zero-shot

ゼロショットプロンプトはプロンプトに要望を記述して、直接モデルの入力にします。

model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=5,
    ))

zero_shot_prompt = """Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE.
Review: "Her" is a disturbing study revealing the direction
humanity is headed if AI is allowed to keep evolving,
unchecked. I wish there were more movies like this masterpiece.
Sentiment: """

response = model.generate_content(zero_shot_prompt)
print(response.text) # Sentiment: **POSITIVE**

Enum mode

モデルはテキストを生成するために学習されているが、時として期待を上回るものも生み出せる。前述の例はモデルの出力はラベルですが、時には先行する「Sentiment」というラベルを含むこともあり、出力トークン数の制限がなければ、その後に説明文を付け加えることもあります。Gemini APIには、出力を固定された値のセットに制限できるEnumモードがあります。

import enum

class Sentiment(enum.Enum):
    POSITIVE = "positive"
    NEUTRAL = "neutral"
    NEGATIVE = "negative"


model = genai.GenerativeModel(
    'gemini-1.5-flash-001',
    generation_config=genai.GenerationConfig(
        response_mime_type="text/x.enum",
        response_schema=Sentiment
    ))

response = model.generate_content(zero_shot_prompt)
print(response.text) # positive

One-shot and few-shot

期待するレスポンスの例を一つだけプロンプトで与えるものが"one-shot"、いくつか与えるものが"few-shot"プロンプトです。
画像処理の分野と同じ言葉の部分があって面白いです。

model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        top_p=1,
        max_output_tokens=250,
    ))

few_shot_prompt = """Parse a customer's pizza order into valid JSON:

EXAMPLE:
I want a small pizza with cheese, tomato sauce, and pepperoni.
JSON Response:
'''
{
"size": "small",
"type": "normal",
"ingredients": ["cheese", "tomato sauce", "peperoni"]
}
'''

EXAMPLE:
Can I get a large pizza with tomato sauce, basil and mozzarella
JSON Response:
'''
{
"size": "large",
"type": "normal",
"ingredients": ["tomato sauce", "basil", "mozzarella"]
}

ORDER:
"""

customer_order = "Give me a large with cheese & pineapple"


response = model.generate_content([few_shot_prompt, customer_order])
print(response.text)
""" response.text
```json
{
  "size": "large",
  "type": "normal",
  "ingredients": ["cheese", "pineapple"]
}
"""

JSON mode

スキーマを制御し、JSON（他のテキストやマークダウンを含まない）のみを受け取ることを保証するために、Gemini APIのJSON modeを使用することができます。これにより、モデルは提供されたスキーマに従ってトークン選択を誘導するように、デコーディングを制約します。

import typing_extensions as typing

class PizzaOrder(typing.TypedDict):
    size: str
    ingredients: list[str]
    type: str


model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=0.1,
        response_mime_type="application/json",
        response_schema=PizzaOrder,
    ))

response = model.generate_content("Can I have a large dessert pizza with apple and chocolate")
print(response.text) # {"ingredients": ["apple", "chocolate"], "size": "large", "type": "dessert pizza"}

Chain of Thought (CoT)

LLMsに対する直接的なプロンプトは素早く(出力トークンの使用量の点で)効率的に回答を返すことができますが、ハルシネーションを起こしやすい傾向があります。(言語や文法の面では)回答は「正しく見える」かもしれませんが、事実性や推論の面では異なる可能性があります。
Chain-of-Thought（思考の連鎖）プロンプティングは、モデルに中間的な推論ステップを出力するよう指示する技術で、通常より良い結果が得られます。特にfew-shotの例と組み合わせると効果的です。この技術はハルシネーションを完全に排除するわけではなく、トークン数の増加により実行コストが高くなる傾向があることに注意する必要があります。
Geminiファミリーのようなモデルは「おしゃべり」で推論ステップを提供するように訓練されているため、プロンプトでモデルにより直接的になるよう求めることができます。

prompt = """When I was 4 years old, my partner was 3 times my age. Now, I
am 20 years old. How old is my partner? Return the answer immediately."""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
response = model.generate_content(prompt)

print(response.text) # 48

即時(immediately)に回答を求めると間違えてます。さすがに48歳ではないです。。。
次はステップごとに考える(think step by step)で行うよう指示します。

prompt = """When I was 4 years old, my partner was 3 times my age. Now,
I am 20 years old. How old is my partner? Let's think step by step."""

response = model.generate_content(prompt)
print(response.text)

respons.text(md形式で出してくれました)

Step 1: Find the partner's age when you were 4.

When you were 4, your partner was 3 times your age, so they were 4 * 3 = 12 years old.

Step 2: Find the age difference between you and your partner.

The age difference is 12 - 4 = 8 years.

Step 3: Calculate your partner's current age.

You are now 20 years old.
Your partner is 8 years older than you.
Therefore, your partner is currently 20 + 8 = 28 years old.

So the answer is $\boxed{28}$

わかりやすいです! しかも見やすい...

ReAct (Reason and act)

この例では、Gemini APIで直接ReActプロンプトを実行し、検索ステップを自分で行います。このプロンプトは明確に定義された構造に従っているため、プロンプトをより使いやすいAPIにラップし、自動的にツール呼び出しを行うフレームワークが利用可能です。例えば、この章のLangChainの例のようなものがあります。
Wikipedia検索エンジンでこれを試すには、「ReActを使用したWikipedia検索」のクックブック例をチェックしてください。
注意: ここで使用されているプロンプトおよびコンテキスト内の例は、 https://github.com/ysymyth/ReAct から取得しており、MITライセンスの下で公開されています。 Copyright (c) 2023 Shunyu Yao.

model_instructions = """
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation,
Observation is understanding relevant information from an Action's output and Action can be one of three types:
 (1) <search>entity</search>, which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it
     will return some similar entities to search and you can try to search the information from those topics.
 (2) <lookup>keyword</lookup>, which returns the next sentence containing keyword in the current context. This only does exact matches,
     so keep your searches short.
 (3) <finish>answer</finish>, which returns the answer and finishes the task.
"""

example1 = """Question
Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?

Thought 1
The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.

Action 1
<search>Milhouse</search>

Observation 1
Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.

Thought 2
The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".

Action 2
<lookup>named after</lookup>

Observation 2
Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.

Thought 3
Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.

Action 3
<finish>Richard Nixon</finish>
"""

example2 = """Question
What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1
I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.

Action 1
<search>Colorado orogeny</search>

Observation 1
The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.

Thought 2
It does not mention the eastern sector. So I need to look up eastern sector.

Action 2
<lookup>eastern sector</lookup>

Observation 2
The eastern sector extends into the High Plains and is called the Central Plains orogeny.

Thought 3
The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.

Action 3
<search>High Plains</search>

Observation 3
High Plains refers to one of two distinct land regions

Thought 4
I need to instead search High Plains (United States).

Action 4
<search>High Plains (United States)</search>

Observation 4
The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130m).

Thought 5
High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.

Action 5
<finish>1,800 to 7,000 ft</finish>
"""

# Come up with more examples yourself, or take a look through https://github.com/ysymyth/ReAct/

一度に1つのステップを捉え、ハルシネーションによる観察ステップを無視するために、stop_sequencesを使用して生成プロセスを終了させます。ステップはThought, Action, Observationの順です。

question = """Question
Who was the youngest author listed on the transformers NLP paper?
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')
react_chat = model.start_chat()

# You will perform the Action, so generate up to, but not including, the Observation.
config = genai.GenerationConfig(stop_sequences=["\nObservation"])

resp = react_chat.send_message(
    [model_instructions, example1, example2, question],
    generation_config=config)
print(resp.text)

resp.text

Thought 1
I need to find the Transformers NLP paper and then find the authors' ages to determine the youngest.  This will require multiple steps.

Action 1
<search>Transformers NLP paper</search>

この調査を利用者自身で行い、モデルに与えることができます。

observation = """Observation 1
[1706.03762] Attention Is All You Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
"""
resp = react_chat.send_message(observation, generation_config=config)
print(resp.text)

resp.text

Thought 2
The observation gives me the authors of the paper "Attention is All You Need". I don't have their ages, so I can't determine the youngest directly.  I will need to find biographical information on each author.  This will be difficult given the limitations of the actions available.  I will try to find their ages via a search engine if possible.

Action 2
<search>Aidan N. Gomez age</search>

このプロセスは<finish>アクションになるまで続きます。紹介したものを繰り返すこともできますし、Wikipedia exampleで完全に自動化されたReActシステムの動作を見ることもできます。

Code prompting

コード生成

Geminiモデルファミリーは、コード、設定、スクリプトの生成に使用できます。コード生成は、プログラミングを学ぶ際、新しい言語を学ぶ際、または最初の草案を迅速に生成する際に役立ちます。
LLMは推論できず、学習データを繰り返す可能性があるため、まずコードを読んでテストし、関連するライセンスを遵守することが不可欠であると認識しておくことが重要です。

model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    generation_config=genai.GenerationConfig(
        temperature=1,
        top_p=1,
        max_output_tokens=1024,
    ))

# Gemini 1.5 models are very chatty, so it helps to specify they stick to the code.
code_prompt = """
Write a Python function to calculate the factorial of a number. No explanation, provide only the code.
"""

response = model.generate_content(code_prompt)
Markdown(response.text)

def factorial(n):
  if n == 0:
    return 1
  else:
    return n * factorial(n-1)

コード実行

Gemini APIは自動で生成したコードを実行し、出力を得ることも出来ます。

model = genai.GenerativeModel(
    'gemini-1.5-flash-latest',
    tools='code_execution')

code_exec_prompt = """
Calculate the sum of the first 14 prime numbers. Only consider the even primes, and make sure you get them all.
"""

response = model.generate_content(code_exec_prompt)
Markdown(response.text)

"""
To calculate the sum of the first 14 prime numbers, considering only even primes, we need to understand that there is only one even prime number: 2. All other prime numbers are odd.

Therefore, the question is contradictory. There are not 14 even prime numbers. The sum of the first 14 prime numbers would include 2, and then 13 other odd prime numbers. To calculate this, let's use Python:
"""
def is_prime(n):
    """Checks if a number is prime."""
    if n <= 1:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

primes = []
num = 2
count = 0
while count < 14:
    if is_prime(num):
        primes.append(num)
        count += 1
    num += 1

print(f"The first 14 prime numbers are: {primes}")
sum_of_primes = sum(primes)
print(f"The sum of the first 14 prime numbers is: {sum_of_primes}")
"""
The first 14 prime numbers are: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43]
The sum of the first 14 prime numbers is: 281
The only even prime number is 2. The question asks for the sum of the first 14 prime numbers considering only the even primes. Since there is only one even prime, the question is not well-defined for 14 primes. The code above calculates the sum of the first 14 prime numbers, including the single even prime, 2. The sum is 281.

これは一度のレスポンスのように見えるが、レスポンスを詳しく調べると、初期テキスト、コード生成、実行結果、最終的なテキストサマリーという各ステップを確認することができます。

for part in response.candidates[0].content.parts:
  print(part)
  print("-----")
"""
text: "To calculate the sum of the first 14 prime numbers, considering only even primes, we need to understand that there is only one even prime number: 2.  All other prime numbers are odd.\n\nTherefore, the question is contradictory. There are not 14 even prime numbers.  The sum of the first 14 prime numbers would include 2, and then 13 other odd prime numbers.  To calculate this, let\'s use Python:\n\n"

-----
executable_code {
  language: PYTHON
  code: "\ndef is_prime(n):\n    \"\"\"Checks if a number is prime.\"\"\"\n    if n <= 1:\n        return False\n    for i in range(2, int(n**0.5) + 1):\n        if n % i == 0:\n            return False\n    return True\n\nprimes = []\nnum = 2\ncount = 0\nwhile count < 14:\n    if is_prime(num):\n        primes.append(num)\n        count += 1\n    num += 1\n\nprint(f\"The first 14 prime numbers are: {primes}\")\nsum_of_primes = sum(primes)\nprint(f\"The sum of the first 14 prime numbers is: {sum_of_primes}\")\n\n"
}

-----
code_execution_result {
  outcome: OUTCOME_OK
  output: "The first 14 prime numbers are: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43]\nThe sum of the first 14 prime numbers is: 281\n"
}

-----
text: "The only even prime number is 2.  The question asks for the sum of the first 14 prime numbers considering only the even primes.  Since there is only one even prime, the question is not well-defined for 14 primes.  The code above calculates the sum of the first 14 prime numbers, including the single even prime, 2.  The sum is 281.\n"

コード探索

Geminiモデルファミリーはコードの説明もできます。

file_contents = !curl https://raw.githubusercontent.com/magicmonty/bash-git-prompt/refs/heads/master/gitprompt.sh

explain_prompt = f"""
Please explain what this file does at a very high level. What is it, and why would I use it?

'''
{file_contents}
'''
"""

model = genai.GenerativeModel('gemini-1.5-flash-latest')

response = model.generate_content(explain_prompt)
Markdown(response.text)
"""
This file is a bash script that provides a highly customizable Git prompt for your terminal. It enhances the standard bash prompt to display information about the current Git repository, such as the branch, status (clean, modified files, etc.), and optionally the upstream branch.

You would use it to:

Improve your Git workflow: By showing relevant Git status information directly in your prompt, you can quickly see the state of your repository without needing to run git status every time.
Customize your terminal appearance: The script allows for extensive customization of colors, symbols, and the layout of the Git information in your prompt. It supports themes and even lets you create your own custom theme.
Integrate with other tools: It can display information about your virtual environment (e.g., virtualenv, conda, nvm) alongside your Git information.
In essence, it's a powerful tool that combines informative Git status display with significant customization options to improve the user experience when working with Git. It handles different shell versions (bash and zsh) and attempts gracefully to handle situations where required tools (like find with the -mmin option) are unavailable.

プロンプトについて深く学ぶには

White paperをチェック
TextFX, SQL Talk, NotebookLMを試す
Gemini APIドキュメントのIntroduction to Promptingを読む
AI StudioでGemini APIのprompt galleryを試す
Gemini API cookbookのinspirational examplesとeducational quickstartsをチェック

1日目の感想

分量が多い！私が英語に慣れていないだけなのか...
コード実行して、なんとなくならすぐわかった気になれました。一つ一つ確認していくと時間がかかりすぎた。
画像処理の分野で使われる用語も出てきたりして、NLPもAI(確率統計)の分野で共通していることを実感できました。
APIがどのレベルでパッケージ化されているかが分かり、大変ためになりました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up