8
3

DSPyによるRAGの構築、最適化および評価

Posted at

[01] RAG: Retrieval-Augmented Generation | DSPyをウォークスルーします。

[01] RAG: Retrieval-Augmented Generation

Retrieval-augmented generation (RAG)は、適切な文やコンテンツを発見し、適切に洗練されたレスポンスを生成するために、ソースからの大規模なコーパスにLLMがアクセスできるようにし、知識ストアにクエリーできるようにするアプローチです。

RAGによって、対象に対してトレーニングしていなかったとしてもLLMはリアルタイムの知識を動的に活用し、意味のある回答を提供できるようになります。しかし、このようなニュアンスは、洗練されたRAGパイプラインのセットアップにおいて、より大きな複雑性を伴います。このような複雑性を削減するために、プロンプティングパイプラインのセットアップにシームレスなアプローチを提供するDSPyに目を向けます!

%pip install dspy-ai
dbutils.library.restartPython()

LMとRMの設定

LMRMのAPI、ローカルモデルのホスティングを通じてDSPyがサポートしている言語モデル(LM)とリトリーバルモデル(RM)のセットアップからスタートします。

このノートブックでは、 GPT-3.5 (gpt-3.5-turbo)とColBERTv2リトリーバー(2017 dumpのそれぞれの文の最初の段落を含むWikipedia 2017 の"abstracts"の検索インデックスをホスティングするフリーのサーバー)を取り扱います。DSPy内でLMとRMを設定し、DSPyが生成や収集で必要とした際に内部的にそれぞれのモジュールを呼び出せるようにします。

import os
os.environ["OPENAI_API_KEY"] = dbutils.secrets.get("demo-token-takaaki.yayoi", "openai_api_key")
import dspy

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

データセットのロード

このチュートリアルでは、多くの場合マルチホップの方法で回答される複雑な質問・回答のペアのコレクションであるHotPotQAデータセットを活用します。HotPotQAクラスを通じてDSPyによって提供されるこのデータセットをロードすることができます。

from dspy.datasets import HotPotQA

# データセットのロード
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# DSPyに'question'が入力であることを指示。その他のフィールドはラベルやメタデータ
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)
(20, 50)

シグネチャの構築

データセットをロードしたので、パイプラインのサブタスクに対するシグネチャを定義しましょう。

シンプルな入力質問と出力回答を特定することができますが、RAGパイプラインを構築しているので、ColBERTコーパスから得られる幾つかの文脈情報を活用したいと思います。このため、我々のシグネチャを定義しましょう: context, question --> answer

class GenerateAnswer(dspy.Signature):
    """簡潔な事実に基づく複数の回答で質問に回答します。"""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

このモデルが何を受け取り、何を生成すべきかに関するより堅牢なガイドを定義するために、contextanswerに対するちょっとした説明を含めています。

パイプラインの構築

2つのメソッドを必要とするDSPy moduleとして、RAGパイプラインを構築します。

  • __init__メソッドは必要とするサブモジュールをシンプルに宣言します: dspy.Retrievedspy.ChainOfThoughtです。後者は我々のGenerateAnswerシグネチャを実装するために定義されます。
  • forwardメソッドは手元にあるモジュールを用いて質問に回答するコントロールフローを説明します: 質問が与えられると、トップ3の適切な文を検索し、回答生成のコンテキストとしてそれらを入力します。
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

パイプラインの最適化

RAGプログラムのコンパイル

このプログラムを定義したらコンパイルしましょう。プログラムのコンパイルは、それぞれのモジュールに格納されるパラメーターを更新します。この環境では、これは主にプロンプトに含まれるものの収集、適切なものの選択の形を取ります。

コンパイルは3つのことに依存します:

  1. トレーニングセット。 上のtrainsetからの20個の質問-回答のサンプルを使います。
  2. 検証のメトリック。 予測された回答が正しく、回答を含むコンテキスト実際に収集したのかをチェックするシンプルなvalidate_context_and_answerを定義します。
  3. 固有のテレプロンプター。 DSPyコンパイラにはあなたのプログラムを最適化できる様々なテレプロンプターが含まれています。
from dspy.teleprompt import BootstrapFewShot

# 検証ロジック: 予測された回答が正しいことをチェックします。
# また、収集されたコンテキストに実際に回答が含まれているかをチェックします。
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# 我々のRAGプログラムをコンパイルする基本的なテレプロンプターのセットアップ
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# コンパイル!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)
Bootstrapped 4 full traces after 12 examples in round 0.

情報

テレプロンプター: テレプロンプターはプログラムを受け取り、ブートストラップを学習し、モジュールに対する効果的なプロンプトを選択するパワフルなオプティマイザーです。このため、「リモートからのプロンプティング」を意味する名前となっています。

それぞれのテレプロンプターは、コストや品質などをどれだ最適化するのかに関して様々なトレードオフを提供します。上の例では、シンプルなデフォルトのBootstrapFewShotを使います。

アナロジーを考えるのであれば、これは標準的なDNN教師あり学習環境において、あなたのトレーニングデータ、ロス関数、オプティマイザーとして捉えることができます。ここでは、SGDが基本的なオプティマイザーですが、AdamやRMSPropのようにより洗練された(より高コストな!)ものも存在します。

パイプラインの実行

RAGプログラムをコンパイルしたのでトライしてみましょう。

# このシンプルなRAGプログラムに対するお好きな質問を入力します。
my_question = "What castle did David Gregory inherit?"

# 結果を取得します。これにはG `pred.context` と `pred.answer` が含まれます。
pred = compiled_rag(my_question)

# コンテキストと回答を表示します。
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: What castle did David Gregory inherit?
Predicted Answer: Kinnairdy Castle
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']

素晴らしい。LMからの最後のプロンプトを調査してみましょうか?

turbo.inspect_history(n=1)




Answer questions with short factoid answers.

---

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt

Question: "Everything Has Changed" is a song from an album released under which record label ?
Answer: Big Machine Records

Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Answer: 1950

Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?
Answer: Aleem Sarwar Dar

Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Answer: "Outfield of Dreams"

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?
Answer: Aleksandr Danilovich Aleksandrov

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?
Answer: 2010

Question: Tombstone stared an actor born May 17, 1955 known as who?
Answer: Bill Paxton

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield
Answer: 1874

Question: which American actor was Candace Kita guest starred with
Answer: Bill Murray

Question: Which is taller, the Empire State Building or the Bank of America Tower?
Answer: The Empire State Building

Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?
Answer: Buena Vista Distribution

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»
[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine's Hall of Fame and "Tae Kwon Do Times" magazine's Hall of Fame.»

Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?

Reasoning: Let's think step by step in order to produce the answer. We know that "Tae Kwon Do Times" has published articles by Scott Shaw, as mentioned in the context.

Answer: Tae Kwon Do Times

---

Context:
[1] «Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an American actress, producer, singer, comic book writer, and political activist. She made her film debut in the 1995 teen drama "Kids". Her subsequent film roles include "He Got Game", "Men in Black II", "25th Hour", "Rent", "Sin City", "Death Proof", "Seven Pounds", "", and "Top Five". Dawson has also provided voice-over work for Disney and DC.»
[2] «Sarai Gonzalez | Sarai Isaura Gonzalez (born 2005) is an American Latina child actress who made her professional debut at the age of 11 on the Spanish-language ""Soy Yo"" ("That's Me") music video by Bomba Estéreo. Cast as a "nerdy" tween with a "sassy" and "confident" attitude, her performance turned her into a "Latina icon" for "female empowerment, identity and self-worth". She subsequently appeared in two get out the vote videos for Latinos in advance of the 2016 United States elections.»
[3] «Gabriela (2001 film) | Gabriela is a 2001 American romance film, starring Seidy Lopez in the title role alongside Jaime Gomez as her admirer Mike. The film has been cited as an inspiration behind the Premiere Weekend Club, which supports Latino film-making.»

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?

Reasoning: Let's think step by step in order to produce the answer. We know that Rosario Dawson made her film debut in the 1995 teen drama "Kids" and is also known for her political activism.

Answer: Rosario Dawson

---

Context:
[1] «Battle of Kursk | The Battle of Kursk was a Second World War engagement between German and Soviet forces on the Eastern Front near Kursk (450 km south-west of Moscow) in the Soviet Union during July and August 1943. The battle began with the launch of the German offensive, Operation Citadel (German: "Unternehmen Zitadelle" ), on 5 July, which had the objective of pinching off the Kursk salient with attacks on the base of the salient from north and south simultaneously. After the German offensive stalled on the northern side of the salient, on 12 July the Soviets commenced their Kursk Strategic Offensive Operation with the launch of Operation Kutuzov (Russian: Кутузов ) against the rear of the German forces in the northern side. On the southern side, the Soviets also launched powerful counterattacks the same day, one of which led to a large armoured clash, the Battle of Prokhorovka. On 3 August, the Soviets began the second phase of the Kursk Strategic Offensive Operation with the launch of Operation Polkovodets Rumyantsev (Russian: Полководец Румянцев ) against the German forces in the southern side of the Kursk salient.»
[2] «Operation Mars | Operation Mars, also known as the Second Rzhev-Sychevka Offensive Operation (Russian: Вторая Ржевско-Сычёвская наступательная операция), was the codename for an offensive launched by Soviet forces against German forces during World War II. It took place between 25 November and 20 December 1942 around the Rzhev salient in the vicinity of Moscow.»
[3] «Kholm Pocket | The Kholm Pocket (German: "Kessel von Cholm" ; Russian: Холмский котёл ) was the name given for the encirclement of German troops by the Red Army around Kholm south of Leningrad, during World War II on the Eastern Front, from 23 January 1942 until 5 May 1942. A much larger pocket was simultaneously surrounded in Demyansk, about 100 km to the northeast. These were the results of German retreat following their defeat during the Battle of Moscow.»

Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?

Reasoning: Let's think step by step in order to produce the answer. We know that the German offensive that started the Battle of Kursk was called Operation Citadel (German: "Unternehmen Zitadelle").

Answer: Operation Citadel

---

Context:
[1] «Kerry Condon | Kerry Condon (born 4 January 1983) is an Irish television and film actress, best known for her role as Octavia of the Julii in the HBO/BBC series "Rome," as Stacey Ehrmantraut in AMC's "Better Call Saul" and as the voice of F.R.I.D.A.Y. in various films in the Marvel Cinematic Universe. She is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet."»
[2] «Corona Riccardo | Corona Riccardo (c. 1878October 15, 1917) was an Italian born American actress who had a brief Broadway stage career before leaving to become a wife and mother. Born in Naples she came to acting in 1894 playing a Mexican girl in a play at the Empire Theatre. Wilson Barrett engaged her for a role in his play "The Sign of the Cross" which he took on tour of the United States. Riccardo played the role of Ancaria and later played Berenice in the same play. Robert B. Mantell in 1898 who struck by her beauty also cast her in two Shakespeare plays, "Romeo and Juliet" and "Othello". Author Lewis Strang writing in 1899 said Riccardo was the most promising actress in America at the time. Towards the end of 1898 Mantell chose her for another Shakespeare part, Ophelia im Hamlet. Afterwards she was due to join Augustin Daly's Theatre Company but Daly died in 1899. In 1899 she gained her biggest fame by playing Iras in the first stage production of Ben-Hur.»
[3] «Judi Dench | Dame Judith Olivia "Judi" Dench, {'1': ", '2': ", '3': ", '4': "} (born 9 December 1934) is an English actress and author. Dench made her professional debut in 1957 with the Old Vic Company. Over the following few years, she performed in several of Shakespeare's plays in such roles as Ophelia in "Hamlet", Juliet in "Romeo and Juliet", and Lady Macbeth in "Macbeth". Although most of her work during this period was in theatre, she also branched into film work and won a BAFTA Award as Most Promising Newcomer. She drew strong reviews for her leading role in the musical "Cabaret" in 1968.»

Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?

Reasoning: Let's think step by step in order to produce the answer. We know that the actress in question played Ophelia in a Royal Shakespeare Company production of "Hamlet" and appeared in the short film "The Shore."

Answer: Kerry Condon

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University of Edinburgh, Savilian Professor of Astronomy at the University of Oxford, and a commentator on Isaac Newton's "Principia".»

Question: What castle did David Gregory inherit?

Reasoning: Let's think step by step in order to produce the answer. We know that David Gregory inherited Kinnairdy Castle.

Answer: Kinnairdy Castle


我々は詳細な指示を記述していませんが、DSPyは非常にネガティブな文の3ショットRAGに対するこの3,000トークンのプロンプトをブートストラップし、非常にシンプルに記述されたプログラムでChain-of-Thoughtの理由づけを行うことができたことがわかります。

これは、合成と学習のパワーを示しています。もちろん、これはそれぞれの環境で完璧な場合もあればそうでない場合もある特定のテレプロンプターによって生成されたものに過ぎません。DSPyでは、お使いのプログラムの品質とコストに関して最適化、検証する必要がある広大ですがシステマティックなオプションのスペースがあります。

また、学習したオブジェクト自身を簡単に調査することができます。

for name, parameter in compiled_rag.named_predictors():
    print(name)
    print(parameter.demos[0])
    print()
generate_answer
Example({'augmented': True, 'context': ['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.', "Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.", 'Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine\'s Hall of Fame and "Tae Kwon Do Times" magazine\'s Hall of Fame.'], 'question': 'Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?', 'rationale': 'produce the answer. We know that "Tae Kwon Do Times" has published articles by Scott Shaw, as mentioned in the context.', 'answer': 'Tae Kwon Do Times'}) (input_keys=None)

パイプラインの評価

これで、開発データセットに対して我々のcompiled_ragプログラムを評価することができます。もちろん、この小規模なセットは信頼できるベンチマークであることを意味しませんが、説明するのには役立ちます。

予測された回答の精度(完全一致)を評価してみましょう。

from dspy.evaluate.evaluate import Evaluate

# `evaluate_on_hotpotqa` 関数のセットアップ。以下ではこれを複数回使用します。
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=False, display_table=5)

# `answer_exact_match` メトリックで `compiled_rag` プログラムを評価します。
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)
Average Metric: 27 / 50  (54.0%)

Screenshot 2024-04-16 at 12.44.13.png

リトリーバルの評価

リトリーバルの精度を見ることも役に立ちます。これを行う方法は複数存在しますが、収集された文に回答が含まれるかどうかをシンプルにチェックすることができます。

収集されるべきゴールドタイトルを含む開発セットを活用することができます。

def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)
Average Metric: 13 / 50  (26.0%)

Screenshot 2024-04-16 at 12.45.20.png

このシンプルなcompiled_ragプログラムは質問のかなりの割合を適切に回答できていますが(この小規模セットで40%以上)、リトリーバルの精度ははるかに低くなっています。

これは、LMが多くの場合、質問に回答するためにトレーニングの過程で記憶された知識に依存することを潜在的に示しています。この脆弱なリトリーバルに対応するために、より高度な検索挙動を含む二つ目のプログラムを探索してみましょう。

はじめてのDatabricks

はじめてのDatabricks

Databricks無料トライアル

Databricks無料トライアル

8
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
8
3