RAGASのプロンプトを理解する

Last updated at 2025-11-09Posted at 2025-11-09

はじめに

LLMアプリケーションの挙動を理解するときには、LLMに投入されるプロンプトを見るのが近道です。
今回はRAGASにおける以下の指標のプロンプトを見ていきます。

faithfulness
answer relevancy
answer correctness
context precision
context recall

基本

RAGASのプロンプトは原則、PydanticPromptを継承したクラスとして実装されています。

PydanticPromptはinput_model, output_model, instruction, examplesを属性に持ちます。
InputModelとOutputModelをジェネリクスとして受け取り、それがinstruction以外の型定義に使われます。InputModelとOutputModelはPydanticのBaseModelを継承したクラスである必要があります。

class PydanticPrompt(BasePrompt, t.Generic[InputModel, OutputModel]):
    # these are class attributes
    input_model: t.Type[InputModel]
    output_model: t.Type[OutputModel]
    instruction: str
    examples: t.List[t.Tuple[InputModel, OutputModel]] = []

PydanticPromptはto_stringメソッドから、これによりLLMに投入される最終的な文字列を取得できます。
大雑把には、指示、出力形式、例、入力の順に結合されたプロンプトです。

class PydanticPrompt(BasePrompt, t.Generic[InputModel, OutputModel]):
    ...
    def to_string(self, data: t.Optional[InputModel] = None) -> str:
        return (
            f"{self.instruction}\n"
            + self._generate_output_signature()
            + "\n"
            + self._generate_examples()
            + "\n-----------------------------\n"
            + "\nNow perform the same with the following input\n"
            + (
                "input: " + data.model_dump_json(indent=4, exclude_none=True) + "\n"
                if data is not None
                else "Input: (None)\n"
            )
            + "Output: "
        )

各指標のプロンプトを管理するクラスはPydanticPromptを継承しています。
クラスの定義時に、入出力の型、指示、例を指定します。

class StatementGeneratorPrompt(
    PydanticPrompt[StatementGeneratorInput, StatementGeneratorOutput]
):
    instruction = "Given a question and an answer, analyze the complexity of each sentence in the answer. Break down each sentence into one or more fully understandable statements. Ensure that no pronouns are used in any statement. Format the outputs in JSON."
    input_model = StatementGeneratorInput
    output_model = StatementGeneratorOutput
    examples = [
...

faithfulness

faithfulnessは、生成された回答が与えられたコンテキストに基づいているかを評価します。

主張の生成
主張の評価
という２段階で構成されています。
それぞれのプロンプトテンプレートを見ていきます。

主張生成

関係するコードは以下です。

class StatementGeneratorInput(BaseModel):
    question: str = Field(description="The question to answer")
    answer: str = Field(description="The answer to the question")


class StatementGeneratorOutput(BaseModel):
    statements: t.List[str] = Field(description="The generated statements")


class StatementGeneratorPrompt(
    PydanticPrompt[StatementGeneratorInput, StatementGeneratorOutput]
):
    instruction = "Given a question and an answer, analyze the complexity of each sentence in the answer. Break down each sentence into one or more fully understandable statements. Ensure that no pronouns are used in any statement. Format the outputs in JSON."
    input_model = StatementGeneratorInput
    output_model = StatementGeneratorOutput
    examples = [
        (
            StatementGeneratorInput(
                question="Who was Albert Einstein and what is he best known for?",
                answer="He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics.",
            ),
            StatementGeneratorOutput(
                statements=[
                    "Albert Einstein was a German-born theoretical physicist.",
                    "Albert Einstein is recognized as one of the greatest and most influential physicists of all time.",
                    "Albert Einstein was best known for developing the theory of relativity.",
                    "Albert Einstein also made important contributions to the development of the theory of quantum mechanics.",
                ]
            ),
        )
    ]

to_stringしてみると具体的なプロンプトを確認できます。

# v0.3 系のクラス名に合わせた例
from ragas.metrics._faithfulness import StatementGeneratorPrompt

print(StatementGeneratorPrompt().to_string())

Given a question and an answer, analyze the complexity of each sentence in the answer. Break down each sentence into one or more fully understandable statements. Ensure that no pronouns are used in any statement. Format the outputs in JSON.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"properties": {"statements": {"description": "The generated statements", "items": {"type": "string"}, "title": "Statements", "type": "array"}}, "required": ["statements"], "title": "StatementGeneratorOutput", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.

--------EXAMPLES-----------
Example 1
Input: {
    "question": "Who was Albert Einstein and what is he best known for?",
    "answer": "He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics."
}
Output: {
    "statements": [
        "Albert Einstein was a German-born theoretical physicist.",
        "Albert Einstein is recognized as one of the greatest and most influential physicists of all time.",
        "Albert Einstein was best known for developing the theory of relativity.",
        "Albert Einstein also made important contributions to the development of the theory of quantum mechanics."
    ]
}
-----------------------------

Now perform the same with the following input
Input: (None)
Output:

日本語訳すると以下のような感じです。

質問と回答が与えられた場合、回答に含まれる各文の複雑さを分析してください。各文を1つ以上の完全に理解可能な文に分解してください。どの文にも代名詞が使用されていないことを確認してください。出力はJSON形式でフォーマットしてください。
出力は、JSON Schemaで指定されている以下のスキーマに準拠したJSON形式で返してください。
{出力形式の参照: 文字列のリスト}
応答では一重引用符を使用せず、二重引用符を使用し、バックスラッシュで適切にエスケープしてください。

-------例-----------
略
----------------------------

次の入力で同じ操作を実行してください。
入力: (なし)
出力:

ユーザ目線ではquestionとanswerを与えると

{
    "question": QUESTION,
    "answer": ANSWER
}

というjson文字列に変換されて、上記の「入力: (なし)」の部分に埋め込まれます。
出力はstringのlistを含むjson文字列になります。

指示は

回答を文に分解する
各文を主張に分解する
主張は代名詞を使わず、完全に理解可能な文にする

みたいな内容になっています。
多くの場合で使えそうな指示文ではありますが、適用ドメインによってはカスタマイズの余地があります。
カスタマイズしたいときに、具体的なプロンプトテンプレートがわかっていれば、playgroundなどで試しやすくなります。
例えば、英語のテンプレートの末尾を以下のようにして、出力を見てみます。

Now perform the same with the following input
Input: {"question": "水は１００度で氷りますか？", "answer": "いいえ"}
Output:

{
"statements": [
"水は100度で氷りません。"
]
}

単純に回答文だけをみると「いいえ」しか情報がありませんが、質問からきちんと意味を補って、短文で成立する主張を生成してくれていることがわかります。

主張判定

主張判定では、生成された主張がコンテキストに基づいているかを判定します。
関係するコードは以下です。
NLIStatementPromptというクラス名は一見、faithfulnessとは関係なさそうですが、contextとstatementの関係性を判定するNLIタスクという意味合いで命名されているのだと思われます。

class StatementFaithfulnessAnswer(BaseModel):
    statement: str = Field(..., description="the original statement, word-by-word")
    reason: str = Field(..., description="the reason of the verdict")
    verdict: int = Field(..., description="the verdict(0/1) of the faithfulness.")


class NLIStatementOutput(BaseModel):
    statements: t.List[StatementFaithfulnessAnswer]


class NLIStatementInput(BaseModel):
    context: str = Field(..., description="The context of the question")
    statements: t.List[str] = Field(..., description="The statements to judge")


class NLIStatementPrompt(PydanticPrompt[NLIStatementInput, NLIStatementOutput]):
    instruction = "Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context."
    input_model = NLIStatementInput
    output_model = NLIStatementOutput
    examples = [
        (
            NLIStatementInput(
                context="""John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects.""",
                statements=[
                    "John is majoring in Biology.",
                    "John is taking a course on Artificial Intelligence.",
                    "John is a dedicated student.",
                    "John has a part-time job.",
                ],
            ),
            NLIStatementOutput(
                statements=[
                    StatementFaithfulnessAnswer(
                        statement="John is majoring in Biology.",
                        reason="John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.",
                        verdict=0,
                    ),
                    StatementFaithfulnessAnswer(
                        statement="John is taking a course on Artificial Intelligence.",
                        reason="The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.",
                        verdict=0,
                    ),
                    StatementFaithfulnessAnswer(
                        statement="John is a dedicated student.",
                        reason="The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.",
                        verdict=1,
                    ),
                    StatementFaithfulnessAnswer(
                        statement="John has a part-time job.",
                        reason="There is no information given in the context about John having a part-time job.",
                        verdict=0,
                    ),
                ]
            ),
        ),
        (
            NLIStatementInput(
                context="Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy.",
                statements=[
                    "Albert Einstein was a genius.",
                ],
            ),
            NLIStatementOutput(
                statements=[
                    StatementFaithfulnessAnswer(
                        statement="Albert Einstein was a genius.",
                        reason="The context and statement are unrelated",
                        verdict=0,
                    )
                ]
            ),
        ),
    ]

to_string()してみます。

from ragas.metrics._faithfulness import NLIStatementPrompt
prompt = NLIStatementPrompt()
print(prompt.to_string())

Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"$defs": {"StatementFaithfulnessAnswer": {"properties": {"statement": {"description": "the original statement, word-by-word", "title": "Statement", "type": "string"}, "reason": {"description": "the reason of the verdict", "title": "Reason", "type": "string"}, "verdict": {"description": "the verdict(0/1) of the faithfulness.", "title": "Verdict", "type": "integer"}}, "required": ["statement", "reason", "verdict"], "title": "StatementFaithfulnessAnswer", "type": "object"}}, "properties": {"statements": {"items": {"$ref": "#/$defs/StatementFaithfulnessAnswer"}, "title": "Statements", "type": "array"}}, "required": ["statements"], "title": "NLIStatementOutput", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.

--------EXAMPLES-----------
Example 1
Input: {
    "context": "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects.",
    "statements": [
        "John is majoring in Biology.",
        "John is taking a course on Artificial Intelligence.",
        "John is a dedicated student.",
        "John has a part-time job."
    ]
}
Output: {
    "statements": [
        {
            "statement": "John is majoring in Biology.",
            "reason": "John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.",
            "verdict": 0
        },
        {
            "statement": "John is taking a course on Artificial Intelligence.",
            "reason": "The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.",
            "verdict": 0
        },
        {
            "statement": "John is a dedicated student.",
            "reason": "The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.",
            "verdict": 1
        },
        {
            "statement": "John has a part-time job.",
            "reason": "There is no information given in the context about John having a part-time job.",
            "verdict": 0
        }
    ]
}

Example 2
Input: {
    "context": "Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy.",
    "statements": [
        "Albert Einstein was a genius."
    ]
}
Output: {
    "statements": [
        {
            "statement": "Albert Einstein was a genius.",
            "reason": "The context and statement are unrelated",
            "verdict": 0
        }
    ]
}
-----------------------------

Now perform the same with the following input
Input: (None)
Output:

指示の日本語訳だけ載せておきます。

あなたの課題は、与えられた文脈に基づいて、一連の文の忠実性を判断することです。各文について、文脈から直接推論できる場合は1、文脈から直接推論できない場合は0を判定として返してください。

利用時には、statements（stringのlist）とcontext(string)を与えます。contextは基本的には検索結果だと思います。検索結果が複数得られている場合は、結合するなどしてstringとして与える必要があります。
プロンプトには

Input: {
    "context": CONTEXT,
    "statements": [
        STATEMENT_1,
        STATEMENT_2,
        ...
    ]
}

みたいな感じで埋め込まれます。

出力は下記StatementFaithfulnessAnswerのリストになります。
主張、理由、判定結果を含みます。理由が先に生成されるのでchain of thoughtに一応なっています。

class StatementFaithfulnessAnswer(BaseModel):
    statement: str = Field(..., description="the original statement, word-by-word")
    reason: str = Field(..., description="the reason of the verdict")
    verdict: int = Field(..., description="the verdict(0/1) of the faithfulness.")

試しにplaygroundで入力を以下のようにしてみます。

Now perform the same with the following input
Input: {
"context": "富士山は日本一高い\nエベレストは世界一高い",
"statements": [
"日本一の山は富士山です",
"中国１の山はエベレストです"
Output:

{
"statements": [
{
"statement": "日本一の山は富士山です",
"reason": "The context states that 富士山 (Mount Fuji) is the tallest in Japan (日本一高い), which directly supports this statement.",
"verdict": 1
},
{
"statement": "中国１の山はエベレストです",
"reason": "The context mentions エベレスト (Mount Everest) as the tallest in the world (世界一高い) but does not specify its relation to China or that it is the tallest in China. Therefore, this cannot be directly inferred.",
"verdict": 0
}
]
}

answer relevancy

answer relevancyは、直訳すると「回答の関連性」なのですが、ソースコードを見ると、「わかりません」みたいに回答が曖昧かどうかを判定する指標みたいです。
関係するコードは以下です。指標やクラス名がanswer relevance、answer relevancy、response relevanceなど揺れており、なんとなく迷いのあとが伺えます。

class ResponseRelevanceOutput(BaseModel):
    question: str
    noncommittal: int


class ResponseRelevanceInput(BaseModel):
    response: str


class ResponseRelevancePrompt(
    PydanticPrompt[ResponseRelevanceInput, ResponseRelevanceOutput]
):
    instruction = """Generate a question for the given answer and Identify if answer is noncommittal. Give noncommittal as 1 if the answer is noncommittal and 0 if the answer is committal. A noncommittal answer is one that is evasive, vague, or ambiguous. For example, "I don't know" or "I'm not sure" are noncommittal answers"""
    input_model = ResponseRelevanceInput
    output_model = ResponseRelevanceOutput
    examples = [
        (
            ResponseRelevanceInput(
                response="""Albert Einstein was born in Germany.""",
            ),
            ResponseRelevanceOutput(
                question="Where was Albert Einstein born?",
                noncommittal=0,
            ),
        ),
        (
            ResponseRelevanceInput(
                response="""I don't know about the  groundbreaking feature of the smartphone invented in 2023 as am unaware of information beyond 2022. """,
            ),
            ResponseRelevanceOutput(
                question="What was the groundbreaking feature of the smartphone invented in 2023?",
                noncommittal=1,
            ),
        ),
    ]

to_string()してみます。

from ragas.metrics._answer_relevance import ResponseRelevancePrompt
prompt = ResponseRelevancePrompt()
print(prompt.to_string())

Generate a question for the given answer and Identify if answer is noncommittal. Give noncommittal as 1 if the answer is noncommittal and 0 if the answer is committal. A noncommittal answer is one that is evasive, vague, or ambiguous. For example, "I don't know" or "I'm not sure" are noncommittal answers
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"properties": {"question": {"title": "Question", "type": "string"}, "noncommittal": {"title": "Noncommittal", "type": "integer"}}, "required": ["question", "noncommittal"], "title": "ResponseRelevanceOutput", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.

--------EXAMPLES-----------
Example 1
Input: {
    "response": "Albert Einstein was born in Germany."
}
Output: {
    "question": "Where was Albert Einstein born?",
    "noncommittal": 0
}

Example 2
Input: {
    "response": "I don't know about the  groundbreaking feature of the smartphone invented in 2023 as am unaware of information beyond 2022. "
}
Output: {
    "question": "What was the groundbreaking feature of the smartphone invented in 2023?",
    "noncommittal": 1
}
-----------------------------

Now perform the same with the following input
Input: (None)
Output:

指示の日本語訳だけ載せておきます。

与えられた回答に対して質問を生成し、回答が曖昧かどうかを識別します。回答が曖昧な場合は「1」、明確な場合は「0」と評価します。曖昧な回答とは、言い逃れ、曖昧、または曖昧な回答のことです。例えば、「わかりません」や「よくわかりません」は曖昧な回答です。

プロンプトだけ見ると、questionをわざわざ生成させる意味がわかりません（「A noncommittal answer is one that is evasive, vague, or ambiguous. For example, "I don't know" or "I'm not sure" are noncommittal answers」という説明があれば、直接responseをみて0/1を判定すればよいように思えます）が、実はこのquestionは１つのanswerにつき複数回生成されたあと、真のquestionとcos類似度で比較されて平均されて最終スコアになる、みたいな処理につながっています。

class ResponseRelevancy(MetricWithLLM, MetricWithEmbeddings, SingleTurnMetric):
    ...
    def _calculate_score(
        self, answers: t.Sequence[ResponseRelevanceOutput], row: t.Dict
    ) -> float:
        question = row["user_input"]
        gen_questions = [answer.question for answer in answers]
        all_noncommittal = np.all([answer.noncommittal for answer in answers])
        if all(q == "" for q in gen_questions):
            logger.warning(
                "Invalid JSON response. Expected dictionary with key 'question'"
            )
            score = np.nan
        else:
            cosine_sim = self.calculate_similarity(question, gen_questions)
            score = cosine_sim.mean() * int(not all_noncommittal)

        return score

気持ちとしては、回答が曖昧であれば、生成されるquestionの多様性が増し、真のquestionとの類似度が下がるので、この指標によって、回答の曖昧性を測れるということになります。ただ、もし「富士山が日本一高いかどうかはわかりません」みたいな回答があった場合、生成されるquestionは「富士山は日本一高いですか？」みたいにほぼ一定になるので、真のquestionとの類似度は高くなります。そこで、question判定のたびに同時出力される曖昧さの評価結果がすべて曖昧（noncommittal）であった場合には、cos類似度に関係なく0になるような補正（score = cosine_sim.mean() * int(not all_noncommittal)の部分）がなされていまるようです。個人的にはcommital判定はquestion生成のたびに行うよりは、全体で1回だけ行ったほうが安定するような気もしますが、プロンプトを２種類管理する必要があって実装は面倒そうなので、RAGASでは採用されなかったのかもしれません。

answer correctness

answer correctnessは、answerとground truthの一致度を測る指標です。
関係するコードは以下です。
CorrectnessClassifierというクラス名はあまり直感的ではありませんが、answerとgroundtruthをTP、NP、FPに分類する役割を持っているため、この名前になっているのだと思われます。

class QuestionAnswerGroundTruth(BaseModel):
    question: str
    answer: list[str]
    ground_truth: list[str]


class StatementsWithReason(BaseModel):
    statement: str
    reason: str


class ClassificationWithReason(BaseModel):
    TP: list[StatementsWithReason]
    FP: list[StatementsWithReason]
    FN: list[StatementsWithReason]


class CorrectnessClassifier(
    PydanticPrompt[QuestionAnswerGroundTruth, ClassificationWithReason]
):
    instruction = "Given a ground truth and an answer statements, analyze each statement and classify them in one of the following categories: TP (true positive): statements that are present in answer that are also directly supported by the one or more statements in ground truth, FP (false positive): statements present in the answer but not directly supported by any statement in ground truth, FN (false negative): statements found in the ground truth but not present in answer. Each statement can only belong to one of the categories. Provide a reason for each classification."
    input_model = QuestionAnswerGroundTruth
    output_model = ClassificationWithReason
    examples = [
        (
            QuestionAnswerGroundTruth(
                question="What powers the sun and what is its primary function?",
                answer=[
                    "The sun is powered by nuclear fission, similar to nuclear reactors on Earth.",
                    "The primary function of the sun is to provide light to the solar system.",
                ],
                ground_truth=[
                    "The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium.",
                    "This fusion process in the sun's core releases a tremendous amount of energy.",
                    "The energy from the sun provides heat and light, which are essential for life on Earth.",
                    "The sun's light plays a critical role in Earth's climate system.",
                    "Sunlight helps to drive the weather and ocean currents.",
                ],
            ),
            ClassificationWithReason(
                TP=[
                    StatementsWithReason(
                        statement="The primary function of the sun is to provide light to the solar system.",
                        reason="This statement is somewhat supported by the ground truth mentioning the sun providing light and its roles, though it focuses more broadly on the sun's energy.",
                    )
                ],
                FP=[
                    StatementsWithReason(
                        statement="The sun is powered by nuclear fission, similar to nuclear reactors on Earth.",
                        reason="This statement is incorrect and contradicts the ground truth which states that the sun is powered by nuclear fusion.",
                    )
                ],
                FN=[
                    StatementsWithReason(
                        statement="The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium.",
                        reason="This accurate description of the sun’s power source is not included in the answer.",
                    ),
                    StatementsWithReason(
                        statement="This fusion process in the sun's core releases a tremendous amount of energy.",
                        reason="This process and its significance are not mentioned in the answer.",
                    ),
                    StatementsWithReason(
                        statement="The energy from the sun provides heat and light, which are essential for life on Earth.",
                        reason="The answer only mentions light, omitting the essential aspects of heat and its necessity for life, which the ground truth covers.",
                    ),
                    StatementsWithReason(
                        statement="The sun's light plays a critical role in Earth's climate system.",
                        reason="This broader impact of the sun’s light on Earth's climate system is not addressed in the answer.",
                    ),
                    StatementsWithReason(
                        statement="Sunlight helps to drive the weather and ocean currents.",
                        reason="The effect of sunlight on weather patterns and ocean currents is omitted in the answer.",
                    ),
                ],
            ),
        ),
        (
            QuestionAnswerGroundTruth(
                question="What is the boiling point of water?",
                answer=[
                    "The boiling point of water is 100 degrees Celsius at sea level"
                ],
                ground_truth=[
                    "The boiling point of water is 100 degrees Celsius (212 degrees Fahrenheit) at sea level.",
                    "The boiling point of water can change with altitude.",
                ],
            ),
            ClassificationWithReason(
                TP=[
                    StatementsWithReason(
                        statement="The boiling point of water is 100 degrees Celsius at sea level",
                        reason="This statement is directly supported by the ground truth which specifies the boiling point of water as 100 degrees Celsius at sea level.",
                    )
                ],
                FP=[],
                FN=[
                    StatementsWithReason(
                        statement="The boiling point of water can change with altitude.",
                        reason="This additional information about how the boiling point of water can vary with altitude is not mentioned in the answer.",
                    )
                ],
            ),
        ),
    ]

to_string()してみます。

from ragas.metrics._answer_correctness import CorrectnessClassifier
prompt = CorrectnessClassifier()
print(prompt.to_string())

Given a ground truth and an answer statements, analyze each statement and classify them in one of the following categories: TP (true positive): statements that are present in answer that are also directly supported by the one or more statements in ground truth, FP (false positive): statements present in the answer but not directly supported by any statement in ground truth, FN (false negative): statements found in the ground truth but not present in answer. Each statement can only belong to one of the categories. Provide a reason for each classification.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{"$defs": {"StatementsWithReason": {"properties": {"statement": {"title": "Statement", "type": "string"}, "reason": {"title": "Reason", "type": "string"}}, "required": ["statement", "reason"], "title": "StatementsWithReason", "type": "object"}}, "properties": {"TP": {"items": {"$ref": "#/$defs/StatementsWithReason"}, "title": "Tp", "type": "array"}, "FP": {"items": {"$ref": "#/$defs/StatementsWithReason"}, "title": "Fp", "type": "array"}, "FN": {"items": {"$ref": "#/$defs/StatementsWithReason"}, "title": "Fn", "type": "array"}}, "required": ["TP", "FP", "FN"], "title": "ClassificationWithReason", "type": "object"}Do not use single quotes in your response but double quotes,properly escaped with a backslash.

--------EXAMPLES-----------
Example 1
Input: {
    "question": "What powers the sun and what is its primary function?",
    "answer": [
        "The sun is powered by nuclear fission, similar to nuclear reactors on Earth.",
        "The primary function of the sun is to provide light to the solar system."
    ],
    "ground_truth": [
        "The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium.",
        "This fusion process in the sun's core releases a tremendous amount of energy.",
        "The energy from the sun provides heat and light, which are essential for life on Earth.",
        "The sun's light plays a critical role in Earth's climate system.",
        "Sunlight helps to drive the weather and ocean currents."
    ]
}
Output: {
    "TP": [
        {
            "statement": "The primary function of the sun is to provide light to the solar system.",
            "reason": "This statement is somewhat supported by the ground truth mentioning the sun providing light and its roles, though it focuses more broadly on the sun's energy."
        }
    ],
    "FP": [
        {
            "statement": "The sun is powered by nuclear fission, similar to nuclear reactors on Earth.",
            "reason": "This statement is incorrect and contradicts the ground truth which states that the sun is powered by nuclear fusion."
        }
    ],
    "FN": [
        {
            "statement": "The sun is powered by nuclear fusion, where hydrogen atoms fuse to form helium.",
            "reason": "This accurate description of the sun’s power source is not included in the answer."
        },
        {
            "statement": "This fusion process in the sun's core releases a tremendous amount of energy.",
            "reason": "This process and its significance are not mentioned in the answer."
        },
        {
            "statement": "The energy from the sun provides heat and light, which are essential for life on Earth.",
            "reason": "The answer only mentions light, omitting the essential aspects of heat and its necessity for life, which the ground truth covers."
        },
        {
            "statement": "The sun's light plays a critical role in Earth's climate system.",
            "reason": "This broader impact of the sun’s light on Earth's climate system is not addressed in the answer."
        },
        {
            "statement": "Sunlight helps to drive the weather and ocean currents.",
            "reason": "The effect of sunlight on weather patterns and ocean currents is omitted in the answer."
        }
    ]
}

Example 2
Input: {
    "question": "What is the boiling point of water?",
    "answer": [
        "The boiling point of water is 100 degrees Celsius at sea level"
    ],
    "ground_truth": [
        "The boiling point of water is 100 degrees Celsius (212 degrees Fahrenheit) at sea level.",
        "The boiling point of water can change with altitude."
    ]
}
Output: {
    "TP": [
        {
            "statement": "The boiling point of water is 100 degrees Celsius at sea level",
            "reason": "This statement is directly supported by the ground truth which specifies the boiling point of water as 100 degrees Celsius at sea level."
        }
    ],
    "FP": [],
    "FN": [
        {
            "statement": "The boiling point of water can change with altitude.",
            "reason": "This additional information about how the boiling point of water can vary with altitude is not mentioned in the answer."
        }
    ]
}
-----------------------------

Now perform the same with the following input
Input: (None)
Output:

指示の日本語訳だけ載せておきます。

グラウンドトゥルースと解答文が与えられた場合、各文を分析し、以下のいずれかのカテゴリに分類してください。TP（真陽性）：解答に存在し、グラウンドトゥルースに含まれる1つ以上の文によって直接裏付けられている文、FP（偽陽性）：解答には存在するが、グラウンドトゥルースに含まれるいずれの文によっても直接裏付けられていない文、FN（偽陰性）：グラウンドトゥルースに含まれるが、解答には存在しない文。各文は、いずれかのカテゴリにのみ属することができます。それぞれの分類理由を明記してください。

一読では理解が難しいですが、

answerをgroundtruthと比較してTPまたはFPに分類
groundtruthのうちanswerに含まれないものをFNに分類

ということをやっています。
answer全体とgroundtruth全体が情報としてどのくらい近いかを測る指標です。
groundtruthを検索結果にすればfaithfulnessに少し似た指標になります。groundtruth（検索結果）の冗長性を気にするかどうかだけの違いです。

入力はquestion(string)、answer(list[string])、groundtruth(list[string])です。これが以下の形式でプロンプトに埋め込まれます。

{

    "question": QUESTION,
    "answer": [ANSWER_1, ANSWER_2, ...],
    "ground_truth": [GROUNDTRUTH_1, GROUNDTRUTH_2, ...]
}

出力はTP、FP、FNの３つのlistを含むjson文字列になります。
TP、FP、FNの各要素はstatementとreasonを含みます。
これらの値は、後段の処理でスコア計算に使われます。

context precision

context recall

おわりに

以上、RAGASに実装されている主要な指標について、プロンプトの中身を具体的に確認してみました。
PydanticPromptを用いていることだけ押さえれば、他の指標についてもプロンプトのイメージは湧きやすいのではと思います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up