0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

SUT‑XR: An External Framework for Evaluating and Improving AI Explanations

0
Posted at

SUT‑XR: An External Framework for Evaluating and Improving AI Explanations

Semantic Understanding Theory – External Rating Model

Even when AI is asked to “explain clearly,” common problems arise:

  • Explanations are overly long
  • They deviate from the intended meaning
  • They are redundant
  • The intended rationale is not conveyed

To address this, I developed SUT‑XR, an external evaluation framework for AI explanations.

This is not a method for improving the AI itself, but a framework for managing the quality of its explanations.


1. Why an “External Frame”?

Even if an AI is programmed with extensive rules:

  • Rules can break midway
  • The AI may mimic form without genuine understanding
  • Consistency can be lost

To address these limitations, we reverse the perspective:

Establish a layer outside the AI to evaluate its explanations.

Advantages include:

  • No additional computational burden on the AI
  • Human control over explanation quality
  • Ability to measure improvements via before/after comparisons

2. CISA: Evaluating Explanations Along Four Axes

An explanation can be represented as the following causal flow:

Context → Intent → Structure → Action

Each axis is scored from 0 to 1.

Context

Are the situation and assumptions clearly stated?

Intent

Is the purpose or rationale explicit?

Structure

Are concepts, causality, and flow well-organized?

Action

Are the steps concise, clear, and unambiguous?


3. Failure Modes: Eight Categories of Explanation Failures

Explanation failures fall into eight categories:

Basic Four

  • Context_missing
  • Intent_missing
  • Structure_missing
  • Action_missing

Procedural Issue

  • Procedure_confusion

Qualitative Failures

  • Inconsistency (contradictions)
  • Redundancy
  • Misalignment (misfit with user expectations)

Each failure is assigned a severity: Critical or Minor.


4. UserModel: Estimating the Type of User

Explanation effectiveness depends on user characteristics.

The framework estimates users along three dimensions:

  • KnowledgeLevel (Beginner → Expert)
  • GoalUrgency (Need to understand / Immediate solution / Fastest completion)
  • CognitiveStyle (Intuitive / Analytical)

CISA weights (wC, wI, wS, wA) are dynamically adjusted based on the UserModel.

Examples:

  • QuickTask → Action is prioritized
  • Learning → Structure is prioritized

5. Evidence: Estimating Understanding from User Reactions

User reactions during interaction are quantified:

  • ActionSuccess = successful steps / total steps
  • ErrorRate = mistakes / total steps
  • ClarificationDepth = depth of re-explanation requests
  • QuestionRate = questions / total conversation turns

These metrics are combined into Evidence_t.


6. UnderstandingScore: Overall Explanation Quality

Overall explanation quality is evaluated as follows:

UnderstandingScore =
  wC*C + wI*I + wS*S + wA*A
  - FailurePenalty
  - CognitiveCost

Weights w are derived from the UserModel.
Relative changes are more informative than absolute values.


7. Dynamic Adaptation (Feedback Loop)

Evidence is used to update the user’s understanding:

Understanding_t =
  α * Understanding_{t-1}
+ β * Evidence_t
  • QuickTask → β is higher
  • Learning → α is higher

Parameters are adjusted according to task type.


8. Positioning of this Theory

SUT‑XR is not an internal AI algorithm, but a layer for externally evaluating and improving AI explanations.

It sits at the intersection of:

  • Human–Computer Interaction (HCI)
  • Explainable AI
  • Interaction Design

9. Empirical Verification

The framework can be empirically validated through:

  1. Comparison of before/after explanations
  2. Scoring using CISA and Failure metrics
  3. Observing differences in resulting scores

Summary

SUT‑XR is an external evaluation framework for AI explanations, enabling users to:

  • Measure explanation quality
  • Improve explanations
  • Compare before/after results

It is particularly useful for those who find AI explanations confusing or misaligned, providing a structured methodology for improvement.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?