【Dify×Azure】URLの入力→数学の問題集の解答生成を行うアプリケーションを作成してみた

Last updated at 2024-09-26Posted at 2024-07-03

はじめに

今回は、Difyを用いて数学の問題集の解答生成を行うアプリケーションを作成した。
Difyの詳細については、以下の通りである。

以下に導入を示す。

導入

今回は以下の数学の問題集を使用した。

アプリケーション実行の流れとしては、
①Pythonで書かれた、Azure の Document Intelligence を使用し、指定されたURLの画像から文字を読み取るコードを実行
②テキストファイルのテキストをコピーし、Difyのチャットボットに貼り付ける
という流れになっている。

まず①のコードについて説明する。

# -*- coding: utf-8 -*-
import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding='utf-8')

import os
from dotenv import load_dotenv
import re
from azure.core.credentials import AzureKeyCredential
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.ai.documentintelligence.models import AnalyzeDocumentRequest

load_dotenv()

def extract_quoted_text(text):
    pattern = r"'(.*?)'"
    matches = re.findall(pattern, text)
    return [match for match in matches if not match.startswith('[') and not match.endswith(']')]

def analyze_read(url):
    endpoint = os.getenv("DOCUMENTINTELLIGENCE_ENDPOINT")
    key = os.getenv("DOCUMENTINTELLIGENCE_API_KEY")

    document_intelligence_client = DocumentIntelligenceClient(endpoint=endpoint, credential=AzureKeyCredential(key))

    poller = document_intelligence_client.begin_analyze_document(
        "prebuilt-read",
        AnalyzeDocumentRequest(url_source=url)
    )       
    
    return poller.result()

def write_utf8(file, text):
    file.write(text.encode('utf-8').decode('utf-8'))

def extract_text_content(result, output_file):
    with open(output_file, 'w', encoding='utf-8') as outfile:
        for page in result.pages:
            for line in page.lines:
                outfile.write(line.content + '\n')
            outfile.write('\n')  # ページ間に空行を挿入

def main():
    print("画像のURLを入力してください：")
    url = input().strip()

    try:
        result = analyze_read(url)
        
        # Markdown形式で結果を保存
        with open("analysis_result.md", "w", encoding="utf-8-sig") as f:
            f.write("# 文書解析結果\n\n")
            
            for page_num, page in enumerate(result.pages, start=1):
                f.write(f"## ページ {page_num}\n\n")
                
                for line_num, line in enumerate(page.lines, start=1):
                    f.write(f"### 行 {line_num}\n\n")
                    f.write(f"テキスト: `{line.content}`\n\n")
                    f.write(f"単語数: {len(line.content.split())}\n\n")
                    f.write(f"位置: {line.polygon}\n\n")
                    
                    if hasattr(line, 'words'):
                        f.write("#### 単語\n\n")
                        for word in line.words:
                            f.write(f"- `{word.content}` (信頼度: {word.confidence:.2f})\n")
                    else:
                        f.write("#### 単語\n\n")
                        for word in line.content.split():
                            f.write(f"- `{word}` (信頼度情報なし)\n")
                    
                    f.write("\n")
                
                # ページ全体から抽出された単語を集める
                extracted_words = []
                for line in page.lines:
                    extracted_words.extend(extract_quoted_text(line.content))


                # 抽出された単語がある場合のみ、そのセクションを書き込む
                if extracted_words:
                    f.write("### 抽出された単語\n\n")
                    for word in extracted_words:
                        f.write(f"- {word}\n")
                    f.write("\n")

                f.write("---\n\n")

            print("結果が analysis_result.md に保存されました。")

        #テキスト内容を抽出して新しいファイルに保存
        output_file = 'extracted_text.txt'
        extract_text_content(result, output_file)
        
        #抽出されたテキストの内容を確認
        with open(output_file, 'r', encoding='utf-8') as f:
            content = f.read()
            if content:
                print(f"テキスト内容が {output_file} に保存されました。")
                print("抽出されたテキストの一部:")
                print(content[:1000] + "..." if len(content) > 1000 else content)
            else:
                print(f"警告: {output_file} は空です。")

    except Exception as e:
        print(f"エラーが発生しました: {e}")

if __name__ == "__main__":
    main()

main 関数:
ユーザーに画像のURLを入力してもらう
analyze_read 関数を使って画像を解析し、解析結果をMarkdown形式で analysis_result.md ファイルに保存する(※今回は使用しない。)
各ページ、各行、各単語の情報を構造化して記録
抽出されたテキスト内容を extracted_text.txt に保存
保存されたテキストの一部を表示

・問題集の画像

・Anaconda Promptでの実行結果

次に②では、テキストファイル extracted_text.txt のテキストをコピーし、Difyのチャットボットに貼り付ける。
今回は、「チャットボット」のChatflow機能で作成した。

Dify内のアプリケーションの全体像は以下のようになっている。

開始ではextracted_text.txtに出力されたテキストを入力してもらう。

開始で得たテキストをLLMに解いてもらう。
この時、TeX形式で表示してもらった方が利用者にとっても見やすいため、

MarkdownのTex形式を使って書き直して、数式の"\("、"\)"、を$$に変換し表示してください。
その後問題を解いてください。
問題の解答についても、MarkdownのTex形式を用いて書き直して、数式の"\("、"\)"、を$$に変換してください。

と指示する。

回答では、LLMの出力結果をそのまま出力してもらう。

実行結果

extracted_text.txt でコピーしたテキストを入力すると、

問題を解いてくれる。

まとめ

今回は、Difyを用いた数学の問題集の解答生成を行うアプリケーションを作成した。
今後は数学の問題集のPDFを添付するだけで、Document Intelligenceを使いテキストを抽出し、Difyに入力、LLMに問題を解いてもらうようなアプリケーションを作成したいと考えている。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up