1. 概要



2. 特徴や使い方


  1. 音声の文字起こし

    • 認識が確定するまでの間は薄いフォントで表示。
    • 認識が確定したらテキストが確定され、入力完了。
  2. リアルタイム翻訳

    • 確定したテキストをGPT-4o-miniを用いて即座に翻訳。
    • プロンプトを工夫することで、無駄な出力を防止。
  3. リアルタイム要約

    • 翻訳を3回ごとに、要約を生成。
    • 要約時に過去の要約を再利用し、積み上げ形式で内容をまとめる。
    • これにより、冗長な会話を要約しやすく、内容の一貫性を維持。
    • 書式も毎回再利用されるため、要約のクオリティが向上。

3. コード


  • app.py
  • templates/index.html

[ app.py ]

from flask import Flask, render_template, request, jsonify
import requests

app = Flask(__name__)

# Hard-coded API Key

# Route to serve the frontend
def index():
    return render_template('index.html')

# Endpoint to process translations
@app.route('/translate', methods=['POST'])
def translate():
    data = request.get_json()
    message_history = data.get('messageHistory')
    api_url = "https://api.openai.com/v1/chat/completions"
    prompt = "あなたはServiceNowの会議での字幕翻訳ツールです。次の音声文字認識テキストを臨場感あふれるかつ読みやすい日本語にしてください。結果をシステムに表示するため結果以外の文字は必ず削除してください。」「の記号は出力禁止。"
    payload = {
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": prompt},
        "max_tokens": 1000

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"

        response = requests.post(api_url, headers=headers, json=payload)
        gpt_response = response.json()
        translated_text = gpt_response['choices'][0]['message']['content']
        return jsonify({'translation': translated_text})
    except requests.exceptions.RequestException as e:
        return jsonify({'error': str(e)}), 500

# Enhanced summarization logic combining previous translations and summary
@app.route('/summarize', methods=['POST'])
def summarize():
    data = request.get_json()
    text = data.get('text')  # Translations field content
    previous_summary = data.get('previousSummary')  # Summary field content
    api_url = "https://api.openai.com/v1/chat/completions"
    # Combined prompt for creating a new summary
    prompt = f"読みやすい議事録を作成してください。リッチテキスト:\n{text}\n\nPrevious Summary:\n{previous_summary}"
    payload = {
        "model": "gpt-4o-mini",
        "messages": [
            {"role": "system", "content": "Summarize the content below"},
            {"role": "user", "content": prompt}
        "max_tokens": 3000

    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"

        response = requests.post(api_url, headers=headers, json=payload)
        gpt_response = response.json()
        summary_text = gpt_response['choices'][0]['message']['content'].strip()
        return jsonify({'summary': summary_text})
    except requests.exceptions.RequestException as e:
        return jsonify({'error': str(e)}), 500

# Run Flask app with debug mode enabled
if __name__ == '__main__':

[ templates/index.html ]

<!DOCTYPE html>
<html lang="en">
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Speech Recognition with GPT-4o-mini</title>
        body { font-family: Arial, sans-serif; margin: 0; padding: 20px; display: flex; justify-content: center; height: 100vh; box-sizing: border-box; }
        .container { display: flex; flex-direction: column; width: 100%; max-width: 1600px; height: 100%; }
        .column-wrapper { display: flex; justify-content: space-between; gap: 20px; flex-grow: 1; height: 100%; }
        .column { flex: 1; padding: 10px; display: flex; flex-direction: column; height: 100%; }
        h2 { margin-top: 0; text-align: center; font-size: 1.5em; }
        #output, #gptResponse, #summary { 
            flex-grow: 1; 
            border: 1px solid #ccc; 
            padding: 10px; 
            margin-top: 10px;
            overflow-y: auto;  /* Scrollable fields */
            white-space: pre-wrap; 
            height: 100%; /* Adjust height for scroll */
            font-size: 1em;  /* Adjust font size for better readability */
            line-height: 1.5em;  /* Adjust line height for better spacing */
        .button-group { text-align: center; margin-top: 10px; display: flex; justify-content: center; gap: 10px; }
        button { padding: 10px 20px; background-color: #4CAF50; color: white; border: none; cursor: pointer; margin: 0 5px; }
        button:hover { background-color: #45a049; }
        button:disabled { background-color: #cccccc; cursor: not-allowed; }
        select { width: 100%; padding: 5px; margin-top: 5px; }
        .interim { color: gray; font-style: italic; }
        .error { color: red; font-weight: bold; }
        #summary { 
            overflow-y: auto; 
            height: 100%; 
            white-space: normal;  /* Allow line breaks */
            background-color: #f8f9fa; 
            padding: 10px;
            word-wrap: break-word;  /* Ensure long words are wrapped */
        .full-width { width: 100%; text-align: center; margin-bottom: 10px; }
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>  <!-- marked.js CDN -->
    <div class="container">
        <div class="full-width">
            <select id="languageSelect">
                <option value="en-US">English</option>
        <div class="button-group">
            <button id="startButton">Start</button>
            <button id="stopButton" disabled>Stop</button>
            <button id="clearButton">Clear</button>
        <div class="column-wrapper">
            <div class="column column-left">
                <h2>Speech Recognition</h2>
                <div id="output"></div>
            <div class="column">
                <h2>GPT-4o-mini Translation</h2>
                <div id="gptResponse"></div>
            <div class="column">
                <div id="summary"></div> <!-- Markdown will be rendered here -->

        const startButton = document.getElementById('startButton');
        const stopButton = document.getElementById('stopButton');
        const clearButton = document.getElementById('clearButton');
        const output = document.getElementById('output');
        const gptResponse = document.getElementById('gptResponse');
        const summary = document.getElementById('summary');
        const languageSelect = document.getElementById('languageSelect');

        let recognition;
        let finalTranscript = '';
        let messageHistory = [];
        let interimTranscript = '';
        let translationCount = 0;
        let accumulatedSummary = '';  // Stores the cumulative summary

        function startRecognition() {
            recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
            recognition.lang = languageSelect.value;
            recognition.interimResults = true;
            recognition.continuous = true;

            recognition.onresult = (event) => {
                interimTranscript = '';  // Reset interim transcript

                for (let i = event.resultIndex; i < event.results.length; i++) {
                    if (event.results[i].isFinal) {
                        finalTranscript += event.results[i][0].transcript + ' ';
                        updateOutput(finalTranscript, '');
                        processText(event.results[i][0].transcript);  // Only send finalized text
                    } else {
                        interimTranscript += event.results[i][0].transcript;

                updateOutput(finalTranscript, interimTranscript);  // Update both final and interim texts

            recognition.onerror = (event) => {
                console.error("Error: ", event.error);

            startButton.disabled = true;
            stopButton.disabled = false;

        function stopRecognition() {
            if (recognition) {
                startButton.disabled = false;
                stopButton.disabled = true;

        function clearOutput() {
            output.textContent = '';
            gptResponse.textContent = '';
            summary.innerHTML = '';  // Clear rich text content
            finalTranscript = '';
            messageHistory = [];
            accumulatedSummary = '';  // Clear accumulated summary
            translationCount = 0;     // Reset translation count

        async function processText(text) {
            messageHistory.push({ role: 'user', content: text });

            try {
                const response = await fetch('/translate', {
                    method: 'POST',
                    headers: { 'Content-Type': 'application/json' },
                    body: JSON.stringify({ messageHistory })

                const data = await response.json();
                if (data.translation) {
                    gptResponse.innerHTML += `${data.translation}\n`;
                    messageHistory.push({ role: 'assistant', content: data.translation });

                    if (translationCount % 3 === 0) {
                        await updateSummary();
                } else if (data.error) {
                    gptResponse.innerHTML += `<span class="error">${data.error}</span>\n`;
            } catch (error) {
                gptResponse.innerHTML += `<span class="error">Translation failed: ${error.message}</span>\n`;

            autoScroll();  // Auto-scroll after translation

        function updateOutput(finalText, interimText) {
            output.innerHTML = finalText;
            if (interimText) {
                output.innerHTML += `<span class="interim">${interimText}</span>`;

        function autoScroll() {
            output.scrollTop = output.scrollHeight;
            gptResponse.scrollTop = gptResponse.scrollHeight;
            summary.scrollTop = summary.scrollHeight;

        async function updateSummary() {
            const allTranslations = gptResponse.innerHTML;
            const response = await fetch('/summarize', {
                method: 'POST',
                headers: { 'Content-Type': 'application/json' },
                body: JSON.stringify({ text: allTranslations, previousSummary: accumulatedSummary })

            const data = await response.json();
            if (data.summary) {
                accumulatedSummary = data.summary;
                summary.innerHTML = marked.parse(accumulatedSummary);  // Correctly use marked.js to render Markdown
            } else if (data.error) {
                summary.innerHTML = `Error: ${data.error}`;

        startButton.addEventListener('click', startRecognition);
        stopButton.addEventListener('click', stopRecognition);
        clearButton.addEventListener('click', clearOutput);

