議事録アプリケーションの開発

Last updated at 2024-10-23Posted at 2024-10-23

【議事録アプリケーション開発】

はじめに

このプロジェクトでは、高度な機密データに対応するための議事録アプリケーションを開発しました。本記事では、2つの異なるアプローチで開発されたアプリケーション、App AとApp Bの技術的な詳細と、それぞれがどのようにセキュリティと機能性のバランスを実現したかを詳しく紹介します。

プロジェクト背景

議事録の作成・管理が非常に重要。しかし、扱うデータが機密性の高いものが多いため、外部に依存するソリューションはリスクが伴います。そのため、内部データを安全に処理できるアプリケーションが求められていました。

用いた技術等

python
c#
vosk
pyaudio
tkinter
google cloud speach
chatgpt

本プロジェクトでは、次の2つのアプローチに基づく議事録アプリケーションを開発しました。

App A: 外部APIを活用して高精度の音声認識、文法チェック、翻訳機能を持つオンラインソリューション。
App B: 完全オフラインでデータを処理し、機密情報を外部に送信せずに音声認識を行うセキュアなオフラインソリューション。

App Aの詳細

App Aは、音声認識と文法チェック、翻訳機能を統合したリアルタイム議事録アプリケーションであり、Google Cloud Speech APIやOpenAI APIなど、外部サービスを活用しています。これにより、高精度な音声認識とテキスト生成、要約、翻訳、文法修正など、豊富な機能を実現しています。

主要な技術スタック

.NET Framework: Windows上でのアプリケーションのフレームワーク。
Google Cloud Speech API: 高精度な音声認識機能を提供。
NAudio: 音声データの録音や再生をサポート。
OpenAI GPT API: 文章の要約や文法チェック、翻訳に使用。
Vosk: リアルタイムでの音声認識エンジン。

App Aの機能概要

音声認識（Google Cloud Speech API使用）
音声ファイルを入力として取り込み、Google Cloud Speech APIを利用して音声を文字データに変換します。このAPIは日本語を含む複数の言語に対応しており、会議や対話の録音を自動的にテキストに変換します。

main.cs


private async void TranscribeAudioFile(string audioFilePath)
{
    var speechClient = SpeechClient.Create();
    var config = new RecognitionConfig
    {
        Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
        SampleRateHertz = 16000,  // 標準的な音声サンプルレート
        LanguageCode = "ja-JP",   // 日本語を使用
        EnableAutomaticPunctuation = true  // 自動句読点を有効化
    };

    var audioContent = RecognitionAudio.FromBytes(buffer);
    var response = await speechClient.RecognizeAsync(config, audioContent);

    foreach (var result in response.Results)
    {
        transcript.AppendLine(result.Alternatives[0].Transcript);
    }
}

この部分のコードでは、Google Cloud Speech APIを利用し、指定された音声ファイルから日本語での文字起こしを行っています。また、EnableAutomaticPunctuationを有効にすることで、句読点も自動で挿入されます。

2.リアルタイム音声認識（Vosk使用）

Voskライブラリを用いて、オフラインでのリアルタイム音声認識もサポートしています。これにより、APIの遅延やネットワーク接続に依存しない処理も可能です。

main.cs


waveIn = new WaveInEvent
{
    WaveFormat = new WaveFormat(16000, 1)  // 16kHz, モノラル
};

waveIn.DataAvailable += (sender, e) =>
{
    if (voskRecognizer.AcceptWaveform(e.Buffer, e.BytesRecorded))
    {
        var result = voskRecognizer.Result();
        transcriptTextBox.AppendText(result.Text + "\n");
    }
};

このコードは、リアルタイムで音声を受信し、その場でVoskを利用して文字起こしを行います。

3.文章の要約・翻訳（OpenAI API使用）

OpenAI GPT APIを使用し、生成された議事録の要約や翻訳を自動化します。以下はその実装例です。

main.cs


private async Task SummarizeTextAsync()
{
    var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
    var chatRequest = new ChatRequest
    {
        Model = "gpt-4",
        Messages = new ChatMessage[]
        {
            new ChatMessage(ChatMessageRole.System, "You are a helpful assistant."),
            new ChatMessage(ChatMessageRole.User, $"要約してください: {TranscriptTextBox.Text}")
        }
    };

    var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

    if (result != null)
    {
        TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
    }
}

4.議事録の保存

議事録の保存には、要約後のテキストと、要約前のテキストを保存することができる。これは、要約機能によって重要な文章が消えてしまう事態を防ぐことにあります。

App Aのメリット

外部サービスを活用した高精度の音声認識と文章生成機能。
自動句読点挿入と文法修正機能による効率的な議事録作成。
リアルタイムの音声処理機能により会議中の文字起こしが可能。

App Bの詳細

私が担当したApp Bは、完全にオフラインで機密データを安全に処理できるアプリケーションです。要件に合わせ、外部サービスに依存しない形で音声認識を行い、データの漏洩リスクを最小限に抑えています。

主要な技術スタック

Python: アプリケーション全体を実装。
Voskライブラリ: オフラインでの音声認識を実現。
Tkinter: シンプルなGUIを提供し、ユーザーが簡単に操作できる環境を構築。

App Bの機能概要

Voskモデルの初期化
Voskライブラリを利用して、音声認識モデルをローカルで使用します。下記のコードで、Voskの日本語モデルをロードし、アプリケーションで使用できる状態にしています。

main.py


def initialize_vosk_model():
    model_path = "/Users/username/Downloads/vosk-model-ja-0.22"
    vosk_model = Model(model_path)
    recognizer = VoskRecognizer(vosk_model, 16000)
    return recognizer

2.リアルタイム音声認識

App Bでは、音声データをリアルタイムで受信し、その場でVoskを使って音声を文字データに変換します。リアルタイム処理はネットワークに依存せず、完全にオフラインで行われます。

main.py


def on_data_available(self, event):
    if self.recognizer.AcceptWaveform(event.Buffer, event.BytesRecorded):
        result = json.loads(self.recognizer.Result())
        self.transcript_textbox.append(result['text'])

このコードでは、リアルタイムで音声認識を行い、結果を即座にテキストとして表示します。外部APIを使わないため、機密データを外部に送信することはありません。

3.シンプルなGUI

Tkinterを使って、使いやすいGUIを構築しました。これにより、金融庁の職員が簡単にアプリケーションを操作できるようになっています。

main.py


import tkinter as tk
from tkinter import filedialog

def create_gui():
    root = tk.Tk()
    root.title("オフライン議事録アプリ")

    transcript_textbox = tk.Text(root, height=20, width=80)
    transcript_textbox.pack()

    start_button = tk.Button(root, text="録音開始", command=start_recording)
    start_button.pack()

    root.mainloop()

App Bのメリット

完全にオフラインで動作するため、機密データが外部に漏洩するリスクがない。
簡単な操作性を提供し、厳しいセキュリティ基準を満たす。
オンライン接続なしで高精度の音声認識が可能。

app aのC#コード

main.cs


using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using System.Windows;
using Microsoft.Win32;
using Google.Cloud.Speech.V1;
using NAudio.Wave;
using Vosk;
using OpenAI_API;
using OpenAI_API.Chat;
using Newtonsoft.Json.Linq;

namespace transcript_generator_C
{
    public partial class MainWindow : Window
    {
        private readonly string jsonPath = @"path_to_secure_location";
        private readonly string openAiApiKey = "XXXXXXXXXXX";
        private bool isJapanese = false;

        private WaveInEvent waveIn;
        private VoskRecognizer voskRecognizer;
        private Model voskModel;
        private bool isTranscribing = false;

        public MainWindow()
        {
            InitializeComponent();
            InitializeVoskModel();
            PopulateAudioSourceComboBox();
            Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", jsonPath);
            UpdatePlaceholderText();
        }


        private void InitializeVoskModel()
        {
            string modelPath = @"voskModelPath";
            voskModel = new Model(modelPath);
            voskRecognizer = new VoskRecognizer(voskModel, 16000.0f);
        }

        private void PopulateAudioSourceComboBox()
        {
            for (int n = 0; n < WaveIn.DeviceCount; n++)
            {
                var deviceInfo = WaveIn.GetCapabilities(n);
                AudioSourceComboBox.Items.Add($"{n}: {deviceInfo.ProductName}");
            }

            if (AudioSourceComboBox.Items.Count > 0)
            {
                AudioSourceComboBox.SelectedIndex = 0;
            }
        }

        private async void Button_Click_SelectAndTranscribe(object sender, RoutedEventArgs e)
        {
            OpenFileDialog openFileDialog = new OpenFileDialog
            {
                Filter = "Audio files (*.wav;*.mp3)|*.wav;*.mp3|All files (*.*)|*.*"
            };

            if (openFileDialog.ShowDialog() == true)
            {
                AudioFilePathTextBox.Text = openFileDialog.FileName;
                await TranscribeAudioFile(openFileDialog.FileName);
            }
        }

   
        private async Task TranscribeAudioFile(string audioFilePath)
        {
            if (string.IsNullOrWhiteSpace(audioFilePath))
            {
                MessageBox.Show("Please select an audio file first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            var loadingWindow = new LoadingWindow();
            try
            {
                loadingWindow.Show();
                var speechClient = SpeechClient.Create();
                int segmentLength = 59;
                StringBuilder transcript = new StringBuilder();

                using (var reader = new WaveFileReader(audioFilePath))
                {
                    int frameRate = reader.WaveFormat.SampleRate;
                    int channels = reader.WaveFormat.Channels;
                    int totalFrames = (int)reader.Length / reader.WaveFormat.BlockAlign;
                    int totalDuration = totalFrames / frameRate;

                    for (int i = 0; i < totalDuration; i += segmentLength)
                    {
                        reader.CurrentTime = TimeSpan.FromSeconds(i);
                        int framesToRead = Math.Min(segmentLength * frameRate, totalFrames - i * frameRate);
                        byte[] buffer = new byte[framesToRead * reader.WaveFormat.BlockAlign];
                        int bytesRead = reader.Read(buffer, 0, buffer.Length);

                        var audioContent = RecognitionAudio.FromBytes(buffer);
                        var config = new RecognitionConfig
                        {
                            Encoding = RecognitionConfig.Types.AudioEncoding.Linear16,
                            SampleRateHertz = frameRate,
                            LanguageCode = "ja-JP",
                            EnableAutomaticPunctuation = true,
                            AudioChannelCount = channels,
                        };

                        var response = await speechClient.RecognizeAsync(config, audioContent);

                        foreach (var result in response.Results)
                        {
                            transcript.AppendLine(result.Alternatives[0].Transcript);
                        }
                    }
                }

                TranscriptTextBox.Text = transcript.ToString();
                UpdatePlaceholderText();
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred during transcription: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
            finally
            {
                loadingWindow.Close();
            }
        }

  
        private void Button_Click_RealTimeTranscribe(object sender, RoutedEventArgs e)
        {
            if (isTranscribing)
            {
                StopRealTimeTranscription();
                return;
            }

            isTranscribing = true;
            RealTimeTranscribeButton.Content = "Stop Transcribing";

            waveIn = new WaveInEvent
            {
                WaveFormat = new WaveFormat(16000, 1),
                DeviceNumber = AudioSourceComboBox.SelectedIndex  
            };
            waveIn.DataAvailable += OnDataAvailable;
            waveIn.StartRecording();
        }

        private void OnDataAvailable(object sender, WaveInEventArgs e)
        {
            if (voskRecognizer.AcceptWaveform(e.Buffer, e.BytesRecorded))
            {
                var result = voskRecognizer.Result();
                var resultText = ExtractTextFromJson(result);

                if (!string.IsNullOrEmpty(resultText))
                {
                    Dispatcher.Invoke(() =>
                    {
                        TranscriptTextBox.AppendText(resultText + "\n");
                        UpdatePlaceholderText();
                    });
                }
            }
        }

        private string ExtractTextFromJson(string jsonResult)
        {
            try
            {
                var jsonObject = JObject.Parse(jsonResult);
                return jsonObject["text"]?.ToString();
            }
            catch (Exception ex)
            {
                MessageBox.Show($"Error parsing JSON: {ex.Message}", "Parsing Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return string.Empty;
            }
        }

        private void StopRealTimeTranscription()
        {
            isTranscribing = false;
            RealTimeTranscribeButton.Content = "Real-Time Transcribe";
            waveIn.StopRecording();
            waveIn.Dispose();
        }

    
        private void UpdatePlaceholderText()
        {
            if (string.IsNullOrWhiteSpace(TranscriptTextBox.Text))
            {
                TranscriptTextBox.Text = "Transcription results will be displayed here";
                TranscriptTextBox.Foreground = System.Windows.Media.Brushes.Gray;
            }
            else
            {
                TranscriptTextBox.Foreground = System.Windows.Media.Brushes.White;
            }
        }

        
        private void TranscriptTextBox_TextChanged(object sender, System.Windows.Controls.TextChangedEventArgs e)
        {
            if (TranscriptTextBox.Text == "Transcription results will be displayed here" && TranscriptTextBox.Foreground == System.Windows.Media.Brushes.Gray)
            {
                TranscriptTextBox.Clear();
                TranscriptTextBox.Foreground = System.Windows.Media.Brushes.White;
            }
            else if (string.IsNullOrWhiteSpace(TranscriptTextBox.Text))
            {
                UpdatePlaceholderText();
            }
        }

  
        private async void Button_Click_2(object sender, RoutedEventArgs e)
        {
            await PerformWithLoading(SummarizeTextAsync);
        }

      
        private async Task SummarizeTextAsync()
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            try
            {
                var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
                var chatRequest = new ChatRequest
                {
                    Model = "gpt-4o",
                    Messages = new ChatMessage[]
                    {
                        new ChatMessage(ChatMessageRole.System, "You are a helpful assistant."),
                        new ChatMessage(ChatMessageRole.User, $"Please summarize the following text in the langage of the original text: {transcript}")
                    }
                };

                var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

                if (result != null && result.Choices.Count > 0)
                {
                    TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
                    UpdatePlaceholderText();
                }
                else
                {
                    MessageBox.Show("No response from GPT API.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred while summarizing: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
        }

        private async void Button_Click_3(object sender, RoutedEventArgs e)
        {
            await PerformWithLoading(TranslateTextAsync);
        }

  
        private async Task TranslateTextAsync()
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            try
            {
                var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
                var chatRequest = new ChatRequest
                {
                    Model = "gpt-4o",
                    Messages = new ChatMessage[]
                    {
                        new ChatMessage(ChatMessageRole.System, "You are a helpful assistant."),
                        new ChatMessage(ChatMessageRole.User, $"Please translate the following text into English, or into Japanese if it's already in English: {transcript}")
                    }
                };

                var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

                if (result != null && result.Choices.Count > 0)
                {
                    TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
                    UpdatePlaceholderText();
                }
                else
                {
                    MessageBox.Show("No response from GPT API.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred while translating: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
        }

    
        private async void Button_Click_5(object sender, RoutedEventArgs e)
        {
            await PerformWithLoading(CleanTextAsync);
        }

  
        private async Task CleanTextAsync()
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            try
            {
                var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
                var chatRequest = new ChatRequest
                {
                    Model = "gpt-4o",
                    Messages = new ChatMessage[]
                    {
                        new ChatMessage(ChatMessageRole.System, "You are a helpful assistant."),
                        new ChatMessage(ChatMessageRole.User, $"Please correct any grammatical errors and erase unneccessary words in the following text in a langage of the text: {transcript}")
                    }
                };

                var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

                if (result != null && result.Choices.Count > 0)
                {
                    TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
                    UpdatePlaceholderText();
                }
                else
                {
                    MessageBox.Show("No response from GPT API.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred while cleaning the text: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
        }

       
        private async void Button_Click_6(object sender, RoutedEventArgs e)
        {
            await PerformWithLoading(ParaphraseTextAsync);
        }

        private async Task ParaphraseTextAsync()
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            try
            {
                var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
                var chatRequest = new ChatRequest
                {
                    Model = "gpt-4o",
                    Messages = new ChatMessage[]
                    {
                        new ChatMessage(ChatMessageRole.System, "You are a helpful assistant."),
                        new ChatMessage(ChatMessageRole.User, $"Please paraphrase the following text using financial jargon and only generate the result and you should do so in the original language of the text: {transcript}")
                    }
                };

                var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

                if (result != null && result.Choices.Count > 0)
                {
                    TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
                    UpdatePlaceholderText();
                }
                else
                {
                    MessageBox.Show("No response from GPT API.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred while paraphrasing: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
        }

     
        private void Button_Click_7(object sender, RoutedEventArgs e)
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            SaveFileDialog saveFileDialog = new SaveFileDialog
            {
                Filter = "Text file (*.txt)|*.txt",
                FileName = "transcript.txt"
            };

            if (saveFileDialog.ShowDialog() == true)
            {
                try
                {
                    File.WriteAllText(saveFileDialog.FileName, transcript);
                    MessageBox.Show("Transcript successfully saved!", "Success", MessageBoxButton.OK, MessageBoxImage.Information);
                }
                catch (Exception ex)
                {
                    MessageBox.Show($"An error occurred while saving the file: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
        }

     
        private void Button_Click_4(object sender, RoutedEventArgs e)
        {
            if (!isJapanese)
            {
                SwitchToJapanese();
            }
            else
            {
                SwitchToEnglish();
            }
        }

        private void SwitchToJapanese()
        {
            SelectFileButton.Content = "ファイルを選択";
            SummarizeButton.Content = "要約";
            TranslateButton.Content = "翻訳";
            GrammarCheckButton.Content = "文法チェック";
            ParaphraseButton.Content = "専門用語に変更";
            ExportButton.Content = "文を摘出";
            Copy.Content = "コピー";
            GetPresentationIdeasButton.Content = "プレゼンのアイデアを得る";
            RealTimeTranscribeButton.Content = "リアルタイムで文字生成";

            if (string.IsNullOrWhiteSpace(AudioFilePathTextBox.Text) || AudioFilePathTextBox.Text == "File path will be displayed here")
            {
                AudioFilePathTextBox.Text = "ここにファイルパスが表示されます。";
            }
            if (string.IsNullOrWhiteSpace(TranscriptTextBox.Text) || TranscriptTextBox.Text == "Transcription results will be displayed here")
            {
                TranscriptTextBox.Text = "ここに文字起こし結果が表示されます。";
                TranscriptTextBox.Foreground = System.Windows.Media.Brushes.Gray;
            }

            Button1.Content = "English";
            MessageBox.Show("アプリのUIが日本語に変更されました!", "成功", MessageBoxButton.OK, MessageBoxImage.Information);
            isJapanese = true;
        }

        private void SwitchToEnglish()
        {
            SelectFileButton.Content = "Select file";
            SummarizeButton.Content = "Summarize";
            TranslateButton.Content = "Translate";
            GrammarCheckButton.Content = "Grammar Check";
            ParaphraseButton.Content = "Paraphrase";
            ExportButton.Content = "Export Text";
            Copy.Content = "Copy";
            GetPresentationIdeasButton.Content = "Get presentation ideas";
            RealTimeTranscribeButton.Content = "Real-Time Transcribe";

            if (string.IsNullOrWhiteSpace(AudioFilePathTextBox.Text) || AudioFilePathTextBox.Text == "ここにファイルパスが表示されます")
            {
                AudioFilePathTextBox.Text = "File path will be displayed here";
            }
            if (string.IsNullOrWhiteSpace(TranscriptTextBox.Text) || TranscriptTextBox.Text == "ここに文字起こし結果が表示されます")
            {
                TranscriptTextBox.Text = "Transcription results will be displayed here";
                TranscriptTextBox.Foreground = System.Windows.Media.Brushes.Gray;
            }

            Button1.Content = "Japanese";
            MessageBox.Show("The app's UI has been translated to English!", "Success", MessageBoxButton.OK, MessageBoxImage.Information);
            isJapanese = false;
        }

       
        private void Button_Click_8(object sender, RoutedEventArgs e)
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
            else
            {
                Clipboard.SetText(transcript);
 
            }
        }

      
        private async Task PerformWithLoading(Func<Task> action)
        {
            var loadingWindow = new LoadingWindow();
            try
            {
                loadingWindow.Show();
                await action();
            }
            finally
            {
                loadingWindow.Close();
            }
        }

     
        private async void Button_Click_GetPresentationIdeas(object sender, RoutedEventArgs e)
        {
            await PerformWithLoading(GetPresentationIdeasAsync);
        }

       
        private async Task GetPresentationIdeasAsync()
        {
            string transcript = TranscriptTextBox.Text;

            if (string.IsNullOrWhiteSpace(transcript) || transcript == "Transcription results will be displayed here")
            {
                MessageBox.Show("The transcript is empty. Please generate or provide a transcript first.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                return;
            }

            try
            {
                var api = new OpenAIAPI(new APIAuthentication(openAiApiKey));
                var chatRequest = new OpenAI_API.Chat.ChatRequest
                {
                    Model = "gpt-4o",
                    Messages = new[]
                    {
                        new OpenAI_API.Chat.ChatMessage(OpenAI_API.Chat.ChatMessageRole.System, "You are a helpful assistant."),
                        new OpenAI_API.Chat.ChatMessage(OpenAI_API.Chat.ChatMessageRole.User, $"Please generate presentation ideas based on the following text, and provide the instructions in the language of the text: {transcript}")
                    }
                };

                var result = await api.Chat.CreateChatCompletionAsync(chatRequest);

                if (result != null && result.Choices.Count > 0)
                {
                    TranscriptTextBox.Text = result.Choices[0].Message.Content.Trim();
                    UpdatePlaceholderText();
                }
                else
                {
                    MessageBox.Show("No response from GPT API.", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show($"An error occurred while generating presentation ideas: {ex.Message}", "Error", MessageBoxButton.OK, MessageBoxImage.Error);
            }
        }
    }
}




<Window x:Class="transcript_generator_C.LoadingWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        Title="Loading" Height="200" Width="400" WindowStartupLocation="CenterScreen" WindowStyle="None" ResizeMode="NoResize" Background="#282828">
    <Grid>
        <TextBlock Text="processing, please wait..." Foreground="White" HorizontalAlignment="Center" VerticalAlignment="Center" FontFamily="Calibri" FontSize="20" Margin="0,0,0,30"/>

        <ProgressBar IsIndeterminate="True" Width="300" Height="20" HorizontalAlignment="Center" VerticalAlignment="Bottom" Margin="0,0,0,20"/>
    </Grid>
</Window>




<Window x:Class="transcript_generator_C.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        mc:Ignorable="d"
        Title="Transcript Generator" Height="624" Width="1116" Background="#2C2F33" Foreground="White" WindowStartupLocation="CenterScreen">

    <Grid Margin="10">
        <Grid.RowDefinitions>
            <RowDefinition Height="Auto"/>
            <RowDefinition Height="Auto"/>
            <RowDefinition Height="*"/>
            <RowDefinition Height="Auto"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="200"/>
            <ColumnDefinition Width="*"/>
        </Grid.ColumnDefinitions>

        
        <StackPanel Grid.Row="0" Grid.ColumnSpan="2" Orientation="Horizontal" HorizontalAlignment="Right" Margin="0,0,0,10">
            <Button x:Name="Button1" Content="Japanese" Width="100" Height="35" Margin="0,0,10,0" Click="Button_Click_4" Background="#99AAB5" BorderBrush="Transparent" Foreground="Black" FontWeight="Bold"/>
            <Button x:Name="SelectFileButton" Content="Select File and Transcribe" Width="200" Height="35" Click="Button_Click_SelectAndTranscribe" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
        </StackPanel>

     
        <TextBox x:Name="AudioFilePathTextBox" Grid.Row="1" Grid.ColumnSpan="2" TextWrapping="Wrap" VerticalAlignment="Top" Height="30" Background="#99AAB5" BorderBrush="Transparent" Foreground="Black" IsReadOnly="True" Margin="0,10,0,10"/>

        <StackPanel Grid.Row="2" Grid.Column="0" VerticalAlignment="Top" HorizontalAlignment="Left" Width="180" Margin="0,0,20,0">
            <ComboBox x:Name="AudioSourceComboBox" Height="30" Margin="0,0,0,10" Background="#99AAB5" BorderBrush="Transparent" Foreground="Black" FontWeight="Bold"/>
            <Button x:Name="SummarizeButton" Content="Summarize" Height="35" Margin="0,0,0,10" Click="Button_Click_2" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
            <Button x:Name="TranslateButton" Content="Translate" Height="35" Margin="0,0,0,10" Click="Button_Click_3" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
            <Button x:Name="GrammarCheckButton" Content="Grammar Check" Height="35" Margin="0,0,0,10" Click="Button_Click_5" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
            <Button x:Name="ParaphraseButton" Content="Paraphrase" Height="35" Margin="0,0,0,10" Click="Button_Click_6" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
            <Button x:Name="ExportButton" Content="Export Text" Height="35" Margin="0,0,0,10" Click="Button_Click_7" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
            <Button x:Name="GetPresentationIdeasButton" Content="Get Presentation Idea" Height="35" Margin="0,0,0,10" Click="Button_Click_GetPresentationIdeas" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
        </StackPanel>

  
        <StackPanel Grid.Row="2" Grid.Column="1" VerticalAlignment="Top" HorizontalAlignment="Stretch" Margin="20,0,0,0">
            <TextBox x:Name="TranscriptTextBox" TextWrapping="Wrap" VerticalAlignment="Top" Height="400" Background="#23272A" BorderBrush="Transparent" Foreground="White" Margin="0,0,0,10" AcceptsReturn="True" VerticalScrollBarVisibility="Auto"/>
            <Button x:Name="Copy" Content="Copy Transcript" HorizontalAlignment="Right" Width="140" Height="38" Click="Button_Click_8" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
        </StackPanel>

        <!-- Real-time Transcription Button -->
        <Button x:Name="RealTimeTranscribeButton" Content="Real-Time Transcribe" Grid.Row="3" Grid.Column="0" Width="180" Height="35" Click="Button_Click_RealTimeTranscribe" Background="#7289DA" BorderBrush="Transparent" Foreground="White" FontWeight="Bold"/>
    </Grid>
</Window>

app bのPythonコード

main.py


#!/usr/bin/env python3
import tkinter as tk
from tkinter import messagebox
from tkinter import filedialog
import pyaudio
import wave
import threading
import json
from vosk import Model, KaldiRecognizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
import queue
import sounddevice as sd
import sys
import re
from sumy.summarizers.text_rank import TextRankSummarizer
import spacy
import os

abs_path = os.path.dirname(__file__)

global stm_judge
stm_judge = 0

def wrap_text(text, max_length):
    lines = []
    current_line = ""

    for char in text:
        current_line += char
        if len(current_line) >= max_length and char in "。、「」":
            lines.append(current_line.strip())
            current_line = ""

    if current_line:
        lines.append(current_line.strip())

    return "\n".join(lines)


def add_commas(text):
    # Insert commas at natural clause boundaries
    text = re.sub(r'(ためには|ための政策|手がかりを得ることが|ですが|という|したがって|そのため|しかし|ので|から|が|けれど|ため|ながら|ので|おり|であり|とは)', r'\1、', text)
    # Handle special cases where commas might disrupt the flow
    text = re.sub(r'、([が])、', r'\1', text)  # Avoid unnecessary commas around "が"
    text = re.sub(r'([0-9]+)%([一-龥])', r'\1%、\2', text)  # Ensure proper formatting around percentages
    text = re.sub(r'([。、])([0-9]+%)', r'\1\2', text)  # Avoid placing commas before percentages
    return text
def add_punctuation(text):
    # Remove existing punctuation for clean processing
    text = text.replace("。", "").replace("、", "")
    # Apply the clause-based comma insertion
    text = add_commas(text)
    # Add periods at the end of recognized sentence-ending forms
    sentence_endings = ['でしょうか','みたいです','ます', 'です', 'だ', 'だった', 'ください', 'しましょう', 'たい', 'たく','である','いる','ましたね','できる','ある']
    for ending in sentence_endings:
        text = re.sub(r'({})'.format(ending), r'\1。', text)
    # Avoid redundant punctuation
    text = re.sub(r'、、', '、', text).replace('。', '。')
    return text.strip()



def calculate_average_line_length(original_text, default_line_length=20):
        # 改行がない場合も考慮し、文章全体の平均行長を計算
        words = original_text.split()  # 単語単位で分割
        if len(words) > 1:
            average_line_length = sum(len(word) for word in words) // len(words)
        else:
            average_line_length = len(original_text) if len(original_text) > 0 else default_line_length

        return average_line_length

def calculate_lines_reduced_text(original_text):
        # 元のテキストの文字数を取得
        original_length = len(original_text)

        # 削減後の文字数を計算（4割に削減）
        reduced_length = int(original_length * 0.4)

        # 平均の行長を計算
        line_length = calculate_average_line_length(original_text)

        # 削減後の行数を計算
        reduced_lines = (reduced_length + line_length - 1) // line_length

        return reduced_lines

class DocumentSummary:
    def __init__(self):
        '''
        コンストラクタ
        '''
        #アルゴリズムの一覧
        self.algorithm = {
            #'lex':LexRankSummarizer(),
            'txt':TextRankSummarizer(),
            #'red':ReductionSummarizer(),
            #'luh':LuhnSummarizer(),
            #'sum':SumBasicSummarizer(),
            #'kls':KLSummarizer(),
            #'lsa':LsaSummarizer(),
            #'edm':EdmundsonSummarizer()
        }
        #コーパスの利用有無
        self.use_corpus = True
        #要約の文書数
        self.sentences_count = 10
        #ストップワード
        self.stop_words = ['']
        #EdmundsonSummarizer用のポジティブキーワード
        self.bonus_words = ['']
        #EdmundsonSummarizer用のネガティブキーワード
        self.stigma_words = ['']
        #EdmundsonSummarizer用の無視キーワード
        self.null_words = ['']
        #要約対象のテキスト
        self.text = None
        #作成したコーパス
        self.coupus = None
        #コーパスの元文書
        self.originals = None
        #要約に適した加工が施されたテキスト情報
        self.parser = None


    def create_corpus(self, text):
        '''
        コーパスを作成する

        Parameters:
        --------
            text : str   コーパスを生成したいテキスト 
        '''
        nlp = spacy.load('ja_ginza')
        corpus = []
        originals = []
        doc = nlp(text)
        for s in doc.sents:
            originals.append(s)
            tokens = []
            for t in s:
                tokens.append(t.lemma_)
            corpus.append(' '.join(tokens))
        return corpus, originals

    def read_text(self, text):
        '''
        要約したいテキストを読み込む

        Parameters:
        --------
            text : str   要約したいテキスト  
        '''
        self.text = text
        if self.use_corpus:
            #コーパスの作成
            self.coupus, self.originals = self.create_corpus(self.text.replace('\r','').replace('\n','').replace('『','「').replace('』','」'))
            #連結したcorpusをトークナイズ
            self.parser = PlaintextParser.from_string(''.join(self.coupus), Tokenizer('japanese'))
        else:
            #テキストをそのままトークナイズ
            self.parser = PlaintextParser.from_string(self.text, Tokenizer('japanese'))

    def summarize(self, algo):
        '''
        文書を要約する

        Parameters:
        --------
            algo : str       アルゴリズム ('lex''red''luh','sum','kls'、など)
            stopwords:[str]  文書の終わりを識別する為の文字列をリストで指定
        '''

        #アルゴリズムの取得
        summarizer = self.algorithm[algo]
        if algo == 'edm':
            summarizer.bonus_words = ['']
            summarizer.stigma_words = ['']
            summarizer.null_words = ['']

        # スペースをストップワードを設定
        summarizer.stop_words = self.stop_words
        # 文書の要約
        summary = summarizer(document=self.parser.document, sentences_count=self.sentences_count)

        #要約した結果をリストに格納
        res = []
        for sentence in summary:
            if self.use_corpus:
                #特定の文字列が
                if sentence.__str__() in self.coupus:
                    res.append(self.originals[self.coupus.index(sentence.__str__())])
            else:
                res.append(sentence.__str__())

        return '\n'.join([str(x) for x in res])


class MinutesApp:
    def __init__(self, master):

        self.master = master
        self.master.title("議事録アプリ")

        # StringVarでフォルダパスを保持
        self.folder_path = tk.StringVar()

        self.is_recording = False
        self.frames = []

        # 左側に縦一列に配置するためのフレームを作成
        left_frame = tk.Frame(master)
        left_frame.pack(side=tk.LEFT, padx=10, pady=10, anchor=tk.N)

        self.start_button = tk.Button(left_frame,text="議事録開始", command=self.start_recording)
        self.start_button.pack(pady=10)

        self.stop_button = tk.Button(left_frame, text="議事録停止", command=self.stop_recording, state=tk.DISABLED)
        self.stop_button.pack(pady=10)

        self.summarize_button = tk.Button(left_frame, text="文章の要約", command=self.summarize_s)
        self.summarize_button.pack(pady=10)

        self.save_save_button = tk.Button(left_frame, text="議事録保存", command=self.save_save)
        self.save_save_button.pack(pady=10)

        self.save_mama_button = tk.Button(left_frame, text="要約前の議事録保存", command=self.save_mama)
        self.save_mama_button.pack(pady=10)

        # フォント設定
        font = ("Helvetica", 20)  # フォント名、サイズ
        # テキストボックスの作成
        self.text_box = tk.Text(master, wrap=tk.WORD, height=45, width=68,font=font)
        self.text_box.pack(pady=10)

        self.entry = tk.Entry(left_frame, textvariable=self.folder_path, width=30)
        self.entry.pack(pady=5)

        # フォルダ参照ボタンの作成
        browse_button = tk.Button(left_frame, text="議事録保存フォルダ参照", command=self.browse_folder)
        browse_button.pack(pady=5)

        # Entryウィジェットにテキストを挿入
        self.entry.insert(0, str(abs_path))

        self.p = pyaudio.PyAudio()
        self.stream = None

    def browse_folder(self):
        folder_selected = filedialog.askdirectory()  # フォルダを選択するダイアログを表示
        if folder_selected:  # フォルダが選択された場合
            self.folder_path.set(folder_selected)  # テキストボックスにフォルダパスを表示

    def save_save(self):
        final_sentence = self.text_box.get("1.0", tk.END)  # テキストボックスの内容を取得
        folder_path_text = self.entry.get()
        with open(str(folder_path_text) + "/Summarized.txt", mode='w') as f:
            f.write(str(final_sentence))
        messagebox.showinfo("議事録作成", f"議事録が保存されました")

    def save_mama(self):
        global sum_before
        folder_path_text = self.entry.get()
        if str(sum_before) == "":
            sum_before = self.text_box.get("1.0", tk.END)  # テキストボックスの内容を取得
        else:
            pass
        with open(str(folder_path_text) + "/non_Summarized.txt", mode='w') as f:
            f.write(str(sum_before))
        messagebox.showinfo("議事録作成", f"要約前の議事録が保存されました")

    def summarize_s(self):
        sentence = self.text_box.get("1.0", tk.END)  # テキストボックスの内容を取得
        # インスタンス作成
        sum = DocumentSummary()
        sum.use_corpus = True
        print(calculate_lines_reduced_text(sentence))
        sum.sentences_count = int(calculate_lines_reduced_text(sentence))

        # ここで要約したいテキストを直接指定
        filler = ['えー','あー','えっと','えーっと','えーと','うん、','うん','うーん','んー']
        sum.read_text(sentence)
        print("====要約 =================================")
        final_sentence = sum.summarize('txt')
        self.text_box.delete("1.0", tk.END)
        if "" == str(sum.summarize('txt')):
            for i in filler:
                final_sentence = sentence.replace(str(i), '')
            print("要約後テキスト : " + final_sentence)
            self.text_box.insert(tk.END, str(final_sentence))
            #messagebox.showinfo("要約するほどの分量がありません。", f"要約するほどの分量がありません。")
        else :
            for i in filler:
                final_sentence = final_sentence.replace(str(i), '')
            print("要約後テキスト : " + final_sentence)
            self.text_box.insert(tk.END, str(final_sentence) + "\n")
            messagebox.showinfo("要約が完了しました。", f"要約が完了しました。")
        global sum_before
        sum_before = sentence

    def start_recording(self):
        #if not self.is_recording:
            #self.is_recording = True
        self.text_box.delete("1.0", tk.END)
        self.start_button.config(state=tk.DISABLED)
        self.stop_button.config(state=tk.NORMAL)
        threading.Thread(target=self.real_time).start()
    def real_time(self):

        # get the samplerate - this is needed by the Kaldi recognizer

        device_info = sd.query_devices(sd.default.device[0], 'input')
        samplerate = int(device_info['default_samplerate'])

        # display the default input device
        print("===> Initial Default Device Number:{} Description: {}".format(sd.default.device[0], device_info))

        # setup queue and callback function
        q = queue.Queue()

        def recordCallback(indata, frames, time, status):
            if status:
                print(status, file=sys.stderr)
            q.put(bytes(indata))

        # build the model and recognizer objects.
        print("===> Build the model and recognizer objects.  This will take a few minutes.")
        model = Model(str(abs_path) + "/vosk-model-ja-0.22")
        recognizer = KaldiRecognizer(model, samplerate)
        recognizer.SetWords(False)

        print("===> Begin recording. Press Ctrl+C to stop the recording ")
        try:
            with sd.RawInputStream(dtype='int16',
                                channels=1,
                                callback=recordCallback):
                while True:
                    data = q.get()
                    if recognizer.AcceptWaveform(data):
                        recognizerResult = recognizer.Result()
                        # convert the recognizerResult string into a dictionary  
                        resultDict = json.loads(recognizerResult)
                        if resultDict.get("text", ""):
                            #rint(resultDict["text"])
                            tim = str((resultDict["text"]))
                            tim = tim.replace(' ', '')
                            #print(type(tim))
                            filler = ['えー','あー','えっと','えーっと','えーと','うん','うーん','んー']
                            for i in filler:
                                tim = tim.replace(str(i), '')
                            sentence = add_punctuation(str(tim))
                            max_length = 30
                            sentence = wrap_text(sentence, max_length)
                            print(sentence)
                            self.text_box.insert(tk.END, str(sentence) + "\n")
                        else:
                            pass
                    global stm_judge
                    if stm_judge == 1:
                        stm_judge = 0
                        print("リアルタイム文字起こし停止")
                        break

        except KeyboardInterrupt:
            print('===> Finished Recording')
        except Exception as e:
            print(str(e))

    def stop_recording(self):
        global stm_judge
        stm_judge = 1

        self.start_button.config(state=tk.NORMAL)
        self.stop_button.config(state=tk.DISABLED)
        sentence = self.text_box.get("1.0", tk.END)  # テキストボックスの内容を取得

        sentence = add_punctuation(str(sentence))

        max_length = 30

        sentence = wrap_text(sentence, max_length)

        print("文字起こし結果 : " + sentence)


if __name__ == "__main__":
    root = tk.Tk()
    root.geometry("800x600")
    app = MinutesApp(root)
    root.protocol("WM_DELETE_WINDOW")
    root.mainloop()

プロジェクトでの学び

セキュリティと機能性のバランスを取ることの重要性を強く感じました。特に、このような機密性の高いデータを扱う場合、オンラインソリューションが便利であっても、オフラインで安全に処理できるソリューションの必要性を再認識しました。

チームとして異なるアプローチでアプリケーションを同時に開発する経験は非常に有意義でした。私が担当したApp Bでは、外部APIを使わずにいかに高性能なアプリケーションを作るかに注力し、ニーズに応えることができました。

結論

議事録アプリケーション開発は、外部APIを活用した高機能なApp Aと、完全オフラインでセキュアなApp Bの2つのアプローチで、異なるニーズに応じた議事録アプリケーションを作成できた。今後もこの経験を活かし、技術とセキュリティの両立を目指したアプリケーション開発に挑戦していきたい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up