More than 1 year has passed since last update.

文書の要約情報をJSONファイルに保存するPythonスクリプト

Python

Last updated at 2023-11-25Posted at 2023-11-25

このText_summary.pyプログラムは、テキストファイルを読み込み、その内容を要約し、タイトルを生成し、最も一般的なキーワードを抽出して、この情報をJSONファイルに保存するPythonスクリプトです。プログラムの主な機能と流れは以下の通りです：

テキストファイルの読み込み: read_text_file関数は指定されたファイルパスからテキストファイルを読み込み、その内容を返します。
テキストの要約: query_chatgpt_api関数は入力テキストを受け取り、langdetectを使ってその言語を検出します。それに基づいて、適切な言語で要約を求めるプロンプトを設定し、OpenAIのGPT-3.5モデルに送信します。応答から得られた要約は後の処理で使用されます。
タイトルの生成: generate_title_with_gpt関数は要約されたテキストを受け取り、それをもとにタイトルを生成するためのプロンプトを作成し、GPT-3.5モデルに送信します。応答からタイトルが抽出されます。
キーワードの抽出: extract_keywords関数は正規表現を使用してテキストから単語を抽出し、Counterクラスを使用して最も頻繁に出現する単語を識別します。この関数は最も一般的な2つの単語をキーワードとして返します。
結果の保存: 最後に、読み込んだテキストファイルの要約、生成されたタイトル、抽出されたキーワードを含むJSONオブジェクトを作成し、それを指定されたファイルパスに保存します。
プログラムの実行: if __name__ == "__main__":ブロックは、このスクリプトが直接実行されたときにmain関数を呼び出すためのものです。main関数は上記のすべての手順を統合し、全体のプロセスを管理します。

このプログラムは、テキスト要約やタイトル生成などのNLPタスクにOpenAIのGPTモデルを活用し、生成された内容を簡単に保存・利用できる形式で出力することを目的としています。

import requests
import json
import os
from collections import Counter
import re
from langdetect import detect

def read_text_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def query_chatgpt_api(text, api_key):
    # 入力文書の言語を検出
    language = detect(text)
    # 言語に応じたプロンプトを設定
    if language == "zh-cn":
        prompt = "请总结以下文本：\n\n" + text
    elif language == "ja":
        prompt = "以下のテキストを要約してください：\n\n" + text
    else:
        prompt = "Please summarize the following text:\n\n" + text
    # GPTモデルへのリクエスト
    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "model": "gpt-3.5-turbo",
            "messages": [{"role": "system", "content": prompt}]
        }
    )
    return response

def generate_title_with_gpt(text, api_key):
    # GPTモデルを使用してタイトルを生成する関数
    # 入力文書の言語に基づいてタイトルを生成
    language = detect(text)
    if language == "zh-cn":
        prompt = "为以下摘要创建一个简洁有力的标题：\n\n" + text
    elif language == "ja":
        prompt = "以下の要約に対して簡潔で魅力的なタイトルを作成してください：\n\n" + text
    else:
        prompt = "Create a concise and compelling title for the following summary in its original language:\n\n" + text
    response = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers={"Authorization": f"Bearer {api_key}"},
        json={
            "model": "gpt-3.5-turbo",
            "messages": [{"role": "system", "content": prompt}]
        }
    )
    return response

def extract_keywords(text):
    # テキストからキーワードを抽出する関数
    words = re.findall(r'\b\w+\b', text)
    frequency = Counter(words)
    most_common = frequency.most_common(2)  # 最も一般的な単語を2つ抽出
    return ", ".join([word for word, freq in most_common])

def main():
    api_key = os.environ.get("OPENAI_API_KEY")
    if not api_key:
        raise ValueError("OpenAI API key not found in environment variables")

    file_path = "/home/user/Dropbox/TTS/TTS_input.txt"
    text = read_text_file(file_path)

    response = query_chatgpt_api(text, api_key)
    if response.status_code == 200:
        data = response.json()
        if "choices" in data and len(data["choices"]) > 0:
            description = data["choices"][0]["message"]["content"]
        else:
            description = "No description available"
    else:
        description = "Error in API response"

    title_response = generate_title_with_gpt(description, api_key)
    if title_response.status_code == 200:
        data = title_response.json()
        if "choices" in data and len(data["choices"]) > 0:
            title = data["choices"][0]["message"]["content"].strip()
        else:
            title = "Default Title"
    else:
        title = "Error in Title Generation"

    keywords = extract_keywords(text)

    output_file_path = "/home/user/Dropbox/workflow/upload_Video/YT-settings.json"
    yt_settings = {
        "file": "/home/user/Dropbox/workflow/upload_Video/video.MP4",
        "title": title,
        "description": description,
        "keywords": keywords,
        "categoryId": "1",
        "privacyStatus": "public"
    }

    with open(output_file_path, 'w', encoding='utf-8') as output_file:
        json.dump(yt_settings, output_file, ensure_ascii=False, indent=4)

    print("YT-settings.json has been updated.")

if __name__ == "__main__":
    main()

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up