@sawata0324 (Sawata)posted at 2023-10-21

GoogleColabで音声データを２５MB以下に分割したい

Q&A

Python whisper GoogleColaboratory ChatGPT-API

解決したいこと

GoogleColaboratoryで、ChatGPTAPIと文字認識をするWhisperAPIをつなげて、音声データから議事録を作成するスクリプトを作成しています。
その際、Whisper側の仕様で音声データの容量が25MB以下でないと受け付けてくれません。
そのため、読み込んだファイルが25MBを超える時は圧縮し、さらに25MBを超えるときは分割して読み込んでWhisperを使用するようにスクリプトを組むことにしました。
chatGPT4と以下のnoteを参考にコードを作成しているのですが、なぜできないのかわかりません。
https://note.com/nyosubro/n/n450d2dc7cef1

解決策を教えていただきたいです。

発生している問題・エラー

---------------------------------------------------------------------------
APIError                                  Traceback (most recent call last)
<ipython-input-35-8087227227ad> in <cell line: 30>()
     29 transcripts = []
     30 for file in files_to_process:
---> 31     result = transcribe_audio(file)
     32     if result:
     33         transcripts.append(result)

4 frames
/usr/local/lib/python3.10/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
    773         stream_error = stream and "error" in resp.data
    774         if stream_error or not 200 <= rcode < 300:
--> 775             raise self.handle_error_response(
    776                 rbody, rcode, resp.data, rheaders, stream_error=stream_error
    777             )

APIError: Maximum content size limit (26214400) exceeded (26401386 bytes read) {
  "error": {
    "message": "Maximum content size limit (26214400) exceeded (26401386 bytes read)",
    "type": "server_error",
    "param": null,
    "code": null
  }
} 413 {'error': {'message': 'Maximum content size limit (26214400) exceeded (26401386 bytes read)', 'type': 'server_error', 'param': None, 'code': None}} {'Date': 'Sat, 21 Oct 2023 03:20:58 GMT', 'Content-Type': 'application/json', 'Content-Length': '171', 'Connection': 'keep-alive', 'openai-processing-ms': '646', 'openai-version': '2020-10-01', 'strict-transport-security': 'max-age=15724800; includeSubDomains', 'x-ratelimit-limit-requests': '50', 'x-ratelimit-remaining-requests': '49', 'x-ratelimit-reset-requests': '1.2s', 'x-request-id': '7e51a31305a79f2f4b3ae514f5ec456c', 'CF-Cache-Status': 'DYNAMIC', 'Server': 'cloudflare', 'CF-RAY': '81965e13bc781877-ATL', 'alt-svc': 'h3=":443"; ma=86400'}

入力しているソースコード

https://colab.research.google.com/drive/1C0yD-6PltgR_mzDituUv3JWnlpLelrgX?usp=share_link
(チャットGPTAPIの部分は問題ありません)

エラーの起きているコードを抜粋

compressed_file = "compressed.mp3"
output_folder = "output"

出力フォルダの作成

os.makedirs(output_folder, exist_ok=True)

圧縮優先

file_size = get_file_size(input_file)
compressed = False
if file_size > 25_000_000:
compress_audio(input_file, compressed_file)
file_size = get_file_size(compressed_file)
compressed = True

圧縮後も23 MBを超えていれば分割処理

if file_size > 25_000_000:
if not compressed:
compress_audio(input_file, compressed_file)
interval_ms = 300_000 # 300秒 = 300000ミリ秒
split_audio(compressed_file, interval_ms, output_folder)

files_to_process = [f"{output_folder}/{path.splitext(path.basename(compressed_file))[0]}_{i}.mp3" for i in range(len(AudioSegment.from_file(compressed_file, format="mp3")) // interval_ms + 1)]

elif compressed:
files_to_process = [compressed_file]
else:
files_to_process = [input_file]

文字起こし

transcripts = []
for file in files_to_process:
result = transcribe_audio(file)
if result:
transcripts.append(result)

文字起こし結果をテキストファイルに書き出し

output_file = f"transcript_{input_file}.txt"
with open(output_file, "w", encoding="utf-8") as file:
for transcript_line in transcripts:
if transcript_line:
# 波形の分割時に境界で改行されることが一般的なため、元の文章の改行を維持します。
file.write(transcript_line.strip() + "\n")

print(f"文字起こし結果が {output_file} に書き出されました。")

0 likes

Are you sure you want to delete the question?