ChatGPTAdvent Calendar 2024

ChatGPTにリポジトリの中身を理解してもらうためのMarkdownファイルを出力するスクリプトを作ってみた

Last updated at 2024-11-16Posted at 2024-11-16

Githubのリポジトリの中身をChatGPTが理解しやすいMarkdown形式で出力するPythonスクリプトを作成しました。このスクリプトを使えば、ChatGPTを使ってリポジトリのコードを解析したり、Issueの解決方法を相談したりできます。

スクリプト

以下が作成したスクリプトです。このスクリプトは、指定したリポジトリ内のコードを再帰的に読み込み、Markdown形式で出力します。また、ファイルサイズが10MBを超えないように分割します。

import os

def write_code_to_markdown_split(repo_path, output_dir, max_size_mb=10):
    """
    リポジトリ内のソースコードを再帰的に読み込み、
    Markdown形式で10MB未満のファイルに分割して出力するスクリプト。
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    current_file_size = 0
    file_counter = 1
    current_md_file = None

    def open_new_md_file():
        """
        新しいMarkdownファイルを開き、そのハンドルとパスを返す。
        """
        nonlocal file_counter, current_file_size
        file_path = os.path.join(output_dir, f"repository_code_part_{file_counter}.md")
        md_file = open(file_path, "w", encoding="utf-8")
        md_file.write("# Repository Code Overview\n\n")
        md_file.write(f"Source: `{repo_path}`\n\n")
        current_file_size = 0
        file_counter += 1
        return md_file, file_path

    # 最初のMarkdownファイルを開く
    current_md_file, current_file_path = open_new_md_file()

    def write_file_content(file_path, relative_path):
        """
        ファイルの内容をMarkdown形式で現在のMarkdownファイルに書き込む。
        """
        nonlocal current_file_size, current_md_file

        header = f"## {relative_path}\n\n"
        try:
            with open(file_path, "r", encoding="utf-8") as f:
                content = f.read()
        except Exception as e:
            content = f"Unable to read file: {e}"

        code_block = f"```{os.path.splitext(file_path)[-1][1:]}\n{content}\n```\n\n"
        entry_size = len(header.encode("utf-8")) + len(code_block.encode("utf-8"))

        # 現在のMarkdownファイルに追加できない場合、新しいファイルを開く
        if current_file_size + entry_size > max_size_mb * 1024 * 1024:
            current_md_file.close()
            current_md_file, current_file_path = open_new_md_file()

        # 現在のMarkdownファイルに書き込む
        current_md_file.write(header)
        current_md_file.write(code_block)
        current_file_size += entry_size

    def process_directory(dir_path, relative_path):
        """
        ディレクトリ内のすべてのファイルを再帰的に処理する。
        """
        for item in sorted(os.listdir(dir_path)):
            item_path = os.path.join(dir_path, item)
            item_relative_path = os.path.join(relative_path, item)
            if os.path.isfile(item_path):
                write_file_content(item_path, item_relative_path)
            elif os.path.isdir(item_path):
                process_directory(item_path, item_relative_path)

    # リポジトリの処理を開始
    process_directory(repo_path, "")

    # 最後のMarkdownファイルを閉じる
    if current_md_file:
        current_md_file.close()

# 実行例
if __name__ == "__main__":
    repository_path = "./my_repository"  # リポジトリのパスを指定
    output_directory = "./output_markdown"  # Markdownファイルの出力先を指定
    max_file_size_mb = 10  # 各Markdownファイルの最大サイズをMB単位で指定
    write_code_to_markdown_split(repository_path, output_directory, max_file_size_mb)
    print(f"Markdown files created in: {output_directory}")

スクリプトの特徴

1. Markdown形式でリポジトリを出力

各ファイルのコードをコードブロック（```）で囲み、ファイルごとにヘッダーを付けて出力します。
ディレクトリ構造も反映され、読みやすい形式で保存されます。

2. 10MBごとに分割

Markdownファイルが大きくなりすぎないように分割して保存します。これにより、ChatGPTのファイルサイズ制限（20MB）に対応できます。

3. エラーハンドリング

読み込めないファイルがあった場合、その旨をMarkdownファイルに記録します。

出力例

例えば、以下のようなリポジトリを対象とした場合:

my_repository/
├── main.py
├── utils/
│   ├── helper.py
│   └── constants.py
└── data/
    └── sample.txt

生成されるMarkdownファイルは以下のようになります。

repository_code_part_1.md

# Repository Code Overview

Source: `./my_repository`

## main.py

```python
# Main script
def hello_world():
    print("Hello, world!")

utils/helper.py

# Helper functions
def add(a, b):
    return a + b

**`repository_code_part_2.md`**
```markdown
# Repository Code Overview

Source: `./my_repository`

## data/sample.txt

```txt
Sample text data.

使用方法

スクリプトをローカルに保存します。
repository_pathにリポジトリのパスを指定します。
output_directoryにMarkdownファイルを保存するディレクトリを指定します。
実行すると、指定したディレクトリに分割されたMarkdownファイルが生成されます。

活用事例

ChatGPTへの活用:
- 出力したMarkdownファイルをChatGPTに渡すことで、リポジトリ全体の解析を効率化
コードレビュー支援:
- チームでコードレビューを行う際に、リポジトリの概要を共有する用途にも活用

記事が面白かったらコーヒー奢ってください！↓
https://buymeacoffee.com/takurot

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up