AIのコード生成「Agentless」を試す

Last updated at 2025-02-01Posted at 2025-01-07

概要

AgentlessがSWEBenchで好成績を出したというのがちょっと前にでていたので中身を確認してみた。

「Agentless」という最新手法。LLMの新しい使い方。がとてもわかり易いのでこれを見ると中身の理解にとても役立ちます。

全体像

Localization: LLMとEmbeddingを使って関連するファイルと位置を特定する (File-level→Element-level→Line-level)
Repair: ラインの位置と変更DiffからなるPatchをissueごとに複数生成する
Validation and Selection: Regression test (既存のテスト)の実行とissueに対応するReproduction testを生成(複数)し、これらの結果から最終的なpatchを決定する

ステップ

1. 準備

Repo setup

gh repo clone OpenAutoCoder/agentless

pip install -r requirements.txt

repo structure

ダウンロードしてないと各レポのStructureを生成するプログラムが走り時間がかかるので、https://github.com/OpenAutoCoder/Agentless/releases/tag/v1.5.0 からダウンロードする

各issueに対してrepo内の情報が格納されている。

Env var設定

export OPENAI_API_KEY=xxxxxx
export PROJECT_FILE_LOC=~/Downloads/repo_structure/repo_structures
export PYTHONPATH=$PYTHONPATH:$(pwd)

~/Downloads/repo_structure/repo_structures は上でダウンロードしてunzipしたもののPath

2. Localization

今回は、一つだけ --target_id=django__django-10914 を例に実行してみる

2.1. ファイルレベル --file_level

2.1.1. ファイルレベル --file_level - 関連ファイルを取得

python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level --num_threads 10 --skip_existing --target_id=django__django-10914

プロンプト

Please look through the following GitHub problem description and Repository structure and provide a list of files that one would need to edit to fix the problem.

### GitHub Problem Description ###
Set default FILE_UPLOAD_PERMISSION to 0o644.
Description
	
Hello,
As far as I can see, the File Uploads documentation page does not mention any permission issues.
What I would like to see is a warning that in absence of explicitly configured FILE_UPLOAD_PERMISSIONS, the permissions for a file uploaded to FileSystemStorage might not be consistent depending on whether a MemoryUploadedFile or a TemporaryUploadedFile was used for temporary storage of the uploaded data (which, with the default FILE_UPLOAD_HANDLERS, in turn depends on the uploaded data size).
The tempfile.NamedTemporaryFile + os.rename sequence causes the resulting file permissions to be 0o0600 on some systems (I experience it here on CentOS 7.4.1708 and Python 3.6.5). In all probability, the implementation of Python's built-in tempfile module explicitly sets such permissions for temporary files due to security considerations.
I found mentions of this issue on GitHub, but did not manage to find any existing bug report in Django's bug tracker.


###

### Repository Structure ###
django/
    setup.py
    __init__.py
    __main__.py
    shortcuts.py
    conf/
        __init__.py
        global_settings.py
    ...すべてのファイル

###

Please only provide the full path and return at most 5 files.
The returned files should be separated by new lines ordered by most to least important and wrapped with ```
For example:
```
file1.py
file2.py
```

結果例

```
django/core/files/storage.py
django/conf/global_settings.py
django/core/files/uploadhandler.py
django/core/files/uploadedfile.py
docs/conf.py
```

2.1.2. ファイルレベル --file_level - 関係ないファイルの特定 --irrelevant

python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level_irrelevant --num_threads 10 --skip_existing --target_id=django__django-10914 --irrelevant

プロンプト

Please look through the following GitHub problem description and Repository structure and provide a list of folders that are irrelevant to fixing the problem.
Note that irrelevant folders are those that do not need to be modified and are safe to ignored when trying to solve this problem.

### GitHub Problem Description ###
Set default FILE_UPLOAD_PERMISSION to 0o644.
Description
	
Hello,
As far as I can see, the File Uploads documentation page does not mention any permission issues.
What I would like to see is a warning that in absence of explicitly configured FILE_UPLOAD_PERMISSIONS, the permissions for a file uploaded to FileSystemStorage might not be consistent depending on whether a MemoryUploadedFile or a TemporaryUploadedFile was used for temporary storage of the uploaded data (which, with the default FILE_UPLOAD_HANDLERS, in turn depends on the uploaded data size).
The tempfile.NamedTemporaryFile + os.rename sequence causes the resulting file permissions to be 0o0600 on some systems (I experience it here on CentOS 7.4.1708 and Python 3.6.5). In all probability, the implementation of Python's built-in tempfile module explicitly sets such permissions for temporary files due to security considerations.
I found mentions of this issue on GitHub, but did not manage to find any existing bug report in Django's bug tracker.


###

### Repository Structure ###
django/
    setup.py
    __init__.py
    __main__.py
    shortcuts.py
    conf/
        __init__.py
     ...すべてのファイル

###

Please only provide the full path.
Remember that any subfolders will be considered as irrelevant if you provide the parent folder.
Please ensure that the provided irrelevant folders do not include any important files needed to fix the problem
The returned folders should be separated by new lines and wrapped with ```
For example:
```
folder1/
folder2/folder3/
folder4/folder5/
```

結果例

```
django/conf/locale/
django/urls/
django/middleware/
django/db/
django/forms/
django/core/management/
django/core/cache/
django/core/mail/
django/core/servers/
django/core/serializers/
django/core/checks/
django/core/handlers/
django/utils/
django/templatetags/
django/template/
django/contrib/
django/dispatch/
django/apps/
django/views/
docs/
scripts/
```

2.1.3. ファイルレベル retrieve - embeddingから関係ファイルを取得

前項でirrelevant fileをfilterしながら、Embeddingで関連ファイルを取得する

python agentless/fl/retrieve.py --index_type simple \
                                --filter_type given_files \
                                --filter_file results/swe-bench-lite/file_level_irrelevant/loc_outputs.jsonl \
                                --output_folder results/swe-bench-lite/retrievel_embedding \
                                --persist_dir embedding/swe-bench_simple \
                                --num_threads 10 \
                                --target_id=django__django-10914

2.1.4. ファイルレベル combine - LLMとembeddingで取得したファイルをマージ

LLMとEmebddingで取得した関連ファイルをそれぞれtop Nを取ってきてマージする

python agentless/fl/combine.py  --retrieval_loc_file results/swe-bench-lite/retrievel_embedding/retrieve_locs.jsonl \
                                --model_loc_file results/swe-bench-lite/file_level/loc_outputs.jsonl \
                                --top_n 3 \
                                --output_folder results/swe-bench-lite/file_level_combined

2.2. 関連Elementレベル --related_level

前項で取得したLLMとEmbeddingから取得した関連ファイルを対象に element levelの関連を取得

python agentless/fl/localize.py --related_level \
                                --output_folder results/swe-bench-lite/related_elements \
                                --top_n 3 \
                                --compress_assign \
                                --compress \
                                --start_file results/swe-bench-lite/file_level_combined/combined_locs.jsonl \
                                --num_threads 10 \
                                --skip_existing \
                                --target_id=django__django-10914

promptは

Please provide the complete set of locations as either a class name, a function name, or a variable name.
Note that if you include a class, you do not need to list its specific methods.
You can include either the entire class or don't include the class name and instead include specific methods in the class.
### Examples:
```
full_path1/file1.py
function: my_function_1
class: MyClass1
function: MyClass2.my_method

full_path2/file2.py
variable: my_var
function: MyClass3.my_method
full_path3/file3.py
function: my_function_2
function: my_function_3
function: MyClass4.my_method_1
class: MyClass5
```

Return just the locations wrapped with ```.

実際に見つかった変更箇所のElementsの例:

{
    "django/core/files/storage.py": ["class: FileSystemStorage"],
    "django/conf/global_settings.py": ["variable: FILE_UPLOAD_PERMISSIONS"], 
    "django/core/files/uploadhandler.py": [""]
}

2.3. 関連行レベル --fine_grain_line_level

related elementを対象に行レベルの変更位置サンプルの生成 (この例ではsampleを4つ作る)

python agentless/fl/localize.py --fine_grain_line_level \
                                --output_folder results/swe-bench-lite/edit_location_samples \
                                --top_n 3 \
                                --compress \
                                --temperature 0.8 \
                                --num_samples 4 \
                                --start_file results/swe-bench-lite/related_elements/loc_outputs.jsonl \
                                --num_threads 10 \
                                --skip_existing \
                                --target_id=django__django-10914

prompt (一部):

Please provide the class name, function or method name, or the exact line numbers that need to be edited.
The possible location outputs should be either \"class\", \"function\" or \"line\".

### Examples:
```
full_path1/file1.py
line: 10
class: MyClass1
line: 51

full_path2/file2.py
function: MyClass2.my_method
line: 12

full_path3/file3.py
function: my_function
line: 24
line: 156
```

Return just the location(s) wrapped with ```.

example result:

["```\ndjango/core/files/storage.py\nline: 260\nline: 217\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\nfull_path1/django/core/files/storage.py\nline: 260\nline: 284\n\nfull_path2/django/conf/global_settings.py\nline: 307\n```", "```\ndjango/core/files/storage.py\nline: 260\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\ndjango/conf/global_settings.py\nline: 307\n\ndjango/core/files/storage.py\nline: 260\n```"]

loc_output.jsonl の形式

instance_id: task ID of the issue
found_files: list of files localized by the model
additional_artifact_loc_file: raw output of the model during file-level localization
file_traj: trajectory of the model during file-level localization (e.g., # of tokens)
found_related_locs: dict of relevant code elements localized by the model
additional_artifact_loc_related: raw output of the model during relevant-code-level localization
related_loc_traj: trajectory of the model during relevant-code-level localization
found_edit_locs: dict of edit locations localized by the model
additional_artifact_loc_edit_location: raw output of the model during edit-location-level localization
edit_loc_traj: trajectory of the model during edit-location-level localization

2.4. edit locationごとに分ける --merge

行レベル変更位置サンプル edit_location_samplesをそれぞれにわける。

python agentless/fl/localize.py --merge \
                                --output_folder results/swe-bench-lite/edit_location_individual \
                                --top_n 3 \
                                --num_samples 4 \
                                --start_file results/swe-bench-lite/edit_location_samples/loc_outputs.jsonl \
                                --target_id=django__django-10914

上の例で生成したsampleは4つが、それぞれのdirに分けられる。

tree results/swe-bench-lite/edit_location_individual                                                                       
results/swe-bench-lite/edit_location_individual
├── args.json
├── loc_merged_0-0_outputs.jsonl
├── loc_merged_1-1_outputs.jsonl
├── loc_merged_2-2_outputs.jsonl
├── loc_merged_3-3_outputs.jsonl
└── localization_logs

2 directories, 5 files

3. Repair

patchの生成

issueごとに複数のpatchを生成してvotingで最終的なpatchを決める (この例ではresults/swe-bench-lite/repair_sample_1)

python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_0-0_outputs.jsonl \
                                  --output_folder results/swe-bench-lite/repair_sample_1 \
                                  --loc_interval \
                                  --top_n=3 \
                                  --context_window=10 \
                                  --max_samples 10  \
                                  --cot \
                                  --diff_format \
                                  --gen_and_process \
                                  --num_threads 2 \
                                --target_id=django__django-10914

↓以下のような感じで複数のpatchが生成される

4サンプル分残りの3つも回す

for i in {1..3}; do
    python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_${i}-${i}_outputs.jsonl \
                                    --output_folder results/swe-bench-lite/repair_sample_$((i+1)) \
                                    --loc_interval \
                                    --top_n=3 \
                                    --context_window=10 \
                                    --max_samples 10  \
                                    --cot \
                                    --diff_format \
                                    --gen_and_process \
                                    --num_threads 2 \
                                    --target_id=django__django-10914
done

repair_sample_1, repair_sample_2, repair_sample_3, repair_sample_4にそれぞれ生成されたPatchが出力される

4. Patch Validation and Selection

以下では --instance_ids=django__django-10914 を指定することで今回対象としているissueのみにを実行する

4.1. Regression Test Selection

Regression testを実行するために関連のRegression test (レポ内既存の)を選択する

python agentless/test/run_regression_tests.py --run_id generate_regression_tests \
                                              --output_file results/swe-bench-lite/passing_tests.jsonl \
                                              --instance_ids=django__django-10914

MacOSで実行したが、このステップでBuilding environment 失敗してしまった。

    raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: Error building image sweb.env.arm64.2baaea72acc974f6c02079:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (logs/build_images/env/sweb.env.arm64.2baaea72acc974f6c02079__latest/build_image.log) for more information.
Building environment images: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:27<00:00, 27.58s/it]
1 environment images failed to build.
Found 0 existing instance images. Will reuse them.
  0%|                                                                                                                                                                                                                                    | 0/1 [00:00<?, ?it/s]Error building image django__django-10914: Environment image sweb.env.arm64.2baaea72acc974f6c02079:latest not found for django__django-10914
Check (logs/run_evaluation/generate_regression_tests/test/django__django-10914/run_instance.log) for more information.

logの中身を見ると "No such image: sweb.env.arm64.2baaea72acc974f6c02079:latest" となっていた

https://github.com/swe-bench/SWE-bench これの設定をすると良さそうなので別途対応。以下は2025/01/08時点では実行出来てない。（TODO）

実行しなくていいテストをLLMで除外する

python agentless/test/select_regression_tests.py --passing_tests results/swe-bench-lite/passing_tests.jsonl \
                                                 --output_folder results/swe-bench-lite/select_regression

選択されたテストをpatchesに対して実行する

folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
    run_id_prefix=$(basename $folder); 
    python agentless/test/run_regression_tests.py --regression_tests results/swe-bench-lite/select_regression/output.jsonl \
                                                  --predictions_path="${folder}/output_${num}_processed.jsonl" \
                                                  --run_id="${run_id_prefix}_regression_${num}" --num_workers 10;
done

4.2. Reproduction test generation

reproduction testを生成して、patchが問題を解決できたかを確認する。patch生成と同様に reproduction testsも複数生成する

python agentless/test/generate_reproduction_tests.py --max_samples 40 \
                                                     --output_folder results/swe-bench-lite/reproduction_test_samples \
                                                     --num_threads 10

Execute

for st in {0..36..4}; do   en=$((st + 3));   
        echo "Processing ${st} to ${en}";   
        for num in $(seq $st $en); do     
            echo "Processing ${num}";     
            python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
                                                            --test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
                                                            --num_workers 6 \
                                                            --testing;
done & done

多数決をしてissueに対してReproduction testを一つを決める

for st in {0..36..4}; do   en=$((st + 3));   
        echo "Processing ${st} to ${en}";   
        for num in $(seq $st $en); do     
            echo "Processing ${num}";     
            python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
                                                            --test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
                                                            --num_workers 6 \
                                                            --testing;
done & done

最後に選択されたreproduction testを用いてpatchを評価する

folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
    run_id_prefix=$(basename $folder); 
    python agentless/test/run_reproduction_tests.py --test_jsonl results/swe-bench-lite/reproduction_test_samples/reproduction_tests.jsonl \
                                                    --predictions_path="${folder}/output_${num}_processed.jsonl" \
                                                    --run_id="${run_id_prefix}_reproduction_${num}" --num_workers 10;
done

4.3. Reranking and patch selection

Regression testとreproduction testの結果からpatchをrerankして選択する

python agentless/repair/rerank.py --patch_folder results/swe-bench-lite/repair_sample_1/,results/swe-bench-lite/repair_sample_2/,results/swe-bench-lite/repair_sample_3/,results/swe-bench-lite/repair_sample_4/ \
                                  --num_samples 40 \
                                  --deduplicate \
                                  --regression \
                                  --reproduction

まとめ

Agentlessの中身をコマンドベースで追いかけることで、実装までの理解にはまだほど遠いが全体像が理解できるようになる
1. Localization:
  1. LLMによる関連ファイルの特定 --file_level
  2. LLMによる無関連ファイルの特定 --irrelevant
  3. Embeddingを使った関連ファイルの特定 with 無関連ファイルフィルター retrieve.py
  4. 上記2つのマージ combine.py
  5. 関連ファイルからElement levelの関連部分の取得 --related_level
  6. 関連エレメントから行レベルの関連部分を生成（サンプル数を指定可）--fine_grain_line_level
  7. サンプルごとにファイル分割 --merge
2. Repair: ラインの位置と変更DiffからなるPatchをissueごとに複数生成する
  1. Localizationで取得した関連部分のサンプルに対してpatchsを複数生成
3. Validation and Selection:
  1. Regression test (既存のテスト)の実行
  2. issueに対応するReproduction testを生成(複数)及び実行
  3. これらの結果から最終的なpatchを決定する

TODO

https://github.com/swe-bench/SWE-bench の設定ができてないためregression testができてないので設定して更新
現状はSWEBench以外のレポに使う方法は載ってないので、任意のレポで試せるようにする

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up