概要
AgentlessがSWEBenchで好成績を出したというのがちょっと前にでていたので中身を確認してみた。
「Agentless」という最新手法。LLMの新しい使い方。 がとてもわかり易いのでこれを見ると中身の理解にとても役立ちます。
全体像
- Localization: LLMとEmbeddingを使って関連するファイルと位置を特定する (File-level→Element-level→Line-level)
- Repair: ラインの位置と変更DiffからなるPatchをissueごとに複数生成する
- Validation and Selection: Regression test (既存のテスト)の実行とissueに対応するReproduction testを生成(複数)し、これらの結果から最終的なpatchを決定する
ステップ
1. 準備
Repo setup
gh repo clone OpenAutoCoder/agentless
pip install -r requirements.txt
repo structure
ダウンロードしてないと各レポのStructureを生成するプログラムが走り時間がかかるので、https://github.com/OpenAutoCoder/Agentless/releases/tag/v1.5.0 からダウンロードする
各issueに対してrepo内の情報が格納されている。
Env var設定
export OPENAI_API_KEY=xxxxxx
export PROJECT_FILE_LOC=~/Downloads/repo_structure/repo_structures
export PYTHONPATH=$PYTHONPATH:$(pwd)
~/Downloads/repo_structure/repo_structures
は上でダウンロードしてunzipしたもののPath
2. Localization
今回は、一つだけ --target_id=django__django-10914
を例に実行してみる
2.1. ファイルレベル --file_level
2.1.1. ファイルレベル --file_level - 関連ファイルを取得
python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level --num_threads 10 --skip_existing --target_id=django__django-10914
プロンプト
Please look through the following GitHub problem description and Repository structure and provide a list of files that one would need to edit to fix the problem.
### GitHub Problem Description ###
Set default FILE_UPLOAD_PERMISSION to 0o644.
Description
Hello,
As far as I can see, the File Uploads documentation page does not mention any permission issues.
What I would like to see is a warning that in absence of explicitly configured FILE_UPLOAD_PERMISSIONS, the permissions for a file uploaded to FileSystemStorage might not be consistent depending on whether a MemoryUploadedFile or a TemporaryUploadedFile was used for temporary storage of the uploaded data (which, with the default FILE_UPLOAD_HANDLERS, in turn depends on the uploaded data size).
The tempfile.NamedTemporaryFile + os.rename sequence causes the resulting file permissions to be 0o0600 on some systems (I experience it here on CentOS 7.4.1708 and Python 3.6.5). In all probability, the implementation of Python's built-in tempfile module explicitly sets such permissions for temporary files due to security considerations.
I found mentions of this issue on GitHub, but did not manage to find any existing bug report in Django's bug tracker.
###
### Repository Structure ###
django/
setup.py
__init__.py
__main__.py
shortcuts.py
conf/
__init__.py
global_settings.py
...すべてのファイル
###
Please only provide the full path and return at most 5 files.
The returned files should be separated by new lines ordered by most to least important and wrapped with ```
For example:
```
file1.py
file2.py
```
結果例
```
django/core/files/storage.py
django/conf/global_settings.py
django/core/files/uploadhandler.py
django/core/files/uploadedfile.py
docs/conf.py
```
2.1.2. ファイルレベル --file_level - 関係ないファイルの特定 --irrelevant
python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level_irrelevant --num_threads 10 --skip_existing --target_id=django__django-10914 --irrelevant
プロンプト
Please look through the following GitHub problem description and Repository structure and provide a list of folders that are irrelevant to fixing the problem.
Note that irrelevant folders are those that do not need to be modified and are safe to ignored when trying to solve this problem.
### GitHub Problem Description ###
Set default FILE_UPLOAD_PERMISSION to 0o644.
Description
Hello,
As far as I can see, the File Uploads documentation page does not mention any permission issues.
What I would like to see is a warning that in absence of explicitly configured FILE_UPLOAD_PERMISSIONS, the permissions for a file uploaded to FileSystemStorage might not be consistent depending on whether a MemoryUploadedFile or a TemporaryUploadedFile was used for temporary storage of the uploaded data (which, with the default FILE_UPLOAD_HANDLERS, in turn depends on the uploaded data size).
The tempfile.NamedTemporaryFile + os.rename sequence causes the resulting file permissions to be 0o0600 on some systems (I experience it here on CentOS 7.4.1708 and Python 3.6.5). In all probability, the implementation of Python's built-in tempfile module explicitly sets such permissions for temporary files due to security considerations.
I found mentions of this issue on GitHub, but did not manage to find any existing bug report in Django's bug tracker.
###
### Repository Structure ###
django/
setup.py
__init__.py
__main__.py
shortcuts.py
conf/
__init__.py
...すべてのファイル
###
Please only provide the full path.
Remember that any subfolders will be considered as irrelevant if you provide the parent folder.
Please ensure that the provided irrelevant folders do not include any important files needed to fix the problem
The returned folders should be separated by new lines and wrapped with ```
For example:
```
folder1/
folder2/folder3/
folder4/folder5/
```
結果例
```
django/conf/locale/
django/urls/
django/middleware/
django/db/
django/forms/
django/core/management/
django/core/cache/
django/core/mail/
django/core/servers/
django/core/serializers/
django/core/checks/
django/core/handlers/
django/utils/
django/templatetags/
django/template/
django/contrib/
django/dispatch/
django/apps/
django/views/
docs/
scripts/
```
2.1.3. ファイルレベル retrieve - embeddingから関係ファイルを取得
前項でirrelevant fileをfilterしながら、Embeddingで関連ファイルを取得する
python agentless/fl/retrieve.py --index_type simple \
--filter_type given_files \
--filter_file results/swe-bench-lite/file_level_irrelevant/loc_outputs.jsonl \
--output_folder results/swe-bench-lite/retrievel_embedding \
--persist_dir embedding/swe-bench_simple \
--num_threads 10 \
--target_id=django__django-10914
2.1.4. ファイルレベル combine - LLMとembeddingで取得したファイルをマージ
LLMとEmebddingで取得した関連ファイルをそれぞれtop Nを取ってきてマージする
python agentless/fl/combine.py --retrieval_loc_file results/swe-bench-lite/retrievel_embedding/retrieve_locs.jsonl \
--model_loc_file results/swe-bench-lite/file_level/loc_outputs.jsonl \
--top_n 3 \
--output_folder results/swe-bench-lite/file_level_combined
2.2. 関連Elementレベル --related_level
前項で取得したLLMとEmbeddingから取得した関連ファイルを対象に element levelの関連を取得
python agentless/fl/localize.py --related_level \
--output_folder results/swe-bench-lite/related_elements \
--top_n 3 \
--compress_assign \
--compress \
--start_file results/swe-bench-lite/file_level_combined/combined_locs.jsonl \
--num_threads 10 \
--skip_existing \
--target_id=django__django-10914
promptは
Please provide the complete set of locations as either a class name, a function name, or a variable name.
Note that if you include a class, you do not need to list its specific methods.
You can include either the entire class or don't include the class name and instead include specific methods in the class.
### Examples:
```
full_path1/file1.py
function: my_function_1
class: MyClass1
function: MyClass2.my_method
full_path2/file2.py
variable: my_var
function: MyClass3.my_method
full_path3/file3.py
function: my_function_2
function: my_function_3
function: MyClass4.my_method_1
class: MyClass5
```
Return just the locations wrapped with ```.
実際に見つかった変更箇所のElementsの例:
{
"django/core/files/storage.py": ["class: FileSystemStorage"],
"django/conf/global_settings.py": ["variable: FILE_UPLOAD_PERMISSIONS"],
"django/core/files/uploadhandler.py": [""]
}
2.3. 関連行レベル --fine_grain_line_level
related elementを対象に 行レベルの変更位置サンプルの生成 (この例ではsampleを4つ作る)
python agentless/fl/localize.py --fine_grain_line_level \
--output_folder results/swe-bench-lite/edit_location_samples \
--top_n 3 \
--compress \
--temperature 0.8 \
--num_samples 4 \
--start_file results/swe-bench-lite/related_elements/loc_outputs.jsonl \
--num_threads 10 \
--skip_existing \
--target_id=django__django-10914
prompt (一部):
Please provide the class name, function or method name, or the exact line numbers that need to be edited.
The possible location outputs should be either \"class\", \"function\" or \"line\".
### Examples:
```
full_path1/file1.py
line: 10
class: MyClass1
line: 51
full_path2/file2.py
function: MyClass2.my_method
line: 12
full_path3/file3.py
function: my_function
line: 24
line: 156
```
Return just the location(s) wrapped with ```.
example result:
["```\ndjango/core/files/storage.py\nline: 260\nline: 217\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\nfull_path1/django/core/files/storage.py\nline: 260\nline: 284\n\nfull_path2/django/conf/global_settings.py\nline: 307\n```", "```\ndjango/core/files/storage.py\nline: 260\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\ndjango/conf/global_settings.py\nline: 307\n\ndjango/core/files/storage.py\nline: 260\n```"]
loc_output.jsonl
の形式
-
instance_id
: task ID of the issue -
found_files
: list of files localized by the model -
additional_artifact_loc_file
: raw output of the model during file-level localization -
file_traj
: trajectory of the model during file-level localization (e.g., # of tokens) -
found_related_locs
: dict of relevant code elements localized by the model -
additional_artifact_loc_related
: raw output of the model during relevant-code-level localization -
related_loc_traj
: trajectory of the model during relevant-code-level localization -
found_edit_locs
: dict of edit locations localized by the model -
additional_artifact_loc_edit_location
: raw output of the model during edit-location-level localization -
edit_loc_traj
: trajectory of the model during edit-location-level localization
2.4. edit locationごとに分ける --merge
行レベル変更位置サンプル edit_location_samplesをそれぞれにわける。
python agentless/fl/localize.py --merge \
--output_folder results/swe-bench-lite/edit_location_individual \
--top_n 3 \
--num_samples 4 \
--start_file results/swe-bench-lite/edit_location_samples/loc_outputs.jsonl \
--target_id=django__django-10914
上の例で生成したsampleは4つが、それぞれのdirに分けられる。
tree results/swe-bench-lite/edit_location_individual
results/swe-bench-lite/edit_location_individual
├── args.json
├── loc_merged_0-0_outputs.jsonl
├── loc_merged_1-1_outputs.jsonl
├── loc_merged_2-2_outputs.jsonl
├── loc_merged_3-3_outputs.jsonl
└── localization_logs
2 directories, 5 files
3. Repair
patchの生成
issueごとに複数のpatchを生成してvotingで最終的なpatchを決める (この例ではresults/swe-bench-lite/repair_sample_1
)
python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_0-0_outputs.jsonl \
--output_folder results/swe-bench-lite/repair_sample_1 \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2 \
--target_id=django__django-10914
↓以下のような感じで複数のpatchが生成される
4サンプル分残りの3つも回す
for i in {1..3}; do
python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_${i}-${i}_outputs.jsonl \
--output_folder results/swe-bench-lite/repair_sample_$((i+1)) \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2 \
--target_id=django__django-10914
done
repair_sample_1
, repair_sample_2
, repair_sample_3
, repair_sample_4
にそれぞれ生成されたPatchが出力される
4. Patch Validation and Selection
以下では --instance_ids=django__django-10914
を指定することで今回対象としているissueのみにを実行する
4.1. Regression Test Selection
Regression testを実行するために関連のRegression test (レポ内既存の)を選択する
python agentless/test/run_regression_tests.py --run_id generate_regression_tests \
--output_file results/swe-bench-lite/passing_tests.jsonl \
--instance_ids=django__django-10914
MacOSで実行したが、このステップでBuilding environment 失敗してしまった。
raise BuildImageError(image_name, str(e), logger) from e
swebench.harness.docker_build.BuildImageError: Error building image sweb.env.arm64.2baaea72acc974f6c02079:latest: The command '/bin/sh -c /bin/bash -c "source ~/.bashrc && /root/setup_env.sh"' returned a non-zero code: 1
Check (logs/build_images/env/sweb.env.arm64.2baaea72acc974f6c02079__latest/build_image.log) for more information.
Building environment images: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:27<00:00, 27.58s/it]
1 environment images failed to build.
Found 0 existing instance images. Will reuse them.
0%| | 0/1 [00:00<?, ?it/s]Error building image django__django-10914: Environment image sweb.env.arm64.2baaea72acc974f6c02079:latest not found for django__django-10914
Check (logs/run_evaluation/generate_regression_tests/test/django__django-10914/run_instance.log) for more information.
logの中身を見ると "No such image: sweb.env.arm64.2baaea72acc974f6c02079:latest"
となっていた
https://github.com/swe-bench/SWE-bench これの設定をすると良さそうなので別途対応。以下は2025/01/08時点では実行出来てない。(TODO)
実行しなくていいテストをLLMで除外する
python agentless/test/select_regression_tests.py --passing_tests results/swe-bench-lite/passing_tests.jsonl \
--output_folder results/swe-bench-lite/select_regression
選択されたテストをpatchesに対して実行する
folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder);
python agentless/test/run_regression_tests.py --regression_tests results/swe-bench-lite/select_regression/output.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_regression_${num}" --num_workers 10;
done
4.2. Reproduction test generation
reproduction testを生成して、patchが問題を解決できたかを確認する。patch生成と同様に reproduction testsも複数生成する
python agentless/test/generate_reproduction_tests.py --max_samples 40 \
--output_folder results/swe-bench-lite/reproduction_test_samples \
--num_threads 10
Execute
for st in {0..36..4}; do en=$((st + 3));
echo "Processing ${st} to ${en}";
for num in $(seq $st $en); do
echo "Processing ${num}";
python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
--test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
--num_workers 6 \
--testing;
done & done
多数決をしてissueに対してReproduction testを一つを決める
for st in {0..36..4}; do en=$((st + 3));
echo "Processing ${st} to ${en}";
for num in $(seq $st $en); do
echo "Processing ${num}";
python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
--test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
--num_workers 6 \
--testing;
done & done
最後に選択されたreproduction testを用いてpatchを評価する
folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder);
python agentless/test/run_reproduction_tests.py --test_jsonl results/swe-bench-lite/reproduction_test_samples/reproduction_tests.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_reproduction_${num}" --num_workers 10;
done
4.3. Reranking and patch selection
Regression testとreproduction testの結果からpatchをrerankして選択する
python agentless/repair/rerank.py --patch_folder results/swe-bench-lite/repair_sample_1/,results/swe-bench-lite/repair_sample_2/,results/swe-bench-lite/repair_sample_3/,results/swe-bench-lite/repair_sample_4/ \
--num_samples 40 \
--deduplicate \
--regression \
--reproduction
まとめ
- Agentlessの中身をコマンドベースで追いかけることで、実装までの理解にはまだほど遠いが全体像が理解できるようになる
-
Localization:
- LLMによる関連ファイルの特定
--file_level
- LLMによる無関連ファイルの特定
--irrelevant
- Embeddingを使った関連ファイルの特定 with 無関連ファイルフィルター
retrieve.py
- 上記2つのマージ
combine.py
- 関連ファイルからElement levelの関連部分の取得
--related_level
- 関連エレメントから行レベルの関連部分を生成(サンプル数を指定可)
--fine_grain_line_level
- サンプルごとにファイル分割
--merge
- LLMによる関連ファイルの特定
-
Repair: ラインの位置と変更DiffからなるPatchをissueごとに複数生成する
- Localizationで取得した関連部分のサンプルに対してpatchsを複数生成
-
Validation and Selection:
- Regression test (既存のテスト)の実行
- issueに対応するReproduction testを生成(複数)及び実行
- これらの結果から最終的なpatchを決定する
-
Localization:
TODO
- https://github.com/swe-bench/SWE-bench の設定ができてないためregression testができてないので設定して更新
- 現状はSWEBench以外のレポに使う方法は載ってないので、任意のレポで試せるようにする