Failed to successfully build Docker for SWE-bench #1

Evgeneus · 2025-03-08T09:19:30Z

Hello,

Thank you for your work! I was able to run the code and I see the Agent performes actions, like running commands in bash, thinking etc.
However, I have tried to build Docker for 3 repos from SWE-bench dataset and all of them were unsuccessful - meaning the agent runs different commands (up to 100 turns) but it is not able to create a Docker file that runs without errors.

Can you please try to reproduce these behaviour? Am I doing something wrong or simply the agent cannot work correctly on those items? Thank you!

python build_agent/main.py googleapis/google-cloud-python 54ff7fa12be62b7be0d913b36c4afc208d8404ac <path>/repo2run/build_agent/

python build_agent/main.py pandas-dev/pandas e51eb9eca56f31055309d83c592cbefa4a14e42c <path>/repo2run/build_agent/

python build_agent/main.py numpy/numpy c08d2647240555e730da7580374a61d8547a932e <path>/repo2run/build_agent/

The text was updated successfully, but these errors were encountered:

kinesiatricssxilm14 · 2025-03-11T06:42:11Z

I tried your instructions, but I was also unable to successfully configure them in 100 conversations.

After analysis, I believe there are two reasons for this:

These repositories themselves have some tests that cannot be run successfully (just as SWE-bench does not require running all tests), and Repo2Run aims to run all tests, which can lead to failures.
These repositories have instances where the output logs are excessively long, and Repo2Run truncates the context directly, resulting in incomplete return information. This causes the LLM to be unable to pinpoint the root cause of the errors.

Thank you for your feedback.

We will continue to improve our design. At this stage, our tool might not be that powerful yet. 😆

Evgeneus · 2025-03-12T14:16:27Z

Thank you a lot for trying to run my examples!
Understand, thank you!

Evgeneus · 2025-03-12T14:17:34Z

I have noticed the your code asks root passwords. Can you explain why? An is it possible to do not use the root password?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to successfully build Docker for SWE-bench #1

Failed to successfully build Docker for SWE-bench #1

Evgeneus commented Mar 8, 2025

kinesiatricssxilm14 commented Mar 11, 2025

Evgeneus commented Mar 12, 2025 •

edited

Loading

Evgeneus commented Mar 12, 2025

Failed to successfully build Docker for SWE-bench #1

Failed to successfully build Docker for SWE-bench #1

Comments

Evgeneus commented Mar 8, 2025

kinesiatricssxilm14 commented Mar 11, 2025

Evgeneus commented Mar 12, 2025 • edited Loading

Evgeneus commented Mar 12, 2025

Evgeneus commented Mar 12, 2025 •

edited

Loading