Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to successfully build Docker for SWE-bench #1

Open
Evgeneus opened this issue Mar 8, 2025 · 3 comments
Open

Failed to successfully build Docker for SWE-bench #1

Evgeneus opened this issue Mar 8, 2025 · 3 comments

Comments

@Evgeneus
Copy link

Evgeneus commented Mar 8, 2025

Hello,

Thank you for your work! I was able to run the code and I see the Agent performes actions, like running commands in bash, thinking etc.
However, I have tried to build Docker for 3 repos from SWE-bench dataset and all of them were unsuccessful - meaning the agent runs different commands (up to 100 turns) but it is not able to create a Docker file that runs without errors.

Can you please try to reproduce these behaviour? Am I doing something wrong or simply the agent cannot work correctly on those items? Thank you!

python build_agent/main.py googleapis/google-cloud-python 54ff7fa12be62b7be0d913b36c4afc208d8404ac <path>/repo2run/build_agent/
python build_agent/main.py pandas-dev/pandas e51eb9eca56f31055309d83c592cbefa4a14e42c <path>/repo2run/build_agent/
python build_agent/main.py numpy/numpy c08d2647240555e730da7580374a61d8547a932e <path>/repo2run/build_agent/
@kinesiatricssxilm14
Copy link
Collaborator

I tried your instructions, but I was also unable to successfully configure them in 100 conversations.

After analysis, I believe there are two reasons for this:

  1. These repositories themselves have some tests that cannot be run successfully (just as SWE-bench does not require running all tests), and Repo2Run aims to run all tests, which can lead to failures.

  2. These repositories have instances where the output logs are excessively long, and Repo2Run truncates the context directly, resulting in incomplete return information. This causes the LLM to be unable to pinpoint the root cause of the errors.

Thank you for your feedback.

We will continue to improve our design. At this stage, our tool might not be that powerful yet. 😆

@Evgeneus
Copy link
Author

Evgeneus commented Mar 12, 2025

Thank you a lot for trying to run my examples!
Understand, thank you!

@Evgeneus
Copy link
Author

I have noticed the your code asks root passwords. Can you explain why? An is it possible to do not use the root password?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants