You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work! I was able to run the code and I see the Agent performes actions, like running commands in bash, thinking etc.
However, I have tried to build Docker for 3 repos from SWE-bench dataset and all of them were unsuccessful - meaning the agent runs different commands (up to 100 turns) but it is not able to create a Docker file that runs without errors.
Can you please try to reproduce these behaviour? Am I doing something wrong or simply the agent cannot work correctly on those items? Thank you!
I tried your instructions, but I was also unable to successfully configure them in 100 conversations.
After analysis, I believe there are two reasons for this:
These repositories themselves have some tests that cannot be run successfully (just as SWE-bench does not require running all tests), and Repo2Run aims to run all tests, which can lead to failures.
These repositories have instances where the output logs are excessively long, and Repo2Run truncates the context directly, resulting in incomplete return information. This causes the LLM to be unable to pinpoint the root cause of the errors.
Thank you for your feedback.
We will continue to improve our design. At this stage, our tool might not be that powerful yet. 😆
Hello,
Thank you for your work! I was able to run the code and I see the Agent performes actions, like running commands in bash, thinking etc.
However, I have tried to build Docker for 3 repos from SWE-bench dataset and all of them were unsuccessful - meaning the agent runs different commands (up to 100 turns) but it is not able to create a Docker file that runs without errors.
Can you please try to reproduce these behaviour? Am I doing something wrong or simply the agent cannot work correctly on those items? Thank you!
The text was updated successfully, but these errors were encountered: