Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176
Draft
caaavik-msft wants to merge 1 commit intodotnet:mainfrom
Draft
Add --run-in-docker to skill-validator to run Copilot CLI in a docker container#176caaavik-msft wants to merge 1 commit intodotnet:mainfrom
caaavik-msft wants to merge 1 commit intodotnet:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an optional Docker execution mode to
skill-validatorso agent runs, judges, and setup commands can execute in an isolated container instead of directly on the host machine.Motivation
The main use case for this is for local development, but it might also be useful for running in CI if we want to build on top of it. I was building some skills and found that when using some weaker models, they made destructive changes to my host system to accomplish the task (e.g. reinstalling .NET). With this, agents and judges run inside a container with only access to the files they need bound to the host machine. This does not add any additional security measures for network isolation.
Implementation
This makes use of the
--headlessmode for running copilot as described here: https://github.com/github/copilot-sdk/blob/main/docs/guides/setup/backend-services.md.It requires a
GITHUB_TOKENbe present to pass into the container so that it can use that to authenticate to the Copilot API. I have an example in theREADMEwhich explains that you can get this token withgh auth token. For people with multiple gh accounts (e.g. personal and enterprise), you can also dogh auth token --user <name>.A Dockerfile is included in the repo to use as the base image:
This ensures that we use the exact same Copilot CLI binary that is shipped with the SDK. The SDK version is resolved programmatically inside the SkillValidator so it is kept in sync. It places the copilot binary at
/usr/local/bin/copilotinside the container.To handle path mapping/translation, when running in docker mode, all temp/work directories are placed inside a single directory in the TMP folder, and that entire directory is mounted into the container with read-write. This makes it easy to map paths to and from the host and container equivalent when needed. Skill directories are also mounted into the container with read-only access, and only the directories that are being evaluated will be mounted.
The container uses a randomised port
-p 0:4321which is resolved later usingdocker port. The container is always cleaned up after finishing, including onProcessExitandCancelKeyPressevents.Future Extensibility
I have a proof of concept working locally which I chose not to push for now to keep this PR simple which runs all agents inside their own containers rather than having a single container that is used to run all agents and judges. This would help reduce any risks of agents modifying the environment and impacting other evaluations if that sounds desirable, but it does mean that each agent would use a separate
CopilotClientrather a single sharedCopilotClient.