Standalone test-controller by bloodearnest · Pull Request #1353 · opensafely-core/job-runner

bloodearnest · 2026-03-05T11:34:49Z

Add a way to run a simple test controller

Runs both webapi and service, with a simple cli interface, in order to make it
easy todo integration testing in backend-server and other places.

The main points of interest:

it runs both webapi and controller in the same container
it ships the test-controller script in the docker image so it can be easily used elsewhere
we use it in job-runner's own functional tests, so we are sure it's working

rebkwok

Mostly I think I'm in favour of this. It certainly makes the functional test set up, and will make backend server setup simpler. Added some comments about additional documentation.

This means that we can run a real job on a real backend, right? So in TPP for example, we'd run the test-controller with backend tpp and the real agent's token. Then we'd add a job on the controller. How does the real agent pick it up? It's running pointing at the real controller endpoint. In the functional tests, we run a test agent too, which is pointing at the right controller endpoint. Or is there a step missing in the docs and to run on a real backend you'd also run a test agent?

If it is running with the real agent, how do we make sure that it isn't sending job data that job-server is going to pick up and complain about? (Maybe I've misunderstood, and this is intended to be separate from the production agent)

I also can't get the functional test to run locally, with this branch I get a connection error:

...
test-agent-1  | During handling of the above exception, another exception occurred:
test-agent-1  | 
test-agent-1  | Traceback (most recent call last):
test-agent-1  |   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
test-agent-1  |     return _run_code(code, main_globals, None,
test-agent-1  |   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
test-agent-1  |     exec(code, run_globals)
test-agent-1  |   File "/app/agent/service.py", line 42, in <module>
test-agent-1  |     main()
test-agent-1  |   File "/app/agent/service.py", line 36, in main
test-agent-1  |     agent_main()
test-agent-1  |   File "/app/agent/main.py", line 29, in main
test-agent-1  |     active_tasks = handle_tasks(api)
test-agent-1  |   File "/app/agent/main.py", line 38, in handle_tasks
test-agent-1  |     active_tasks = task_api.get_active_tasks()
test-agent-1  |   File "/app/agent/task_api.py", line 40, in get_active_tasks
test-agent-1  |     agent_tasks = get_json("tasks/")["tasks"]
test-agent-1  |   File "/app/agent/task_api.py", line 17, in get_json
test-agent-1  |     return request_json("GET", path)
test-agent-1  |   File "/app/agent/task_api.py", line 29, in request_json
test-agent-1  |     response = session.request(method, url, data=data, headers=headers)
test-agent-1  |   File "/opt/venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
test-agent-1  |     resp = self.send(prep, **send_kwargs)
test-agent-1  |   File "/opt/venv/lib/python3.10/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 388, in instrumented_send
test-agent-1  |     raise exception.with_traceback(exception.__traceback__)
test-agent-1  |   File "/opt/venv/lib/python3.10/site-packages/opentelemetry/instrumentation/requests/__init__.py", line 323, in instrumented_send
test-agent-1  |     result = wrapped_send(
test-agent-1  |   File "/opt/venv/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
test-agent-1  |     r = adapter.send(request, **kwargs)
test-agent-1  |   File "/opt/venv/lib/python3.10/site-packages/requests/adapters.py", line 677, in send
test-agent-1  |     raise ConnectionError(e, request=request)
test-agent-1  | requests.exceptions.ConnectionError: HTTPConnectionPool(host='test-controller', port=8000): Max retries exceeded with url: /test/tasks/ (Caused by NewConnectionError("HTTPConnection(host='test-controller', port=8000): Failed to establish a new connection: [Errno 111] Connection refused"))
test-controller-1  | 2026-03-09 09:36:16.386Z created new db at /tmp/test-controller-workdir/db.sqlite 
test-controller-1  | Dummy controller started with:
test-controller-1  | BACKEND=test
test-controller-1  | CONTROLLER_TASK_API_TOKEN=pass
test-controller-1  | CONTROLLER_TASK_API_ENDPOINT=http://localhost:8000/
test-controller-1  | [2026-03-09 09:36:16 +0000] [7] [INFO] Starting gunicorn 25.1.0
test-controller-1  | [2026-03-09 09:36:16 +0000] [7] [INFO] Listening at: http://0.0.0.0:8000 (7)
test-controller-1  | [2026-03-09 09:36:16 +0000] [7] [INFO] Using worker: sync
test-controller-1  | [2026-03-09 09:36:16 +0000] [32] [INFO] Booting worker with pid: 32
test-controller-1  | [2026-03-09 09:36:16 +0000] [7] [ERROR] Control server error: [Errno 13] Permission denied
test-controller-1  | [2026-03-09 09:36:16 +0000] [32] [INFO] Worker spawned (pid: 32)
test-controller-1  | [2026-03-09 09:36:16 +0000] [34] [INFO] Booting worker with pid: 34
test-controller-1  | [2026-03-09 09:36:16 +0000] [34] [INFO] Worker spawned (pid: 34)
test-controller-1  | [2026-03-09 09:36:16 +0000] [37] [INFO] Booting worker with pid: 37
test-controller-1  | [2026-03-09 09:36:16 +0000] [37] [INFO] Worker spawned (pid: 37)
test-controller-1  | [2026-03-09 09:36:16 +0000] [39] [INFO] Booting worker with pid: 39
test-controller-1  | [2026-03-09 09:36:16 +0000] [39] [INFO] Worker spawned (pid: 39)
test-controller-1  | [2026-03-09 09:36:16 +0000] [41] [INFO] Booting worker with pid: 41
test-controller-1  | [2026-03-09 09:36:16 +0000] [41] [INFO] Worker spawned (pid: 41)
test-controller-1  | 2026-03-09 09:36:16.763Z ctrl controller.service started 
test-controller-1  | 2026-03-09 09:36:16.763Z ctrl controller.service ticks disabled 
test-controller-1  | 2026-03-09 09:36:16.763Z ctrl jobrunner.run loop started 
test-controller-1  | 2026-03-09 09:36:21.051Z ctrl Job executing on the backend status=StatusCode.INITIATED workspace=test action=generate_dataset id=oj4cwy6xliqbgdpp
error: Recipe `functional-test` failed with exit code 1

rebkwok · 2026-03-09T09:44:47Z

DEVELOPERS.md

+
+The production image provides a `test-controller` command for running a dummy
+controller (web API + controller loop) with ephemeral local state. It is used
+in this repo's functional tests, but also can be used by other code than need


Suggested change

in this repo's functional tests, but also can be used by other code than need

in this repo's functional tests, but also can be used by other code that needs

rebkwok · 2026-03-09T09:46:01Z

DEVELOPERS.md

+The command requires exactly two arguments:
+
+1. backend name
+2. token value


We have various tokens in env variables, might be worth specifying which one we need here

rebkwok · 2026-03-09T09:58:27Z

docker/scripts/test-controller.sh

@@ -0,0 +1,56 @@
+#!/usr/bin/env bash


It'd be good to add some more documentation into this script itself (but I know this is still draft so maybe that was a plan for later, if we decide to go ahead with it). Thinking of similar scripts that allow running jobs in real backends, like the ehrql debug one

Totally worth adding some more documentation, will do.

The primary purpose of this script is not to run it in real production backend. It's to run it in test environments, like the backend-server tests, and testing any AMI images we build actually work. When we moved to controller/agent split, the lack of ability to run a test job easily meant that the backend-server tests were significantly weakened. This has caused me issues with NIoT, as config mistakes that the tests would have caught if they'd run a job had to be hit in the real NIoT environment. And I consider it essential for reliably publishing AMI images.

The only scenario I anticipate it being useful in a real backend is when we're:
a) manually bootstrapping something (i.e. not AMI image based deployment)
b) we don't have an external controller to talk to (staging or real).

NIoT's deployment happens to hit that situation, but I don't expect EMIS or any other backend to.

OK, this makes sense. Maybe just some words about that in the docs then? The fact that the script is included in the image implies that it might be used in production.

Oh, also, when I run the functional test locally, the test-controller container doesn't get removed. Should it?

Yeah, will add some words. Something along the lines that test-controller script is shipped in the prod image so can be used wholesale, but is not intended to be used in production.

If we really wanted to, I guess we could publish an alternate image with the script, but I don't think its worth it?

I'll look into the left over container, I hadn't spotted that, thanks.

The test script can be used as docker CMD, and takes a backend name and token, and then sets up an ephemeral controller that you can point a local agent to. It runs both the webapp and the service, using the same run-one-constantly technique we use in airlock. It overrides a bunch of configuration to make it a simpler and ephemeral controller. You can then use cli.add_job as normal to add a job to the controller.

This simplifies our docker-compose set up, and runs a little faster too. The motivation for test-controller script was be easier to run a test controller setup in other projects (e.g. backend-server, or manually testing backends), so it's important that job-runner also uses it, so that we can detect breakages and fix them at source.

Allows standalone use

This makes it easy to work with ephemeral test backends. You can still override with <BACKEND>_MAX_{DB_}_WORKERS as normal. I'm not 100% sure we should have per-backend default workers, tbh, I think it might be better to require explicit config. The failure stat would be to still be able to run jobs, but not as many. However, this PR is about test controller, so I'm punting on that.

evansd

This seems reasonable to me too. I was worried this might end up being more invasive but the actual prod changes are pretty tiny.

bloodearnest force-pushed the test-controller-standalone branch 2 times, most recently from c98b87f to 46eb5fd Compare March 5, 2026 14:18

bloodearnest marked this pull request as draft March 6, 2026 13:39

rebkwok reviewed Mar 9, 2026

View reviewed changes

bloodearnest added 5 commits March 9, 2026 10:26

Allow the controller tick thread to be disabled

070235b

Add documentation for new test-controller command

1411769

Allows standalone use

bloodearnest force-pushed the test-controller-standalone branch from 46eb5fd to 6896112 Compare March 9, 2026 10:26

evansd approved these changes Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone test-controller#1353

Standalone test-controller#1353
bloodearnest wants to merge 5 commits intomainfrom
test-controller-standalone

bloodearnest commented Mar 5, 2026 •

edited

Loading

Uh oh!

rebkwok left a comment

Uh oh!

rebkwok Mar 9, 2026

Uh oh!

rebkwok Mar 9, 2026

Uh oh!

rebkwok Mar 9, 2026

Uh oh!

bloodearnest Mar 9, 2026

Uh oh!

rebkwok Mar 9, 2026

Uh oh!

rebkwok Mar 9, 2026

Uh oh!

bloodearnest Mar 9, 2026

Uh oh!

evansd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	in this repo's functional tests, but also can be used by other code than need
	in this repo's functional tests, but also can be used by other code that needs

Conversation

bloodearnest commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rebkwok left a comment

Choose a reason for hiding this comment

Uh oh!

rebkwok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

rebkwok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

rebkwok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

bloodearnest Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

rebkwok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

rebkwok Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

bloodearnest Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

evansd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bloodearnest commented Mar 5, 2026 •

edited

Loading