This repository was archived by the owner on Sep 9, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 1 file changed +28
-0
lines changed Expand file tree Collapse file tree 1 file changed +28
-0
lines changed Original file line number Diff line number Diff line change 1+ # CI/CD with EC2 Runners
2+
3+ ## Problem
4+
5+ Projects run E2E tests on EC2 runners for small, medium, and large jobs. (Some
6+ projects may have other names for such jobs, like ` smoke ` in ` training ` ).
7+
8+ These runners are used to get access to accelerated hardware (e.g., GPUs) to
9+ run compute intensive processes.
10+
11+ Access to instances with such hardware is sometimes limited and depends on the
12+ current demand among all EC2 users in a particular zone. This means that
13+ sometimes requested instance types are not available, which makes jobs that
14+ rely on these instances fail.
15+
16+ ## Solution
17+
18+ Availability depends on a particular zone. If a zone is busy, we can try
19+ another zone.
20+
21+ For this, a new
22+ [ launch-ec2-runner-with-fallback] ( https://github.com/instructlab/ci-actions/tree/main/actions/launch-ec2-runner-with-fallback )
23+ action was implemented in ` ci-actions ` repository. If adopted, this action will
24+ walk through AZs and try to request an instance in each AZ until it finds one.
25+
26+ All projects that rely on AWS EC2 runners should adopt the
27+ ` launch-ec2-runner-with-fallback ` action in all of the jobs to avoid fluke test
28+ failures.
You can’t perform that action at this time.
0 commit comments