Skip to content

Commit 1c38894

Browse files
committed
Require launch-ec2-runner-with-fallback use for all ec2 runners
Signed-off-by: Ihar Hrachyshka <[email protected]>
1 parent d45f91e commit 1c38894

File tree

1 file changed

+28
-0
lines changed

1 file changed

+28
-0
lines changed

docs/ci/ec2-runners.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# CI/CD with EC2 Runners
2+
3+
## Problem
4+
5+
Projects run E2E tests on EC2 runners for small, medium, and large jobs. (Some
6+
projects may have other names for such jobs, like `smoke` in `training`).
7+
8+
These runners are used to get access to accelerated hardware (e.g., GPUs) to
9+
run compute intensive processes.
10+
11+
Access to instances with such hardware is sometimes limited and depends on the
12+
current demand among all EC2 users in a particular zone. This means that
13+
sometimes requested instance types are not available, which makes jobs that
14+
rely on these instances fail.
15+
16+
## Solution
17+
18+
Availability depends on a particular zone. If a zone is busy, we can try
19+
another zone.
20+
21+
For this, a new
22+
[launch-ec2-runner-with-fallback](https://github.com/instructlab/ci-actions/tree/main/actions/launch-ec2-runner-with-fallback)
23+
action was implemented in `ci-actions` repository. If adopted, this action will
24+
walk through AZs and try to request an instance in each AZ until it finds one.
25+
26+
All projects that rely on AWS EC2 runners should adopt the
27+
`launch-ec2-runner-with-fallback` action in all of the jobs to avoid fluke test
28+
failures.

0 commit comments

Comments
 (0)