File tree 1 file changed +28
-0
lines changed
1 file changed +28
-0
lines changed Original file line number Diff line number Diff line change
1
+ # CI/CD with EC2 Runners
2
+
3
+ ## Problem
4
+
5
+ Projects run E2E tests on EC2 runners for small, medium, and large jobs. (Some
6
+ projects may have other names for such jobs, like ` smoke ` in ` training ` ).
7
+
8
+ These runners are used to get access to accelerated hardware (e.g., GPUs) to
9
+ run compute intensive processes.
10
+
11
+ Access to instances with such hardware is sometimes limited and depends on the
12
+ current demand among all EC2 users in a particular zone. This means that
13
+ sometimes requested instance types are not available, which makes jobs that
14
+ rely on these instances fail.
15
+
16
+ ## Solution
17
+
18
+ Availability depends on a particular zone. If a zone is busy, we can try
19
+ another zone.
20
+
21
+ For this, a new
22
+ [ launch-ec2-runner-with-fallback] ( https://github.com/instructlab/ci-actions/tree/main/actions/launch-ec2-runner-with-fallback )
23
+ action was implemented in ` ci-actions ` repository. If adopted, this action will
24
+ walk through AZs and try to request an instance in each AZ until it finds one.
25
+
26
+ All projects that rely on AWS EC2 runners should adopt the
27
+ ` launch-ec2-runner-with-fallback ` action in all of the jobs to avoid fluke test
28
+ failures.
You can’t perform that action at this time.
0 commit comments