-
Notifications
You must be signed in to change notification settings - Fork 685
Knowledge share: SecureDrop Continuous Integration
the problem:
- ci has been a frequent pain point in terms of flakiness, reliability, and performance
- renew collaboration with infra
current state: we are running the staging VM environment for the app and mon servers, performing a clean install (pulls in ansible code for admins), building the debian packages, and installing them in the VMs. also fetchs app-test artifacts. we are basically saying that every PR (actually, every commit!) should run it.
- do we need to run this if the ansible code didn't change? - yes, because app armor rules need to be checked
- how does it work?
- when circleci job starts, we run the vagrant-based VM setup on google cloud to make sure the VM setup used by developers still works. (there was another reason that conor mentioned that i missed)
- ignores
i18n-
branches -develop/devops/gce-nested/gce-start.sh
: script provisions the cloud vms. explicitly pinsci-nested-virt-buster-IMAGE_NUMBER
and pre-fetches the VM images so we don't waste more wall time when we runvagrant up
- should we run this on hardware?
- when do we need to run this? [focus on this] - we don't need to run this on every commit
- this is a pain during the release process
more on how this works:
-
we
rebaseontarget
to make sure it's run against the latestdevelop
branch, then run staging tests on GCE (seeci-go
script, which is a wrapper for everything:gce-start.sh
,gce-runner.sh
,gce-stop.sh
), and then destroy all oursecuredrop-ci
tagged VMs so that we can use a cron job (over in infra) to pull for VMs with this label that have been running for longer than 6 hours and destroys them so we're not charged a bunch of $ for it-
gce-runner.sh
does SSH bootstrapping, (lines 60-61) wemake build-debs-notest
andmake staging
(probably takes ~30 minutes to provision the system), then we verify the state of our provisioned VMs -
after the GCE/GCP run, there's a brief step to extract test results in a machine-readable format.
-
after test results are stoled, the environment will be torn down completely, regardless of pass or fail.
-
questions:
-
should we move away from circleci now that github has nested virtualization support?
- we should look into shaving some time off from running google cloud platform (gcp)
-
should we use circleci orbs for filtering for when a job should be run?
-
could we look at diffs and determine what should be run?
-
file and branch filtering already helps us determine what we should run. how would orbs improve this? it might be more maintainable, but perhaps not a performance improvement
-
we're still mostly interested in performance improvements so research is needed to see if it actually shaves off time with environmental setup
-
-
could someone give some background info on
circleci/
vscimg/
images?-
cimg/
images are newer circleci-maintained images, which we should be using
-
-
one area for infra to dig into?
-
we're losing a lot of time on container builds (in
.circleci/config.yml
) -
we could also combine some of these test steps, e.g. is there a reason to run
make build-debs-notest
on circleci and gcp as well? -
the debs do not have a commit hash appended - just the version.
-
we could start building debs in nightlies and then ci could pull from apt-test, like we do in securedrop workstation land. orrr just build them once in a job and share them.
-
current state: app-tests
is parallelized via --split-by=timings
in the .circleci/config.yml
(line 107)
- there might be a more sophisticated way to parallelize
- this is the only place in ci where we use the parallelism
tag (is this correct?)
- lint is taking an unexpectedly long time, probably because of environmental setup
- parallelism: 20
for translation-tests
- we need to bump this up each time we add a new language
- if a branch is prefixed with i18n-
then this ci job is run
- there is a devops/scripts
script somewhere that determines if the translation-tests is run
- instead, you can use circleci, which could save ~5 minutes to figure out if this should be run