Skip to content

Commit ad4be4c

Browse files
authored
Update Readme to new composer and remove old composer deployment (#3858)
* Replace mention to new composer instance and bucket * Remove deployment to old composer
1 parent 83b492c commit ad4be4c

File tree

4 files changed

+6
-17
lines changed

4 files changed

+6
-17
lines changed

.github/workflows/deploy-airflow.yml

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,22 +30,11 @@ jobs:
3030
with:
3131
files: 'airflow/requirements.txt'
3232

33-
- name: Deploy Airflow dependencies to old Composer
34-
if: steps.changed-requirements.outputs.any_changed == 'true'
35-
run: gcloud composer environments update calitp-airflow2-prod-composer2-patch --update-pypi-packages-from-file airflow/requirements.txt --location us-west2 --project cal-itp-data-infra
36-
37-
- name: Deploy Airflow dependencies to new Composer
33+
- name: Deploy Airflow dependencies to Composer
3834
if: steps.changed-requirements.outputs.any_changed == 'true'
3935
run: gcloud composer environments update calitp-airflow2-prod-composer2-20250402 --update-pypi-packages-from-file airflow/requirements.txt --location us-west2 --project cal-itp-data-infra
4036

41-
- name: Push Airflow code to old Composer
42-
run: |
43-
gsutil -m rsync -d -c -r airflow/dags gs://$AIRFLOW_BUCKET/dags
44-
gsutil -m rsync -d -c -r airflow/plugins gs://$AIRFLOW_BUCKET/plugins
45-
env:
46-
AIRFLOW_BUCKET: "us-west2-calitp-airflow2-pr-88ca8ec6-bucket"
47-
48-
- name: Push Airflow code to new Composer
37+
- name: Push Airflow code to Composer
4938
run: |
5039
gsutil -m rsync -d -c -r airflow/dags gs://$AIRFLOW_BUCKET/dags
5140
gsutil -m rsync -d -c -r airflow/plugins gs://$AIRFLOW_BUCKET/plugins

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ happy.
6060
Generally we try to configure things via environment variables. In the Kubernetes
6161
world, these get configured via Kustomize overlays ([example](./kubernetes/apps/overlays/gtfs-rt-archiver-v3-prod/archiver-channel-vars.yaml)).
6262
For Airflow jobs, we currently use hosted Google Cloud Composer which has a
63-
[user interface](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-patch/variables)
63+
[user interface](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-20250402/variables)
6464
for editing environment variables. These environment variables also have to be
6565
injected into pod operators as needed via Gusty YAML or similar. If you are
6666
running Airflow locally, the [docker compose file](./airflow/docker-compose.yaml)

airflow/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,10 @@ docker compose run airflow tasks test unzip_and_validate_gtfs_schedule_hourly va
9393

9494
## Deploying Changes to Production
9595

96-
We have a [GitHub Action](../.github/workflows/deploy-airflow.yml) that runs when PRs touching this directory merge to the `main` branch. The GitHub Action updates the requirements sourced from [requirements.txt](./requirements.txt) and syncs the [DAGs](./dags) and [plugins](./plugins) directories to the bucket that Composer watches for code/data to parse. As of 2024-02-12, this bucket is `us-west2-calitp-airflow2-pr-88ca8ec6-bucket`.
96+
We have a [GitHub Action](../.github/workflows/deploy-airflow.yml) that runs when PRs touching this directory merge to the `main` branch. The GitHub Action updates the requirements sourced from [requirements.txt](./requirements.txt) and syncs the [DAGs](./dags) and [plugins](./plugins) directories to the bucket that Composer watches for code/data to parse. As of 2025-04-03, this bucket is `us-west2-calitp-airflow2-pr-f6bb9855-bucket`.
9797

9898
### Upgrading Airflow Itself
9999

100-
Our production Composer instance is called [calitp-airflow2-prod-composer2-patch](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-patch/monitoring); its configuration (including worker count, Airflow config overrides, and environment variables) is manually managed through the web console. When scoping upcoming upgrades to the specific Composer-managed Airflow version we use in production, it can be helpful to grab the corresponding list of requirements from the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions), copy it into `requirements-composer-[COMPOSER_VERSION_NUMBER]-airflow-[AIRFLOW_VERSION_NUMBER].txt`, change [Dockerfile.composer](./Dockerfile.composer) to reference that file (deleting the previous equivalent) and modify the `FROM` statement at the top to grab the correct Airflow and Python versions for that Composer version, and build the image locally.
100+
Our production Composer instance is called [calitp-airflow2-prod-composer2-20250402](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-20250402/monitoring); its configuration (including worker count, Airflow config overrides, and environment variables) is manually managed through the web console. When scoping upcoming upgrades to the specific Composer-managed Airflow version we use in production, it can be helpful to grab the corresponding list of requirements from the [Cloud Composer version list](https://cloud.google.com/composer/docs/concepts/versioning/composer-versions), copy it into `requirements-composer-[COMPOSER_VERSION_NUMBER]-airflow-[AIRFLOW_VERSION_NUMBER].txt`, change [Dockerfile.composer](./Dockerfile.composer) to reference that file (deleting the previous equivalent) and modify the `FROM` statement at the top to grab the correct Airflow and Python versions for that Composer version, and build the image locally.
101101

102102
It is desirable to keep our local testing image closely aligned with the production image, so the `FROM` statement in our automatically deployed [Dockerfile](./Dockerfile) should always be updated after a production Airflow upgrade reflect the same Airflow version and Python version that are being run in the Composer-managed production environment.

runbooks/workflow/deprecation-stored-files.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Occasionally, we want to assess our Google Cloud Storage buckets for outdatednes
1111

1212
3. For the non-test buckets that constitute the deprecation candidate list, the path forward relies on investigation of internal project configuration and conversation with data stakeholders. Some data may need to be retained because it is frequently accessed despite being infrequently updated (NTD data or static website assets, for instance). Some data may need to be retained rather than deleted because it represents raw data collected once that can't otherwise be recovered, or to conform with regulatory requirements, or to provide a window for future research access. Each of the following steps should be taken to determine which path to take:
1313

14-
- Search the source code of the [data-infra repository](https://github.com/cal-itp/data-infra), the [data-analyses repository](https://github.com/cal-itp/data-analyses), and the [reports repository](https://github.com/cal-itp/reports) for the name of the bucket, as well as the environment variables [set in Cloud Composer](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-patch/variables?project=cal-itp-data-infra). If you find it referenced anywhere, investigate whether the reference is in active use. For an extra step of safety, you could also search the entire Cal-ITP GitHub organization's source code via GitHub's web user interface.
14+
- Search the source code of the [data-infra repository](https://github.com/cal-itp/data-infra), the [data-analyses repository](https://github.com/cal-itp/data-analyses), and the [reports repository](https://github.com/cal-itp/reports) for the name of the bucket, as well as the environment variables [set in Cloud Composer](https://console.cloud.google.com/composer/environments/detail/us-west2/calitp-airflow2-prod-composer2-20250402/variables?project=cal-itp-data-infra). If you find it referenced anywhere, investigate whether the reference is in active use. For an extra step of safety, you could also search the entire Cal-ITP GitHub organization's source code via GitHub's web user interface.
1515
- Note: [External tables](https://cloud.google.com/bigquery/docs/external-tables) in BigQuery, created from GCS objects via [our `create_external_tables` DAG](https://b2062ffca77d44a28b4e05f8f5bf4996-dot-us-west2.composer.googleusercontent.com/dags/create_external_tables/grid) in Airflow, do not produce read or write data that shows up in the GCS request count metric we used in step one. If you find a reference to a deprecation candidate bucket within the [`create_external_tables` subfolder](https://github.com/cal-itp/data-infra/tree/main/airflow/dags/create_external_tables) of the data-infra repository, you should check [BigQuery audit logs](https://cloud.google.com/bigquery/docs/reference/auditlogs/#data_access_data_access) to see whether people are querying the external tables that rely on the deprecation candidate bucket (and if so, eliminate it from the deprecation list).
1616
- Post in `#data-warehouse-devs` and any other relevant channels in Slack (this may vary by domain; for example, if investigating a bucket related to GTFS quality, you may post in `#gtfs-quality`). Ask whether anybody knows of ongoing use of the bucket(s) in question. If there are identifiable stakeholders who aren't active in Slack, like external research partners, reach out to them directly.
1717

0 commit comments

Comments
 (0)