Make scheduler jobs configurable#503
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3099/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/68/NOTEBOOK TEST RESULTS |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3107/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/74/NOTEBOOK TEST RESULTS |
tlvu
left a comment
There was a problem hiding this comment.
Very nice PR. I spotted some suspicious change below.
Have you tried enabling all the 6 new .env file and see the cron daemon fire the jobs according to schedule (ie, no syntax error with the generation of the jobs)?
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3109/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/76/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3110/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/77/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3112/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/79/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3170/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/124/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3180/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : configurable-crontab DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca
|
|
Can you please let me know if you're ok with this PR so that we can merge it since we have a few others that are dependent on this one. If you're not going to have time to review it more then can we please pull it in and we can make additional changes later if needed. I don't think this will affect PAVICS since you're still running version 1.42.1 |
tlvu
left a comment
There was a problem hiding this comment.
Very sorry for the delay. I took a few days off the past 2 weeks.
The only thing I didn't like was the job repetition for all "deploy-data" jobs (deploy xclim and raven testdata). These are just example usage of "deploy-data" style of jobs.
In-house we have a ton of other "deploy-data" jobs and it's much more convenient to update the jobs to match any deploy-data code changes when all the jobs are generated from the same template instead of having to separately update 10+ separate jobs in 6-7 different external repos. Having one common template will make job update completely transparent for all external jobs in all external repos.
I agree this is not show stopper. I will send a separate PR for this. I was hoping I could send this PR sooner but all the back-compat config vars problems prevented me from really testing autodeploy so my attention was to fix autodeploy first.
I quickly skimmed over the PR again and there some env var removed I am not sure, I probably missed during my first review. The rest is fine.
| environment: | ||
| COMPOSE_DIR: ${PWD} | ||
| BIRDHOUSE_AUTODEPLOY_DEPLOY_KEY_ROOT_DIR: ${BIRDHOUSE_AUTODEPLOY_DEPLOY_KEY_ROOT_DIR} | ||
| CODE_OWNERSHIP: ${BIRDHOUSE_AUTODEPLOY_CODE_OWNERSHIP:-} |
There was a problem hiding this comment.
Is it normal all the env var are removed? I can't remember at this moment, but is it possible one of the jobs that can be enabled on-demand use those?
There was a problem hiding this comment.
They're added in templated files so they're not needed as environment variables in the scheduler container directly.
@mishaschwartz |
|
@mishaschwartz I've merged my other fix back-compat branch with in But I am not getting the existing jobs back in the I found this: Basically all the new job It's as if all the new I am in my back-compat config so maybe it's due to this. Is there something special/additional to do in the case of back-compat? |
|
It could be related to this (#508 (comment)). Did you bring the stack down before you migrated to the new version? Is this code not working for you? That addition was meant to avoid any situation where empty directories were created by docker in place of templated files. |
Did not bring the entire stack down, but I did delete the |
That comment says stop docker daemon, then delete instanciated template file, then restart docker deamon. But I did not start/stop the docker daemon and those problematic files are all the new jobs |
|
Ok, there's nothing special about those files, they're just template files like we have elsewhere in the stack. I'm not really sure what would be different about those ones but in general the code I mentioned here (#503 (comment)) in birdhouse-compose.sh should handle that case where directories are created. If you figure out why that isn't working as intended please let me know so we can add a fix. In the meantime, I'd recommend always bringing the stack down before you update the source code just so you know that you're starting fresh and docker doesn't get put in an unexpected state. |
There was a power outage yesterday so I restarted clean with
I know, this is so weird. Nothing in the PR looks like it would trigger these weird behavior. Am I the only person with all of these enabled at the same time? Maybe it's the combination of all of these that trigger this behavior? |
|
What is in the non-empty directories?
I have everything enabled too |
|
|
So the issue is that it's owned by root? |
Oh true ... but that's not supposed to happen since I have That's probably because it was created by the volume-mount because the instantiated template file is not there. And the instantiated was not there because the parent is also owned by root. Weird permission problem. I can now start the stack finally. |
…n and as a scheduler job) (#532) ## Overview ### Changes - Add `backup` command in `bin/birdhouse` to backup and restore data to a restic repository This allows users to backup and restore: - application data, user data, and log data for all components - birdhouse logs - docker container logs - local environement file Restoring data either involves restoring it to a named volume (determined by `BIRDHOUSE_BACKUP_VOLUME`) or in the case of user data and application data, to overwrite the current data with the backup. For full details run the `bin/birdhouse backup --help` command. Backups are stored in a [restic](https://restic.readthedocs.io/en/stable/) repository which can be configured by creating a file at `BIRDHOUSE_BACKUP_RESTIC_ENV_FILE` (default: `birdhouse/restic.env`) which contains the [environment variables](https://restic.readthedocs.io/en/stable/040_backup.html#environment-variables) necessary for restic to create, and access a repository (see `birdhouse/restic.env.example` for details). The backup and restore commands can be further customized by setting any of the following variables: - `BIRDHOUSE_BACKUP_SSH_KEY_DIR`: - The location of a directory that contains an SSH key used to access a remote machine where the restic repository is hosted. Required if accessing a restic repository using the sftp protocol. - `BIRDHOUSE_BACKUP_RESTIC_BACKUP_ARGS`: - Additional options to pass to the `restic backup` command when running the `birdhouse backup create` command. For example: `BIRDHOUSE_BACKUP_RESTIC_BACKUP_ARGS='--skip-if-unchanged --exclude-file "file-i-do-not-want-backedup.py"` - `BIRDHOUSE_BACKUP_RESTIC_FORGET_ARGS`: - Additional options to pass to the `restic forget` command after running the backup job. This allows you to ensure that restic deletes old backups according to your backup retention policy. If this is set, then restic will also run the `restic prune` command after every backup to clean up old backup files. For example, to store backups daily for 1 week, weekly for 1 month, and monthly for a year: `BIRDHOUSE_BACKUP_RESTIC_FORGET_ARGS='--keep-daily=7 --keep-weekly=4 --keep-monthly=12'` - Add scheduler job to automatically backup data Create a new scheduler job at `optional-components/scheduler-job-backup` which runs the `bin/birdhouse backup create` command at regular intervals to ensure that the birdhouse stack's data is regularly backed up. To configure this job you may set the following variables: - `SCHEDULER_JOB_BACKUP_FREQUENCY`: - Cron schedule when to run this scheduler job (default is `'1 1 * * *'`, at 1:01 am daily) - `SCHEDULER_JOB_BACKUP_ARGS`: - Extra arguments to pass to the 'bin/birdhouse backup create' command when backing up data. By default this backs up everything (default is `'-a \* -u \* -l \* --birdhouse-logs --local-env-file'`) - Add `configs --print-log-command` option in `bin/birdhouse` This allows users to print a command that can be used to load the birdhouse logging functions in the current process. This is very similar to the `bin/birdhouse configs --print-config-command` except that it only loads the logging functions. Example usage: ```sh eval $(bin/birdhouse configs --print-log-command) log INFO 'here is an example log message' ``` It is important to have a distinct option to just load the log commands because functions are not inherited by subprocesses which means that if you do something like: ```sh eval $(bin/birdhouse configs --print-config-command) log INFO 'this one works' sh -c 'log ERROR "this one does not"' ``` the log command in the subprocess does not work. We would have to re-run the `eval` in the subprocess which would unnecessarily redefine all the existing configuration variables. Instead we can now do this: ```sh eval $(bin/birdhouse configs --print-log-command) log INFO 'this one works' sh -c 'eval $(bin/birdhouse configs --print-log-command); log INFO "this one does work now"' ``` which is much quicker and does not require redefining all configuration variables. Note: this was introduced as a helper for the `bin/birdhouse backup` commands but was made part of the public interface because it is potentially very useful for other scripts that want to use the birdhouse logging mechanism. For example, the `components/weaver/post-docker-compose-up` script defines its own logging functions which could now be easily replaced using this method. - Add `BIRDHOUSE_COMPOSE_TEMPLATE_SKIP` environment variable to explicitly skip rebuilding template files if `true` This gives us the option to skip re-building template files even if the command to `bin/birdhouse compose` is `up` or `restart`. This is essentially the opposite of `BIRDHOUSE_COMPOSE_TEMPLATE_FORCE`. This option is necessary when running a command while the birdhouse stack is already running and we don't want to change the template files for the running stack. ### Fixes - Replace non-portable `sed -z` option The `birdhouse/scripts/get-services-json.include.sh` script includes the `sed` command using the `-z` flag. The `-z` flag is non-standard and is not supported by several well-used versions of `sed`. This became apparent when this script is run by the `optional-components/scheduler-job-backup` job which runs in an alpine based docker container. ## Changes **Non-breaking changes** - Adds several new command line options - Adds a new scheduler job - Minor bug fixes and improvements **Breaking changes** - None ## Related Issue / Discussion - This depends on #503 ## Additional Information Links to other issues or sources. ## CI Operations <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci`` set to ``true`` in the PR description. Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the commit message will override ``birdhouse_skip_ci`` from the PR description. Such commit command can be used to override the PR description behavior for a specific commit update. However, a commit message cannot 'force run' a PR which the description turns off the CI. To run the CI, the PR should instead be updated with a ``true`` value, and a running message can be posted in following PR comments to trigger tests once again. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
Overview
The scheduler component automatically enables three jobs (autodeploy, logrotate, notebookdeploy). If someone wants to use the scheduler component but does not want these jobs, there is no obvious way to disable any one of these jobs.
This change makes it possible to enable/disable jobs as required by the user and adds documentation to explain how to do this.
This change also converts existing jobs to be optional components. This makes the jobs more in-line with the way the stack is deployed (since version 1.24.0) and ensures that settings set as environment variables in the local environment file are not so sensitive to the order that they were declared in.
Breaking Change:
optional-componentssubdirectory.Deprecations
BIRDHOUSE_AUTODEPLOY_EXTRA_SCHEDULER_JOBSvariable. Users should create additional jobs by adding them as custom components instead.What about... ?
'#'string?However, this is not obvious to the user and is unreliable since it is not documented.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
I would really like to use the scheduler for other things but I do not want to have to enable the autodeploy mechanism.
It seems like the scheduler component was designed with the autodeploy mechanism in mind since that is added by default but it could be much more useful if it was more configurable.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false