-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add resources for ope-notebook-culler cronjob #3
base: main
Are you sure you want to change the base?
Conversation
One another minor suggestion, can you add the steps to run this cronjob in the readme file? |
Also updated README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
docker/src/ope-notebook-culler.sh
Outdated
echo $(date) | ||
|
||
for notebook in $running_notebooks; do | ||
oc delete notebook.kubeflow.org $notebook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment won't delete the pvcs connected to the notebooks. When the same student relaunch a notebook, will that pvc be automatically assigned to the user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it, oc delete
command will equally kill all notebooks no matter how long they have been running. One user may just start a notebook 1 minute before. Then the cronjob kills it immediately. That's not what we expect. Can you make the cronjob job run every 24 hours, and then only kill notebooks that are running more than 48 hours?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these assumptions are correct, the pvc does need to be deleted and we don't want to delete notebooks that have been running less than 48 hours. The script I had before checked the running time of the notebook and deleted the pvc and notebook, maybe that can be used, I will copy it below:
`
#!/bin/sh
set_time=24 hours
time_limit=$(($(date -u +"%s") - $(date -d '-$set_time' +%s)))
running_notebooks=$(oc get notebook -o custom-columns=:metadata.name)
for notebook in $running_notebooks; do
start_time=$(oc get notebook $notebook -o jsonpath='{.status.containerState.running.startedAt}')
running_time=$(($(date -u +"%s") - $(date -u -d "$start_time" +"%s")))
if [ "$running_time" -gt "$time_limit" ]; then
echo "Shutting down pod $notebook"
oc delete notebook $notebook
oc delete pvc $notebook
fi
done
`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dystewart is it necessary to delete the whole notebook, or can the StatefulSet be scaled to 0 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@computate If I'm not mistaken, when you scale the StatefulSet to 0 the pod is terminated but then starts back up again. That's why we delete the whole notebook because it ensures the notebook, statefulset and pod are actually deleted.
@dystewart can you please provide a description for this PR, and link it to an issue? Is this related to nerc-project/operations#337? |
Updated description and linked associated issue |
Also made requested updates to code @DanNiESh |
Do you think that we can adjust the PR to "shut down" the notebooks in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please adjust this PR to only affect certain classes and not CS506? See the Ensure that notebooks in Lance Galletti's CS506 course are not deleted and have a longer uptime issue for more details.
Dylan is also working on another issue to distinguish users in rhods-notebooks to their course group. Once that is completed, we can customize this cronjob to only apply to cs210 and ece440 classes which I think could be a follow-up PR if we don't want this PR to be blocked by that issue. |
@DanNiESh as this PR currently is configured to delete all notebooks in the |
@computate Gotcha! Once Dylan updates the PR to "shutdown notebooks" rather than delete them, we'll test and verify it in ope-testing namespace. Do you think "shutdown" notebooks works for Lance Galletti? Ultimately we will modify this cronjob to only apply to cs210 and ece440 classes once this issue is resolved. |
File formatting to make git happy Formatting
@computate @DanNiESh I have updated the cronjob to utilize the code we were provided and tested in ope-rhods-testing and confirmed its working. @DanNiESh will confirm |
Looks good! I tested and verified this cronjob worked. Some bugs I got during testing the README and cronjob:
|
cronjobs/notebok-culler/cronjob.yaml
Outdated
#threshold to stop running notebooks. Currently set to 24 hours | ||
cutoff_time=86400 | ||
current_time=$(date +%s) | ||
notebooks=$(oc get notebooks -n ope-rhods-testing-1fef2f -o jsonpath='{range .items[?(@.status.containerState.running)]}{.metadata.name}{" "}{.metadata.namespace}{" "}{.status.containerState.running.startedAt}{"\n"}{end}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be rhods-notebooks rather than ope-rhods-testing-1fef2f
docker/src/ope-notebook-culler.sh
Outdated
#threshold to stop running notebooks. Currently set to 24 hours | ||
cutoff_time=86400 | ||
current_time=$(date +%s) | ||
notebooks=$(oc get notebooks -n ope-rhods-testing-1fef2f -o jsonpath='{range .items[?(@.status.containerState.running)]}{.metadata.name}{" "}{.metadata.namespace}{" "}{.status.containerState.running.startedAt}{"\n"}{end}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Not ope-rhods-testing-1fef2f
docker/Dockerfile
Outdated
@@ -0,0 +1,6 @@ | |||
FROM quay.io/operate-first/opf-toolbox:v0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Dockerfile still needed?
docker/src/ope-notebook-culler.sh
Outdated
@@ -0,0 +1,24 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this script being used anywhere?
README.md
Outdated
1. Ensure you followed the steps above | ||
2. Verify the cronjob ope-notebook-culler exists | ||
``` | ||
oc get cronjob ope-notebook-culler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new cronjob name is nb-culler. So the comand should be changed tooc get cronjob nb-culler
README.md
Outdated
|
||
3. Run: | ||
``` | ||
kubectl create -n rhods-notebooks job --from=cronjob/ope-notebook-culler ope-notebook-culler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same. nb-culler
README.md
Outdated
Alternatively, to run the script immediately: | ||
|
||
1. Ensure you followed the steps above | ||
2. Verify the cronjob ope-notebook-culler exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb-culler
README.md
Outdated
|
||
### ope-notebook-culler | ||
|
||
This cronjob runs once every 24 hours at 7am, removing all notebooks & pvcs from the rhods-notebooks namespace. To add resources to the rhods-notebooks namespace: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we just shutdown notebooks and pvcs
README.md
Outdated
oc project rhods-notebooks | ||
``` | ||
|
||
3. From cronjobs/ope-notebook-culler/ directory run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory name has changed.
Also removes unused docker resources after switching cronjob logic to mirror redhat internal's Updates README.md to reflect all these changes
@dystewart Can you update the script to shutdown notebooks in specific classes and also resolve the commit conflicts? |
Addresses: nerc-project/operations#337
This PR adds: