Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resources for ope-notebook-culler cronjob #3

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dystewart
Copy link
Collaborator

@dystewart dystewart commented Jan 18, 2024

Addresses: nerc-project/operations#337

This PR adds:

  1. a cronjob to delete notebooks in the rhods-notebooks namespace that have been running for 48 hours straight or more.
  2. the docker resources for the cronjob
  3. updates to the README.md to include new cronjob

@DanNiESh DanNiESh self-requested a review January 19, 2024 13:17
cronjobs/ope-notebook-culler/cronjob.yaml Outdated Show resolved Hide resolved
cronjobs/ope-notebook-culler/cronjob.yaml Outdated Show resolved Hide resolved
cronjobs/ope-notebook-culler/cronjob.yaml Outdated Show resolved Hide resolved
@DanNiESh
Copy link
Collaborator

One another minor suggestion, can you add the steps to run this cronjob in the readme file?

@dystewart
Copy link
Collaborator Author

Also updated README.md

Copy link
Collaborator

@IsaiahStapleton IsaiahStapleton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
echo $(date)

for notebook in $running_notebooks; do
oc delete notebook.kubeflow.org $notebook
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment won't delete the pvcs connected to the notebooks. When the same student relaunch a notebook, will that pvc be automatically assigned to the user?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think about it, oc delete command will equally kill all notebooks no matter how long they have been running. One user may just start a notebook 1 minute before. Then the cronjob kills it immediately. That's not what we expect. Can you make the cronjob job run every 24 hours, and then only kill notebooks that are running more than 48 hours?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these assumptions are correct, the pvc does need to be deleted and we don't want to delete notebooks that have been running less than 48 hours. The script I had before checked the running time of the notebook and deleted the pvc and notebook, maybe that can be used, I will copy it below:

`
#!/bin/sh

    set_time=24 hours

    time_limit=$(($(date -u +"%s") - $(date -d '-$set_time' +%s)))

    running_notebooks=$(oc get notebook -o custom-columns=:metadata.name)

    for notebook in $running_notebooks; do

      start_time=$(oc get notebook $notebook -o jsonpath='{.status.containerState.running.startedAt}')
        
      running_time=$(($(date -u +"%s") - $(date -u -d "$start_time" +"%s")))
      
      if [ "$running_time" -gt "$time_limit" ]; then
        echo "Shutting down pod $notebook"
        oc delete notebook $notebook
        oc delete pvc $notebook
      fi 
    done

`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dystewart is it necessary to delete the whole notebook, or can the StatefulSet be scaled to 0 instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@computate If I'm not mistaken, when you scale the StatefulSet to 0 the pod is terminated but then starts back up again. That's why we delete the whole notebook because it ensures the notebook, statefulset and pod are actually deleted.

@computate
Copy link
Member

@dystewart can you please provide a description for this PR, and link it to an issue? Is this related to nerc-project/operations#337?

@dystewart
Copy link
Collaborator Author

Updated description and linked associated issue

@dystewart
Copy link
Collaborator Author

Also made requested updates to code @DanNiESh

README.md Outdated Show resolved Hide resolved
cronjobs/ope-notebook-culler/cronjob.yaml Outdated Show resolved Hide resolved
@computate
Copy link
Member

Do you think that we can adjust the PR to "shut down" the notebooks in the rhods-notebooks namespace rather than delete them as the issue nerc-project/operations#337 describes, based on the sample culler code provided in Slack?

Copy link
Member

@computate computate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanNiESh
Copy link
Collaborator

Can we please adjust this PR to only affect certain classes and not CS506? See the Ensure that notebooks in Lance Galletti's CS506 course are not deleted and have a longer uptime issue for more details.

Dylan is also working on another issue to distinguish users in rhods-notebooks to their course group. Once that is completed, we can customize this cronjob to only apply to cs210 and ece440 classes which I think could be a follow-up PR if we don't want this PR to be blocked by that issue.

@computate
Copy link
Member

computate commented Jan 23, 2024

@DanNiESh as this PR currently is configured to delete all notebooks in the rhods-notebooks namespace, including the CS506 notebooks which are required to be available all semester, I consider it a blocker if Lance Galletti is not onboard with the changes that will be applied to his notebooks in this PR.

@DanNiESh
Copy link
Collaborator

@computate Gotcha! Once Dylan updates the PR to "shutdown notebooks" rather than delete them, we'll test and verify it in ope-testing namespace. Do you think "shutdown" notebooks works for Lance Galletti? Ultimately we will modify this cronjob to only apply to cs210 and ece440 classes once this issue is resolved.

@dystewart
Copy link
Collaborator Author

@computate @DanNiESh I have updated the cronjob to utilize the code we were provided and tested in ope-rhods-testing and confirmed its working. @DanNiESh will confirm

@DanNiESh
Copy link
Collaborator

DanNiESh commented Jan 24, 2024

Looks good! I tested and verified this cronjob worked. Some bugs I got during testing the README and cronjob:

  1. I noticed that you changed the directory name to cronjobs/notebok-culler, is notebok-culler a typo?
  2. When I try to run kubectl create -n ope-rhods-testing-1fef2f job --from=cronjob/notebok-culler nb-culler, I got the error Error from server (NotFound): cronjobs.batch "notebok-culler" not found

#threshold to stop running notebooks. Currently set to 24 hours
cutoff_time=86400
current_time=$(date +%s)
notebooks=$(oc get notebooks -n ope-rhods-testing-1fef2f -o jsonpath='{range .items[?(@.status.containerState.running)]}{.metadata.name}{" "}{.metadata.namespace}{" "}{.status.containerState.running.startedAt}{"\n"}{end}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rhods-notebooks rather than ope-rhods-testing-1fef2f

#threshold to stop running notebooks. Currently set to 24 hours
cutoff_time=86400
current_time=$(date +%s)
notebooks=$(oc get notebooks -n ope-rhods-testing-1fef2f -o jsonpath='{range .items[?(@.status.containerState.running)]}{.metadata.name}{" "}{.metadata.namespace}{" "}{.status.containerState.running.startedAt}{"\n"}{end}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Not ope-rhods-testing-1fef2f

@@ -0,0 +1,6 @@
FROM quay.io/operate-first/opf-toolbox:v0.8.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Dockerfile still needed?

@@ -0,0 +1,24 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this script being used anywhere?

README.md Outdated
1. Ensure you followed the steps above
2. Verify the cronjob ope-notebook-culler exists
```
oc get cronjob ope-notebook-culler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new cronjob name is nb-culler. So the comand should be changed tooc get cronjob nb-culler

README.md Outdated

3. Run:
```
kubectl create -n rhods-notebooks job --from=cronjob/ope-notebook-culler ope-notebook-culler
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same. nb-culler

README.md Outdated
Alternatively, to run the script immediately:

1. Ensure you followed the steps above
2. Verify the cronjob ope-notebook-culler exists
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb-culler

README.md Outdated

### ope-notebook-culler

This cronjob runs once every 24 hours at 7am, removing all notebooks & pvcs from the rhods-notebooks namespace. To add resources to the rhods-notebooks namespace:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we just shutdown notebooks and pvcs

README.md Outdated
oc project rhods-notebooks
```

3. From cronjobs/ope-notebook-culler/ directory run:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory name has changed.

Also removes unused docker resources after switching cronjob logic to mirror redhat internal's
Updates README.md to reflect all these changes
@DanNiESh
Copy link
Collaborator

@dystewart Can you update the script to shutdown notebooks in specific classes and also resolve the commit conflicts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants