-
Notifications
You must be signed in to change notification settings - Fork 3
Fix problem with completed observations blocking scheduling new ones #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| elif observation['state'] == 'COMPLETED': | ||
| # If the observation is already completed, set its end time so we don't block | ||
| # scheduling other observations using that end time as a reason | ||
| observation['end'] = starts_before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, starts_before is always in the future when this is being run, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, starts_before is the estimated scheduling run completion time, always in the future usually ~1-5 min or so (based on how long a run takes). ends_after is when this current scheduling run began, probably 0-1 min in the past.
jchate6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense. Thanks for updating this.
|
If I'm reading it right, calling _get_running_observations() is modifying the schedule. This is an 'unexpected consequence' or 'hidden side effect' anti-pattern; a method called get shouldn't modify the underlying data. Maybe you could change the method name or feel free to ignore me because you know the scheduler infinitely better than I do. |
markBowman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with one question.
_get_running_observations is getting the set of observations that are currently "running" based on their start/end times overlapping the current scheduling runs runtime. Modifying the observations pulled down from the API isn't modifying the actual observations in the Observation Portal (its not pushed back up) - its just a local modification because downstream we use the end time to decide when we can schedule on that resource this run, and I want to prevent us from using the end time when the observation is already complete. I understand it's a bit convoluted though - I was just trying to make the least changes possible to the scheduler to fix it so I have less risk of breaking other things. |
Yeah, I get that it's not changing the observation portal and I agree that minimum changes is good. |
This came from a problem Nikolaus reported last month: https://lcogt.slack.com/archives/C0ANPF49F/p1749233652771689.
The problem was that completed observations that finished early, such that their end time was still in the future past the current scheduling cutoff, would block other requests from getting scheduled on that resource until their scheduled end time, rather than allowing other requests to get scheduled there since it finished early. I believe this change should fix that behavior setting the internal observation end time for completed "running" observations to be the scheduling cutoff time for that run. We need to still have the completed observations in that list since that is used for other things like preventing a request that just completed from getting rescheduled that same run it completed in.