Skip to content

Conversation

@jnation3406
Copy link
Member

This came from a problem Nikolaus reported last month: https://lcogt.slack.com/archives/C0ANPF49F/p1749233652771689.

The problem was that completed observations that finished early, such that their end time was still in the future past the current scheduling cutoff, would block other requests from getting scheduled on that resource until their scheduled end time, rather than allowing other requests to get scheduled there since it finished early. I believe this change should fix that behavior setting the internal observation end time for completed "running" observations to be the scheduling cutoff time for that run. We need to still have the completed observations in that list since that is used for other things like preventing a request that just completed from getting rescheduled that same run it completed in.

elif observation['state'] == 'COMPLETED':
# If the observation is already completed, set its end time so we don't block
# scheduling other observations using that end time as a reason
observation['end'] = starts_before
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, starts_before is always in the future when this is being run, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, starts_before is the estimated scheduling run completion time, always in the future usually ~1-5 min or so (based on how long a run takes). ends_after is when this current scheduling run began, probably 0-1 min in the past.

Copy link

@jchate6 jchate6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Thanks for updating this.

@markBowman
Copy link
Contributor

If I'm reading it right, calling _get_running_observations() is modifying the schedule. This is an 'unexpected consequence' or 'hidden side effect' anti-pattern; a method called get shouldn't modify the underlying data. Maybe you could change the method name or feel free to ignore me because you know the scheduler infinitely better than I do.

Copy link
Contributor

@markBowman markBowman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with one question.

@jnation3406
Copy link
Member Author

If I'm reading it right, calling _get_running_observations() is modifying the schedule. This is an 'unexpected consequence' or 'hidden side effect' anti-pattern; a method called get shouldn't modify the underlying data. Maybe you could change the method name or feel free to ignore me because you know the scheduler infinitely better than I do.

_get_running_observations is getting the set of observations that are currently "running" based on their start/end times overlapping the current scheduling runs runtime. Modifying the observations pulled down from the API isn't modifying the actual observations in the Observation Portal (its not pushed back up) - its just a local modification because downstream we use the end time to decide when we can schedule on that resource this run, and I want to prevent us from using the end time when the observation is already complete. I understand it's a bit convoluted though - I was just trying to make the least changes possible to the scheduler to fix it so I have less risk of breaking other things.

@markBowman
Copy link
Contributor

If I'm reading it right, calling _get_running_observations() is modifying the schedule. This is an 'unexpected consequence' or 'hidden side effect' anti-pattern; a method called get shouldn't modify the underlying data. Maybe you could change the method name or feel free to ignore me because you know the scheduler infinitely better than I do.

_get_running_observations is getting the set of observations that are currently "running" based on their start/end times overlapping the current scheduling runs runtime. Modifying the observations pulled down from the API isn't modifying the actual observations in the Observation Portal (its not pushed back up) - its just a local modification because downstream we use the end time to decide when we can schedule on that resource this run, and I want to prevent us from using the end time when the observation is already complete. I understand it's a bit convoluted though - I was just trying to make the least changes possible to the scheduler to fix it so I have less risk of breaking other things.

Yeah, I get that it's not changing the observation portal and I agree that minimum changes is good.
I approved the PR as it stands. - Future-you is probably less prone to confusion than future-me.

@jnation3406 jnation3406 merged commit 4cd235c into main Jul 22, 2025
11 checks passed
@jnation3406 jnation3406 deleted the fix/adjust_scheduling_cutoff branch July 22, 2025 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants