-
Notifications
You must be signed in to change notification settings - Fork 126
APP-10064: Add "continuous" scheduling option #5495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ee7b1e8 to
b3e23c6
Compare
robot/client/client.go
Outdated
| } | ||
|
|
||
| // JobManager returns nil. | ||
| func (rc *RobotClient) JobManager() *jobmanager.JobManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheukt can we brainstorm on ways to quickly get rid of this Robot interface type? Does localRobot really have to be private?
These kinds of code changes slow down work and are simply harmful/sharp edges for consumers of our code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think localRobot has to be private, no.
b3e23c6 to
d852c24
Compare
d852c24 to
7be7108
Compare
7be7108 to
7fa2dcd
Compare
7fa2dcd to
cb31787
Compare
cb31787 to
04d6842
Compare
76175dd to
3892a3a
Compare
| protoConfig.LogConfiguration = &pb.LogConfiguration{Level: strings.ToLower(jc.LogConfiguration.Level.String())} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
App will need an RDK bump (containing API bump) after this PR is merged for the web editor to support LogConfiguration.
3892a3a to
2c3223d
Compare
| func (jm *JobManager) createJobFunction(jc config.JobConfig, continuous bool) func(ctx context.Context) error { | ||
| jobLogger := jm.logger.Sublogger(jc.Name) | ||
| if jc.LogConfiguration != nil { | ||
| jobLogger.SetLevel(jc.LogConfiguration.Level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test changes to the log level?
b3e51bd to
ab3179a
Compare
ab3179a to
9659108
Compare
9659108 to
0d96311
Compare
…remove JobManager() from robot interface
0d96311 to
609ecbd
Compare
benjirewis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Haven't looked at tests yet, but had some initial comments/questions.
| Resource string `json:"resource"` | ||
| Method string `json:"method"` | ||
| Command map[string]any `json:"command,omitempty"` | ||
| LogConfiguration *resource.LogConfig `json:"log_configuration,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since LogConfig's are now shared between jobs and resources, maybe resource is no longer the correct package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are tests associated with this file; worth adding a test there for roundtripping.
robot/jobmanager/jobmanager.go
Outdated
| VerbosityLevel: 0, | ||
| } | ||
| jobLogger.CInfo(jm.ctx, "Job triggered") | ||
| // Job added for scheduling; may not immediately run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a fair change in messaging, but did you change it because you noticed a gap between the emission of this log line and the job actually running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. If you run a job that sleeps for 10 seconds every 1s, you'd previously get "Job triggered" messages every second even though the extra jobs will either be skipped (in cron mode) or queued (in duration mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this may not be correct...It's something I noticed early on while starting to learn the JobManager, but I may have just misread it. I can't repro on this branch. I'll investigate again on the current main and revert to Job triggered if I can't repro there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you run a job that sleeps for 10 seconds every 1s, you'd previously get "Job triggered" messages every second even though the extra jobs will either be skipped (in cron mode) or queued (in duration mode).
Actually, this may not be correct...
What behavior are you seeing for that setup you described? I'm just curious what actually happens. Does gocron actually start a new job every 1s even though the last one is still sleeping? Is that different for cron/duration mode? I regret that I never fully grokked what happened in these cases with the original implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(replied earlier in other comment; moving here)
My earlier analysis is off--I was using fake sensor's GetReadings as my test job which has a mutex, so it looked like "Job triggered" would sometimes be queued. I'll change it back to "Job triggered", but there is a related corner case here which I'm fixing now:
I've updated GetReadings to only sleep 10s and return (removed the mutex).
I've scheduled the job to run at duration 1000ms, and later updated to 1001ms.
It starts returning every 10s as expected (at second 21, 31, 41), with the blocked runs getting queued by gocron.
When I update the schedule at second 44 while a job is currently running, it immediately launches the next one. So we briefly have 2 instances of the job running that complete at seconds 41 and 45. Then it shifts to running at 55, 05, 15...
This is because when we modify an old job, we completely remove the old one from gocron (which doesn't cancel the running instance) and create a new one, but gocron also supports updating an existing job with jm.scheduler.Update instead of NewJob.
2025-11-21T15:04:21.439Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:04:21.440Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:04:31.449Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:04:31.451Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:04:41.460Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:04:41.463Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:04:44.047Z INFO rdk.job_manager jobmanager/jobmanager.go:317 Job modified {"name":"sensor-reading"}
2025-11-21T15:04:45.050Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:04:51.474Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:04:55.057Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:04:55.058Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:05:05.067Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:05:05.068Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:05:15.084Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
2025-11-21T15:05:15.084Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered
2025-11-21T15:05:25.089Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha; discussed a bit offline (I think this is what you were referring to in the Slack thread); that's an interesting case of 2 instances of the job running at the same time. Is the issue there just that time.Sleep(10 * time.Second) won't examine the context to see that it's been canceled, so an old job can stick around? If that's the case, there's probably not much we can do there, as the only way to really "stop" a goroutine is to cancel its context and cross your fingers that it has code to look at ctx.Err() or ctx.Done() and exit (or, of course, stop your entire program).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the issue we're discussing on Slack, and also connected to your jobFunc := func(_ context.Context) question: if we were to use a ctx here, then the running job could be cancelled when we modify it, but because we only use jm.ctx throughout jobFunc, it only cancels the running job if JobManager is shuting down. Now the new question is when we update a JobConfig (schedule, log_configuration, etc.), do we want to cancel the previously running job or let it finish?
538887a to
79a5f55
Compare
I recommend reviewing by single commit. Commits from #5493 will disappear from this diff once merged.
"schedule": "continuous",that will run as fast as your system allows, and does not cause unnecessary stress on gocron.log_configuration: {"level": "..."}for each job.