APP-10064: Add "continuous" scheduling option #5495

aldenh-viam · 2025-11-18T21:45:36Z

I recommend reviewing by single commit. Commits from #5493 will disappear from this diff once merged.

Add support for "schedule": "continuous", that will run as fast as your system allows, and does not cause unnecessary stress on gocron.
- Changing from non-continuous to continuous schedule and vice-versa is supported without needing to restart.
Lowered "Job Added" and "Job succeeded" log messages to debug level, and include job name.
Support log_configuration: {"level": "..."} for each job.

dgottlieb · 2025-11-19T14:02:23Z

robot/client/client.go

 }

+// JobManager returns nil.
+func (rc *RobotClient) JobManager() *jobmanager.JobManager {


@cheukt can we brainstorm on ways to quickly get rid of this Robot interface type? Does localRobot really have to be private?

These kinds of code changes slow down work and are simply harmful/sharp edges for consumers of our code.

I don't think localRobot has to be private, no.

aldenh-viam · 2025-11-19T21:22:04Z

config/proto_conversions.go

+		protoConfig.LogConfiguration = &pb.LogConfiguration{Level: strings.ToLower(jc.LogConfiguration.Level.String())}
+	}


App will need an RDK bump (containing API bump) after this PR is merged for the web editor to support LogConfiguration.

robot/impl/jobmanager_test.go

cheukt · 2025-11-20T16:47:17Z

robot/jobmanager/jobmanager.go

+func (jm *JobManager) createJobFunction(jc config.JobConfig, continuous bool) func(ctx context.Context) error {
 	jobLogger := jm.logger.Sublogger(jc.Name)
+	if jc.LogConfiguration != nil {
+		jobLogger.SetLevel(jc.LogConfiguration.Level)


can we test changes to the log level?

…remove JobManager() from robot interface

…dded

benjirewis

Nice! Haven't looked at tests yet, but had some initial comments/questions.

benjirewis · 2025-11-20T19:42:35Z

config/config.go

+	Resource         string              `json:"resource"`
+	Method           string              `json:"method"`
+	Command          map[string]any      `json:"command,omitempty"`
+	LogConfiguration *resource.LogConfig `json:"log_configuration,omitempty"`


Since LogConfig's are now shared between jobs and resources, maybe resource is no longer the correct package.

benjirewis · 2025-11-20T19:43:15Z

config/proto_conversions.go

I believe there are tests associated with this file; worth adding a test there for roundtripping.

robot/jobmanager/jobmanager.go

benjirewis · 2025-11-20T20:32:41Z

robot/jobmanager/jobmanager.go

 			VerbosityLevel: 0,
 		}
-		jobLogger.CInfo(jm.ctx, "Job triggered")
+		// Job added for scheduling; may not immediately run.


That seems like a fair change in messaging, but did you change it because you noticed a gap between the emission of this log line and the job actually running?

Yes, exactly. If you run a job that sleeps for 10 seconds every 1s, you'd previously get "Job triggered" messages every second even though the extra jobs will either be skipped (in cron mode) or queued (in duration mode).

Actually, this may not be correct...It's something I noticed early on while starting to learn the JobManager, but I may have just misread it. I can't repro on this branch. I'll investigate again on the current main and revert to Job triggered if I can't repro there.

If you run a job that sleeps for 10 seconds every 1s, you'd previously get "Job triggered" messages every second even though the extra jobs will either be skipped (in cron mode) or queued (in duration mode).

Actually, this may not be correct...

What behavior are you seeing for that setup you described? I'm just curious what actually happens. Does gocron actually start a new job every 1s even though the last one is still sleeping? Is that different for cron/duration mode? I regret that I never fully grokked what happened in these cases with the original implementation.

(replied earlier in other comment; moving here)

My earlier analysis is off--I was using fake sensor's GetReadings as my test job which has a mutex, so it looked like "Job triggered" would sometimes be queued. I'll change it back to "Job triggered", but there is a related corner case here which I'm fixing now:

I've updated GetReadings to only sleep 10s and return (removed the mutex).
I've scheduled the job to run at duration 1000ms, and later updated to 1001ms.

It starts returning every 10s as expected (at second 21, 31, 41), with the blocked runs getting queued by gocron.
When I update the schedule at second 44 while a job is currently running, it immediately launches the next one. So we briefly have 2 instances of the job running that complete at seconds 41 and 45. Then it shifts to running at 55, 05, 15...

This is because when we modify an old job, we completely remove the old one from gocron (which doesn't cancel the running instance) and create a new one, but gocron also supports updating an existing job with jm.scheduler.Update instead of NewJob.

2025-11-21T15:04:21.439Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:04:21.440Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:04:31.449Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:04:31.451Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:04:41.460Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:04:41.463Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:04:44.047Z INFO rdk.job_manager jobmanager/jobmanager.go:317 Job modified {"name":"sensor-reading"} 2025-11-21T15:04:45.050Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:04:51.474Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:04:55.057Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:04:55.058Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:05:05.067Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:05:05.068Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:05:15.084Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}} 2025-11-21T15:05:15.084Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:207 Job triggered 2025-11-21T15:05:25.089Z INFO rdk.job_manager.sensor-reading jobmanager/jobmanager.go:222 Job succeeded {"response":{"readings":{"a":1,"b":2,"c":3}}}

Gotcha; discussed a bit offline (I think this is what you were referring to in the Slack thread); that's an interesting case of 2 instances of the job running at the same time. Is the issue there just that time.Sleep(10 * time.Second) won't examine the context to see that it's been canceled, so an old job can stick around? If that's the case, there's probably not much we can do there, as the only way to really "stop" a goroutine is to cancel its context and cross your fingers that it has code to look at ctx.Err() or ctx.Done() and exit (or, of course, stop your entire program).

Yes, this is the issue we're discussing on Slack, and also connected to your jobFunc := func(_ context.Context) question: if we were to use a ctx here, then the running job could be cancelled when we modify it, but because we only use jm.ctx throughout jobFunc, it only cancels the running job if JobManager is shuting down. Now the new question is when we update a JobConfig (schedule, log_configuration, etc.), do we want to cancel the previously running job or let it finish?

robot/jobmanager/jobmanager.go

viambot added the safe to test This pull request is marked safe to test from a trusted zone label Nov 18, 2025

aldenh-viam force-pushed the jobcontinuous branch from ee7b1e8 to b3e23c6 Compare November 18, 2025 21:49

viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 18, 2025

dgottlieb reviewed Nov 19, 2025

View reviewed changes

aldenh-viam force-pushed the jobcontinuous branch from b3e23c6 to d852c24 Compare November 19, 2025 16:33