runtime: add runtime.BackgroundYield() #8

dt · 2025-10-06T22:17:59Z

No description provided.

petermattis

I'm good with experimenting with this Go runtime enhancement. I definitely want to see experimental evidence of the benefit.

PS Should probably update the print in schedtrace to include sched.bgqsize.

petermattis · 2025-10-08T11:31:00Z

src/runtime/proc.go

+	if gp.lockedm != 0 || gp.m.lockedg != 0 || gp.m.locks > 0 {
+		return
+	}
+	if sched.runqsize > 0 || (gp.m.p != 0 && !runqempty(gp.m.p.ptr())) {


Don't you need to hold sched.lock when checking sched.runqsize and sched.bgsqize? Hmm, it looks like runqsize is sometimes checked without the lock and sometimes with, though it is always modified with the lock held. Seems like the runtime authors just assume this is understood. Worth dropping a comment here about the safety.

Yeah, I had a similar question when I was reading findRunnable and noticed reads of runqsize without holding the lock and it is not using atomics and indeed specifically asked ChatGPT to look for all such cases and outline what's going on. My take-away was that it being done where the compiler isn't going to hoist the load entirely out of the loop or something, so it will actually be a load that's just a the mercy of cache coherence delays, and that slightly stale read is good enough for the sort of "first pass" decisions it is used for, with critical decisions then falling back to a locked read like the one at the bottom of findRunnable if the unlocked reads didn't find anything. Yielding seems like just such a case, where it is more important we be cheap than perfect, as we'll just yield a few checks later once our cache updates.

petermattis · 2025-10-08T11:52:41Z

src/runtime/proc.go

+	}
+
+	gp := getg()
+	// Don't yield if locked to an OS thread or holding runtime locks.


I thought runtime locks were only held during internal runtime operations. That is, runtime locks can't be held if application code is calling BackgroundYield. Is that check purely defensive?

Yeah, just an abundance of caution since it would be not good to park while holding one. But I don't really see how you could unless the runtime itself were to use this, eg in the gc loop (that current uses a similar yieldIfBusy utility.

petermattis · 2025-10-08T11:57:44Z

src/runtime/proc.go

+	backgroundyield_slow(gp)
+}
+
+// Keep the heavy work (timeout check, park, locking) out of the inline path.


Nit: checkTimeouts is a no-op except for the JS runtime (i.e. it isn't "heavy work").

Ah, good to know. That said, since I only need to call it when I actually a going to park, vs as part of the decision to park, I think it still belongs in the separate function to keep the inlinable check function to the minimum required to check.

sumeerbhola · 2025-10-09T22:39:13Z

src/runtime/runtime2.go

 	runqsize int32
+	// Global background-yield queue: goroutines that voluntarily yielded
+	// while the scheduler was busy. Does NOT contribute to runqsize.
+	bgq     gQueue


(just clarifying) So there is no per-P run queue for these (like p.runq) because:

We expect the number of such background goroutines to be few?

Even if they are not few, they are only run when the foreground goroutines are not runnable, which should be rarer (when P utilization is high, and if it is low, this goroutine wouldn't have had to yield), so grabbing them from the global queue is ok wrt concurrent performance?

sumeerbhola · 2025-10-09T22:54:13Z

src/runtime/proc.go

+//
+// If there are any idle Ps this is a noop.
+//
+// If there are no idle Ps and the global run queue has runnable goroutines waiting,


What if the schedt.runq is empty but one of the p.runqs is non-empty? Will it keep running? I suppose we don't want to incur the synchronization of looking at each p.runq. Should it at least look at its P's runq?

sumeerbhola · 2025-10-09T22:58:27Z

src/runtime/proc.go

+	// Fast path: tiny, inlineable checks only.
+
+	// Check if we need to yield at all and early exit fast if not.
+	if sched.npidle.Load() > 0 {


Given we have global atomic of npidle already, should we consider adding one for npWithNonEmptyRunQ? That would eliminate the unfairness from my previous comment.

What do you mean here?

We are deciding to add this goroutine to bgq when there is no idle P and:

The global runq is non-empty, or

This P's runq is non-empty

Both the above could be false, but some other P could have a non-empty runq (which, if this P became idle, it would steal from). This is not desirable. We can fix this by having a schedt.npWithNonEmptyRunQ atomic: each P when it transitions from runq empty to non-empty would increment this atomic, and decrement on the reverse transition. The second bullet above would change to npWithNonEmptyRunQ > 0.

Yeah, I figured if global runq is empty and this G had nothing in its local that’s cheap to check. I was reluctant initially, to add a new atomic or anything that needs to be maintained in any non-background paths in case this is a patch we have to carry on our fork, thus trying to stick to just what we already have: npidle, global queue, and local queue, but not other m’s queues.

This has me thinking: if we were willing to leave a little utilization on the table, we could just say npidle < 1 is our signal. Then we can just infer that all the runqs, global and local, are empty or could be if they wanted to be since we’re leaving a whole p idle, and that possibly has even better latency characteristics than waiting for runq to be no -empty to jump out of the way at the last minute. But again, at the cost of leaving a whole p on the table. But maybe that’s ok (still better utilization than today for maxprocs>=4).

I guess another option is to make a num runnable atomic (if we have to keep it on a fork it isn’t that hard to grep for the casStatus(runnable) calls) and stop looking at either npidle or runqsize?

sumeerbhola · 2025-10-09T23:02:00Z

src/runtime/proc.go

+	if gp.lockedm != 0 || gp.m.lockedg != 0 || gp.m.locks > 0 {
+		return
+	}
+	if sched.runqsize > 0 || (gp.m.p != 0 && !runqempty(gp.m.p.ptr())) {


Looks like we are checking the local P's runq, so ignore part of my earlier comment.

sumeerbhola · 2025-10-09T23:04:14Z

src/runtime/proc.go

+	// Yielded goroutines were runnable but voluntarily deprioritized themselves
+	// to waiting instead by calling BackgroundYield. If we have nothing runnable
+	// we can bring a yielded goroutine back to runnable and run it.
+	if sched.bgqsize != 0 {


Does the go compiler know something special about certain fields and makes their unsynchronized reads "less stale"?

Not that I can find. The runtime and in particular scheduler uses these a-bit-stale reads liberally (though in important things like findRunnable there are locked reads backstopping them after fast paths are exhausted). I was, uh, curious about this too. As far as I can tell -- and what chatgpt tells me as well -- is they're just good old unlocked, un-syned reads. They're uint32 so no worry about tearing a write (on 32b), so staleness is the only concern. They aren't in loops out of which the load might be lifted by the compiler, so it'd be an actual load, and just up to MESI or whatever I guess. Though I did my initial testing with a cruder version of this patch, before I cleaned it up for a PR, and one of my cleanups was to split to up to ensure the cheap checks could be inlined; I wonder if that is a mistake, since if it is inlined into a for loop -- the place we expect this to be called -- maybe the load does get lifted and then never sees new values? I guess I should re-test with the optimized version.

sumeerbhola · 2025-10-10T00:27:13Z

src/runtime/proc.go

+// scheduled for at least the past duration. This allows the calling goroutine
+// to offer some degree of fairness among goroutines that opt in to yielding,
+// as otherwise yielding is only done based on the (non-background) run queue.
+//


Given a background goroutine is only a background goroutine, when it successfully yields, I wonder whether one can arrange things such that it never yields. Say it keeps transitioning out of running state after every 1ms of cpu consumption for some IO, and passes 2ms to this parameter. It will not yield to the goroutines in bgq, yes? If yes, it may justify keeping this fairness behavior out of the scheduler, since it can be accomplished by waiting for re-admission via AC.

Correct, it would never yield to background work if it kept coming from some unscheduled state. I was thinking that that if it was unscheduled, that probably means whatever background was at the head of the queue got a chance to run anyway, thanks to it unscheduling for whatever it blocked on even if not thanks to it deliberately yielding, then when our unfair caller comes back there is a non-bg runq so it yields to that. So the fairness mechanism is there just in case nothing else blocks it.

tbg · 2025-10-10T11:10:42Z

src/runtime/backgroundyield_test.go

+	defer runtime.GOMAXPROCS(orig)
+
+	runtime.GOMAXPROCS(target)
+	runtime.Gosched()


what's this for?

tbg · 2025-10-10T11:15:11Z

src/runtime/export_test.go

 	}
 }

+// RunBackgroundYieldQueueCheck exercises the background queue enqueue/dequeue


I'm confused by this method. What does it exercise? What does success mean? Why does it return skipped==true if there's something in the bgq? Why is this only used in a test that doesn't seem to do anything?

Can’t use testing.T or testing at all in the runtime package, only in runtime_test external test package. So the common pattern, if you want to test anything non-exported, seems to be a RunX func (in this _test.go file that is not _test package) that exercises the internal code / returns an exported version, then a usually very thin TestX in the external runtime_test package that calls it.

That’s the reason for the overall setup, but I’ll go back and review the actual logic herein in this one: I added these last couple these tests while just chasing coverage % on the train so might be some room for cleanup/commenting here.

dt requested review from sumeerbhola and tbg October 6, 2025 22:17

petermattis reviewed Oct 8, 2025

View reviewed changes

rickystewart force-pushed the cockroach-go1.23.12 branch 3 times, most recently from 84fef0d to ec86954 Compare October 9, 2025 21:21

sumeerbhola requested changes Oct 9, 2025

View reviewed changes

sumeerbhola reviewed Oct 10, 2025

View reviewed changes

tbg reviewed Oct 10, 2025

View reviewed changes

dt force-pushed the yield branch from c341e6d to c51f04d Compare October 13, 2025 23:40

runtime: add runtime.Yield

1a9d747

dt force-pushed the yield branch from c51f04d to 1a9d747 Compare October 14, 2025 21:04

runtime: add runtime.BackgroundYield() #8

Are you sure you want to change the base?

runtime: add runtime.BackgroundYield() #8

Conversation

dt commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dt Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dt commented Oct 6, 2025 •

edited

Loading

dt Oct 10, 2025 •

edited

Loading