Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: iterators cannot be composed without incurring extra heap allocations (val + func literals) #69539

Open
coxley opened this issue Sep 19, 2024 · 9 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@coxley
Copy link

coxley commented Sep 19, 2024

Go version

go 1.23 / go1.24-fc968d4 / go1.24-165bf24

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/coxley/Library/Caches/go-build'
GOENV='/Users/coxley/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/coxley/.go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/coxley/.go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.23.1/libexec'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.23.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/coxley/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/var/folders/3s/w_9thc5x5zj6d5vyf_9j9xw00000gn/T/tmp.vEBTr6BuI7/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/3s/w_9thc5x5zj6d5vyf_9j9xw00000gn/T/go-build3137110682=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Runnable benchmark: https://go.dev/play/p/1J9H8AL2aGM

Created two types:

type Slice[T any] []T

type Collection[T any] struct {
  underlying Slice[T]
}

This simulates an actual use-case I have: an AVL tree being embedded in a graph data structure. Iterating from a higher-order type should be as performant as iterating from the foundational one.

When moving away from callback-style iteration (avl.Each(func(key int64, val *Edge) { /* ... */ })) to idiomatic iterators, I noticed a non-trivial performance regression. Granted, some of this iteration is in a tight-loop so maybe it isn't affecting every program, but after digging I noticed that higher-order iterators' func literals escape to the heap while the lowest-order one's do not.

The benchmark link above demonstrates. I'm not sure if this is #66469, #69015, or simply related. But the first issue didn't get any traction which was before the 1.23 release.

What did you see happen?

  • A basic iter.Seq style iterator performs 0 allocations.
  • Another iter.Seq style iterator on that dispatches to the other incurs allocations — 6 in the go.dev/play example.

What did you expect to see?

Equivalent performance of both.

The only current option is to duplicate iteration logic instead of composing iterators.

@gabyhelp
Copy link

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 19, 2024
@coxley coxley changed the title cmd/compile: Nested iterators escape to heap when base iterator does not cmd/compile: Nested iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 19, 2024
@coxley coxley changed the title cmd/compile: Nested iterators cannot be composed without incurring extra heap allocations (val + func literals) cmd/compile: Iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 19, 2024
@coxley
Copy link
Author

coxley commented Sep 19, 2024

I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1 and master.

> go version
go version go1.23.1 darwin/arm64

> go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }

> /Users/coxley/projects/go/bin/go version
go version devel go1.24-165bf24 2024-09-19 00:40:50 +0000 darwin/arm64

> /Users/coxley/projects/go/bin/go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }

@nemith
Copy link
Contributor

nemith commented Sep 19, 2024

Moving the underlying iterator out of the closure seems to remove the allocations but it is still a lot slower.

func (s Collection[T]) Values() iter.Seq[T] {
	underlyingIter := s.underlying.All()
	return func(yield func(T) bool) {
		for _, v := range underlyingIter {
			if !yield(v) {
				return
			}
		}
	}
}
$ go run main.go
Direct:        5	       200.0 ns/op        0 B/op	       0 allocs/op
Nested:        5	      5033 ns/op        0 B/op	       0 allocs/op

@cagedmantis cagedmantis changed the title cmd/compile: Iterators cannot be composed without incurring extra heap allocations (val + func literals) cmd/compile: iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 20, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 20, 2024
@cagedmantis cagedmantis added this to the Go1.24 milestone Sep 20, 2024
@cuonglm
Copy link
Member

cuonglm commented Sep 23, 2024

I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1 and master.

The fix is about wrong analysis of escape analysis, so it won't change the inlining cost.

@dr2chase
Copy link
Contributor

dr2chase commented Oct 23, 2024

Not entirely sure what's gone wrong here -- when compiled "-m", it helpfully reports that it can inline all the things with recent closure-inline-enhancing CLs applied, but it is not faster.

Correction -- it is NOT inlining all the things.

@dr2chase
Copy link
Contributor

Here's another benchmark -- if nothing else, assigning to "_" in the loop body gets optimized away. I'd also rather write these as go-test benchmarks, which reduces the need to check to see if something non-standard is going on.

package foo_test

import (
	"iter"
	"testing"
)

var u int

func BenchmarkDirect(b *testing.B) {
	s := Slice[int](make([]int, 64))
	for range b.N {
		for _, v := range s.All() {
			u = v
		}
	}
}

func BenchmarkNestedA(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	for range b.N {
		for v := range collection.ValuesA() {
			u = v
		}
	}
}

func BenchmarkNestedB(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	for range b.N {
		for v := range collection.ValuesB() {
			u = v
		}
	}
}

func BenchmarkNestedACall(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	yield := func(i int) bool {
		u = i
		return true
	}
	for range b.N {
		collection.ValuesA()(yield)
	}
}

func BenchmarkNestedBCall(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	yield := func(i int) bool {
		u = i
		return true
	}
	for range b.N {
		collection.ValuesB()(yield)
	}
}

type Slice[T any] []T

func (s Slice[T]) All() iter.Seq2[int, T] {
	return func(yield func(int, T) bool) {
		for i, v := range s {
			if !yield(i, v) {
				return
			}
		}
	}
}

type Collection[T any] struct {
	underlying Slice[T]
}

func (s Collection[T]) ValuesA() iter.Seq[T] {
	return func(yield func(T) bool) {
		for _, v := range s.underlying.All() {
			if !yield(v) {
				return
			}
		}
	}
}

func (s Collection[T]) ValuesB() iter.Seq[T] {
	underlyingIter := s.underlying.All()
	return func(yield func(T) bool) {
		for _, v := range underlyingIter {
			if !yield(v) {
				return
			}
		}
	}
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/622415 mentions this issue: cmd/compile: spell "go.runtime" correctly for inline "cheap" test

gopherbot pushed a commit that referenced this issue Oct 24, 2024
Updates #69539.

Change-Id: I40885e9c23f35772f8ace645044afee0d55b70b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/622415
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Robert Griesemer <[email protected]>
@coxley
Copy link
Author

coxley commented Oct 24, 2024

@dr2chase Sorry for the non-standard benchmark — only did that so it was viewable in go.dev/play.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 5, 2024

This is a dupe of #69411 -- same bug, same root cause. The inliner needs to get better at closures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

8 participants