Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect cases where first eval is slower than subsequent evals #102

Open
LilithHafner opened this issue May 12, 2024 · 2 comments
Open

Detect cases where first eval is slower than subsequent evals #102

LilithHafner opened this issue May 12, 2024 · 2 comments

Comments

@LilithHafner
Copy link
Owner

If I have something like @b rand(1000) sort!, the first eval is much slower than subsequent evals within a given sample, which violates benchmarking assumptions and results in weird results. For example, @b rand(1000) sort! reports a super fast runtime while @b rand(100_000) sort! is realistic.

See: compintell/Mooncake.jl#140

julia> @be rand(100_000) sort!
Benchmark: 100 samples with 1 evaluation
min    761.379 μs (6 allocs: 789.438 KiB)
median 871.046 μs (6 allocs: 789.438 KiB)
mean   890.113 μs (6 allocs: 789.438 KiB, 2.74% gc time)
max    1.223 ms (6 allocs: 789.438 KiB, 14.46% gc time)

julia> @be rand(1000) sort!
Benchmark: 2943 samples with 7 evaluations
min    2.345 μs (0.86 allocs: 1.429 KiB)
median 3.208 μs (0.86 allocs: 1.429 KiB)
mean   4.221 μs (0.86 allocs: 1.434 KiB, 0.25% gc time)
max    701.837 μs (0.86 allocs: 1.714 KiB, 98.49% gc time)
@gdalle
Copy link

gdalle commented May 13, 2024

I guess most of these cases can be detected by systematically running a second evaluation after the first one? Of course it's debatable whether the benefit outweighs the cost

@LilithHafner
Copy link
Owner Author

LilithHafner commented Jan 5, 2025

For seconds=0.1 (the default), we'll choose to run only a single eval if the runtime is greater than about 0.02% of the budget. In this case, there isn't actually an issue because evals=1. If the runtime is less than 0.02% of the budget, then it should be pretty cheap to perform this check.

For higher budgets, the sitaution is even better. For lower budgets, it seems reasonable to perform fewer sanity checks.

This is all assuming that runtime is dominated by evaluating the target function rather than by Chairmarks plumbing or by the setup or teardown functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants