-
Notifications
You must be signed in to change notification settings - Fork 44
Improve runtime measures for criterion plot and benchmarking plots #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very nice proposal! 🎉 This definitely fills a small but relevant gap. Some comments:
Regarding your question, I am unsure whether I understand it correctly. If, for example, I have a benchmark with two functions that have different runtimes of their derivative, I could use the |
Yes, function time would be a fair comparison but it is hardware specific and not fully reproducible. In benchmarking you often want to get reproducible results and potentially even compare benchmark results generated on different computers. So we need the CostModel solution to work for benchmarks as well and unfortunately there could be cases where each problem has a different cost model. |
hi, is this still relevant, can i work on this ? |
HI @spline2hg we are alread working on this in #553 and it is too hard for a first-time contributor. We'll upload more issues with the "good first issue" tag in the next few days. We started with #556 |
Thanks for the update! We'll try to work on #561 and keep an eye out for more good first issues. Looking forward to contributing! |
Current Situation / Problem you want to solve
The proposal in this issue concerns the functions
criterion_plot
,profile_plot
andconvergence_plot
.criterion_plot
uses the number of function evaluations (n_evaluations
) as runtime measureprofile_plot
andconvergence_plot
has aruntime_measure
argument that lets the user switch betweenn_evaluations
,n_batches
, andwalltime
.Each runtime measure serves a purpose:
n_evaluations
andn_batches
measure important aspects but also have a big drawback: They exclusively focus on objective functions and ignore all time that is spent on evaluating derivatives. This is not a problem as long as only derivative free or only derivative based optimizers are compared. But as soon as one compares a derivative free with a derivative based optimizer it becomes misleading.Describe the solution you'd like
Step 1: Introduce a new runtime measures:
All relevant functions will get a
runtime_measure
argument which can be:"function_time"
(default): The time spent in evaluations of the user provided functionsfun
,jac
,fun_and_jac
; Similar ton_evaluations
, this will ignore the overhead of calculations done in the optimizer."batch_function_time"
: The time that would have been spent in evaluations of user provided functions if all evaluations of the same batch were done in parallel (without parallelization overhead)."walltime"
: The actual time spent (reflecting actual optimizer overheads, parallelization overheads, ...)We also keep the legacy measures
"n_evaluations"
and"n_batches"
.Step 2: Introduce an optional cost model
While
"function_time"
and"batch_function"
time allow to ignore optimizer overhead, they are not deterministic nor comparable across machines. In order to achieve this, we optionally allow a user to pass aCostModel
asruntime_measure
. Using aCostModel
allows to reproduce all existing measures except for walltime. Moreover, it allows to get reproducible and hardware agnostic runtime measures for almost any situation.A cost model looks as follows:
The attributes
fun
,jac
, andfun_and_jac
allow a user to provide runtimes of the user provided functions. Those could be actual times in seconds or normalized values (e.g. 1 forfun
). None means, that an actual measured runtime is used.The attribute
label
is used as x-axis label in plots.The method
aggregate_batch_times
takes a list of times (which might be measured runtimes or replaced times based on the other attributes) and returns a scalar value. The default implementation assumes that no parallelization is used.To see the cost model in action, let's reproduce a few existing measures:
The zero values for
jac
andfun_and_jac
make the problems ofn_evaluations
andn_batches
very apparent.Potential variations
aggregate_batch_times
could be a callable attribute so users don't have to subclassCostModel
to change it.n_batches
andn_evaluations
could be deprecated and only be available by using the CostModelQuestions
The text was updated successfully, but these errors were encountered: