-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64
Conversation
Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known.
PS: I have not touched the stats, but it might be useful to track how often the vertices get clipped to the bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
Agreed with all your comments: I think it's important to have some tests for this. In particular for when the evolving state is some complicated PyTree -- I find it's really easy to mess up the tree-map manipulations here!
I'd also be very happy to have a statistic tracking how often clipping occurs, if that would be useful to you.
@@ -171,6 +177,24 @@ def init( | |||
+ self.adelta | |||
+ self.rdelta * leaf[relative_indices] | |||
) | |||
if lower is not None: | |||
broadcast_leaves = jtu.tree_map( | |||
lambda a, b: jnp.clip(a, a_min=b), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a_min
is deprecated in favour of just min
or something. (IIRC)
jnp.array(broadcast_leaves), | ||
jnp.broadcast_to( | ||
jnp.array(lower_bounds[index]), | ||
shape=jnp.array(broadcast_leaves).shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the jnp.array(broadcast_leaves)
should be unnecessary, from the code above it already looks to be an array?
All great points, thank you! |
Hi! I managed to reformulate the objective function and now it works super well even without bounds (and using a different solver). Do you think bounded optimization should be a per-solver implementation, or could there be a more abstract formulation that works on any pytree, and could be added to any solver? |
Doing this properly for trust region methods is a bit more involved. In my experience https://epubs.siam.org/doi/10.1137/0806023 is the best approach. We did some benchmarking in https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010322, which only looks at reflective strategies, not truncations. Fides also implements truncations, and from what I recall this was either worse or as good as the reflective strategies. |
Honestly, I'm not sure! Constrained optimization was out-of-scope when we first wrote Optimistix, and we've not revisited it since. If there can be a general approach to it then that would be very useful. I suspect there probably is something -- if nothing else then truncating/reflecting at the end of every step is a thing you should always be able to do! @FFroehlich -- that's super interesting to read! If we can support multiple strategies that may be best... |
Hi @FFroehlich and @patrick-kidger, thank you for your input, and thank you for the references! Very interesting indeed. It makes sense that reflection should be better - seems like it has a higher chance of ending up somewhere where the direction of descent does not immediately lead back to the bounds. Here are some thoughts:
I would define Very minor point on notation: I prefer In terms of tree structures, anything other than vectors and parametric models I need to consider? So far I only optimized over those. (The parametric model is a small linear model in my case, implemented as an Enjoy your days! |
In terms of API, I'd probably suggest making it a callable all the way, and support nothing else. That would mean something like We can then also define helper functions like Agreed on 'clip' over 'truncate'. |
Great idea, a callable would make it much cleaner! Then stuff like checking if lower and upper are each defined can be handled inside of that, and the solvers themselves can remain blissfully ignorant of all the details, only passing a point and getting one back. I'd propose something like that: class AbstractBoundaryMap(...):
...
@abc.abstractmethod
def _map_with_bounds(self, y):
"""Very good description."""
def __call__(self, y: PyTree[Array]) -> PyTree[Array]:
return self._map_with_bounds(y)
class ClippingBoundaryMap(AbstractBoundaryMap):
...
class ReflectiveBoundaryMap(AbstractBoundaryMap):
... How about calling the callable |
That sounds good to me! Why have separate @FFroehlich what do you think? One thing I'm not sure on is how far we want to go with this. The make-a-step-and-then-adjust-it approach is reasonable for simple problems, but in general constrained optimisation is a much larger topic than this. If we later decide that we want to implement e.g. interior point methods, is that a thing we would do without having painted ourselves into a corner here? |
Ah right! I will do away with the private method. And I was indeed focused on the more immediate problem of modifying the proposed
That does not sound like a boundary map to me, it sounds like a solver-specific option again, called in
Then all solvers in which simple constraints are reasonable could support these, with a shared implementation thereof. |
The paper I sent uses a two-pronged approach. On the one hand, there is the step-then-project/adjust approach to make sure that constraints are always satisfied. I agree with everything/implemented so far and also that this is likely the best approach for simple problems. On the other hand, there is the change to the trust region subproblem. What effectively happens is that the boundary constraint also adds a transformation of the optimisation variables, which effectively results a dampening term in the quadratic subproblem, similar to what happens with the interior point approach (although you don't have any slack variables). This has the advantage that it also adds some mild regularisation to the problem, which is likely to be helpful for ill-conditioned problems, but implementation will be a bit more complex (but I see how this could be implemented rather broadly by passing a (pre-conditioning) variable transformation). |
Thank you for pointing that out, @FFroehlich. I gave the paper introducing Fides a deep read now :) I think the required transformation of the optimization variables and the Hessian could be an extra method of a However, I also noticed this
In each case,
This sounds to me as though the solver checks i), ii) and iii) and then picks the best option. This would be an argument against separate Have you checked how i), ii) and iii) perform separately? I'm not sure how to interpret Fig3.D and E from the Fides paper. Do you expect strong performance differences between the solver picking among the three options, compared to sticking to a single option during a solve? |
Yes, sound that sounds reasonable.
That's where things get more complicated. It's probably helpful carefully read https://link.springer.com/article/10.1007/BF01582221. From what I recall, they include the (truncated) gradient step ((ii) above) as it is necessary to prove convergence. In contrast to (iii), which rescales all elements of *p, the truncation strategy in Fig. 3 D/E is actually an element-wise truncation at the boundary (replacing (i)), just for the parameters that hit the boundary. I don't think there is any theoretical justification why to include (iii), it is more of a practical thing that the corresponding step is anyways computed. We simply stuck with the implementation that fmincon/lsqnonlin and ls_trf were using in order to be able to compare to those methods. My impression is that overall, there are a lot of empirical choices in these algorithms with little evidence that they are in any way optimal. Also contemporary optimisation problems likely have distinct characteristics than typical benchmarks in the 90s. You could try to just use a single option. You could try to only use a reflective/truncation strategy. I would expect that both work reasonably well as optimisation close to boundaries is just generally difficult, even with the discussed strategies. If you are interested in trying some things out, we would have everything in place to easily run optimistix on the PEtab benchmark collection. |
100 %! Amazing, I didn't know that interfacing to the PETab benchmarks was already possible. I'll update with more detail on your other remarks, once I have been able to give them thorough consideration. Thank you for the thoughtful reply! |
Just a brief update: I have not forgotten about this :) I did read the papers you sent, @FFroehlich. I then branched out into more general literature on constraint based optimisation, consulting the Nocedal & Wright in particular. I have some things to wrap up in the coming two weeks, but plan to return to this topic then - I think there is potential for some useful abstractions here, since a lot of common ingredients are repeated across different approaches. (It do agree that it seems as though people make some empirical choices there - and then run with them if convergence can be proven. Optimality of one approach among many is much harder to show, as @FFroehlich has already pointed out.) |
Closing this - thanks for the input, everyone! |
I'm using Nelder-Mead to fit a variance model to residuals. On real data, the solve diverges frequently. I followed the advice in #45 and implemented bounds in the
options
passed to the solver.This is my first-ever pull request, and I would not be surprised if there is a much more elegant way to solve this, but I wanted to get your opinion early on in the process.
tests/
. Should it go intotest_minimise
from the eponymous test module?