Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64

johannahaffner · 2024-06-09T15:32:11Z

I'm using Nelder-Mead to fit a variance model to residuals. On real data, the solve diverges frequently. I followed the advice in #45 and implemented bounds in the options passed to the solver.

This is my first-ever pull request, and I would not be surprised if there is a much more elegant way to solve this, but I wanted to get your opinion early on in the process.

I have tested this on synthetic data and simpler test cases, and am trying it on real data right now.
I have not yet added a test case for this specific option to tests/. Should it go into test_minimise from the eponymous test module?

Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known.

johannahaffner · 2024-06-09T15:34:21Z

PS: I have not touched the stats, but it might be useful to track how often the vertices get clipped to the bounds.

patrick-kidger

This looks really good!

Agreed with all your comments: I think it's important to have some tests for this. In particular for when the evolving state is some complicated PyTree -- I find it's really easy to mess up the tree-map manipulations here!

I'd also be very happy to have a statistic tracking how often clipping occurs, if that would be useful to you.

patrick-kidger · 2024-06-09T20:59:55Z

optimistix/_solver/nelder_mead.py

@@ -171,6 +177,24 @@ def init(
                    + self.adelta
                    + self.rdelta * leaf[relative_indices]
                )
+                if lower is not None:
+                    broadcast_leaves = jtu.tree_map(
+                        lambda a, b: jnp.clip(a, a_min=b),


I think a_min is deprecated in favour of just min or something. (IIRC)

patrick-kidger · 2024-06-09T21:00:56Z

optimistix/_solver/nelder_mead.py

+                        jnp.array(broadcast_leaves),
+                        jnp.broadcast_to(
+                            jnp.array(lower_bounds[index]),
+                            shape=jnp.array(broadcast_leaves).shape,


I think the jnp.array(broadcast_leaves) should be unnecessary, from the code above it already looks to be an array?

johannahaffner · 2024-06-11T00:45:47Z

All great points, thank you!
I have a pretty busy week ahead, will sit down as I am able to and think this through :)

johannahaffner · 2024-06-15T23:28:24Z

Hi! I managed to reformulate the objective function and now it works super well even without bounds (and using a different solver).
However, I would really like to be able to set bounds in LevenbergMarquardt, and potentially in other solvers too.

Do you think bounded optimization should be a per-solver implementation, or could there be a more abstract formulation that works on any pytree, and could be added to any solver?

FFroehlich · 2024-06-18T08:59:08Z

Doing this properly for trust region methods is a bit more involved. In my experience https://epubs.siam.org/doi/10.1137/0806023 is the best approach. We did some benchmarking in https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010322, which only looks at reflective strategies, not truncations. Fides also implements truncations, and from what I recall this was either worse or as good as the reflective strategies.

patrick-kidger · 2024-06-19T18:23:15Z

Do you think bounded optimization should be a per-solver implementation, or could there be a more abstract formulation that works on any pytree, and could be added to any solver?

Honestly, I'm not sure! Constrained optimization was out-of-scope when we first wrote Optimistix, and we've not revisited it since.

If there can be a general approach to it then that would be very useful. I suspect there probably is something -- if nothing else then truncating/reflecting at the end of every step is a thing you should always be able to do!

@FFroehlich -- that's super interesting to read! If we can support multiple strategies that may be best...

johannahaffner · 2024-06-20T06:57:23Z

Hi @FFroehlich and @patrick-kidger,

thank you for your input, and thank you for the references! Very interesting indeed. It makes sense that reflection should be better - seems like it has a higher chance of ending up somewhere where the direction of descent does not immediately lead back to the bounds.
I'd love to take this up and implement support for bounded optimization.

Here are some thoughts:

the user should (ideally) specify the bounds with the same PyTree structure as y0
otherwise we could support a pair (lower, upper), in which case the bounds would default to jax.tree.map(lambda x: lower*x, y0)
(Raise error if neither.)
bounds can be a keyword argument bounds: dict = None, it could look something like bounds = dict(kind="clip", lower=lower).

I would define tree_clip and tree_reflect functions.

Very minor point on notation: I prefer clip over truncate, to make it more obvious that jnp.clip is used under the hood. If that would veer away too far from convention, let's go with truncate.

In terms of tree structures, anything other than vectors and parametric models I need to consider? So far I only optimized over those. (The parametric model is a small linear model in my case, implemented as an eqx.Module with a __call__ method.)
I haven't used anything that has, for example, an activation function inside.

Enjoy your days!

patrick-kidger · 2024-06-20T18:51:39Z

In terms of API, I'd probably suggest making it a callable all the way, and support nothing else.

That would mean something like optx.minimise(..., move_y=...) -- although probably let's pick a different name -- where in particular move_y is actually a function that maps points from where the optimiser put them, to where they should be instead.

We can then also define helper functions like tree_clip and tree_reflect, which can be passed to this API!

Agreed on 'clip' over 'truncate'.
On tree structures: we require that y0 be of type PyTree[Array] (check its type signature), so you don't need to worry about any non-arrays like activation functions.

johannahaffner · 2024-06-20T23:02:24Z

Great idea, a callable would make it much cleaner! Then stuff like checking if lower and upper are each defined can be handled inside of that, and the solvers themselves can remain blissfully ignorant of all the details, only passing a point and getting one back.

I'd propose something like that:

class AbstractBoundaryMap(...):
   ...

   @abc.abstractmethod
   def _map_with_bounds(self, y):
       """Very good description."""

   def __call__(self, y: PyTree[Array]) -> PyTree[Array]: 
       return self._map_with_bounds(y)


class ClippingBoundaryMap(AbstractBoundaryMap): 
   ...

class ReflectiveBoundaryMap(AbstractBoundaryMap): 
   ...

How about calling the callable boundary_map and have it default to None? The user would then have to define them outside of the solve, passing lower and upper similarly to how tolerances have to be specified in diffrax events.

patrick-kidger · 2024-06-21T20:44:00Z

That sounds good to me! Why have separate _map_with_bounds and __call__, though? (Just make the __call__ an abstractmethod.)

@FFroehlich what do you think?

One thing I'm not sure on is how far we want to go with this. The make-a-step-and-then-adjust-it approach is reasonable for simple problems, but in general constrained optimisation is a much larger topic than this. If we later decide that we want to implement e.g. interior point methods, is that a thing we would do without having painted ourselves into a corner here?

johannahaffner · 2024-06-21T21:18:17Z

That sounds good to me! Why have separate _map_with_bounds and __call__, though? (Just make the __call__ an abstractmethod.)

Ah right! I will do away with the private method.

And I was indeed focused on the more immediate problem of modifying the proposed y.
From the abstract of the paper you shared, @FFroehlich:

Instead, a solution to a trust region subproblem is defined by minimizing a quadratic function subject only to an ellipsoidal constraint.

That does not sound like a boundary map to me, it sounds like a solver-specific option again, called in step during the computation of the next point, rather than after.
We could then, on a per-solver basis:

support a boundary map (make it private and create it in init, using the specified bounds),
or something more complex, such as interior points.

Then all solvers in which simple constraints are reasonable could support these, with a shared implementation thereof.
I can commit to implementing the boundary maps for clipped and reflected adjusted steps.

FFroehlich · 2024-06-26T09:40:00Z

The paper I sent uses a two-pronged approach.

On the one hand, there is the step-then-project/adjust approach to make sure that constraints are always satisfied. I agree with everything/implemented so far and also that this is likely the best approach for simple problems.

On the other hand, there is the change to the trust region subproblem. What effectively happens is that the boundary constraint also adds a transformation of the optimisation variables, which effectively results a dampening term in the quadratic subproblem, similar to what happens with the interior point approach (although you don't have any slack variables). This has the advantage that it also adds some mild regularisation to the problem, which is likely to be helpful for ill-conditioned problems, but implementation will be a bit more complex (but I see how this could be implemented rather broadly by passing a (pre-conditioning) variable transformation).

johannahaffner · 2024-07-03T15:02:18Z

Thank you for pointing that out, @FFroehlich. I gave the paper introducing Fides a deep read now :)

I think the required transformation of the optimization variables and the Hessian could be an extra method of a BoundaryMap, that is simply ignored by non-trust region solvers.
I have yet to write this down in code or pseudocode and consider what would be required for interior-point methods with slack variables.

However, I also noticed this

To ensure convergence, $\Delta \theta_k$ is then selected based on the lowest $m_k(p)$ value among (i) the reflection of $p*$ at the parameter boundary (ii) the constrained Cauchy step, which is the minimizer of $m_k(p)$ along the gradient that is truncated at the parameter boundary and (iii) $p*$ truncated at the parameter boundary.

In each case,

$m_k(p) = f_k + \nabla f_k + \frac{1}{2} p^T \nabla^2 f_k p$ for an objective function $f$ evaluated at $\theta_k$.
$p^*$ minimizes $m_k , s.t. \lVert p \rVert \leq \Delta_k$, with $\Delta_k$ being the trust region radius.

This sounds to me as though the solver checks i), ii) and iii) and then picks the best option. This would be an argument against separate BoundaryMaps for clipping and reflective boundaries and in favor of a single map that could implement both (+ the Cauchy, eventually).

Have you checked how i), ii) and iii) perform separately? I'm not sure how to interpret Fig3.D and E from the Fides paper.

Do you expect strong performance differences between the solver picking among the three options, compared to sticking to a single option during a solve?

FFroehlich · 2024-07-04T11:34:53Z

Thank you for pointing that out, @FFroehlich. I gave the paper introducing Fides a deep read now :)

I think the required transformation of the optimization variables and the Hessian could be an extra method of a BoundaryMap, that is simply ignored by non-trust region solvers. I have yet to write this down in code or pseudocode and consider what would be required for interior-point methods with slack variables.

Yes, sound that sounds reasonable.

However, I also noticed this

To ensure convergence, Δθk is then selected based on the lowest mk(p) value among (i) the reflection of p∗ at the parameter boundary (ii) the constrained Cauchy step, which is the minimizer of mk(p) along the gradient that is truncated at the parameter boundary and (iii) p∗ truncated at the parameter boundary.

In each case,

mk(p)=fk+∇fk+12pT∇2fkp for an objective function f evaluated at θk.

p∗ minimizes mk,s.t.‖p‖≤Δk, with Δk being the trust region radius.

This sounds to me as though the solver checks i), ii) and iii) and then picks the best option. This would be an argument against separate BoundaryMaps for clipping and reflective boundaries and in favour of a single map that could implement both (+ the Cauchy, eventually).

Have you checked how i), ii) and iii) perform separately? I'm not sure how to interpret Fig3.D and E from the Fides paper.

Do you expect strong performance differences between the solver picking among the three options, compared to sticking to a single option during a solve?

That's where things get more complicated. It's probably helpful carefully read https://link.springer.com/article/10.1007/BF01582221. From what I recall, they include the (truncated) gradient step ((ii) above) as it is necessary to prove convergence. In contrast to (iii), which rescales all elements of *p, the truncation strategy in Fig. 3 D/E is actually an element-wise truncation at the boundary (replacing (i)), just for the parameters that hit the boundary. I don't think there is any theoretical justification why to include (iii), it is more of a practical thing that the corresponding step is anyways computed.

We simply stuck with the implementation that fmincon/lsqnonlin and ls_trf were using in order to be able to compare to those methods. My impression is that overall, there are a lot of empirical choices in these algorithms with little evidence that they are in any way optimal. Also contemporary optimisation problems likely have distinct characteristics than typical benchmarks in the 90s. You could try to just use a single option. You could try to only use a reflective/truncation strategy. I would expect that both work reasonably well as optimisation close to boundaries is just generally difficult, even with the discussed strategies.

If you are interested in trying some things out, we would have everything in place to easily run optimistix on the PEtab benchmark collection.

johannahaffner · 2024-07-05T12:22:14Z

If you are interested in trying some things out, we would have everything in place to easily run optimistix on the PEtab benchmark collection.

100 %! Amazing, I didn't know that interfacing to the PETab benchmarks was already possible.

I'll update with more detail on your other remarks, once I have been able to give them thorough consideration. Thank you for the thoughtful reply!

johannahaffner · 2024-07-25T11:22:36Z

Just a brief update: I have not forgotten about this :)

I did read the papers you sent, @FFroehlich. I then branched out into more general literature on constraint based optimisation, consulting the Nocedal & Wright in particular.
I took that route because box constraints are just a special case of inequality constraints. And if we don't want to box ourselves in (pun intended), I think it is useful to have a bit of an overview.

I have some things to wrap up in the coming two weeks, but plan to return to this topic then - I think there is potential for some useful abstractions here, since a lot of common ingredients are repeated across different approaches. (It do agree that it seems as though people make some empirical choices there - and then run with them if convergence can be proven. Optimality of one approach among many is much harder to show, as @FFroehlich has already pointed out.)

johannahaffner · 2025-02-06T18:11:23Z

Closing this - thanks for the input, everyone!

Update nelder_mead.py

94c37f4

Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known.

patrick-kidger reviewed Jun 9, 2024

View reviewed changes

patrick-kidger mentioned this pull request Jul 15, 2024

Parallel multi start #67

Open

johannahaffner closed this Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64

Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64

johannahaffner commented Jun 9, 2024 •

edited

Loading

johannahaffner commented Jun 9, 2024

patrick-kidger left a comment

patrick-kidger Jun 9, 2024

patrick-kidger Jun 9, 2024

johannahaffner commented Jun 11, 2024

johannahaffner commented Jun 15, 2024

FFroehlich commented Jun 18, 2024

patrick-kidger commented Jun 19, 2024

johannahaffner commented Jun 20, 2024 •

edited

Loading

patrick-kidger commented Jun 20, 2024

johannahaffner commented Jun 20, 2024 •

edited

Loading

patrick-kidger commented Jun 21, 2024

johannahaffner commented Jun 21, 2024 •

edited

Loading

FFroehlich commented Jun 26, 2024

johannahaffner commented Jul 3, 2024 •

edited

Loading

FFroehlich commented Jul 4, 2024

johannahaffner commented Jul 5, 2024

johannahaffner commented Jul 25, 2024

johannahaffner commented Feb 6, 2025

Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64

Draft:: Implement box constraints for the Nelder Mead solver, to increase its robustness on messy data where bounds are known. #64

Conversation

johannahaffner commented Jun 9, 2024 • edited Loading

johannahaffner commented Jun 9, 2024

patrick-kidger left a comment

Choose a reason for hiding this comment

patrick-kidger Jun 9, 2024

Choose a reason for hiding this comment

patrick-kidger Jun 9, 2024

Choose a reason for hiding this comment

johannahaffner commented Jun 11, 2024

johannahaffner commented Jun 15, 2024

FFroehlich commented Jun 18, 2024

patrick-kidger commented Jun 19, 2024

johannahaffner commented Jun 20, 2024 • edited Loading

patrick-kidger commented Jun 20, 2024

johannahaffner commented Jun 20, 2024 • edited Loading

patrick-kidger commented Jun 21, 2024

johannahaffner commented Jun 21, 2024 • edited Loading

FFroehlich commented Jun 26, 2024

johannahaffner commented Jul 3, 2024 • edited Loading

FFroehlich commented Jul 4, 2024

johannahaffner commented Jul 5, 2024

johannahaffner commented Jul 25, 2024

johannahaffner commented Feb 6, 2025

johannahaffner commented Jun 9, 2024 •

edited

Loading

johannahaffner commented Jun 20, 2024 •

edited

Loading

johannahaffner commented Jun 20, 2024 •

edited

Loading

johannahaffner commented Jun 21, 2024 •

edited

Loading

johannahaffner commented Jul 3, 2024 •

edited

Loading