Skip to content

Enable parallel big-M calculation for gdp.mbigm transformation #3641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

sadavis1
Copy link
Contributor

@sadavis1 sadavis1 commented Jun 23, 2025

Fixes # (n/a)

Summary/Motivation:

The multiple big-M transformation tends to slow down as model size increases due to linear or quadratic growth in the number of subsolver runs required. This change parallelizes M calculation using Python's multiprocessing module. The threading module was tried at first, but due to previously discussed issues was switched to multiprocessing.

Changes proposed in this PR:

  • Rework flow of gdp.mbigm transformation. Calculate M values all at once using a multiprocessing.Pool.starmap call.
  • Switch gdp.mbigm big-M calculation from using the primal bound to using the dual bound, as the primal bound is not mathematically correct in the presence of numerical error (but any dual bound is valid).
  • Add configuration option 'use_primal_bound' to reenable old behavior, in case a solver like ipopt that cannot provide a dual bound is used.
  • Add configuration option 'threads' to control number of multiprocessing workers:
    • By default, use os.cpu_count() - 1 workers. Note that this is potentially harmful; for example, it may lead to using more Gurobi license tokens at once, also on Windows the model must now be pickleable (see below).
    • When set to 1 or 0, do not use multiprocessing and revert to fully single-threaded operation
  • Add configuration option 'process_spawn_mechanism' to determine how we spawn processes.
    • Each mechanism described in the Python docs is available.
    • Dynamically choose the default option: 'spawn' on Windows, 'fork' on Unix, unless we can detect that there are multiple threads, in which case we use 'forkserver' instead.
    • When using 'spawn' or 'forkserver', models must be pickled to give to the other processes. Depend on dill in this case as models often contain nested functions and in my testing do not reliably pickle without it. I think this code leads to a nested pickle, but I'm not sure if anything can be done about it.
  • Fix a bug in bigm_mixin.py (name 'logger' used without being defined)
  • contrib/solver/factory.py: Set name class attribute of LegacySolverWrapper derived class to the solver's legacy_name. This is so that code like this:
from pyomo.environ import SolverFactory
solver = SolverFactory('gurobi_direct_v2')
solver_effective_copy = SolverFactory(solver.name, options=solver.options)

will behave as intended (as solvers do not reliably pickle even with dill). This does not affect code that gets the original contrib Solver directly from the pyomo.contrib.solver.common.factory.SolverFactory.

  • Add check that we are not passing a contrib Solver object, unless it is a LegacySolverWrapper (it has a .solve attribute, but the code will choke later in several places if we do not reject it).
  • Add tests to ensure that big-M calculation functions properly with the different start methods. These are essentially copy-pastes of test_calculated_Ms_correct with parameters to the solve() call altered.

Since this is a performance change, I ran a test on the medium-sized instance gdp_col from GDPlib, using baron as the subsolver.
test_gdp_col_fork
It kind of looks like f(x) = 1/x, if you squint in such a way that you cannot see the bottom of the chart. This instance transformed in 145 seconds on the current main branch, so this is not a regression in the single-threaded case. Naturally, things are slightly slower when using 'spawn', but I do at least make sure we only pickle the model once per thread (and hopefully only once total? I'm not sure how multiprocessing works on the inside, but it really should cache these).

I also tested the small instance jobshop, to ensure nothing horrible happened.
test_jobshop_fork
On the current main branch, this model transforms in 0.36 seconds so again there is no regression.

Finally, there seems to be a bug when using this transformation with gurobi_direct v1. It works fine with the other solvers I've tried, so I suspect it's a bug in that interface, but I haven't tracked it down yet. This combination also has errors on the current main branch, but they're different errors, so it's hard to know if I've changed anything in that regard.

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the BSD license.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

@emma58 emma58 self-requested a review June 23, 2025 19:26
@@ -23,7 +27,7 @@ def _convert_M_to_tuple(M, constraint, disjunct=None):
else:
try:
M = (-M, M)
except:
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You touched this line so now I get to snark. It's good practice, if catching a general exception, to raise the original exception as well so folks can directly inspect it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already do this? There is a raise statement after we log.

(transBlock, algebraic_constraint) = self._setup_transform_disjunctionData(
obj, root_disjunct
)
def _transform_disjunctionDatas(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to break this up? This function is huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's even worse than it looks, because _setup_jobs_for_disjunction is basically just the inner body of this function that I transposed out so it would be less offensively indented (notice it mutates like 4 of its parameters). The problem is that the whole transformation is basically a big ball of state until it's done and I don't know if it can really not be that way. Emma's version was a lot nicer because it did the disjunctions one by one, but I can't do that if I want it to use threads effectively.

All that said, I will look and see if I can separate any more of this out in a reasonably clean way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the multiprocessing pool setup to its own instance method, slightly improving this situation

@sadavis1
Copy link
Contributor Author

After discussion today, I have reverted the change to the LegacySolverWrapper class names, and switched from sending the solver name to recreate the solver to sending the solver class to recreate the solver. This makes the assumption that all solver classes (besides contrib without the wrapper, which we reject for other reasons) can be correctly constructed with the single named argument options.

Also, using this on windows now depends on dill even more, because solvers can be instances of nested classes so that's another dill.dumps(). I went ahead and completed the trio by dill-ing the options parameter too -- who knows, maybe there's a way to pass a nested function into one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants