| jupytext |
|
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| kernelspec |
|
The trajectory optimization methods presented so far compute a complete control trajectory from an initial state to a final time or state. Once computed, this trajectory is executed without modification, making these methods fundamentally open-loop. The control function,
Real systems face modeling errors, external disturbances, and measurement noise that accumulate over time. A precomputed trajectory becomes increasingly irrelevant as these perturbations push the actual system state away from the predicted path. The solution is to incorporate feedback, making control decisions that respond to the current state rather than blindly following a predetermined schedule. While dynamic programming provides the theoretical framework for deriving feedback policies through value functions and Bellman equations, there exists a more direct approach that leverages the trajectory optimization methods already developed.
Model Predictive Control creates a feedback controller by repeatedly solving trajectory optimization problems. Rather than computing a single trajectory for the entire task duration, MPC solves a finite-horizon problem at each time step, starting from the current measured state. The controller then applies only the first control action from this solution before repeating the entire process. This strategy transforms any trajectory optimization method into a feedback controller.
The defining characteristic of MPC is its receding horizon strategy. At each time step, the controller solves an optimization problem looking a fixed duration into the future, but this prediction window constantly moves forward in time. The horizon "recedes" because it always starts from the current time and extends forward by the same amount.
Consider the discrete-time optimal control problem in Bolza form:
At time step
This receding horizon principle enables feedback without computing an explicit policy. By constantly updating predictions based on current measurements, MPC naturally corrects for disturbances and model errors. The apparent waste of computing but not using most of the trajectory is actually the mechanism that provides robustness.
The choice of prediction horizon depends on the control objective. We distinguish between three cases, each requiring different mathematical formulations.
For stabilization problems where the system must operate indefinitely around an equilibrium, the true objective is:
Since this cannot be solved directly, MPC approximates it with:
The terminal cost $V_f(\mathbf{x}N)$ approximates $\sum{k=N}^{\infty} c(\mathbf{x}_k, \mathbf{u}_k)$, the cost-to-go beyond the horizon. The terminal constraint
For tasks ending at time
The MPC formulation must adapt as time progresses:
As the task approaches completion, the horizon shrinks and the terminal cost switches from the approximation
Some systems operate on repeating cycles where the optimal behavior depends on the time of day, week, or season. Consider a commercial building where heating costs are higher at night, electricity prices vary hourly, and occupancy patterns repeat daily. The MPC controller must account for these periodic patterns while planning over a finite horizon.
For tasks with period
The cost function changes based on the phase
The complete MPC procedure implements the receding horizon principle through repeated optimization:
:label: alg-mpc-complete
**Input:**
- Nominal prediction horizon $N$
- Sampling period $\Delta t$
- Task type: {infinite, finite with duration $t_f$, periodic with period $T_p$}
- Cost functions and dynamics
- Constraints
**Procedure:**
1. Initialize time $t \leftarrow 0$
2. Measure initial state $\mathbf{x}_{\text{current}} \leftarrow \mathbf{x}(t)$
3. **While** task continues:
4. **Determine effective horizon and costs:**
- If infinite task:
- $N_{\text{eff}} \leftarrow N$
- Use terminal cost $V_f$ and constraint $\mathcal{X}_f$
- If finite task:
- $N_{\text{eff}} \leftarrow \min(N, \lfloor(t_f - t)/\Delta t\rfloor)$
- If $t + N_{\text{eff}}\Delta t = t_f$: use final cost $c_f$
- Otherwise: use approximation $c_T$
- If periodic task:
- $N_{\text{eff}} \leftarrow N$
- Adjust costs/constraints based on phase
5. **Solve optimization:**
Minimize over $\mathbf{u}_{0:N_{\text{eff}}-1}$ subject to dynamics, constraints, and $\mathbf{x}_0 = \mathbf{x}_{\text{current}}$
6. **Apply receding horizon control:**
- Extract $\mathbf{u}^*_0$ from solution
- Apply to system for duration $\Delta t$
- Measure new state
- Advance time: $t \leftarrow t + \Delta t$
7. **End While**
For many regulation and tracking problems, the nonlinear dynamics and costs we encounter can be approximated locally by linear and quadratic functions. The basic idea is to linearize the system around the current operating point and approximate the cost with a quadratic form. This reduces each MPC subproblem to a quadratic program (QP), which can be solved reliably and very quickly using standard solvers.
Suppose the true dynamics are nonlinear,
Around a nominal trajectory
$$ \mathbf{x}_{k+1} \approx f(\bar{\mathbf{x}}_k,\bar{\mathbf{u}}_k)
- \mathbf{A}_k(\mathbf{x}_k - \bar{\mathbf{x}}_k)
- \mathbf{B}_k(\mathbf{u}_k - \bar{\mathbf{u}}_k), $$
with Jacobians
Similarly, if the stage cost is nonlinear,
we approximate it quadratically near the nominal point:
$$ c(\mathbf{x}_k,\mathbf{u}_k) ;\approx; |\mathbf{x}_k - \mathbf{x}k^{\text{ref}}|{\mathbf{Q}_k}^2
- |\mathbf{u}_k - \mathbf{u}k^{\text{ref}}|{\mathbf{R}_k}^2, $$
with positive semidefinite weighting matrices
The resulting MPC subproblem has the form
$$ \begin{aligned} \min_{\mathbf{x}{0:N},\mathbf{u}{0:N-1}} \quad & |\mathbf{x}_N - \mathbf{x}N^{\text{ref}}|{\mathbf{P}}^2
- \sum_{k=0}^{N-1} \left( |\mathbf{x}_k - \mathbf{x}k^{\text{ref}}|{\mathbf{Q}_k}^2
- |\mathbf{u}_k - \mathbf{u}k^{\text{ref}}|{\mathbf{R}k}^2 \right) \ \text{s.t.} \quad & \mathbf{x}{k+1} = \mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u}k + \mathbf{d}k, \ & \mathbf{u}{\min} \leq \mathbf{u}k \leq \mathbf{u}{\max}, \ & \mathbf{x}{\min} \leq \mathbf{x}k \leq \mathbf{x}{\max}, \ & \mathbf{x}0 = \mathbf{x}{\text{current}} , \end{aligned} $$
where
Because the dynamics are now linear and the cost quadratic, this optimization problem is a convex quadratic program. Quadratic programs are attractive in practice: they can be solved at kilohertz rates with mature numerical methods, making them the backbone of many real-time MPC implementations.
At each MPC step, the controller updates its linearization around the new operating point, constructs the local QP, and solves it. The process repeats, with the linear model and quadratic cost refreshed at every reoptimization. Despite the approximation, this yields a closed-loop controller that inherits the fast computation of QPs while retaining the ability to track trajectories of the underlying nonlinear system.
The finite-horizon approximation in MPC brings a new challenge: the controller cannot see consequences beyond the horizon. Without proper design, this myopia can destabilize even simple systems. The solution is to carefully encode information about the infinite-horizon problem into the finite-horizon optimization through its terminal conditions.
Before diving into the mathematics, we should first establish what "stability" means and which tasks these theoretical guarantees address, as the notion of stability varies significantly across different control objectives.
The terminal conditions provide different types of guarantees depending on the control objective. For regulation problems, where the task is to drive the state to a fixed equilibrium $(\mathbf{x}\mathrm{eq}, \mathbf{u}\mathrm{eq})$ (often shifted to the origin), the stability guarantee is asymptotic stability: starting sufficiently close to the equilibrium, we have $\mathbf{x}k \to \mathbf{x}\mathrm{eq}$ while constraints remain satisfied throughout the trajectory (recursive feasibility). This requires the stage cost
When tracking a constant setpoint, the task becomes following a constant reference $(\mathbf{x}\mathrm{ref},\mathbf{u}\mathrm{ref})$ that solves the steady-state equations. This problem is handled by working in error coordinates $\tilde{\mathbf{x}}=\mathbf{x}-\mathbf{x}\mathrm{ref}$ and $\tilde{\mathbf{u}}=\mathbf{u}-\mathbf{u}\mathrm{ref}$, transforming the tracking problem into a regulation problem for the error system. The stability guarantee becomes asymptotic tracking, meaning
The terminal conditions we discuss below primarily address regulation and constant reference tracking. Time-varying tracking and economic MPC require additional techniques such as tube MPC and dissipativity theory.
To provide theoretical guarantees, the finite-horizon MPC problem is augmented with three interconnected components. The terminal cost
These components must satisfy specific compatibility conditions to provide theoretical guarantees:
:label: thm-mpc-stability
Consider the MPC problem with terminal cost $V_f$, terminal set $\mathcal{X}_f$, and local controller $\kappa_f$. If the following conditions hold:
**Control invariance**: For all $\mathbf{x} \in \mathcal{X}_f$, we have $\mathbf{f}(\mathbf{x}, \kappa_f(\mathbf{x})) \in \mathcal{X}_f$ (the set is invariant) and $\mathbf{g}(\mathbf{x}, \kappa_f(\mathbf{x})) \leq \mathbf{0}$ (constraints remain satisfied).
**Lyapunov decrease**: For all $\mathbf{x} \in \mathcal{X}_f$:
$$V_f(\mathbf{f}(\mathbf{x}, \kappa_f(\mathbf{x}))) - V_f(\mathbf{x}) \leq -\ell(\mathbf{x}, \kappa_f(\mathbf{x}))$$
where $\ell$ is the stage cost.
Then the MPC controller achieves recursive feasibility (if the problem is feasible at time $k$, it remains feasible at time $k+1$), asymptotic stability to the target equilibrium for regulation problems, and monotonic cost decrease along trajectories until the target is reached.
The finite-horizon MPC value
The upper bound
The interesting question is bounding the approximation error
Let $(\mathbf{u}_0^, \mathbf{u}_1^, \ldots)$ denote the infinite-horizon optimal control sequence with corresponding state trajectory $(\mathbf{x}_0^, \mathbf{x}_1^, \ldots)$ where
Now consider what happens when we truncate this optimal sequence at horizon
where $V_f(\mathbf{x}N^*)$ approximates the tail cost $\sum{k=N}^{\infty} \ell(\mathbf{x}_k^, \mathbf{u}_k^)$.
Since
This bound shows that the approximation error depends on how well the terminal cost
Once the basic idea of receding-horizon control is clear, it is helpful to see how the same backbone accommodates many variations. In every case, we transcribe the continuous-time optimal control problem into a nonlinear program of the form
The components in this NLP come from discretizing the continuous-time problem with a fixed horizon
The most common setup is reference tracking. Here, we are given time-varying target trajectories
When dynamics are linear and constraints are polyhedral, this yields a convex quadratic program at each time step.
In regulation tasks, we aim to bring the system back to an equilibrium point
To guarantee stability, it is common to include a terminal constraint
Not all systems operate around a reference. Sometimes the goal is to optimize a true economic objective (eg. energy cost, revenue, efficiency) directly. This gives rise to economic MPC, where the cost functions reflect real operational performance:
There is no reference trajectory here. The cost function determines the optimal behavior. In this setting, standard stability arguments no longer apply automatically, and one must be careful to add terminal penalties or constraints that ensure the closed-loop system remains well-behaved.
Some systems are exposed to external disturbances or small errors in the model. In those cases, we want the controller to make decisions that will still work no matter what happens, as long as the disturbances stay within some known bounds. This is the idea behind robust MPC.
Instead of planning a single trajectory, the controller plans a "nominal" path (what would happen in the absence of any disturbance) and then adds a feedback correction to react to whatever disturbances actually occur. This looks like:
where
Because we know the worst-case size of the disturbance, we can estimate how far the real state might drift from the plan, and "shrink" the constraints accordingly. The result is that the nominal plan is kept safely away from constraint boundaries, so even if the system gets pushed around, it stays inside limits. This is often called tube MPC because the true trajectory stays inside a tube around the nominal one.
The main benefit is that we can handle uncertainty without solving a complicated worst-case optimization at every time step. All the uncertainty is accounted for in the design of the feedback
If disturbances are random rather than adversarial, a natural goal is to optimize expected cost while enforcing constraints probabilistically. This gives rise to stochastic MPC, in which:
-
The cost becomes an expectation:
$$ \mathbb{E} \left[ c(\mathbf{x}N) + \sum{k=0}^{N-1} w_k, c(\mathbf{x}_k, \mathbf{u}_k) \right] $$
-
Constraints are allowed to be violated with small probability:
$$ \mathbb{P}[\mathbf{g}(\mathbf{x}_k, \mathbf{u}_k) \leq \mathbf{0}] \geq 1 - \varepsilon $$
In practice, expectations are approximated using a finite set of disturbance scenarios drawn ahead of time. For each scenario, the system dynamics are simulated forward using the same control inputs
Despite appearances, this is not dynamic programming. There is no value function or tree of all possible paths. There is only a finite set of futures chosen a priori, and optimized over directly. This scenario-based approach is common in energy systems such as hydro scheduling, where inflows are uncertain but sample trajectories can be generated from forecasts.
Risk constraints are typically enforced across all scenarios or encoded using risk measures like CVaR. For example, one might penalize violations that occur in the worst
When systems involve discrete switches (eg. on/off valves, mode selection, or combinatorial logic) the MPC problem must include integer or binary variables. These show up in constraints like
along with mode-dependent dynamics and costs. The resulting formulation is a mixed-integer nonlinear program (MINLP). The receding-horizon idea is the same, but each solve is more expensive due to the combinatorial nature of the decision space.
Large-scale systems often consist of interacting subsystems. Distributed MPC decomposes the global NLP into smaller ones that run in parallel, with coordination constraints enforcing consistency across shared variables:
Each subsystem solves a local problem over its own state and input variables, then exchanges information with neighbors. Coordination can be done via primal–dual methods, ADMM, or consensus schemes, but each local block looks like a standard MPC problem.
In practice, we may not know the true model
The parameters
The trajectory optimization methods we have studied assume perfect models and deterministic dynamics. In practice, however, MPC controllers must operate in environments where models are approximate, disturbances are unpredictable, and computational resources are limited. The mathematical elegance of optimal control must always yield to the engineering reality of robust operation as perfect optimality is less important than reliable operation. This philosophy permeates industrial MPC applications. A controller that achieves 95% performance 100% of the time is superior to one that achieves 100% performance 95% of the time and fails catastrophically the remaining 5%. Airlines accept suboptimal fuel consumption over missed approaches, power grids tolerate efficiency losses to prevent blackouts, and chemical plants sacrifice yield for safety. By designing for failure, we want to to create MPC systems that degrade gracefully rather than fail catastrophically, maintaining safety and stability even when the impossible is asked of them.
Consider a wind farm where MPC controllers coordinate individual turbines to maximize overall power production while minimizing wake interference. Each turbine can adjust both its thrust coefficient (through blade pitch) and yaw angle to redirect its wake away from downstream turbines. At time
Now suppose an unexpected wind direction change occurs, shifting the incoming wind vector by 30 degrees. The current state
This scenario illustrates the fundamental challenge of real-time MPC: constraint incompatibility. When disturbances push the system into states from which recovery appears impossible, or when reference trajectories demand physically impossible maneuvers, the intersection of all constraint sets becomes empty. Model mismatch compounds this problem as prediction errors accumulate over the horizon.
Even when feasible solutions exist, computational constraints can prevent their discovery. A control loop running at 100 Hz allows only 10 milliseconds per iteration. If the solver requires 15 milliseconds to converge, we face an impossible choice: delay the control action and risk destabilizing the system, or apply an unconverged iterate that may violate critical constraints.
A third failure mode involves numerical instabilities: ill-conditioned matrices, rank deficiency, or division by zero in the linear algebra routines. These failures are particularly problematic because they occur sporadically, triggered by specific state configurations that create near-singular conditions in the optimization problem.
The first approach to handling infeasibility recognizes that not all constraints carry equal importance. A chemical reactor's temperature must never exceed the runaway threshold: this is a hard constraint that cannot be violated. However, maintaining temperature within an optimal efficiency band is merely desirable. This can be treated as a soft constraint that we prefer to satisfy but can relax when necessary.
This hierarchy motivates reformulating the optimization problem using slack variables:
The penalty weights
Rather than treating constraints as binary hard/soft categories, we can establish a constraint hierarchy that enables graceful degradation:
As conditions deteriorate, the controller abandons objectives in reverse priority order, maintaining safety even when optimality becomes impossible.
When even soft constraints prove insufficient (perhaps due to catastrophic solver failure or corrupted problem structure) we need feasibility restoration that finds any feasible point regardless of optimality:
This formulation temporarily relaxes even the dynamics constraints, finding the "least infeasible" solution. It answers the question: if we must violate something, what is the minimal violation required? Once feasibility is restored, we can warm-start the original problem from this point.
Rather than reacting to infeasibility after it occurs, we can prevent it by filtering references through a reference governor. Consider an aircraft following waypoints. Instead of passing waypoints directly to the MPC, the governor asks: what is the closest approachable reference from our current state?
The governor performs a line search between the current state (always feasible since staying put requires no action) and the desired reference (potentially infeasible). This guarantees the MPC always receives feasible problems while making maximum progress toward the goal.
For computational efficiency, we can pre-compute the maximal output admissible set:
Online, the governor simply projects the desired reference onto
When MPC fails entirely (due to solver crashes, timeouts, or numerical failures) we need backup controllers that require minimal computation while guaranteeing stability and keeping the system away from dangerous regions.
The standard approach uses a pre-computed local LQR controller around the equilibrium:
where
The region $\mathcal{X}{\text{LQR}} = {\mathbf{x} : (\mathbf{x} - \mathbf{x}{\text{eq}})^T \mathbf{P} (\mathbf{x} - \mathbf{x}_{\text{eq}}) \leq \alpha}$ represents the largest invariant set where LQR is guaranteed to work.
Production MPC systems rarely rely on a single solver. Instead, they implement a cascade of increasingly conservative controllers that trade optimality for reliability:
def get_control(self, x, time_budget):
"""
Multi-level cascade for robust real-time control
"""
time_remaining = time_budget
# Level 1: Full nonlinear MPC
if time_remaining > 5e-3: # 5ms minimum
try:
u, solve_time = self.solve_nmpc(x, time_remaining)
if converged:
return u
except:
pass
time_remaining -= solve_time
# Level 2: Simplified linear MPC
if time_remaining > 1e-3: # 1ms minimum
try:
# Linearize around current state
A, B = self.linearize_dynamics(x)
u, solve_time = self.solve_lmpc(x, A, B, time_remaining)
return u
except:
pass
time_remaining -= solve_time
# Level 3: Explicit MPC lookup
if time_remaining > 1e-4: # 0.1ms minimum
region = self.find_critical_region(x)
if region is not None:
return self.explicit_control_law[region](x)
# Level 4: LQR backup
if self.in_lqr_region(x):
return self.K_lqr @ (x - self.x_eq)
# Level 5: Emergency safe mode
return self.emergency_stop(x)Each level trades optimality for reliability: Level 1 provides optimal but computationally expensive control, Level 2 offers suboptimal but faster solutions, Level 3 provides pre-computed instant evaluation, Level 4 ensures stabilizing control without tracking, and Level 5 implements safe shutdown.
Even when using backup controllers, we can maintain solution continuity through persistent warm-starting:
The shift operation takes a successful MPC solution and moves it forward by one time step, appending a terminal action: $[\mathbf{u}_1^{(k)}, \mathbf{u}2^{(k)}, \ldots, \mathbf{u}{N-1}^{(k)}, \kappa_f(\mathbf{x}_N^{(k)})]$. This shifted sequence provides natural temporal continuity for the next optimization.
When MPC fails and backup control is applied, the lift operation extends the single backup action
The propagate operation maintains a "virtual" trajectory by continuing to evolve the previous solution as if it were still being executed, even when the actual system follows backup control. This forward simulation keeps the warm-start temporally aligned and relevant for when MPC recovers.
Consider a continuous stirred tank reactor (CSTR) where an exothermic reaction must be controlled:
The MPC must maintain temperature below the runaway threshold
When the cooling system partially fails,
Real-time model predictive control places strict limits on computation. In applications such as adaptive optics, the controller must run at kilohertz rates. A sampling frequency of 1000 Hz allows only one millisecond per step to compute and apply a control input. This makes efficiency a first-class concern.
The structure of MPC lends itself naturally to optimization reuse. Each time step requires solving a problem with the same dynamics and constraints. Only the initial state, forecasts, or reference signals change. Instead of treating each instance as a new problem, we can frame MPC as a parametric optimization problem and focus on how the solution evolves with the parameter.
We begin with a general optimization problem indexed by a parameter
For each value of
depend on
When the problem is smooth and regular, the Karush–Kuhn–Tucker (KKT) conditions characterize optimality:
$$ \begin{aligned} \nabla_{\mathbf{x}} f(\mathbf{x}; \boldsymbol{\theta})
- \nabla_{\mathbf{x}} \mathbf{g}(\mathbf{x}; \boldsymbol{\theta})^\top \boldsymbol{\lambda}
- \nabla_{\mathbf{x}} \mathbf{h}(\mathbf{x}; \boldsymbol{\theta})^\top \boldsymbol{\nu} &= 0, \ \mathbf{g}(\mathbf{x}; \boldsymbol{\theta}) \le 0, \quad \boldsymbol{\lambda} \ge 0, \quad \lambda_i g_i(\mathbf{x}; \boldsymbol{\theta}) &= 0, \ \mathbf{h}(\mathbf{x}; \boldsymbol{\theta}) &= 0. \end{aligned} $$
If the active set remains fixed over changes in
are differentiable.
In linear and quadratic programming, this structure becomes even more tractable. Consider a linear program with affine dependence on
Each active set determines a basis and thus a region in
Similarly, in strictly convex quadratic programs
each active set again leads to an affine optimizer, with piecewise-affine global structure and a piecewise-quadratic value function.
Parametric programming focuses on the structure of the map
We often meet equations of the form
where
is invertible at a solution
In words: if the square Jacobian in
Return to
and write the KKT equations as a single residual
$$ F(y,\boldsymbol{\theta}) ;=; \begin{bmatrix} \nabla_{\mathbf{x}} f(\mathbf{x};\boldsymbol{\theta})
- \nabla_{\mathbf{x}} \mathbf{g}(\mathbf{x};\boldsymbol{\theta})^\top \boldsymbol{\lambda}
- \nabla_{\mathbf{x}} \mathbf{h}(\mathbf{x};\boldsymbol{\theta})^\top \boldsymbol{\nu} \ \mathbf{h}(\mathbf{x};\boldsymbol{\theta}) \ \mathbf{g}_\mathcal{A}(\mathbf{x};\boldsymbol{\theta}) \end{bmatrix} ;=; \mathbf{0}. $$
Here
To invoke IFT, we need the Jacobian
-
LICQ (Linear Independence Constraint Qualification) at
$(\mathbf{x}^\star,\boldsymbol{\theta}^\star)$ : the gradients of all active constraints are linearly independent. - Second-order sufficiency on the critical cone (the Lagrangian Hessian is positive definite on feasible directions).
- Strict complementarity (optional but convenient): each active inequality has strictly positive multiplier.
Under these, the KKT matrix,
is nonsingular. Here
The right-hand side sensitivity to parameters is
$$ G ;=; \frac{\partial F}{\partial \boldsymbol{\theta}}(y^\star,\boldsymbol{\theta}^\star) ;=; \begin{bmatrix} \nabla_{\boldsymbol{\theta}}\nabla_{\mathbf{x}} f
- \sum_{i\in\mathcal{A}} \lambda_i^\star \nabla_{\boldsymbol{\theta}}\nabla_{\mathbf{x}} g_i
- \sum_j \nu_j^\star \nabla_{\boldsymbol{\theta}}\nabla_{\mathbf{x}} h_j \ \nabla_{\boldsymbol{\theta}} \mathbf{h} \ \nabla_{\boldsymbol{\theta}} \mathbf{g}\mathcal{A} \end{bmatrix}{(\mathbf{x}^\star,\boldsymbol{\theta}^\star)} . $$
IFT then gives local differentiability of the optimizer and multipliers:
The formula above is valid as long as the active set
In parametric MPC,
We start with a smooth root-finding problem
Newton's method iterates
or equivalently solves the linearized system
Convergence is local and fast when the Jacobian is nonsingular and the initial guess is close.
Now suppose the root depends on a parameter:
We want the solution path
At a known solution
If
This is exactly the implicit differentiation formula. Continuation uses it as a predictor:
Then a few corrector steps apply Newton to
For parametric KKT systems, set
Continuation then becomes:
-
Predictor:
$y_{\text{pred}} = y^\star + (\Delta\theta),(-K^{-1}G)$ . -
Corrector: a few Newton/SQP steps on the KKT equations at the new
$\theta$ .
In MPC, this yields efficient warm starts across time: as the parameter
The idea of reusing structure across similar optimization problems is not exclusive to parametric programming. In machine learning, a related concept known as amortized optimization aims to reduce the cost of repeated inference by replacing explicit optimization with a function that has been learned to approximate the solution map. This approach shifts the computational burden from online solving to offline training.
The goal is to construct a function
Amortized optimization has emerged in several contexts:
- In probabilistic inference, where variational autoencoders (VAEs) amortize the computation of posterior distributions across a dataset.
- In meta-learning, where the objective is to learn a model that generalizes across tasks by internalizing how to adapt.
- In hyperparameter optimization, where learning a surrogate model can guide the search over configuration space efficiently.
This perspective has also begun to influence control. Recent work investigates how to amortize nonlinear MPC (NMPC) policies into neural networks. The training data come from solving many instances of the underlying optimal control problem offline. The resulting neural policy
Compared to explicit MPC, which partitions the parameter space and stores exact solutions region by region, amortized control smooths over the domain by learning an approximate policy globally. It is less precise, but scalable to high-dimensional problems where enumeration of regions is impossible.
Neural network amortization is advantageous due to the expressivity of these models. However, the challenge is ensuring constraint satisfaction and safety, which are hard to guarantee with unconstrained neural approximators. Hybrid approaches attempt to address this by combining a neural warm-start policy with a final projection step, or by embedding the network within a constrained optimization layer. Other strategies include learning structured architectures that respect known physics or control symmetries.
Consider a fixed horizon
The applied action is $\pi^\star(\boldsymbol{\theta}) := \mathbf{u}0^\star(\boldsymbol{\theta})$. Our goal is to learn a fast surrogate mapping $\hat{\pi}\phi:\boldsymbol{\theta}\mapsto \hat{\mathbf{u}}_0 \approx \pi^\star(\boldsymbol{\theta})$ that can be evaluated in microseconds, optionally followed by a safety projection layer.
Supervised learning from oracle solutions.
One first samples parameters
is then used to train a neural network
Once trained, the network acts as a surrogate for the optimizer, providing instantaneous evaluations that approximate the MPC law.
This problem explores the control of propofol infusion in total intravenous anesthesia (TIVA). Our presentation follows the problem formulation developped by {cite:t}Sawaguchi2008. The primary objective is to maintain the desired level of unconsciousness while minimizing adverse reactions and ensuring quick recovery after surgery.
The level of unconsciousness is measured by the Bispectral Index (BIS), which is obtained using an electroencephalography (EEG) device. The BIS ranges from
The goal is to design a control system that regulates the infusion rate of propofol to maintain the BIS within the target range. This can be formulated as an optimal control problem:
Where:
-
$u(t)$ is the propofol infusion rate (mg/kg/h) -
$x_1$ ,$x_2$ , and$x_3$ are the drug concentrations in different body compartments -
$x_e$ is the effect-site concentration -
$k_{ij}$ are rate constants for drug transfer between compartments -
$BIS(t)$ is the Bispectral Index -
$\lambda$ is a regularization parameter penalizing excessive drug use -
$E_0$ ,$E_{\text{max}}$ ,$EC_{50}$ , and$\gamma$ are parameters of the pharmacodynamic model
The specific dynamics model used in this problem is so-called "Pharmacokinetic-Pharmacodynamic Model" and consists of three main components:
-
Pharmacokinetic Model, which describes how the drug distributes through the body over time. It's based on a three-compartment model:
- Central compartment (blood and well-perfused organs)
- Shallow peripheral compartment (muscle and other tissues)
- Deep peripheral compartment (fat)
-
Effect Site Model, which represents the delay between drug concentration in the blood and its effect on the brain.
-
Pharmacodynamic Model that relates the effect-site concentration to the observed BIS.
The propofol infusion control problem presents several interesting challenges from a research perspective. First, there is a delay in how fast the drug can reach a different compartments in addition to the BIS measurements which can lag. This could lead to instability if not properly addressed in the control design.
Furthermore, every patient is different from another. Hence, we cannot simply learn a single controller offline and hope that it will generalize to an entire patient population. We will account for this variability through Model Predictive Control (MPC) and dynamically adapt to the model mismatch through replanning. How a patient will react to a given dose of drug also varies and must be carefully controlled to avoid overdoses. This adds an additional layer of complexity since we have to incorporate safety constraints. Finally, the patient might suddenly change state, for example due to surgical stimuli, and the controller must be able to adapt quickly to compensate for the disturbance to the system.
:tags: [hide-input]
# label: fig-mpc-propofol
# caption: Closed-loop MPC for propofol infusion keeps the Bispectral Index near the target (top), regulates infusion rates (middle), and tracks the effect-site concentration (bottom).
%config InlineBackend.figure_format = 'retina'
import numpy as np
from scipy.optimize import minimize
import matplotlib.pyplot as plt
# Apply book style
try:
import scienceplots
plt.style.use(['science', 'notebook'])
except (ImportError, OSError):
pass # Use matplotlib defaults
class Patient:
def __init__(self, age, weight):
self.age = age
self.weight = weight
self.set_pk_params()
self.set_pd_params()
def set_pk_params(self):
self.v1 = 4.27 * (self.weight / 70) ** 0.71 * (self.age / 30) ** (-0.39)
self.v2 = 18.9 * (self.weight / 70) ** 0.64 * (self.age / 30) ** (-0.62)
self.v3 = 238 * (self.weight / 70) ** 0.95
self.cl1 = 1.89 * (self.weight / 70) ** 0.75 * (self.age / 30) ** (-0.25)
self.cl2 = 1.29 * (self.weight / 70) ** 0.62
self.cl3 = 0.836 * (self.weight / 70) ** 0.77
self.k10 = self.cl1 / self.v1
self.k12 = self.cl2 / self.v1
self.k13 = self.cl3 / self.v1
self.k21 = self.cl2 / self.v2
self.k31 = self.cl3 / self.v3
self.ke0 = 0.456
def set_pd_params(self):
self.E0 = 100
self.Emax = 100
self.EC50 = 3.4
self.gamma = 3
def pk_model(x, u, patient):
x1, x2, x3, xe = x
dx1 = -(patient.k10 + patient.k12 + patient.k13) * x1 + patient.k21 * x2 + patient.k31 * x3 + u / patient.v1
dx2 = patient.k12 * x1 - patient.k21 * x2
dx3 = patient.k13 * x1 - patient.k31 * x3
dxe = patient.ke0 * (x1 - xe)
return np.array([dx1, dx2, dx3, dxe])
def pd_model(ce, patient):
return patient.E0 - patient.Emax * (ce ** patient.gamma) / (ce ** patient.gamma + patient.EC50 ** patient.gamma)
def simulate_step(x, u, patient, dt):
x_next = x + dt * pk_model(x, u, patient)
bis = pd_model(x_next[3], patient)
return x_next, bis
def objective(u, x0, patient, dt, N, target_bis):
x = x0.copy()
total_cost = 0
for i in range(N):
x, bis = simulate_step(x, u[i], patient, dt)
total_cost += (bis - target_bis)**2 + 0.1 * u[i]**2
return total_cost
def mpc_step(x0, patient, dt, N, target_bis):
u0 = 10 * np.ones(N) # Initial guess
bounds = [(0, 20)] * N # Infusion rate between 0 and 20 mg/kg/h
result = minimize(objective, u0, args=(x0, patient, dt, N, target_bis),
method='SLSQP', bounds=bounds)
return result.x[0] # Return only the first control input
def run_mpc_simulation(patient, T, dt, N, target_bis):
steps = int(T / dt)
x = np.zeros((steps+1, 4))
bis = np.zeros(steps+1)
u = np.zeros(steps)
for i in range(steps):
# Add noise to the current state to simulate real-world uncertainty
x_noisy = x[i] + np.random.normal(0, 0.01, size=4)
# Use noisy state for MPC planning
u[i] = mpc_step(x_noisy, patient, dt, N, target_bis)
# Evolve the true state using the deterministic model
x[i+1], bis[i] = simulate_step(x[i], u[i], patient, dt)
bis[-1] = pd_model(x[-1, 3], patient)
return x, bis, u
# Set up the problem
patient = Patient(age=40, weight=70)
T = 120 # Total time in minutes
dt = 0.5 # Time step in minutes
N = 20 # Prediction horizon
target_bis = 50 # Target BIS value
# Run MPC simulation
x, bis, u = run_mpc_simulation(patient, T, dt, N, target_bis)
# Plot results
t = np.arange(0, T+dt, dt)
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(10, 12), sharex=True)
ax1.plot(t, bis)
ax1.set_ylabel('BIS')
ax1.set_ylim(0, 100)
ax1.axhline(y=target_bis, color='r', linestyle='--')
ax2.plot(t[:-1], u)
ax2.set_ylabel('Infusion Rate (mg/kg/h)')
ax3.plot(t, x[:, 3])
ax3.set_ylabel('Effect-site Concentration (µg/mL)')
ax3.set_xlabel('Time (min)')
plt.tight_layout()
print(f"Initial BIS: {bis[0]:.2f}")
print(f"Final BIS: {bis[-1]:.2f}")
print(f"Mean infusion rate: {np.mean(u):.2f} mg/kg/h")
print(f"Final effect-site concentration: {x[-1, 3]:.2f} µg/mL")