Skip to content

Latest commit

 

History

History
89 lines (65 loc) · 4.68 KB

notes.md

File metadata and controls

89 lines (65 loc) · 4.68 KB

To-do

  • Test acrobot with constraints enforced in forward pass. Why doesn't it work?

    • Bug: in forward_pass if we use us (clamped) instead of u_curr (unclamped), we can't solve the problem. Should we even have to clamp a lot? Isn't boxQP supposed to take care of this? See lines for clamping in forward_pass()
      • K matrix in direction of clamped inputs should be zero. TODO check this
      • limits are crossed without applying control gains (just feedforward), but only by a bit - TODO check K matrix
  • Fix calculate_cxu - why do we calculate the final one?

  • Optimize

    • run in MPC mode on perturbed model - how fast is it with warm-start?
    • speed up derivatives - openmp currently turned off because it was leading to some non-deterministic (?) behavior. See #pragma lines in derivatives.cpp
    • try callgrind (on linux; having some issues on laptop)
    • replace virtual function call to dynamics with template/policy
    • Speed up backward pass - what else can be done?
    • parallelize line search with openmp? how often does it end on the first few iterations?
    • Turn off assertions in a clean way
  • Implement other methods, compare

  • Save notes of derivations somewhere

  • Read extension papers - parallel (ethz thesis)

  • Solidify understanding - read thru notes and lectures again

  • clean up code and formatting

  • python script for plotting output - finalize with commandline arguments

  • additional documentation for future me


  • Try a more complex model - mujoco?
  • debuggers: gdb, rr, sublime integration
  • remove error codes - use enums

Algorithm

Questions
  • Connection between DDP and backprop
  • Understand pros/cons vs other methods.
Pros/cons vs other trajectory optimization algorithms
  • iLQR/DDP searches in space of control trajectories; it solves an $m$-dimensional problem $N$ times instead of solving a single problem of size $mN$ (as a direct method would). So complexity is $O(Nm^3)$ vs. $O(N^3m^3)$.
    • Note that this ignores the cost of calculating the derivatives - this takes up the bulk of the time in iLQR if finite-differences is used. Ideally, we would have analytic derivatives of dynamics model.
    • This also ignores methods for explotining sparsity in direct methods
  • Intermediate results of shooting methods (of which iLQR is one) are always strictly feasible and "dynamics constraints" are unnecessary.
  • Direct methods tend to find optima easier than indirect/shooting methods. Shooting methods are much more sensitive to local minima.
  • Shooting methods make it easy to use warm-starts.
  • Another advantage of iLQR is that it finds a set of local linear controllers around the desired trajectory, which can be used to keep the trajectory from drifting

From Tassa 2014: "The direct approach discards the temporal structure and is forced to search in a constrained space which is slower, however it is far easier to find better optima through continuation. The indirect approach is faster and better suited for warm-starting, but is far more sensitive to local minima."

Misc

  • For iLQR, undesired states are usually encoded as repulsive "potential fields" in the cost function. We can't do this for the control input limits, which is why the boxQP is necessary (the "control-limited" part of the paper title)

Misc notes

  • cmake -DCMAKE_BUILD_TYPE=Debug .. to compile with debug flags
  • Find TODO:
grep -nr TODO src
grep -n TODO include/*

References

Other implementations
Misc