RL Path Planning by MinhxNguyen7 · Pull Request #117 · uci-uav-forge/uavf_2025

MinhxNguyen7 · 2025-02-07T06:23:55Z

Description

Implementation of path planning using model-free reinforcement learning.
- Model-free RL necessary because the trajectory and collision cost functions are not differentiable w.r.t. the actor model (although this conclusion should be challenged).
TD3, PPO, and SAC will be tried and compared.
- The training methods should be interchangeable when using stable_baselines3 and gymnasium.
The system outputs waypoints which can either be passed to ArduPilot or to our own controller.
This can be used in the future to solve other problems with RL, e.g., mission planning.

Technical Details

State Space

Current velocity, acceleration, and jerk.
Destination location relative to the current position.
Some representation of obstacles.

Action Space

The action space is the position delta and velocity between the current and next waypoint.
This can then be added to the current position and velocity to be used as world-frame waypoints.

Reward Function

The reward function is a weighted sum of the below functions.

Trajectory Cost

This is an estimate of the "control effort."
We estimate this with the weighted sum of velocity, acceleration, and snap.
Since we don't actually know the trajectory that our controller will generate, we generate a smooth trajectory by minimizing snap as a proxy.
The min-snap trajectory is piecewise-polynomial, so we can find its derivatives and integrate over their squares (to keep the values positive).

Collision Cost

The perception of obstacles can be based on raw particles from particle filtering or aggregated Gaussian tracks.

Raw Particles

We can pass a sample of raw particles to the models as part of the state.
The collision cost would be the minimum distance of the trajectory to each particle.
The function of the distance between the trajectory and a particle w.r.t. time is simple.
- Since each particle has position, velocity, and maybe acceleration, their motion is polynomial over time.
- Since the trajectory is also a polynomial over time, the function of the distance between a particle and the planned trajectory can be computed by subtracting the functions, i.e., subtracting the polynomial coefficients.
However, finding the minimum of that distance function is somewhat non-trivial.
- We'll just sample it.
This allows us to skip track aggregation, but it increases sensitivity to changes in particle distribution.
- This can happen if we tweak the algorithm or change the detection model.

Gaussian Tracks

We need to aggregate tracks first, which is highly non-trivial and difficult to parallelize.
Training another NN might be an option.
Needs more thinking

Distance/Time to Target

The distance to the target is trivial as well as differentiable.
- Therefore, this can skip the critic.
We need some cost to encourage time efficiency.

Tasks

Test Plan

TBA

Issues

Part of Path Planning #109

Resources

Point-Cloud Feature Extraction

PointNet (Papers with Code) utilizes raw points in deep learning model to learn a global feature embedding.

EricPedley · 2025-02-12T20:35:26Z

Since we don't actually know the trajectory that our controller will generate, we generate a smooth trajectory by minimizing snap as a proxy.

Does whatever we're using to generate the min-snap trajectories also take into account the collision avoidance cost? I don't get how using the control cost for a min-snap trajectory helps us here.

Also, on the note of track association, if we figure it out I don't think we need RL, we can use graph of convex sets: https://underactuated.mit.edu/trajopt.html#example9. And for doing track association it might be worth looking into methods for lidar object detection because that's basically the same problem as track association for the particle filter. IDK how hard you expect doing this with RL to be, but if you wanna consider more options before going ahead it might be worth looking into.

MinhxNguyen7 · 2025-02-12T21:56:12Z

Does whatever we're using to generate the min-snap trajectories also take into account the collision avoidance cost?

No, but the polynomial trajectory will be used to compute the collision cost, so as long as the min-snap interpolation roughly represents the real-life path of the drone if given those granular waypoints, it should be fine. I.e., the model will learn to work around the interpolator.

if we figure it out I don't think we need RL, we can use graph of convex sets

I've only skimmed the chapter, but these methods only works for static objects, right? If so, it would require us to both do track association and form convex shapes that for the trajectory to optimize around.

it might be worth looking into methods for lidar object detection because that's basically the same problem as track association for the particle filter

I'll take a look, but I think the most promising thing to do would be to use a NN. Do note, though that lidar and particle filtering result in substantially different point distributions.

IDK how hard you expect doing this with RL to be, but if you wanna consider more options before going ahead it might be worth looking into.

I don't anticipate the RL being more difficult than setting up these optimizations, especially considering the fact that they're even further outside of my wheelhouse.

Dat-Bois · 2025-02-12T22:57:15Z

Looks good to me, worried a bit about computational effort since we may want some form of MPC to run (very simple local horizon planning most likely) since when going off track it may be more optimal to regenerate the trajectory from that point rather than make a maneuver to get back on the original track. Maybe we can characterize the vehicle dynamics into the cost functions (ie inertial matrix, velocity and accel hard constraints (not sure if you had hard constraints)) and that can prove a more physically trackable trajectory and avoid MPC.

I'm not super familiar with using RL for trajectory generation so it may be much faster than I expect.

MinhxNguyen7 · 2025-02-13T00:09:22Z

worried a bit about computational effort

The training might be a little bit computationally involved, but the actor (which generates the waypoints) will be a relatively small neural network which should be much smaller than YOLO. I wouldn't expect generating a whole trajectory with many points to take more time than one image detection. The plan is to continuously regenerate the trajectory from the current position to the destination.

Maybe we can characterize the vehicle dynamics into the cost functions (ie inertial matrix, velocity and accel hard constraints (not sure if you had hard constraints))

Yes, the "control cost" will be a weighted sum of the velocity and acceleration. I can make the cost non-linear to reflect hard-ish limits, which I think would be better than a cost-cliff.

MinhxNguyen7 added GNC IN PROGRESS labels Feb 7, 2025

MinhxNguyen7 self-assigned this Feb 7, 2025

MinhxNguyen7 added 17 commits February 6, 2025 23:03

feat(MinSnapTrajectory): copy

a3e5136

docs(MinSnapTrajectory): improve traj docstring formatting

ee4cfd9

refactor(MinSnapTrajectory): fold traj_stepwise into traj

df106a3

test(MinSnapTrajectory): expected trajectory for regressiong testing

a446864

feat(MinSnaptrajectory): print running time

42747d0

feat(MinSnapTrajectory): correct csv header

2b0b839

chore(MinSnapTrajectory): remove input_csv

c30c201

refactor(MinSnapTrajectory): use np arrays instead of stupidity

da6df8d

feat(MinSnapTrajectory): broken rewrite of visualize

76be512

refactor(trajectory): rename to min_snap

0e4869a

docs(MinSnapTrajectory): disclaimer

b2f7e6a

refactor(gnc): rename trajectory to path

0976b34

refactor(path): make trajectory submodule

f069b67

feat(Trajectory): base class for trajectories

243f6e6

fix(Trajectory): order of classmethod and abstractmethod

ba1c6f4

feat(SimplePolyTrajectory): simple polynomial trajectory generation

4946ddd

refactor(MinSnapTrajectory): better input structure and decomposition

c34deac

MinhxNguyen7 force-pushed the path-planning branch from 80d59e2 to c34deac Compare February 7, 2025 07:03

feat: minimal copy of spinningup ddpg implementation

3b8d220

MinhxNguyen7 changed the title ~~Path planning~~ DDPG Path Planning Feb 12, 2025

MinhxNguyen7 added 3 commits February 12, 2025 22:05

refactor(MinSnap): use Trajectory

1027ae9

chore(MinSnap): remove vestigial plot method

11059f8

refactor(MinSnap): save Polynomials instead of coefficients

3ad08da

MinhxNguyen7 added 8 commits April 14, 2025 05:08

docs(particle_filtering): shape of get_particles

3112a53

feat(Env): scaled safety reward

f46ef48

feat(Env): implement trajectory efficiency cost

625950e

feat(Env): observation without particle filter simulation

39d38db

feat(Env): dummy timeless particle filter

f703223

refactor: rename TimelessParticleFilter to AtemporalParticleFilter

613a04e

refactor: move things into trajectory_planning

3278f65

fix: particle filtering integration

b899c5c

EricPedley removed the IN PROGRESS label May 2, 2025

EricPedley marked this pull request as draft May 2, 2025 15:43

MinhxNguyen7 added 20 commits May 2, 2025 23:24

feat(models): pointnet-based encoder

ab13111

feat: training setup

bccf7e2

chore: remove unused reference code

3e93a32

refactor: reorganize files

6ce9e87

fix: import errors

7f0b102

dev: training debug script

4956a65

fix: >= instead of > assertion

3ce9f69

fix: use MultiInputPolicy instead of MlpPolicy for Dict space

c10014f

fix(PointNetEncoder): device setting

a7c9001

fix: lower buffer size

6060ff1

fix(MinSnapTrajectory): time zero edge case

b6487df

fix(State): location getters

f8d26f3

feat(MinSnapTrajectory): waypoint input validation

0b1b8fc

fix(env): trajectory data shapes

f676341

fix: extract trajectory coefficients for environment

9f47777

fix: extract location of waypoint for edge case

28ca0e0

fix: typo

1445818

fix: reshape particles output of particle filter

029688c

feat(MinSnapTrajectory): optimization failed error

d0ac588

chore: delete leftover file

0527702

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL Path Planning#117

RL Path Planning#117
MinhxNguyen7 wants to merge 69 commits intomainfrom
path-planning

MinhxNguyen7 commented Feb 7, 2025 •

edited

Loading

Uh oh!

EricPedley commented Feb 12, 2025 •

edited

Loading

Uh oh!

MinhxNguyen7 commented Feb 12, 2025 •

edited

Loading

Uh oh!

Dat-Bois commented Feb 12, 2025 •

edited

Loading

Uh oh!

MinhxNguyen7 commented Feb 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MinhxNguyen7 commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Technical Details

State Space

Action Space

Reward Function

Trajectory Cost

Collision Cost

Raw Particles

Gaussian Tracks

Distance/Time to Target

Tasks

Test Plan

Issues

Resources

RL Algorithms

Trajectory Interpolation

Libraries

Point-Cloud Feature Extraction

Uh oh!

EricPedley commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MinhxNguyen7 commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dat-Bois commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MinhxNguyen7 commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MinhxNguyen7 commented Feb 7, 2025 •

edited

Loading

EricPedley commented Feb 12, 2025 •

edited

Loading

MinhxNguyen7 commented Feb 12, 2025 •

edited

Loading

Dat-Bois commented Feb 12, 2025 •

edited

Loading

MinhxNguyen7 commented Feb 13, 2025 •

edited

Loading