FW-MAV Optimal Control: BBO & Deep Reinforcement Learning

Welcome to the repository for my advanced aerospace control systems project. This research focuses on bridging the gap between traditional optimization and modern AI by designing optimal flight controllers for Flapping Wing Micro Air Vehicles (FW-MAV).

Part 1: The Problem Statement

Objective To design an optimal flight controller for an FW-MAV using both Black Box Optimization (BBO) and Deep Reinforcement Learning (DRL). The drone must navigate from an origin point A, where $\mathbf{x}(0) = [0, 0]$, to a target point B, where $\mathbf{x}_B = [20, 15]$ m.

Environmental Dynamics & Constraints

The Drone: Point-mass of $m = 10$ g.
Actuation: A propulsive force bounded by $F_P \in [0, 0.7]$ N, which can be vectored at an angle $\alpha \in [-180^\circ, 180^\circ]$. The propulsive force vector is defined as:

$$\mathbf{F}_P = F_P [\sin(\alpha), \cos(\alpha)]^T$$

External Forces: Gravity ($g = 9.81 \text{ m/s}^2$) and a constant horizontal wind of $U_\infty = 10 \text{ m/s}$. The gravity vector is:

$$\mathbf{F}_G = [0, -mg]$$

Aerodynamics: Drag is proportional to the square root of the relative velocity between the drone and the wind. Given $c_D = 0.001 \text{ N s}^2/\text{m}^2$ and wind vector $\mathbf{v}w = [U\infty, 0]$, the drag force is:

$$\mathbf{F}_D = c_D (\mathbf{v}_w - \mathbf{v}(t)) |\mathbf{v}_w - \mathbf{v}(t)|$$

System Dynamics: The flight path is governed by the following second-order differential equation:

$$m \frac{d^2\mathbf{x}}{dt^2} = \mathbf{F}_P + \mathbf{F}_G + \mathbf{F}_D$$

The Goal The flight policy must balance three competing objectives: reach the target as fast as possible, consume the minimum amount of energy, and arrive with a near-zero velocity to ensure a safe landing.

Part 2: The 5-Step Execution Plan

Step 1: Simulating the Physical Environment

Before applying any intelligent control, a rigorous physical simulation must be established.

Action: Implement the system of Ordinary Differential Equations (ODEs) defining the drone's kinematics in Python using scipy.integrate. Validation: Prove the environment is accurate by testing static control limits, such as verifying terminal vertical velocities against theoretical models.

Step 2: Formulating the Reward Function

In optimal control and RL, the agent learns entirely through the reward signal.

Action: Define a continuous cost function that penalizes energy consumption, long flight times, distance to the target, and high landing velocities. The reward function $R(\mathbf{w})$ is defined over episode duration $T$ as:

$$R(\mathbf{w}) = -c_1 \int_0^T F_p^2(t) dt - c_2 t_F^2 - c_3 ||\mathbf{x}(T) - \mathbf{x}_B||_2^2 - c_4 v_F$$

Challenge: Delicately balancing the penalty coefficients ($c_1, c_2, c_3, c_4$) to prevent boundary-case behaviors, for example, crashing into the target to maximize speed.

Step 3: Baseline Solution via Black Box Optimization (BBO)

To benchmark the Neural Network, the problem is first solved using a parameterized approach.

Action: Define a polynomial control policy where the thrust output is a function of the drone's distance to the target $\mathbf{x}_b$:

$$\mathbf{F}_p = w_1(\mathbf{x}(t) - \mathbf{x}_b) + w_2(\mathbf{x}(t) - \mathbf{x}_b)^2 + w_3(\mathbf{x}(t) - \mathbf{x}_b)^3 + \dots$$

Optimization: Use gradient-free algorithms (like Nelder-Mead) or gradient-based methods (like BFGS computed via Finite Differences) to find the static polynomial weights ($\mathbf{w}$) that yield the highest reward.

Step 4: Advanced Control via Deep Reinforcement Learning (DRL)

The controller is then upgraded from a static equation to an adaptable Neural Network.

Action: Wrap the validated physics simulator into a standard Gymnasium environment. Algorithm: Train an agent using Proximal Policy Optimization (PPO). PPO is chosen because it is highly stable and explicitly designed to handle continuous action spaces.

Step 5: Results and Trajectory Analysis

The final evaluation compares the two approaches.

Action: Plot the flight trajectories generated by the BBO policy versus the DRL policy on a 2D graph. These outputs will be saved in the Figures folder, starting with Figure_10.png for this specific analysis block. Metrics: Compare the two controllers based on flight time ($t_F$), total energy consumed ($\int F_p^2 dt$), and terminal velocity ($v_F$) to determine the superior flight envelope.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FW-MAV Optimal Control: BBO & Deep Reinforcement Learning

Part 1: The Problem Statement

Part 2: The 5-Step Execution Plan

Step 1: Simulating the Physical Environment

Step 2: Formulating the Reward Function

Step 3: Baseline Solution via Black Box Optimization (BBO)

Step 4: Advanced Control via Deep Reinforcement Learning (DRL)

Step 5: Results and Trajectory Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FW-MAV Optimal Control: BBO & Deep Reinforcement Learning

Part 1: The Problem Statement

Part 2: The 5-Step Execution Plan

Step 1: Simulating the Physical Environment

Step 2: Formulating the Reward Function

Step 3: Baseline Solution via Black Box Optimization (BBO)

Step 4: Advanced Control via Deep Reinforcement Learning (DRL)

Step 5: Results and Trajectory Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages