Welcome to the repository for my advanced aerospace control systems project. This research focuses on bridging the gap between traditional optimization and modern AI by designing optimal flight controllers for Flapping Wing Micro Air Vehicles (FW-MAV).
Objective
To design an optimal flight controller for an FW-MAV using both Black Box Optimization (BBO) and Deep Reinforcement Learning (DRL). The drone must navigate from an origin point A, where
Environmental Dynamics & Constraints
-
The Drone: Point-mass of
$m = 10$ g. -
Actuation: A propulsive force bounded by
$F_P \in [0, 0.7]$ N, which can be vectored at an angle$\alpha \in [-180^\circ, 180^\circ]$ . The propulsive force vector is defined as:
-
External Forces: Gravity (
$g = 9.81 \text{ m/s}^2$ ) and a constant horizontal wind of$U_\infty = 10 \text{ m/s}$ . The gravity vector is:
-
Aerodynamics: Drag is proportional to the square root of the relative velocity between the drone and the wind. Given
$c_D = 0.001 \text{ N s}^2/\text{m}^2$ and wind vector $\mathbf{v}w = [U\infty, 0]$, the drag force is:
- System Dynamics: The flight path is governed by the following second-order differential equation:
The Goal The flight policy must balance three competing objectives: reach the target as fast as possible, consume the minimum amount of energy, and arrive with a near-zero velocity to ensure a safe landing.
Before applying any intelligent control, a rigorous physical simulation must be established.
Action: Implement the system of Ordinary Differential Equations (ODEs) defining the drone's kinematics in Python using scipy.integrate.
Validation: Prove the environment is accurate by testing static control limits, such as verifying terminal vertical velocities against theoretical models.
In optimal control and RL, the agent learns entirely through the reward signal.
Action: Define a continuous cost function that penalizes energy consumption, long flight times, distance to the target, and high landing velocities. The reward function
Challenge: Delicately balancing the penalty coefficients (
To benchmark the Neural Network, the problem is first solved using a parameterized approach.
Action: Define a polynomial control policy where the thrust output is a function of the drone's distance to the target
Optimization: Use gradient-free algorithms (like Nelder-Mead) or gradient-based methods (like BFGS computed via Finite Differences) to find the static polynomial weights (
The controller is then upgraded from a static equation to an adaptable Neural Network.
Action: Wrap the validated physics simulator into a standard Gymnasium environment.
Algorithm: Train an agent using Proximal Policy Optimization (PPO). PPO is chosen because it is highly stable and explicitly designed to handle continuous action spaces.
The final evaluation compares the two approaches.
Action: Plot the flight trajectories generated by the BBO policy versus the DRL policy on a 2D graph. These outputs will be saved in the Figures folder, starting with Figure_10.png for this specific analysis block.
Metrics: Compare the two controllers based on flight time (
