Authors:
- Kaarel Kõomägi
- Rainer Vana
- Jaan Otter
We picked the flight delays dataset because it was the most interesting dataset we could find. We wanted the project to be fun but challenging so we picked one of the largest datasets we could find.
The objective of this project is to explore machine learning algorithms in depth by developing a robust predictive model. The goal is to accurately determine the likelihood of a flight delay and estimate the expected duration of the delay based on various influencing factors.
- Data processing.ipynb (Preprocesses the data.)
- Machine_Learning.ipynb (Creating the models and testing them.)
- EDA.ipynb (Data deep dive.)
- Download the dataset from https://www.kaggle.com/datasets/yuanyuwendymu/airline-delay-and-cancellation-data-2009-2018.
- Fetch the 2013.csv and put it in the root directory.
- Open up "Data processing.csv" and run all the cells.
- Next open up "EDA.ipynb" and run all the cells.
- Lastly open up "Machine_Learning.ipynb" and run all the cells.
- The last cells of the given .ipynb show how well the models performed and some graphs.
The order of steps 4 through 6 doesn't matter, "Data processing.csv" needs to be ran first as it formats the dataset for the other notebooks.