Analyzing and overcoming the curse of dimensionality and exploring various gradient descent techniques with implementations in R
This project was completed in partial fulfilment of the course FOUNDATIONS OF DATA SCIENCE (CS F320) offered during Second Semester 2019-20 at BITS Pilani, Pilani Campus.
The aim of this project was to analyse high dimensional data (HDD) by identifying and implementing techniques to mitigate the problems occuring with HDD. The subsequent part of the project dealt with common issues with Gradient Descent and ways to overcome those issues.
In this project we first looked at various problems related to high dimensional data which include:
The proposed solutions consist of:
These solutions were implemented in R and the results obtained clearly indicate their efficacy in overcoming all the problems.
We then shifted our focus to Gradient Descent and realized that we face problems where the objective function gets stuck at a local minima for a non-convex loss function or that GD runs slow for larger datasets. We also observed that the learning rate should be chosen suitably.
A few successful variations of GD exist to overcome these issues such as:
We saw that the cost is often highly sensitive to some directions in parameter space and insensitive to others. The momentum algorithm can mitigate these issues somewhat, but does so at the expense of introducing another hyperparameter.
We then discussed a number of incremental (or mini-batch-based) methods that adapt the learning rates of model parameters. These include:
Having implemented these solutions in R, the results suggest that the family of algorithms with adaptive learning rates performed fairly robustly and emerged as clear winners.
The choice of which algorithm to use, at this point, seems to depend largely on the user's familiarity with the algorithm (for ease of hyperparameter tuning).
Please read the report as it contains detailed information about the datasets used, methods implemented, results obtained and further discussion.
For any doubts don't hesitate to contact me at [email protected]
If you find our work helpful, do not forget to ⭐ the repository!