This repository is the final project of NTU 2023 Spring Database Management System – from SQL to NoSQL. Our group members include 林天行, 黃千睿, 黃品翰, 王雅茵, 王睿謙.
In this project, we propose an extended MongoDB platform which not only support origina MongoDB queries, but also can perform
- Data Exploration
- Data Preprocessing
- Machine Learning
-
Create a virtual environment (Optional):
conda create --name mongodb conda activate mongodb
-
Install packages
pip install -r requirements.txt
-
Run MongoDB at
localhost:27017
and import data set beforehand (you can modify the file path indata_init.py
to import your own data )(Note: The train/test collections we provided are meant to predict 'Danceability', the label column of this dataset. Error may occur in the attempt to predict other columns since the test collection doesn't contain 'Danceability' column.)
python3 backend/data_init.py
-
Start server on a machine
python3 backend/server.py
-
Start client
For mac users:
brew install yarn brew install node cd frontend/ yarn install yarn start
For Linux users:
sudo apt-get install nodejs sudo apt install npm npm install --global yarn cd frontend/ yarn install yarn start
For Window users:
please don't use Window or use WSL2
-
Users send request to a remote server (the mongodb service is on this remote server).
-
The remote server will perform the users' request on data from MongoDB.
-
The remote server sends the result back to the client.
We can deploy all the machine learning models on remote server, and use the computing resources of server to process datasets and train model. Client side just need to send request and receive result from server.
Also, We design an UI interface for clients to decide what kind of operations they want to perform and how they want to perform. This way, even users without machine learning or computer science knowledge can easliy do data exploration, data preprocessing, and machine learning by just clicking on UI interface instead of using command line.
There are two parts of this project: the original MongoDB part and the extended MongoDB part.
Original MongoDB implementation:
Extended MongoDB implementation:
- Original MongoDB queries:
-
Data Exploration: show feature distributions
-
Data Exploration: show missing values
-
Machine Learning:
Furthermore, our UI will list the available databases, collections, and predict columns users can choose:
Show databases list:
Show collections list:
Show predict columns list: