ProjectQuora

Dànae Canillas Sánchez & Xavier Rubiés Cullell

In this project, a study of the dataset provided by Quora in its Kaggle competition has been carried out in order to detect duplicate questions. We will fine-tune pretrained transformer models from the Hugging Face library. We will present the results for different models (BERT, XLNet, DistilBERT, ...) and different hyperparameter combinations that have been used. Finally, we will explore sentence embedding meaning.

The dataset is taken from Quora competition at Kaggle:
https://www.kaggle.com/c/quora-question-pairs

plots: Folder that contains the plots generated in class_visualization.ipynb

2d_pca.html

2d_tsne.html

3d_pca.html

3d_tsne.html

report: Deliverables

imgs: Images contained in POE_Final_Project_Quora_CanillasRubies.pdf

Hyperparameters_Study.pdf: Table of the hyperparameters experiments

POE_Final_Project_Quora_CanillasRubies.pdf: Deliverable report

POE_Initial_Plan.pdf: First deliverable

Presentacio-XavierDanae.pdf: Intermediate project presentation

src: Folder containing script files

data: CVS files

train.csv: Raw data

sentences.csv: Table with questions and tokenizations (from BERT)

class-consistency.ipynb: Prediction consistency study

class_visualization.ipynb: Generates plots

data_analysis.ipynb: Data inference

input_net.py: Generates the model input

main.py: Model training and validation

most_similar_sentence.ipynb: Most similar sentence search

table_generation.ipynb: Generates sentences.csv

utils.py: Contains auxiliary functions

.gitignore : Untracked files
README.md: Project Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ProjectQuora

Dànae Canillas Sánchez & Xavier Rubiés Cullell

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ProjectQuora

Dànae Canillas Sánchez & Xavier Rubiés Cullell