Deep Natural Language Processing Project: Topic Modeling

This repository focuses on topic modeling techniques that leverage BERT-based keyword extraction. We explore three main approaches:

Domain Adaptation – Applying keyword extraction in a specialized domain (e.g., agriculture).
Multilingual Extension – Handling documents in the Greek language.
NER-based Preprocessing – Using Named Entity Recognition to filter key entities before extracting keywords.

Repository Structure

domain_adaptation/ covers the agriculture-domain adaptation approach.
multilingual/ includes all code for multilingual (Greek) modeling.
ner_preprocessing/ implements NER-based entity filtering.
utils/ has utility scripts for logging, helper functions, etc.

How to Run

Install Dependencies

pip install spacy nltk scikit-learn requests
pip install torch sentence-transformers keybert thefuzz
python -m spacy download el_core_news_sm

Running via main.py

We provide a single entry point in main.py that accepts a parameter specifying which approach to run:
```
python main.py --approach domain
```
Runs the domain adaptation pipeline.
```
python main.py --approach multilingual
```
Runs the multilingual (Greek) pipeline.
```
python main.py --approach ner
```
Runs the NER-based preprocessing pipeline.

Inside main.py, these commands map to the corresponding scripts in their respective folders.

Results

During each run, the code may generate:

Logs: Training and validation logs for model performance tracking.
Metrics: Precision, Recall, F1 scores for keyword extraction.
Comparison: We compare the final results (baseline vs. extended approaches) in our final report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Natural Language Processing Project: Topic Modeling

Repository Structure

How to Run

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
domain_adaptation		domain_adaptation
multilingual		multilingual
ner_preprocessing		ner_preprocessing
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py

License

Marileni/Keyword-extraction-with-BERT

Folders and files

Latest commit

History

Repository files navigation

Deep Natural Language Processing Project: Topic Modeling

Repository Structure

How to Run

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages