Skip to content

Marileni/Keyword-extraction-with-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Natural Language Processing Project: Topic Modeling

This repository focuses on topic modeling techniques that leverage BERT-based keyword extraction. We explore three main approaches:

  1. Domain Adaptation – Applying keyword extraction in a specialized domain (e.g., agriculture).
  2. Multilingual Extension – Handling documents in the Greek language.
  3. NER-based Preprocessing – Using Named Entity Recognition to filter key entities before extracting keywords.

Repository Structure

  • domain_adaptation/ covers the agriculture-domain adaptation approach.
  • multilingual/ includes all code for multilingual (Greek) modeling.
  • ner_preprocessing/ implements NER-based entity filtering.
  • utils/ has utility scripts for logging, helper functions, etc.

How to Run

  1. Install Dependencies

    pip install spacy nltk scikit-learn requests
    pip install torch sentence-transformers keybert thefuzz
    python -m spacy download el_core_news_sm
    
  2. Running via main.py

    We provide a single entry point in main.py that accepts a parameter specifying which approach to run:

    python main.py --approach domain

    Runs the domain adaptation pipeline.

    python main.py --approach multilingual

    Runs the multilingual (Greek) pipeline.

    python main.py --approach ner

    Runs the NER-based preprocessing pipeline.

    Inside main.py, these commands map to the corresponding scripts in their respective folders.

Results

During each run, the code may generate:

  • Logs: Training and validation logs for model performance tracking.
  • Metrics: Precision, Recall, F1 scores for keyword extraction.
  • Comparison: We compare the final results (baseline vs. extended approaches) in our final report.

About

Deep Natural Language Processing Project: Topic Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages