This project was developed in the context of an LLM training module, as a practical exercise in building an assistant tool for Human Resources (RH) using RAG (Retrieval-Augmented Generation) techniques.
It provides a pipeline to:
- Build a RAG from HR-related data,
- Query the assistant with natural language,
- Interact via a simple interface,
- Evaluate the quality of answers against a dataset.
The project uses uv for environment and dependency management.
-
Clone the repository and switch to the organized branch:
git clone https://github.com/DidiCi/Projet_assistantRH_LLM.git cd Projet_assistantRH_LLM -
Install the environment with uv:
uv sync
-
Set up configuration:
- Obtain a Google API key and save it in a
.envfile at the project root:GOOGLE_API_KEY=your_api_key_here - Input/output folders and other options can be configured in
rag/config.py.
- Obtain a Google API key and save it in a
-
Prepare the data:
- Place your CV files (the documents in PDF format to be analyzed) in:
data/raw/
- Place your CV files (the documents in PDF format to be analyzed) in:
Create the RAG:
uv run python rag/main.pyYou can also ask a direct question when running it:
uv run python rag/main.py --question "Qui parle italien?"Once the RAG is created, launch the Streamlit interface:
uv run streamlit run app/interface.pyThis provides a user-friendly way to interact with the assistant.
The project includes tools to evaluate the RAG’s answers.
-
Define your test set by editing:
evaluation/evaluation_dataset.jsonAdd questions and their expected answers.
-
Run the evaluation pipeline:
uv run python evaluation/evaluation_llm.py uv run python evaluation/evaluation_score.py
This will generate scores and metrics about the assistant’s accuracy and relevance.
| Path | Description |
|---|---|
app/ |
Streamlit interface for interacting with the assistant. |
rag/ |
Core RAG implementation (retrieval, embeddings, pipeline). |
rag/config.py |
Configuration file for input/output folders and settings. |
data/raw/ |
Folder where input CVs must be placed. |
evaluation/ |
Scripts and datasets for evaluating RAG answers. |
evaluation/evaluation_dataset.json |
JSON dataset of questions & answers for evaluation. |
.env |
Must contain the Google API key. |
pyproject.toml, uv.lock |
Project dependencies managed by uv. |
- uv
- Python (version specified in
.python-version) - Google API key
Dependencies are automatically installed via uv sync.
This repository was created as part of an LLM formation module, to practice:
- Using RAG for domain-specific assistants,
- Managing configurations and pipelines,
- Evaluating model performance systematically,
- Building a minimal interactive application.