The exercise introduces modern approaches to Question Answering using Retrieval Augmented Generation (RAG) with LLMs and vector databases.
Objectives (8 points):
-
Set up the QA environment:
- Install OLLAMA and select an appropriate LLM
- Configure Qdrant vector database (or vector DB of your choosing)
- Install necessary Python packages for embedding generation
-
Find PDF file of your choosing. Example - some publication or CV file:
-
Write next procedures necessary for RAG pipeline. Use LangChain library:
- Load PDF file using
PyPDFLoader
. - Split documents into appropriate chunks using
RecursiveCharacterTextSplitter
. - Generate and store embeddings in Qdrant database
- Load PDF file using
-
Design and implement the RAG pipeline with
LCEL
. As reference use this detailed guide created by LangChain community - RAG. Next steps should involve:- Create query embedding generation
- Implement semantic search in Qdrant
- Design prompt templates for context integration
- Build response generation with the LLM
Hint: You don't need to build it from scratch. A lot of this steps is already automated using LCEL pipeline definition.
- Implement basic retrieval strategies (semantic search).
- Create basic QA prompt.
- Determine 5 evaluation queries:
- Determine a few questions, which answers are confirmed by you.
- Compare performance of RAG vs. pure LLM response.
Questions (2 points):
- How does RAG improve the quality and reliability of LLM responses compared to pure LLM generation?
- What are the key factors affecting RAG performance (chunk size, embedding quality, prompt design)?
- How does the choice of vector database and embedding model impact system performance?
- What are the main challenges in implementing a production-ready RAG system?
- How can the system be improved to handle complex queries requiring multiple document lookups?
- Careful chunk size selection is crucial for relevant context retrieval
- To select LLM for answer generation you can consult LLM leaderboard or Polish LLM Leaderboard.
- To select model for retrieval (embedding generation) you can consul Embedding leaderboard or Polish embedding leaderboard.
- Consider implementing re-ranking of retrieved documents
- Prompt engineering significantly impacts answer quality
- Caching can greatly improve system performance during development
- Consider using metadata filtering to improve retrieval precision
- The choice of embedding model affects both accuracy and speed