This project demonstrates Retrieval-Augmented Generation (RAG) using Embabel Agent with Apache Lucene for vector storage and Spring Shell for interaction.
API Key: Set at least one LLM provider API key as an environment variable:
# For OpenAI (GPT models)
export OPENAI_API_KEY=sk-...
# For Anthropic (Claude models)
export ANTHROPIC_API_KEY=sk-ant-...The model configured in application.yml determines which key is required. The default configuration uses OpenAI.
Java: Java 21+ is required.
- Set your API key (see above)
- Run the shell:
./scripts/shell.sh
- Ingest a document:
ingest - Start chatting:
chat
Run the shell script to start Embabel under Spring Shell:
./scripts/shell.shYou can also run the main class, com.embabel.examples.ragbot.RagShellApplication, directly from your IDE.
| Command | Description |
|---|---|
ingest [url] |
Ingest a URL into the RAG store. Uses Apache Tika to parse content hierarchically and chunks it for vector storage. Default URL is the text of the recent Australia Social Media ban for under 16s. Documents are only ingested if they don't already exist. |
zap |
Clear all documents from the Lucene index. Returns the count of deleted documents. |
chunks |
Display all stored chunks with their IDs and content. Useful for debugging what content has been indexed. |
chat |
Start an interactive chat session where you can ask questions about ingested content. |
# Start the shell
./scripts/shell.sh
# Ingest a document
ingest https://example.com/document
# View what was indexed
chunks
# Chat with the RAG-powered assistant
chat
> What does this document say about X?
# Clear the index when done
zap┌─────────────────────────────────────────────────────────────────────────────┐
│ Spring Shell │
│ │
│ > chat │
│ > What penalties apply to social media platforms? │
└─────────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ AgentProcess │
│ │
│ Starts when chat begins. Manages conversation state and action dispatch. │
│ Listens for triggers (UserMessage) and invokes matching @Action methods. │
└─────────────────────────────────────┬───────────────────────────────────────┘
│ UserMessage triggers
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ @Action: ChatActions.respond() │
│ │
│ Fired on each user message. Uses Ai interface to build request: │
│ context.ai() │
│ .withLlm(...) │
│ .withReference(toolishRag) ◄── ToolishRag added as LLM tool │
│ .withTemplate("ragbot") │
│ .respondWithSystemPrompt(conversation, ...) │
└─────────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Ai Interface │
│ │
│ • Renders system prompt from Jinja template │
│ • Packages ToolishRag as tool definition for LLM │
│ • Sends request to LLM provider (OpenAI / Anthropic) │
└─────────────────────────────────────┬───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ LLM (GPT / Claude) │
│ │
│ Receives prompt + tool definitions. Decides to call tools as needed: │
│ │
│ "I need to search for penalty information..." │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Tool Call: vectorSearch("penalties social media platforms") │ │
│ └─────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
└─────────────────────────────────────┼───────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ ToolishRag → LuceneSearchOperations │
│ │
│ • Converts query to embedding vector │
│ • Searches ./.lucene-index for similar chunks │
│ • Returns relevant content to LLM │
└─────────────────────────────────────┬───────────────────────────────────────┘
│
▼
LLM generates final response
using retrieved context
│
▼
Response sent to user
Flow Summary:
- User types
chat→ AgentProcess starts and manages the session - User sends a message → triggers
@Action(trigger = UserMessage.class) - ChatActions.respond() builds request via Ai interface, adding ToolishRag with
.withReference() - Ai packages prompt + tool definitions, sends to LLM
- LLM decides to call a ToolishRag tool to search for relevant content
- The ToolishRag tool queries Lucene index, returns matching chunks to LLM
- LLM generates response using retrieved context → sent back to user
- Loop continues for each new message until user exits
RAG is configured in RagConfiguration.java:
@Bean
LuceneSearchOperations luceneSearchOperations(
ModelProvider modelProvider,
RagbotProperties properties) {
var embeddingService = modelProvider.getEmbeddingService(DefaultModelSelectionCriteria.INSTANCE);
var luceneSearchOperations = LuceneSearchOperations
.withName("docs")
.withEmbeddingService(embeddingService)
.withChunkerConfig(properties.chunkerConfig())
.withIndexPath(Paths.get("./.lucene-index"))
.buildAndLoadChunks();
return luceneSearchOperations;
}Key aspects:
- Lucene with disk persistence: The vector index is stored at
./.lucene-index, surviving application restarts - Embedding service: Uses the configured
ModelProviderto get an embedding service for vectorizing content - Configurable chunking: Content is split into chunks with configurable size (default 800 chars), overlap (default 50 chars), and optional section title inclusion
Chunking properties can be configured via application.yml:
ragbot:
chunker-config:
max-chunk-size: 800
overlap-size: 100The chatbot is created in
ChatConfiguration.java:
@Bean
Chatbot chatbot(AgentPlatform agentPlatform) {
return AgentProcessChatbot.utilityFromPlatform(agentPlatform);
}The AgentProcessChatbot.utilityFromPlatform() method creates a chatbot that automatically discovers all @Action
methods in @EmbabelComponent classes. Any action with a matching trigger becomes eligible to be called when
appropriate messages arrive.
Chat actions are defined in ChatActions.java:
@EmbabelComponent
public class ChatActions {
private final ToolishRag toolishRag;
private final RagbotProperties properties;
public ChatActions(SearchOperations searchOperations, RagbotProperties properties) {
this.toolishRag = new ToolishRag(
"sources",
"Sources for answering user questions",
searchOperations);
this.properties = properties;
}
@Action(canRerun = true, trigger = UserMessage.class)
void respond(Conversation conversation, ActionContext context) {
var assistantMessage = context.ai()
.withLlm(properties.chatLlm())
.withReference(toolishRag)
.withTemplate("ragbot")
.respondWithSystemPrompt(conversation, Map.of(
"properties", properties
));
context.sendMessage(conversation.addMessage(assistantMessage));
}
}Key concepts:
-
@EmbabelComponent: Marks the class as containing agent actions that can be discovered by the platform -
@Actionannotation:trigger = UserMessage.class: This action is invoked whenever aUserMessageis received in the conversationcanRerun = true: The action can be executed multiple times (for each user message)
-
ToolishRagas LLM reference:- Wraps the
SearchOperations(Lucene index) as a tool the LLM can use - When
.withReference(toolishRag)is called, the LLM can search the RAG store to find relevant content - The LLM decides when to use this tool based on the user's question
- Wraps the
-
Response flow:
- User sends a message (triggering the action)
- The action builds an AI request with the RAG reference
- The LLM may call the RAG tool to retrieve relevant chunks
- The LLM generates a response using retrieved context
- The response is added to the conversation and sent back
Chatbot prompts are managed using Jinja templates rather than inline strings. This is best practice for chatbots because:
- Prompts grow complex: Chatbots require detailed system prompts covering persona, guardrails, objectives, and behavior guidelines
- Separation of concerns: Prompt engineering can evolve independently from Java code
- Reusability: Common elements (guardrails, personas) can be shared across different chatbot configurations
- Configuration-driven: Switch personas or objectives via
application.ymlwithout code changes
The template system separates two concerns:
- Objective: What the chatbot should accomplish - the task-specific instructions and domain expertise (e.g., analyzing legal documents, answering technical questions)
- Voice: How the chatbot should communicate - the persona, tone, and style of responses (e.g., formal lawyer, Shakespearean, sarcastic)
This separation allows mixing and matching. You could have a "legal" objective answered in the voice of Shakespeare, Monty Python, or a serious lawyer - without duplicating the legal analysis instructions in each persona template.
src/main/resources/prompts/
├── ragbot.jinja # Main template entry point
├── elements/
│ ├── guardrails.jinja # Safety and content restrictions
│ └── personalization.jinja # Dynamic persona/objective loader
├── personas/ # HOW to communicate (voice/style)
│ ├── clause.jinja # Serious legal expert
│ ├── shakespeare.jinja # Elizabethan style
│ ├── monty_python.jinja # Absurdist humor
│ └── ...
└── objectives/ # WHAT to accomplish (task/domain)
└── legal.jinja # Legal document analysis
The main template ragbot.jinja composes the system prompt from reusable elements:
{% include "elements/guardrails.jinja" %}
{% include "elements/personalization.jinja" %}The personalization.jinja template dynamically includes persona and objective based on configuration:
{% set persona_template = "personas/" ~ properties.voice().persona() ~ ".jinja" %}
{% include persona_template %}
{% set objective_template = "objectives/" ~ properties.objective() ~ ".jinja" %}
{% include objective_template %}Templates are invoked using .withTemplate() and passing bindings:
context.ai()
.withLlm(properties.chatLlm())
.withReference(toolishRag)
.withTemplate("ragbot")
.respondWithSystemPrompt(
conversation,
Map.of(
"properties", properties
));The properties object (a Java record) is accessible in templates. Jinjava supports calling record accessor methods
with properties.voice().persona() syntax for nested records.
To create a new persona, add a .jinja file to prompts/personas/ and reference it by name in application.yml.
See Configuration Reference for all available settings.
All configuration is externalized in application.yml, allowing behavior changes without code modifications.
ragbot:
# RAG chunking settings
chunker-config:
max-chunk-size: 800 # Maximum characters per chunk
overlap-size: 100 # Overlap between chunks for context continuity
# LLM model selection and hyperparameters
chat-llm:
model: gpt-4.1-mini # Model to use for chat responses
temperature: 0.0 # 0.0 = deterministic, higher = more creative
# Voice controls HOW the chatbot communicates
voice:
persona: clause # Which persona template to use (personas/*.jinja)
max-words: 30 # Hint for response length
# Objective controls WHAT the chatbot accomplishes
objective: legal # Which objective template to use (objectives/*.jinja)
embabel:
agent:
shell:
# Redirect logging during chat sessions
redirect-log-to-file: trueWhen redirect-log-to-file: true, console logging is redirected to a file during chat sessions, providing a cleaner
chat experience. Logs are written to:
logs/chat-session.log
To monitor logs while chatting, open a separate terminal and tail the log file:
tail -f logs/chat-session.logThis is useful for debugging RAG retrieval, seeing which chunks are being returned, and monitoring LLM API calls.
To change the chatbot's personality, simply update the persona value:
ragbot:
voice:
persona: shakespeare # Now responds in Elizabethan EnglishTo use a different LLM:
ragbot:
chat-llm:
model: gpt-4.1 # Use the larger GPT-4.1 instead
temperature: 0.7 # More creative responsesNo code changes required - just restart the application.
