Skip to content

Commit 2fa19f2

Browse files
committed
Bug fixes and README update
1 parent 66753d7 commit 2fa19f2

8 files changed

+150
-18
lines changed

Dockerfile_parser_api

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ RUN pip install gradio==4.44.1 requests==2.32.3
2020
# Install cpu version for lightweight Docker Image
2121
RUN pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
2222

23-
ARG CACHEBUST=3
23+
ARG CACHEBUST=4
2424
RUN pip install git+https://github.com/bibekyess/FastRAG.git
2525

2626
COPY fastrag/api.py /app/api.py

README.md

+96-4
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,103 @@
11
# FastRAG
2-
A simple RAG application that is optimized to run fast on general grade PCs
32

3+
FastRAG is a simple Retrieval-Augmented Generation (RAG) application optimized for fast performance on general-grade PCs. It provides a chatbot interface that leverages vector-based search and large language models (LLMs) for answering questions and interacting with document-based data.
44

5-
# TODO
6-
- [ ] Make stubbs test
5+
---
76

7+
### 🚀 Launch API and Demo Locally
88

9-
`pip install llama-index-embeddings-huggingface` may install unnecessary nvidia-cuda libraries be careful to install cpuonly stuffs
9+
To get started with FastRAG locally, follow these steps:
1010

11+
1. Clone the repository:
12+
```bash
13+
git clone https://github.com/bibekyess/FastRAG.git
14+
```
15+
16+
2. Navigate to the project directory:
17+
```bash
18+
cd FastRAG
19+
```
20+
21+
3. Build and launch the containers:
22+
```bash
23+
docker compose up --build
24+
```
25+
26+
This will start the FastRAG API and demo with all necessary services.
27+
28+
---
29+
30+
### 🛠️ API Endpoints
31+
32+
The FastRAG application launches several API endpoints for different purposes:
33+
34+
1. **Get Conversation History**
35+
- **Method**: `GET`
36+
- **Endpoint**: `/conversation-history`
37+
- **Parameters**:
38+
- `collection_name` (str): Name of the collection to fetch history from.
39+
- `limit` (int): Number of history entries to return. Default is 10.
40+
41+
2. **Add to Conversation History**
42+
- **Method**: `POST`
43+
- **Endpoint**: `/conversation-history`
44+
- **Body**:
45+
- `collection_name` (str): Name of the collection to fetch history from.
46+
- `query` (str): User input query
47+
- `response_text` (str): AI response
48+
49+
3. **Parse Document**
50+
- **Method**: `POST`
51+
- **Endpoint**: `/parse`
52+
- **Parameters**:
53+
- `file` (UploadFile): The document to be parsed.
54+
- `index_id` (str): Index name for the document. Default is `files`.
55+
- `splitting_type` (Literal['raw', 'md']): Splitting type for the document. Default is `raw` (based on chunk settings).
56+
57+
4. **Chat with the Bot**
58+
- **Method**: `POST`
59+
- **Endpoint**: `/chat`
60+
- **Body**:
61+
- `user_input` (str): The user's query.
62+
- `index_id` (str): The index to search. Default is `"files"`.
63+
- `llm_text` (str): The LLM model to use. Default is `"local"`.
64+
- `dense_top_k` (int): The number of top results to return from the vector search. Default is 5.
65+
- `upgrade_user_input` (bool): Flag to indicate whether to upgrade the user input from conversation history. Default is `False`.
66+
- `stream` (bool): Flag to enable streaming of results. Default is `True`.
67+
68+
69+
### 🖥️ User Interface
70+
71+
- **Gradio UI**: FastRAG features a simple Gradio-based user interface for interacting with the chatbot.
72+
- **Real-time Chat**: Users can upload a document and ask questions in real-time, with previous conversations stored and utilized for context-based improvements. [Providing the option to upload document is in progress]
73+
74+
---
75+
76+
### 🗃️ Storage and Database
77+
78+
- **QdrantDB**: The vector embeddings and chatbot conversation history are stored in QdrantDB. This allows the chatbot to utilize previous conversation context for improved responses.
79+
---
80+
81+
### ⚡ Model Backend
82+
83+
- **Model**: [bartowski/Llama-3.2-3B-Instruct-GGUF](https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF)
84+
85+
---
86+
87+
### ⏱️ Latency Tracking
88+
89+
- **UI Display**: Latency of the chatbot's response is displayed in the Gradio interface.
90+
- **Logging**: Detailed logs of latency and other events are saved for debugging and performance monitoring.
91+
92+
---
93+
94+
95+
### 🧾 Document Parsing Options
96+
97+
FastRAG offers multiple options for segmenting documents into chunks:
98+
99+
1. **Raw Format**: This option allows experimenting with various chunk sizes, strides, and overlapping settings for raw text parsing.
100+
2. **Markdown Format**: This method segments the document based on semantic information, creating more context-aware chunks.
101+
102+
---
11103

__init__.py

Whitespace-only changes.

docker-compose.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@ services:
1010
environment:
1111
- PARSER_API_URL=http://parser-api:8090/chat
1212
- CONVERSATION_HISTORY_URL=http://parser-api:8090/conversation-history
13-
13+
volumes:
14+
- ./logs:/app/logs
1415

1516
parser-api:
1617
build:

fastrag/api.py

+7-7
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@
2727
logger.addHandler(file_handler)
2828

2929
# Set up stream handler to print logs to the terminal
30-
stream_handler = logging.StreamHandler()
31-
stream_handler.setLevel(logging.INFO)
32-
stream_handler.setFormatter(formatter)
33-
logger.addHandler(stream_handler)
30+
# stream_handler = logging.StreamHandler()
31+
# stream_handler.setLevel(logging.INFO)
32+
# stream_handler.setFormatter(formatter)
33+
# logger.addHandler(stream_handler)
3434

3535

3636
qdrant_url = os.getenv("QDRANT_URL", "http://0.0.0.0:6333")
@@ -154,7 +154,7 @@ async def parse(file: UploadFile = File(...), index_id: str="files", splitting_t
154154
return base_retriever.add_documents_to_index(documents=documents, index_id=index_id)
155155

156156

157-
def llamacpp_inference(prompt, n_predict=128, temperature=0.7, top_p=0.95, stop=None, stream=True):
157+
def llamacpp_inference(prompt, n_predict=512, temperature=0.7, top_p=0.95, stop=None, stream=True):
158158
url = os.getenv("LLAMACPP_URL", "http://localhost:8088/completion")
159159

160160
payload = {
@@ -220,11 +220,11 @@ async def chat(request: ChatRequest):
220220
passed_llm_prompt = LLM_PROMPT.format(context_str=context_str, query_str=user_input)
221221
logger.info(f"passed llm prompt: {str(passed_llm_prompt)}")
222222
if stream:
223-
streamer = llamacpp_inference(passed_llm_prompt, n_predict=200, temperature=0.3, stream=stream)
223+
streamer = llamacpp_inference(passed_llm_prompt, n_predict=512, temperature=0.4, stream=stream)
224224

225225
return StreamingResponse(streamer, media_type="text/plain")
226226
else:
227-
response = llamacpp_inference(passed_llm_prompt, n_predict=200, temperature=0.3, stream=stream)
227+
response = llamacpp_inference(passed_llm_prompt, n_predict=512, temperature=0.4, stream=stream)
228228
return {'response': response}
229229

230230

fastrag/demo_ui.py

+20
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,27 @@
44
import logging
55
import time
66
from time import perf_counter
7+
from datetime import datetime
8+
79

810
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
911
logger = logging.getLogger(__name__)
1012

1113

14+
def log_to_file(question, response, latency):
15+
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
16+
17+
log_entry = f"Timestamp: {timestamp}\n"
18+
log_entry += f"Question: {question}\n"
19+
log_entry += f"Response: {response}\n"
20+
log_entry += f"Latency: {latency:.4f} seconds\n"
21+
log_entry += "-" * 50 + "\n"
22+
23+
# append mode
24+
with open("./logs/chatbot_log.txt", "a") as file:
25+
file.write(log_entry)
26+
27+
1228
def call_chat_api(user_input):
1329
url = os.getenv("PARSER_API_URL", "http://localhost:8080/chat")
1430
headers = {
@@ -20,6 +36,7 @@ def call_chat_api(user_input):
2036
"index_id": "files",
2137
"llm_text": "local",
2238
"dense_top_k": 4,
39+
"upgrade_user_input": True,
2340
"stream": True
2441
}
2542

@@ -52,6 +69,9 @@ def chat(chatbot_history):
5269
elapsed_time = end_time-start_time
5370
logger.info(f"Chat API executed in {elapsed_time:.4f} seconds")
5471

72+
# Log the results to a .txt file
73+
log_to_file(query, response_text, elapsed_time)
74+
5575
yield chatbot_history, f"## Latency of Last Response: {elapsed_time:.4f} seconds"
5676

5777

fastrag/notebooks/qdrant_sandbox.ipynb

+21-1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,21 @@
3131
"qdrant_database.create_collection(collection_name)"
3232
]
3333
},
34+
{
35+
"cell_type": "code",
36+
"execution_count": null,
37+
"metadata": {},
38+
"outputs": [],
39+
"source": [
40+
"\n",
41+
"\n",
42+
"# qdrant_database.qdrant_client.scroll(\n",
43+
"# collection_name=collection_name,\n",
44+
"# limit=100000,\n",
45+
"# order_by=\"id\"\n",
46+
"# )"
47+
]
48+
},
3449
{
3550
"cell_type": "code",
3651
"execution_count": null,
@@ -40,14 +55,19 @@
4055
"qdrant_database.load_recent_responses(collection_name=collection_name, limit=20)"
4156
]
4257
},
58+
{
59+
"cell_type": "markdown",
60+
"metadata": {},
61+
"source": []
62+
},
4363
{
4464
"cell_type": "code",
4565
"execution_count": null,
4666
"metadata": {},
4767
"outputs": [],
4868
"source": [
4969
"# qdrant_database.add_response(collection_name, \"what are you doing?\", \"I am studying\")\n",
50-
"qdrant_database.add_response(collection_name, \"tell me about Bib\", \"It is a country.\")"
70+
"qdrant_database.add_response(collection_name, \"tell me about Bib\", \"It is a boy.\")"
5171
]
5272
},
5373
{

fastrag/utilities/qdrant_database.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -48,11 +48,10 @@ def add_response(self, collection_name, query, response_text):
4848
def load_recent_responses(self, collection_name, limit: int=10):
4949
search_results = self.qdrant_client.scroll(
5050
collection_name = collection_name,
51+
limit=10e16, # Retrieve every points
5152
with_payload=True
52-
)[0]
53-
54-
print(len(search_results))
55-
53+
)
54+
search_results = search_results[0]
5655
return [response.payload for response in search_results[-limit:]]
5756

5857
def delete_collection(self, collection_name):

0 commit comments

Comments
 (0)