Skip to content

Latest commit

 

History

History

task-2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Task 2

A FastAPI-based Retrieval-Augmented Generation (RAG) service that combines document retrieval with text generation.

Step 1:

  1. Create a conda environment with the requirements.txt file

TIP: Check this example for how to use slurm to create a conda environment.

conda create -n rag python=3.10 -y
conda activate rag
git clone https://github.com/ed-aisys/edin-mls-25-spring.git
cd edin-mls-25-spring/task-2
pip install -r requirements.txt
  1. Run the service
python serving_rag.py
  1. Test the service
curl -X POST "http://localhost:8000/rag" -H "Content-Type: application/json" -d '{"query": "Which animals can hover in the air?"}'

Note:
If you encounter issues while downloading model checkpoints on a GPU machine, try the following workaround:

  1. Manually download the model on the host machine:
conda activate rag
huggingface-cli download <model_name>

Step 2:

  1. Create a new script (bash or python) to test the service with different request rates. A reference implementation is TraceStorm

Step 3:

  1. Implement a request queue to handle concurrent requests

A potential design: Create a request queue Put incoming requests into the queue, instead of directly processing them Start a background thread that listens on the request queue

  1. Implement a batch processing mechanism

Take up to MAX_BATCH_SIZE requests from the queue or wait until MAX_WAITING_TIME Process the batched requests

  1. Measure the performance of the optimized system compared to the original service

  2. Draw a conclusion