Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResponseError for using Llama model to evaluate #1539

Closed
RyanTree-HS opened this issue Oct 19, 2024 · 3 comments
Closed

ResponseError for using Llama model to evaluate #1539

RyanTree-HS opened this issue Oct 19, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@RyanTree-HS
Copy link

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
A clear and concise description of what the bug is.

https://www.youtube.com/watch?v=Ts2wDG6OEko&t=190s

This is the reference link and I copy exactly the same code to run, and Ollama run Llama 3 is working, but when going into evalute function, it gives :

Evaluating: 0%| | 0/2 [00:00<?, ?it/s]Exception raised in Job[0]: ResponseError()
Evaluating: 50%|█████████████████████████████████████████████████████████ | 1/2 [02:00<02:00, 120.95s/it]Exception raised in Job[1]: ResponseError()
Evaluating: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [02:11<00:00, 65.58s/it]

Code to Reproduce:

from datasets import load_dataset

dataset = load_dataset("explodinggradients/amnesty_qa","english_v2",trust_remote_code=True)
dataset_subset = dataset["eval"].select(range(2))

llm = ChatOllama(model="llama3")
embedding_model = OllamaEmbeddings(model="llama3")

result = evaluate(
dataset=dataset_subset,
llm=llm,
embeddings=embedding_model,
metrics=[
context_precision
],
run_config=RunConfig(timeout=600.0, max_workers=128)
)

Ragas version:
Python version:

Code to Reproduce
Share code to reproduce the issue

Error trace

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@RyanTree-HS RyanTree-HS added the bug Something isn't working label Oct 19, 2024
Copy link

dosubot bot commented Oct 19, 2024

The ResponseError() you're encountering during the evaluation phase with the Llama 3 model using ChatOllama and OllamaEmbeddings could be due to several factors, although specific causes aren't detailed in the Ragas documentation. Here are some potential areas to investigate:

  1. API Rate Limits: Ensure that the API you're using for the Llama 3 model isn't being rate-limited. The RunConfig settings with max_workers=128 might be too high, leading to too many concurrent requests. Try reducing max_workers to a lower number, such as 16 or 32, to see if this resolves the issue.

  2. Timeout Settings: The timeout parameter is set to 600 seconds, which is quite high. While this allows for long-running requests, it might not be necessary. Consider reducing the timeout to a more typical value, like 180 seconds, to see if it affects the error.

  3. Error Handling: Check if there are any specific error messages or logs associated with the ResponseError(). These might provide more insight into what is causing the error.

  4. Dataset Compatibility: Although there are no known compatibility issues between the explodinggradients/amnesty_qa dataset and the Llama 3 model, ensure that the dataset is correctly formatted and compatible with the evaluation function.

  5. Model and Embedding Configuration: Double-check the configuration and initialization of ChatOllama and OllamaEmbeddings to ensure they are set up correctly for the Llama 3 model.

If these suggestions don't resolve the issue, consider isolating parts of the code to identify where the error originates, or consult any available logs for more detailed error information.

To continue talking to Dosu, mention @dosu.

@RyanTree-HS
Copy link
Author

Device 0: NVIDIA GeForce RTX 4090 Laptop GPU, compute capability 8.9, VMM: yes
llm_load_tensors: ggml ctx size = 0.32 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CUDA_Host buffer size = 69.08 MiB
llm_load_tensors: CUDA0 buffer size = 1463.27 MiB
time=2024-10-19T22:29:30.355+08:00 level=INFO source=server.go:632 msg="waiting for server to become available" status="llm server loading model"
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 2560.00 MiB
llama_new_context_with_model: KV self size = 2560.00 MiB, K (f16): 1280.00 MiB, V (f16): 1280.00 MiB
llama_new_context_with_model: CUDA_Host output buffer size = 0.81 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 548.00 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 21.01 MiB
llama_new_context_with_model: graph nodes = 1095
llama_new_context_with_model: graph splits = 2
INFO [wmain] model loaded | tid="29352" timestamp=1729348170
time=2024-10-19T22:29:30.856+08:00 level=INFO source=server.go:637 msg="llama runner started in 0.75 seconds"
[GIN] 2024/10/19 - 22:29:31 | 200 | 1.195007s | 127.0.0.1 | POST "/api/generate"
[GIN] 2024/10/19 - 22:32:11 | 200 | 0s | 127.0.0.1 | GET "/"

@jjmachan
Copy link
Member

this is a duplicate of #1170 so I'm closing this for now
right now ragas does not work well with ollama but people have got it working too. I'll take a deeper look this week and hopefull solve it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants