Skip to content

Support multi language testset generation #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
grauvictor opened this issue Jan 8, 2024 · 18 comments
Closed

Support multi language testset generation #439

grauvictor opened this issue Jan 8, 2024 · 18 comments
Assignees

Comments

@grauvictor
Copy link
Contributor

It doesn't seem that testset generation supports several types of language.

One solution would be to use an adapt function to translate the generation scripts in the same way as is done for metrics:

def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
assert self.llm is not None, "LLM is not set"
logger.info(f"Adapting Faithfulness metric to {language}")
self.long_form_answer_prompt = self.long_form_answer_prompt.adapt(
language, self.llm, cache_dir
)
self.nli_statements_message = self.nli_statements_message.adapt(
language, self.llm, cache_dir
)

@jjmachan
Copy link
Member

jjmachan commented Jan 8, 2024

yes, we are going to bring that same functionality to testset generation too - I'm glad you brought it up 🙂.
we also had a few ideas in our mind about this as well, could we run them by you sometime your free?

@grauvictor
Copy link
Contributor Author

yes, we are going to bring that same functionality to testset generation too - I'm glad you brought it up 🙂. we also had a few ideas in our mind about this as well, could we run them by you sometime your free?

Thx, of course :)

@Gr33nLight
Copy link

Gr33nLight commented Feb 7, 2024

Hello! I stumbled into the same issue, are there any updates on this, maybe a workaround? Thanks!! @jjmachan @grauvictor
I'm specifically using the generate_with_langchain_docs to generate initial test data.

@drdsgvo
Copy link

drdsgvo commented Feb 12, 2024

I would also be interested in a multi-language feature, as I am working with German texts.
Is there any update on this issue?

@shahules786
Copy link
Member

Hey guys, Yes now this is possible with language adaptation for test generation. Just follow the guide here and enjoy :) Install ragas from the source before doing so.
@Gr33nLight @drdsgvo @grauvictor

@shahules786 shahules786 self-assigned this Feb 12, 2024
@drdsgvo
Copy link

drdsgvo commented Feb 13, 2024

Thank you a lot for your quick and helpful answer.
I tried it out with German language and German wikipedia articles.
From 6 answers generated, 3 are not usable:

1 answer to a generated conditional question is empty
1 answer to a generated conditional question is (in English!): "Sorry, I cannot translate a negative number. Please provide a valid input"
1 answer to a generated reasoning question is (in English!): "Sorry I cannot answer this question as the information provided in the context does not mention... < here comes a question-specific problem>"

The other 3 answers are in German and as expected. All questions are good.

But with 3 out of 6 answers generated being no-answers, the approach is not feasible as of now.
Any ideas?

@shahules786
Copy link
Member

shahules786 commented Feb 13, 2024

Hey @drdsgvo , which models are you using?

@drdsgvo
Copy link

drdsgvo commented Feb 13, 2024

Hey @drdsgvo , which models are you using?

Just the default ones. I did not change anything and copy pasted your example code from your guide with some minor adaptations (like changing the language to 'german' insteadt of 'hindi')

@shahules786
Copy link
Member

Thanks, bro @drdsgvo , I will work on it. Would also love to chat with you to understand your application for it, if you're free sometime this week or later. calendly

@drdsgvo
Copy link

drdsgvo commented Feb 13, 2024

Thanks, bro @drdsgvo , I will work on it. Would also love to chat with you to understand your application for it, if you're free sometime this week or later. calendly

With the synthetic questions and answers generated we want to see if we can train an llm from scratch for question answering with given context.

@shahules786
Copy link
Member

shahules786 commented Feb 15, 2024

Hey, @drdsgvo just raised a PR for #599 fixing some evolution flows which solves this problem (90% filling rate). Here is a sample of Spanish data I generated from Wikipedia
Screenshot 2024-02-14 at 4 31 10 PM

@drdsgvo
Copy link

drdsgvo commented Feb 15, 2024

Hey, @drdsgvo just raised a PR for #599 fixing some evolution flows which solves this problem (90% filling rate). Here is a sample of Spanish data I generated from Wikipedia <

Thank you very much for your quick response and coding. I tried it (updated to latest ragas version, no modifications in my code).
An exception is raised in line 75 of fe0bcc4

Exception message:
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

@c-yeh
Copy link
Contributor

c-yeh commented Feb 19, 2024

Facing same problem with @drdsgvo on Japanese docs.

@shahules786
Copy link
Member

shahules786 commented Feb 19, 2024

Hey @c-yeh @drdsgvo sorry guys, that was poor testing on my part. I have merged a fix for it. Can you guys please install it from the source and try again? We will make a release later this week.

if you have any concerns or ideas on how to improve this feature, feel free to bug me here

@c-yeh
Copy link
Contributor

c-yeh commented Feb 20, 2024

@shahules786
Thanks for the quick response.

Using 0.1.2.dev8+gc18c7f4 the generator.generate_with_langchain_docs() part seems to proceed, but it's been 30 min now and neither does it error out nor finish. Is it supposed to take this long for test_size=10 (~1k docs in total)?

testset = generator.generate_with_langchain_docs(
    documents=lc_docs, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

@shahules786
Copy link
Member

shahules786 commented Feb 20, 2024

@c-yeh can you decrease the number of docs to (<100) and try again?
I would recommend to gradually increase the number of docs and test size once you have got some results

@c-yeh
Copy link
Contributor

c-yeh commented Feb 20, 2024

@shahules786

can you decrease the number of docs to (<100) and try again?

Tried with 50 docs and it generator.generate_with_langchain_docs() finished in ~4min.
Edit: up to 200 docs finished in ~4min
500 docs + test_size=10 = 10min
500 docs + test_size=20 = 12min

So at least it seems to not error out like before. However the time growth appears to be non linear.

I would recommend to gradually increase the number of docs and test size

Do you mean there is possibly a point over which the generation time starts to explode?
I think normally we'd expect linear time increase wrt. number of test_size (and number of docs to a limit), right?

What is a reasonable/normal time for the generation of test_size=10?

@c-yeh
Copy link
Contributor

c-yeh commented Feb 20, 2024

I found the reason for the slowness above and have raised an issue: #642

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants