-
Notifications
You must be signed in to change notification settings - Fork 896
Support multi language testset generation #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yes, we are going to bring that same functionality to testset generation too - I'm glad you brought it up 🙂. |
Thx, of course :) |
Hello! I stumbled into the same issue, are there any updates on this, maybe a workaround? Thanks!! @jjmachan @grauvictor |
I would also be interested in a multi-language feature, as I am working with German texts. |
Hey guys, Yes now this is possible with language adaptation for test generation. Just follow the guide here and enjoy :) Install ragas from the source before doing so. |
Thank you a lot for your quick and helpful answer. 1 answer to a generated conditional question is empty The other 3 answers are in German and as expected. All questions are good. But with 3 out of 6 answers generated being no-answers, the approach is not feasible as of now. |
Hey @drdsgvo , which models are you using? |
Just the default ones. I did not change anything and copy pasted your example code from your guide with some minor adaptations (like changing the language to 'german' insteadt of 'hindi') |
With the synthetic questions and answers generated we want to see if we can train an llm from scratch for question answering with given context. |
Thank you very much for your quick response and coding. I tried it (updated to latest ragas version, no modifications in my code). Exception message: |
Facing same problem with @drdsgvo on Japanese docs. |
@shahules786 Using testset = generator.generate_with_langchain_docs(
documents=lc_docs, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) |
@c-yeh can you decrease the number of docs to (<100) and try again? |
Tried with 50 docs and it So at least it seems to not error out like before. However the time growth appears to be non linear.
Do you mean there is possibly a point over which the generation time starts to explode? What is a reasonable/normal time for the generation of |
I found the reason for the slowness above and have raised an issue: #642 |
It doesn't seem that testset generation supports several types of language.
One solution would be to use an
adapt
function to translate the generation scripts in the same way as is done for metrics:ragas/src/ragas/metrics/_faithfulness.py
Lines 203 to 212 in 27e48b0
The text was updated successfully, but these errors were encountered: