Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-302] Support function-calling / json mode / structured generation for testset generation #1532

Open
ahgraber opened this issue Oct 18, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request linear Created by Linear-GitHub Sync module-testsetgen Module testset generation

Comments

@ahgraber
Copy link
Contributor

ahgraber commented Oct 18, 2024

Describe the Feature
Most service APIs now support enforcing schema outputs through function calling, json mode, or structured generation.
It would be really useful to have an option that would use the service API to enforce schema constraints rather than hoping chat prompts follow the expected format.

Why is the feature important for you?
With OpenAI, synthetic generation works flawlessly 99% of the time.
With Anthropic or Llama models, I get frequent parse errors, which end up retrying and ultimately failing. This uses a lot of tokens (and therefore $).
Concretely, generating a testset of 100 questions, gpt-4o-mini uses ~660k input and produces ~13k output tokens. When I attempt to generate a testset from the same knowledge graph with Anthropic Claude 3.5 sonnet, the generation fails from parse errors but I still end up using ~850k input and ~22.5k output tokens due to the retries!

Additional context
Given most of the responses are being parsed with Pydantic, it should be fairly trivial to turn the desired Pydantic object into a jsonschema (hint: openai provides openai.pydantic_function_tool() to convert Pydantic models to openai-compatible subset jsonschema)

R-302

@ahgraber ahgraber added the enhancement New feature or request label Oct 18, 2024
@dosubot dosubot bot added the module-testsetgen Module testset generation label Oct 18, 2024
@jjmachan
Copy link
Member

@ahgraber thanks for the suggestion - we should definitely do that as the default for the services that do support it

ref: https://python.langchain.com/v0.1/docs/modules/model_io/chat/structured_output/
something on top of this should work

also would love to chat sometime with you too Alex and get more feedback. I've send you an email to connect. Are you on discord btw

cheers ❤️
Jithin

@jjmachan jjmachan added the linear Created by Linear-GitHub Sync label Oct 20, 2024
@jjmachan jjmachan changed the title Support function-calling / json mode / structured generation for testset generation [R-302] Support function-calling / json mode / structured generation for testset generation Oct 20, 2024
@jjmachan jjmachan added this to the v.26 milestone Oct 22, 2024
@jjmachan jjmachan self-assigned this Oct 22, 2024
@jjmachan jjmachan modified the milestones: v.26, v.27 Oct 28, 2024
@ahgraber
Copy link
Contributor Author

Related:

For evals, many of the prompts seem to request a numeric answer (Context Recall -> "Attributed=0/1", Context Precision -> "Verdict=0/1"
While structured generation would work here, perhaps an even better option would be constraining the outputs to just tokens '0' and '1'?
OpenAI supports this with the logit_bias parameter (see openai docs and AAAzzam's twitter thread); I'm not sure how it's integrated into LangChain/Llamaindex and whether it is supported for all/most models.

@jjmachan jjmachan modified the milestones: v.27, v.28 Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request linear Created by Linear-GitHub Sync module-testsetgen Module testset generation
Projects
None yet
Development

No branches or pull requests

2 participants