[R-302] Support function-calling / json mode / structured generation for testset generation #1532
Labels
enhancement
New feature or request
linear
Created by Linear-GitHub Sync
module-testsetgen
Module testset generation
Describe the Feature
Most service APIs now support enforcing schema outputs through function calling, json mode, or structured generation.
It would be really useful to have an option that would use the service API to enforce schema constraints rather than hoping chat prompts follow the expected format.
Why is the feature important for you?
With OpenAI, synthetic generation works flawlessly 99% of the time.
With Anthropic or Llama models, I get frequent parse errors, which end up retrying and ultimately failing. This uses a lot of tokens (and therefore $).
Concretely, generating a testset of 100 questions, gpt-4o-mini uses ~660k input and produces ~13k output tokens. When I attempt to generate a testset from the same knowledge graph with Anthropic Claude 3.5 sonnet, the generation fails from parse errors but I still end up using ~850k input and ~22.5k output tokens due to the retries!
Additional context
Given most of the responses are being parsed with Pydantic, it should be fairly trivial to turn the desired Pydantic object into a jsonschema (hint:
openai provides openai.pydantic_function_tool()
to convert Pydantic models to openai-compatible subset jsonschema)R-302
The text was updated successfully, but these errors were encountered: