Skip to content

Conversation

@ncurado
Copy link

@ncurado ncurado commented Nov 19, 2025

Summary

This PR improves the OpenAI backend in UltraRAG by:

  1. Making the handling of unsupported parameters more robust for newer OpenAI models (e.g. reasoning models like o3-mini) by dropping parameters that those models do not accept.
  2. Adding a dedicated docs page for OpenAI backend usage, including configuration examples and notes on parameter compatibility.

Motivation

When running examples/rag_full.yaml with the OpenAI backend and newer models, I hit repeated 400 errors from the OpenAI Chat Completions API such as:

  • Unknown parameter: 'chat_template_kwargs'.
  • Unsupported parameter: 'top_p' is not supported with this model.

These errors occurred during the generation.generate step when calling client.chat.completions.create. They are not specific to my environment; they arise from the OpenAI API rejecting parameters that are either internal to local backends or not supported by certain models (notably reasoning models).

The goal of this PR is to make the OpenAI backend "just work" in these cases, while keeping behavior simple and predictable.


Changes

  1. OpenAI backend: drop unsupported parameters

File: servers/generation/src/generation.py

For backend == "openai":

  • Do not send chat_template_kwargs to OpenAI.
  • Drop top_p and top_k from sampling parameters for OpenAI backends to avoid unsupported_parameter errors from models like o3-mini.
  • Map max_tokensmax_completion_tokens for backwards compatibility.

The behavior for vLLM and HF backends is unchanged.

  1. New documentation: OpenAI backend usage

File: docs/openai-backend.md

Adds a new documentation page that covers:

  • Prerequisites and environment variables (LLM_API_KEY, RETRIEVER_API_KEY).
  • Example servers/generation/parameter.yaml for backend: openai.
  • Parameter compatibility section explaining that some models (e.g. o3-mini) do not support top_p, and that UltraRAG drops chat_template_kwargs, top_p and top_k for OpenAI backends.
  • Example of the 400 error that is avoided by this behavior.
  • Optional configuration for using OpenAI embeddings in the retriever.
  • A quick checklist for OpenAI-only setups.

Impact and compatibility

  • Existing users of the OpenAI backend should see fewer 400 errors, especially when using newer reasoning models or when top_p / top_k are configured in YAML.
  • For models that do support top_p/top_k, the current behavior is conservative: those params are not sent for OpenAI backends. If there is interest in exposing them conditionally per model, I'm happy to adjust based on maintainer feedback.
  • vLLM and HF backends are unaffected by these changes.

Testing

  • Ran ultrarag build and ultrarag run with examples/rag_full.yaml using the OpenAI backend.
  • Verified that the previous 400 errors (chat_template_kwargs, top_p) no longer occur and the pipeline runs to completion.
  • Confirmed that the new docs are self-contained and do not modify project-level behavior.

If you'd like additional tests or want the docs integrated into a specific docs navigation system, I'm happy to follow your conventions.

@xhd0728 xhd0728 self-assigned this Nov 20, 2025
@xhd0728
Copy link
Collaborator

xhd0728 commented Nov 20, 2025

Thanks for raising this and for the clear explanation of your use case!

This issue is actually caused by the official OpenAI API rejecting certain unsupported sampling parameters, which leads to the upstream error you encountered. As a temporary workaround, you can manually edit the built *_parameters.yaml and remove the unsupported entries (e.g., top_k or chat_template_kwargs when using the OpenAI API).

Thanks again for your contribution to UltraRAG! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants