align responses api payload to previous chat/completion by filip-komarzyniec · Pull Request #77 · IBM/ai4rag

filip-komarzyniec · 2026-06-24T10:06:10Z

Description

Per each generated rag pattern there is also a ready to use out-of-the-box payload for Responses API requests. In order for it to be reliable it has to be strictly aligned with what's being sent during each iteration of the AutoRAG optimization process to Chat/Completion endpoint.

Motivation

First phase of switching to Responses API (from Chat/Completion)
requests sent during optimization process (Chat/Completion) must match those sent afterwards (Responses)

Changes

modified responses_template key in each genrated rag pattern
exposed, in rag_params dict, temperature and max_completion_tokens parameters of the used generation models
related tests changes

Testing

How did you test this?

Checklist

Tests added/updated
Documentation updated
Code follows style guide
All checks passing

…to align with previous payload of chat/completion; relevant tests changes Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

…nker, deleted neural weights and unnecessary fields) Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

…search cases; related tests update Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Mateusz-Switala

lgtm

LukaszCmielowski · 2026-06-24T13:17:38Z

AI-assisted review: #77 — Responses API payload alignment

Blocking

1. Tests vs implementation drift

On cc7c1f6, pattern_builder.py and test_pattern_builder.py disagree:

Test	Code
`ranking_options["max_num_results"]` in `test_adds_responses_template`	`max_num_results` on `tools[0]` only; `ranking_options` has `weights`
`test_simple_retrieval`: `ranker: "auto"`, `neural: 0.0`	`else` branch: `weights: {vector, keyword}` only
Hybrid tests: `max_num_results` in `ranking_options`	RRF/weighted branches omit it

Align to one schema (suggest: max_num_results on tool; ranking_options for ranker fields only) and run pytest tests/unit/ai4rag/assets_generator/test_pattern_builder.py.

2. Export parity — PR #75 rules not exported

This PR maps only system_message_text into system input. #75 puts RAG rules (grounding, citations, English-only) in user_message_text. Exported Responses payload will drop them unless we also fix content placement:

Option A: Move rule text to system_message_text in correct prompt templates #75; export picks it up here.
Option B: Merge static user_message_text prefix (before Documents: / {reference_documents}) into exported system input.

Without A or B, this fixes payload shape, not prompt parity.

3. Document HPO vs Responses contract

Phase	System	User
HPO	`system_message_text`	Full `user_message_text` (rules + chunks + question)
Responses	system `input`	Question only; chunks via `file_search`

Add a brief comment in pattern_builder.py — byte-for-byte parity is not expected; static instructions must match.

Suggestions

Confirm no consumers still read responses_template.instructions (breaking change).
Clarify whether tool_choice.tools: [{}] is intentional.
Add a test fixture with correct prompt templates #75-shaped user_message_text; assert exported system input contains grounding + citation keywords (fails today; passes after Option A/B).

Merge order

Merge correct prompt templates #75 (prompt content).
Rebase align responses api payload to previous chat/completion #77, fix test drift, add export parity (A or B).
Diff exported pattern.json vs HPO messages for static instruction parity.

filip-komarzyniec · 2026-06-24T14:15:59Z

@LukaszCmielowski:

Ad.1) cc7c1f6 does not exist, most likely it has already been fixed. If there was a difference, the repo-level checks would fail,

Ad.2) How can I incorporate things from a not yet merged PR? Please propose the desired merging order

Ad.3) I'm not really following. The only "contract" between HPO and Responses phases is that the Responses call should resemble as closely as possible the chat/completion requests made during HPO phase. It has been achieved based on the multiple dedicated jira efforts. The instruction also do match (to most possible extent). What exactly should I document then?

As per the suggestions:

AFAIK there are no consumers of responses_template from the generated patterns,
tool_choice.tools: [{}] -- it works (tool usage is enforce) and it's taken from the OGX API docs. I'm not sure what's the intent behind such format. The docs are very vague and I've opened a separate issue for that: Improve the documented API schema ogx-ai/ogx#6176

-

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Co-authored-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

…entation publish script. Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

This reverts commit 5217636. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

…#83) * fix(assets): merge HPO user prompt rules into Responses export system input Map Chat Completions prompts to responses_template without losing HPO-tuned rules or duplicating OGX file_search runtime text. Add build_responses_system_input() to merge non-redundant static supplements from user_message_text (RAG rules, answer length, language policy) into input[role=system], while stripping {reference_documents}, {question}, and citation instructions that OGX injects via annotation_prompt_params. Wire build_pattern_json() to use the merged system text (detected_language is still applied on generation before export). Add unit tests for export parity across all default model families and legacy prompt patterns. RHOAIENG-71231 Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * black formatting Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * ruff fix Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * fix(assets): add fallback for empty system input in Responses export Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * apply the feedback from Mateusz fix(assets): address PR review for Responses export system input merge Reorganize OGX runtime phrase lists into citation, grounding, and file-search groups with derived combined views. Tighten grounding deduplication so persona-only system prompts no longer suppress valid user supplements (e.g. Granite RAG block). Simplify OGX runtime stripping to a single pass, collapse whitespace after phrase removal, and remove unused citation helpers. Add unit tests for hybrid alpha=1.0 fallthrough, partial OGX sentence removal, explicit vs persona grounding detection, export-parity system input, and required generation fields. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor * refactor(assets): address staff review on Responses export prompt mapping efactor(assets): address staff review on Responses export prompt mapping Remove unreachable citation dedup branch, simplify citation-line detection to use _CITATION_PREFIXES/_CITATION_SUBSTRINGS directly, and derive _system_has_grounding_policy from _GROUNDING_PREFIXES for a single source of truth. Rename _USER_RAG_SCAFFOLD_PREFIXES to _USER_RAG_GROUNDING_PREFIXES and document _join_answer_scaffold_blocks scope. Add direct tests for _is_placeholder_only_export and grounding-prefix coverage; remove duplicate empty-input test. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor --------- Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

Mateusz-Switala · 2026-06-29T11:50:31Z

Claude Code Review: `RHOAIENG-71231` — Chat Completions → Responses API Parameter Mapping

code review

Reviewer role: Principal Staff Engineer
Branch: RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones
Base: main

1. High-Level Summary

Goal: Map HPO (Hyperparameter Optimization) Chat Completions prompt parameters to the exported Responses API input[system] format. The core challenge is that HPO prompt templates embed OGX-owned runtime-injection strings (citation instructions, retrieval grounding rules, document slots) that must be stripped before being exported as a static Responses API system prompt — otherwise the API payload would duplicate text that OGX injects at inference time via file_search.

Scope:

~420 lines of new production code in pattern_builder.py (a new build_responses_system_input() entry-point + ~15 private helpers)
2-line change in experiment.py to propagate temperature and max_completion_tokens into the generation dict
~320 lines of new tests in test_pattern_builder.py

2. Confirmed Issues & Critical Risks

`[HIGH]` — `tool_choice` schema is structurally invalid

File: pattern_builder.py:460

"tool_choice": {"mode": "required", "tools": [{}], "type": "file_search"},

"tools": [{}] is an array containing one empty object. This matches no documented shape of the OpenAI/RHOAI Responses API. The tool_choice object for forced file-search should likely be {"type": "file_search"} (or whatever the deployment-specific schema requires). An empty dict inside the tools array will either silently be ignored or cause a 400-validation error from the API at runtime — with no test catching it since tests only assert the dict value, not validate it against the API spec.

Fix: Verify the exact tool_choice contract with the Responses API team and replace the literal. At minimum add an integration-level test or schema-validation fixture so this doesn't silently drift again.

`[HIGH]` — `_system_has_grounding_policy` uses substring (`in`) while all other checkers use prefix match

File: pattern_builder.py:303

normalized = system.lower()
return any(prefix.lower() in normalized for prefix in _GROUNDING_PREFIXES)

Every other function that checks _GROUNDING_PREFIXES (e.g. _sentence_is_ogx_duplicative, _should_skip_redundant_user_line) uses startswith. This one uses in — a full-text substring scan. A system prompt like "Use only relevant information from documents below the line." or "All documents do not contain user PII" would incorrectly register as having a grounding policy, causing user-template grounding lines to be over-suppressed.

Fix:

def _system_has_grounding_policy(system: str) -> bool:
    sentences = [s.strip() for s in re.split(r"(?<=[.!?])\s+", system)]
    return any(
        any(sent.lower().startswith(p.lower()) for p in _GROUNDING_PREFIXES)
        for sent in sentences
    )

`[HIGH]` — `_strip_ogx_runtime_instructions` double-replaces and risks over-stripping

File: pattern_builder.py:170-172

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "").replace(phrase.rstrip("."), "")

For phrase = "Answer using ONLY the provided documents.", after the first .replace(phrase, "") removes it (if present), the second .replace(phrase.rstrip("."), "") then scans for the substring "Answer using ONLY the provided documents" in the already-modified string. If another sentence like "Answer using ONLY the provided documents reliably and succinctly." exists, the rstrip version could partially match it, silently excising content mid-sentence.

Fix: Apply both replacements independently on the original text value before compounding:

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "")
    text = text.replace(phrase.rstrip("."), "")

Or, preferably, use a single regex with word-boundary anchoring to avoid mid-sentence partial matches.

`[HIGH]` — `temperature` and `max_completion_tokens` may be `None`; no guard in the builder

File: experiment.py:335-336, pattern_builder.py:458-459

# experiment.py
"temperature": foundation_model.params.temperature,
"max_completion_tokens": foundation_model.params.max_completion_tokens,

# pattern_builder.py
"max_output_tokens": generation["max_completion_tokens"],
"temperature": generation["temperature"],

If params.temperature or params.max_completion_tokens is None (e.g. for models with optional sampling), the Responses API will receive "temperature": null. The API contract for null-vs-omitted is often not the same. No test exercises the None case. The test_build_pattern_json_requires_generation_model_id test covers a missing model_id but not missing temperature/max_completion_tokens.

Fix: Guard in build_pattern_json:

if generation.get("temperature") is not None:
    rt["temperature"] = generation["temperature"]
if generation.get("max_completion_tokens") is not None:
    rt["max_output_tokens"] = generation["max_completion_tokens"]

Also add a test:

def test_omits_temperature_when_none():
    pattern = _make_pattern()
    pattern["settings"]["generation"]["temperature"] = None
    build_pattern_json(pattern)
    assert "temperature" not in pattern["settings"]["responses_template"]

`[MEDIUM]` — `_normalize_answer_scaffold` produces malformed output for non-canonical scaffold order

File: pattern_builder.py:159-162

def _normalize_answer_scaffold(line: str) -> str:
    normalized = line.replace(", with citations", "").replace("with citations", "")
    return _collapse_whitespace(normalized)

For "Answer (with citations, max 150 words):", the first replace removes nothing (the substring is "with citations," not ", with citations"), so the second replace removes "with citations", yielding "Answer (, max 150 words):" — a leading orphaned comma. The test only covers the canonical form "Answer (max 150 words, with citations):".

Fix: Use a regex:

def _normalize_answer_scaffold(line: str) -> str:
    normalized = re.sub(r",?\s*with citations,?\s*", "", line)
    return _collapse_whitespace(normalized)

`[MEDIUM]` — `ranker_alpha != 1` mixes integer and float comparison semantics

File: pattern_builder.py:482

elif search_mode == "hybrid" and ranker_strategy == "weighted" and ranker_alpha is not None and ranker_alpha != 1:

The comment immediately below says "ranker_alpha == 1.0 intentionally falls through" but the guard checks != 1 (integer literal). Python's 1.0 == 1 is True, so behaviour is correct, but a future reader (or linter rule) may flag this as a type inconsistency, and someone maintaining the logic may change it to != 1.0 believing they're doing a cleanup when they're not.

Fix: Use a float literal and a tolerance for floating-point safety:

import math
elif ... and ranker_alpha is not None and not math.isclose(ranker_alpha, 1.0):

`[MEDIUM]` — `_USER_RAG_GROUNDING_PREFIXES` and `_USER_GROUNDING_SKIP_PREFIXES` overlap with `_GROUNDING_PREFIXES` without documented rationale

File: pattern_builder.py:85-93

There are three separate prefix collections for grounding-like suppression with non-obvious differences in where each is applied:

_GROUNDING_PREFIXES: used in sentence-level and system detection
_USER_GROUNDING_SKIP_PREFIXES: a subset, used only in pass-2 user-line filtering
_USER_RAG_GROUNDING_PREFIXES: only suppressed in pass-1 when system_has_grounding is True

A new contributor has no way to know when to add a new OGX phrase to which list. The module-level comment says "update these lists" but doesn't explain the tripartite structure.

Fix: Add inline comments to each list explaining the specific pipeline stage where it applies, and why the membership criteria differ.

3. Maintainability & Readability

`[MEDIUM]` — Two-pass filtering bleeds concerns across pass boundaries

The docstring for build_responses_system_input describes a "Pass 1 / Pass 2" two-pass architecture. In practice, pass-1 (_filter_static_user_for_responses) also strips citation lines via _is_citation_related_line, which is a pass-2 concern. And pass-2 (_adapt_static_user_for_responses_export) also checks _USER_GROUNDING_SKIP_PREFIXES, which overlaps with pass-1 deduplication. The abstraction is leaky — what the caller sees as two distinct semantics is, at the line level, the same logic duplicated with slightly different predicate sets.

Recommendation: Either commit fully to the two-pass model with clear, non-overlapping predicate sets, or collapse to a single-pass predicate union and document the single comprehensive filter.

`[LOW]` — `_is_citation_related_line` checks `_CITATION_PREFIXES` then `_HPO_CITATION_FRAGMENTS` redundantly

File: pattern_builder.py:123-133

_CITATION_PREFIXES contains "You MUST cite sources". _HPO_CITATION_FRAGMENTS contains strings that all start with "You MUST cite sources ...". The startswith(_CITATION_PREFIXES) check on line 129 will always catch any _HPO_CITATION_FRAGMENTS hit before the second check on line 131 fires. The second check is unreachable for those fragments.

`[LOW]` — Module-level constants encode brittle external knowledge without a link to the source of truth

The extensive constant lists (_CITATION_PREFIXES, _FILE_SEARCH_MARKERS, etc.) are described as needing to stay in sync with benchmarking/rag/config.yaml. There is no automated check that these are in sync. Any OGX config change that adds a new injection phrase silently bypasses all filtering without any test failure.

Recommendation: Add a CI step or at least a comment pointing to the specific YAML keys that govern each list, so reviewers know what to check when config.yaml changes.

`[LOW]` — Test case numbering is out of order

File: test_pattern_builder.py, test_build_responses_system_input_handles_empty_inputs

The inline comments label the cases "Case 1", "Case 2", "Case 4", "Case 3" — Case 4 appears before Case 3 in source order. Minor, but breaks readability.

`[LOW]` — `build_pattern_json` mutates its input and returns it — the return value adds no information

The function signature implies a transformation returning a new value. In practice pattern is mutated in-place and then returned. The return value is the same object. Either the function should be renamed/documented to make the mutation explicit, or it should create and return a deep copy.

4. Test Coverage Gaps

Gap	Severity
`temperature=None` and `max_completion_tokens=None` in `generation`	High
`tool_choice` value not schema-validated against the actual API contract	High
`_normalize_answer_scaffold` with non-canonical argument order (e.g. `"Answer (with citations, max 150 words):"`)	Medium
`_system_has_grounding_policy` false-positive: system prompt contains `_GROUNDING_PREFIXES` text as a non-leading embedded substring	Medium
`_strip_ogx_runtime_instructions` over-stripping: a phrase partially matching `phrase.rstrip(".")` embedded mid-sentence	Medium
`ranker_alpha` passed as integer `1` (not float `1.0`)	Low
`_filter_ogx_duplicative_sentences` with a sentence ending in `?` or `!`	Low
`build_pattern_json` called with `detected_language` set — verifies that `build_responses_system_input` is called after `detected_language` is stored	Low

5. Actionable Fixes

Fix 1 — `tool_choice` schema (`HIGH`)

Rationale: Protects against a silent API contract violation.

# Current (line 460) — suspicious empty object in tools array:
"tool_choice": {"mode": "required", "tools": [{}], "type": "file_search"},

# Proposed — verify the actual API spec, then use explicit shape:
"tool_choice": {"type": "file_search"},  # or as per Responses API documentation

Fix 2 — `_system_has_grounding_policy` over-matching (`HIGH`)

Rationale: Prevents spurious suppression of user supplements when a system prompt mentions documents in a non-grounding context.

def _system_has_grounding_policy(system: str) -> bool:
    sentences = [s.strip() for s in re.split(r"(?<=[.!?])\s+", system)]
    return any(
        any(sent.lower().startswith(p.lower()) for p in _GROUNDING_PREFIXES)
        for sent in sentences
    )

Fix 3 — `_strip_ogx_runtime_instructions` double-replace risk (`HIGH`)

Rationale: Eliminates mid-sentence partial-match over-stripping.

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "")
    text = text.replace(phrase.rstrip("."), "")
text = _collapse_whitespace(text)

Fix 4 — Guard `None` temperature/tokens (`HIGH`)

Rationale: Prevents sending null to the Responses API for optional fields.

# In build_pattern_json, after building the base template dict:
if generation.get("temperature") is not None:
    pattern["settings"]["responses_template"]["temperature"] = generation["temperature"]
if generation.get("max_completion_tokens") is not None:
    pattern["settings"]["responses_template"]["max_output_tokens"] = generation["max_completion_tokens"]

Add corresponding test:

def test_omits_temperature_when_none():
    pattern = _make_pattern()
    pattern["settings"]["generation"]["temperature"] = None
    build_pattern_json(pattern)
    assert "temperature" not in pattern["settings"]["responses_template"]

Fix 5 — `_normalize_answer_scaffold` regex (`MEDIUM`)

Rationale: Handles non-canonical comma order without producing orphaned punctuation.

def _normalize_answer_scaffold(line: str) -> str:
    normalized = re.sub(r",?\s*with citations,?\s*", "", line)
    return _collapse_whitespace(normalized)

Fix 6 — `ranker_alpha` floating-point comparison (`MEDIUM`)

Rationale: Makes floating-point intent explicit and avoids future type-confusion bugs.

import math
elif (
    search_mode == "hybrid"
    and ranker_strategy == "weighted"
    and ranker_alpha is not None
    and not math.isclose(ranker_alpha, 1.0)
):

Fix 7 — Test case label order (`LOW`)

File: test_pattern_builder.py

Renumber the inline comments to # Case 1, # Case 2, # Case 3, # Case 4 in source order inside test_build_responses_system_input_handles_empty_inputs.

Fix 8 — `while "\n\n\n"` loop (`LOW`)

Rationale: Consistent with the existing re.sub in _collapse_whitespace; handles any run of 3+ newlines in one pass.

# Replace (pattern_builder.py:194-195):
while "\n\n\n" in result:
    result = result.replace("\n\n\n", "\n\n")

# With:
result = re.sub(r"\n{3,}", "\n\n", result)

…previous-chat-completions-requests-parameters-to-responses-requests-ones Resolve conflicts: - ai4rag/__init__.py: Use version 0.8.2 from main - pyproject.toml: Use docling-core 2.83.1 from main - pattern_builder.py: Remove detected_language parameter (handled in PR #81), keep PR #77 responses_template structure - uv.lock: Regenerate from main Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

- Replace language_autodetect=False with language='English' - Update test expectations to match unified RAG instruction format - Remove checks for deprecated model-specific fragments Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Remove extra blank line between variable assignment and dict creation. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Extract 260 lines of prompt filtering logic from pattern_builder.py into new prompt_filters.py module for better separation of concerns. Changes: - Create ai4rag/components/assets_generator/prompt_filters.py - Move all OGX deduplication constants and functions - Add comprehensive module docstring - Make functions public (remove underscore prefixes) - Update pattern_builder.py (487 → 316 lines, -35%) - Import filtering functions from prompt_filters - Remove duplicated filtering logic - Focus on pattern building responsibilities - Update tests to import from new module Benefits: - Clear separation: pattern building vs text filtering - Better maintainability: OGX phrase lists in one place - Easier testing: filter logic independently testable - Reduced cognitive load when reviewing pattern code All 35 tests passing. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

LukaszCmielowski · 2026-06-30T09:48:03Z

@Mateusz-Switala updates were made - please re-review.

Mateusz-Switala · 2026-06-30T10:51:21Z

@Mateusz-Switala updates were made - please re-review.

@LukaszCmielowski It seems that not all suggestions from #77 (comment) have been applied or addressed.

Critical fixes: - Remove legacy template support code (only support {reference_documents} format) - Fix _system_has_grounding_policy to use sentence-level startswith instead of substring Prevents false positives like "All documents do not contain PII" - Add None guards for temperature and max_completion_tokens Prevents sending null values to Responses API - Replace while loop with single-pass regex in prompt_filters.py Test coverage: - Add test_omits_temperature_when_none - Add test_omits_max_output_tokens_when_none - Add test_system_grounding_detection_no_false_positive_on_embedded_substring - Update test_extract_static_user_pure_text_no_slots for modern templates only - Rename test_build_responses_system_input_legacy_user_prefix to test_build_responses_system_input_strips_ogx_prefix All 38 tests passing. Black formatting passing. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

LukaszCmielowski · 2026-06-30T11:06:48Z

@Mateusz-Switala noticed that - the commit above should address it

Mateusz-Switala · 2026-07-01T08:21:37Z

@LukaszCmielowski I see that problem with tool_choice schema was not addressed, nor fixed applied. Could you at least explain why you think this is the correct schema and should stay?

Change headings metadata from list to string format using " > " separator. This makes the metadata more readable and consistent with other formats. Before: metadata["headings"] = ["Chapter 1", "Section 1.1"] After: metadata["headings"] = "Chapter 1 > Section 1.1" Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Update tests to handle headings as string ("A > B") instead of list. Extract first heading from the string using .split(" > ")[0]. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Applied 5 fixes from Mateusz's review comments: 1. Fix tool_choice schema (CRITICAL) - Changed from: {"mode": "required", "tools": [{}], "type": "file_search"} - Changed to: {"type": "file_search"} - Matches OGX SDK authoritative type definition 2. Update test assertion for tool_choice - Updated test to match corrected schema 3. Remove citation constant duplication (DRY) - Import _RAG_CITATION_INSTRUCTION from model_props.py - Remove duplicate HPO_CITATION_INSTRUCTION - Single source of truth for citation instruction 4. Document prefix constants - Add inline comments explaining usage of each constant group - Documents two-pass filtering architecture - Helps contributors understand which list to update 5. Fix normalize_answer_scaffold regex - Use regex to handle all comma orderings - Handles both ", with citations" and "with citations," correctly - No orphaned commas in output All 89 tests passing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

…previous-chat-completions-requests-parameters-to-responses-requests-ones

Mateusz-Switala

lgtm

LukaszCmielowski · 2026-07-01T14:43:41Z

EXTENDED LEADERBOARD WITH RESPONSES API
============================================================================================================================================
  Pattern Retrieval  Chunks Faithfulness Answer Correctness LLM Judge Citation Rate Multilingual % Responses-LLMJudge
 Pattern2    simple      10        53.8%              50.6%     79.1%         63.6%          36.4%              76.8%
 Pattern4    simple      10        43.5%              49.2%     80.9%         59.1%           9.1%              77.1%
 Pattern3    simple      10        42.3%              44.0%     75.5%         68.2%          27.3%              77.0%
 Pattern7    simple       3        42.1%              36.5%     75.5%         63.6%          22.7%              78.2%
 Pattern1    simple      10        42.0%              47.8%     74.5%         77.3%          27.3%              75.2%
Pattern10    simple       5        40.9%              35.7%     74.5%         59.1%          31.8%              81.8%
 Pattern8    simple       3        40.7%              34.3%     76.4%         81.8%          31.8%              77.3%
 Pattern5    simple       3        40.5%              41.0%     74.5%         81.8%          31.8%              80.0%
 Pattern6    simple      10        40.1%              48.0%     77.3%         68.2%          40.9%              80.0%
 Pattern9    simple       5        38.8%              36.4%     74.5%         59.1%          36.4%              79.1%

============================================================================================================================================
COMPARISON: AI4RAG vs Responses API
============================================================================================================================================
AI4RAG LLM Judge (mean):     76.3%
Responses API LLM Judge:      78.3%
Delta:                        +2.0%

Responses API success rate:   96.4% (212/220 answers)

filip-komarzyniec requested review from Mateusz-Switala and jakub-walaszczyk June 24, 2026 10:06

Mateusz-Switala reviewed Jun 24, 2026

View reviewed changes

Comment thread ai4rag/components/assets_generator/pattern_builder.py Outdated

Comment thread ai4rag/components/assets_generator/pattern_builder.py Outdated

filip-komarzyniec force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch 2 times, most recently from cc7c1f6 to beb3a1c Compare June 24, 2026 11:43

filip-komarzyniec requested a review from Mateusz-Switala June 24, 2026 11:45

filip-komarzyniec force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch from ddf12c6 to 8420482 Compare June 24, 2026 12:01

Mateusz-Switala approved these changes Jun 24, 2026

View reviewed changes

jakub-walaszczyk and others added 5 commits June 24, 2026 21:33

Downgrade docling-core version (#78)

be54757

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Release 0.8.1

14bd1a9

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

feature: Update prompt templates (#75)

e336828

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Co-authored-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

ci: Remove mike dependency, documentation versioning and update docum…

b97a102

…entation publish script. Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com> Assisted-by: Claude Code Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>

Revert "feature: Update prompt templates (#75)"

71b5247

This reverts commit 5217636. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

LukaszCmielowski requested a review from Mateusz-Switala June 29, 2026 10:29

LukaszCmielowski force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch from 42ee158 to 94086d3 Compare June 29, 2026 15:46

LukaszCmielowski added 4 commits June 30, 2026 10:19

style: fix black formatting

f0f298b

Remove extra blank line between variable assignment and dict creation. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

jakub-walaszczyk approved these changes Jun 30, 2026

View reviewed changes

Mateusz-Switala reviewed Jul 1, 2026

View reviewed changes

Comment thread ai4rag/components/assets_generator/prompt_filters.py Outdated

LukaszCmielowski and others added 4 commits July 1, 2026 12:52

test(chunking): update tests for string-based headings format

a9d89e4

Update tests to handle headings as string ("A > B") instead of list. Extract first heading from the string using .split(" > ")[0]. Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com> Assisted-by: Cursor

Merge branch 'main' into RHOAIENG-71231-define-the-proper-mapping-of-…

c032107

…previous-chat-completions-requests-parameters-to-responses-requests-ones

Mateusz-Switala self-requested a review July 1, 2026 14:40

Mateusz-Switala approved these changes Jul 1, 2026

View reviewed changes

LukaszCmielowski merged commit d86e672 into main Jul 1, 2026
4 checks passed

LukaszCmielowski deleted the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch July 1, 2026 14:44

Uh oh!

Conversation

filip-komarzyniec commented Jun 24, 2026

Description

Motivation

Changes

Testing

Checklist

Uh oh!

Uh oh!

Uh oh!

Mateusz-Switala left a comment

Choose a reason for hiding this comment

Uh oh!

LukaszCmielowski commented Jun 24, 2026

AI-assisted review: #77 — Responses API payload alignment

Blocking

Suggestions

Merge order

Uh oh!

filip-komarzyniec commented Jun 24, 2026

Uh oh!

Mateusz-Switala commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review: RHOAIENG-71231 — Chat Completions → Responses API Parameter Mapping

1. High-Level Summary

2. Confirmed Issues & Critical Risks

[HIGH] — tool_choice schema is structurally invalid

[HIGH] — _system_has_grounding_policy uses substring (in) while all other checkers use prefix match

[HIGH] — _strip_ogx_runtime_instructions double-replaces and risks over-stripping

[HIGH] — temperature and max_completion_tokens may be None; no guard in the builder

[MEDIUM] — _normalize_answer_scaffold produces malformed output for non-canonical scaffold order

[MEDIUM] — ranker_alpha != 1 mixes integer and float comparison semantics

[MEDIUM] — _USER_RAG_GROUNDING_PREFIXES and _USER_GROUNDING_SKIP_PREFIXES overlap with _GROUNDING_PREFIXES without documented rationale

3. Maintainability & Readability

[MEDIUM] — Two-pass filtering bleeds concerns across pass boundaries

[LOW] — _is_citation_related_line checks _CITATION_PREFIXES then _HPO_CITATION_FRAGMENTS redundantly

[LOW] — Module-level constants encode brittle external knowledge without a link to the source of truth

[LOW] — Test case numbering is out of order

[LOW] — build_pattern_json mutates its input and returns it — the return value adds no information

4. Test Coverage Gaps

5. Actionable Fixes

Fix 1 — tool_choice schema (HIGH)

Fix 2 — _system_has_grounding_policy over-matching (HIGH)

Fix 3 — _strip_ogx_runtime_instructions double-replace risk (HIGH)

Fix 4 — Guard None temperature/tokens (HIGH)

Fix 5 — _normalize_answer_scaffold regex (MEDIUM)

Fix 6 — ranker_alpha floating-point comparison (MEDIUM)

Fix 7 — Test case label order (LOW)

Fix 8 — while "\n\n\n" loop (LOW)

Uh oh!

LukaszCmielowski commented Jun 30, 2026

Uh oh!

Mateusz-Switala commented Jun 30, 2026

Uh oh!

LukaszCmielowski commented Jun 30, 2026

Uh oh!

Mateusz-Switala commented Jul 1, 2026

Uh oh!

Uh oh!

Mateusz-Switala left a comment

Choose a reason for hiding this comment

Uh oh!

LukaszCmielowski commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mateusz-Switala commented Jun 29, 2026 •

edited

Loading

Claude Code Review: `RHOAIENG-71231` — Chat Completions → Responses API Parameter Mapping

`[HIGH]` — `tool_choice` schema is structurally invalid

`[HIGH]` — `_system_has_grounding_policy` uses substring (`in`) while all other checkers use prefix match

`[HIGH]` — `_strip_ogx_runtime_instructions` double-replaces and risks over-stripping

`[HIGH]` — `temperature` and `max_completion_tokens` may be `None`; no guard in the builder

`[MEDIUM]` — `_normalize_answer_scaffold` produces malformed output for non-canonical scaffold order

`[MEDIUM]` — `ranker_alpha != 1` mixes integer and float comparison semantics

`[MEDIUM]` — `_USER_RAG_GROUNDING_PREFIXES` and `_USER_GROUNDING_SKIP_PREFIXES` overlap with `_GROUNDING_PREFIXES` without documented rationale

`[MEDIUM]` — Two-pass filtering bleeds concerns across pass boundaries

`[LOW]` — `_is_citation_related_line` checks `_CITATION_PREFIXES` then `_HPO_CITATION_FRAGMENTS` redundantly

`[LOW]` — Module-level constants encode brittle external knowledge without a link to the source of truth

`[LOW]` — Test case numbering is out of order

`[LOW]` — `build_pattern_json` mutates its input and returns it — the return value adds no information

Fix 1 — `tool_choice` schema (`HIGH`)

Fix 2 — `_system_has_grounding_policy` over-matching (`HIGH`)

Fix 3 — `_strip_ogx_runtime_instructions` double-replace risk (`HIGH`)

Fix 4 — Guard `None` temperature/tokens (`HIGH`)

Fix 5 — `_normalize_answer_scaffold` regex (`MEDIUM`)

Fix 6 — `ranker_alpha` floating-point comparison (`MEDIUM`)

Fix 7 — Test case label order (`LOW`)

Fix 8 — `while "\n\n\n"` loop (`LOW`)