Skip to content

align responses api payload to previous chat/completion#77

Merged
LukaszCmielowski merged 18 commits into
mainfrom
RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones
Jul 1, 2026
Merged

align responses api payload to previous chat/completion#77
LukaszCmielowski merged 18 commits into
mainfrom
RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones

Conversation

@filip-komarzyniec

Copy link
Copy Markdown
Collaborator

Description

Per each generated rag pattern there is also a ready to use out-of-the-box payload for Responses API requests. In order for it to be reliable it has to be strictly aligned with what's being sent during each iteration of the AutoRAG optimization process to Chat/Completion endpoint.

Motivation

  • First phase of switching to Responses API (from Chat/Completion)
  • requests sent during optimization process (Chat/Completion) must match those sent afterwards (Responses)

Changes

  • modified responses_template key in each genrated rag pattern
  • exposed, in rag_params dict, temperature and max_completion_tokens parameters of the used generation models
  • related tests changes

Testing

How did you test this?

Checklist

  • Tests added/updated
  • Documentation updated
  • Code follows style guide
  • All checks passing

…to align with previous payload of chat/completion; relevant tests changes

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Comment thread ai4rag/components/assets_generator/pattern_builder.py Outdated
Comment thread ai4rag/components/assets_generator/pattern_builder.py Outdated
…nker, deleted neural weights and unnecessary fields)

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@filip-komarzyniec filip-komarzyniec force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch 2 times, most recently from cc7c1f6 to beb3a1c Compare June 24, 2026 11:43
…search cases; related tests update

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

Signed-off-by: Filip Komarzyniec <fkomarzy@redhat.com>

rh-pre-commit.version: 2.3.2
rh-pre-commit.check-secrets: ENABLED

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@filip-komarzyniec filip-komarzyniec force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch from ddf12c6 to 8420482 Compare June 24, 2026 12:01

@Mateusz-Switala Mateusz-Switala left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@LukaszCmielowski

Copy link
Copy Markdown
Collaborator

AI-assisted review: #77 — Responses API payload alignment

Blocking

1. Tests vs implementation drift

On cc7c1f6, pattern_builder.py and test_pattern_builder.py disagree:

Test Code
ranking_options["max_num_results"] in test_adds_responses_template max_num_results on tools[0] only; ranking_options has weights
test_simple_retrieval: ranker: "auto", neural: 0.0 else branch: weights: {vector, keyword} only
Hybrid tests: max_num_results in ranking_options RRF/weighted branches omit it

Align to one schema (suggest: max_num_results on tool; ranking_options for ranker fields only) and run pytest tests/unit/ai4rag/assets_generator/test_pattern_builder.py.

2. Export parity — PR #75 rules not exported

This PR maps only system_message_text into system input. #75 puts RAG rules (grounding, citations, English-only) in user_message_text. Exported Responses payload will drop them unless we also fix content placement:

  • Option A: Move rule text to system_message_text in correct prompt templates #75; export picks it up here.
  • Option B: Merge static user_message_text prefix (before Documents: / {reference_documents}) into exported system input.

Without A or B, this fixes payload shape, not prompt parity.

3. Document HPO vs Responses contract

Phase System User
HPO system_message_text Full user_message_text (rules + chunks + question)
Responses system input Question only; chunks via file_search

Add a brief comment in pattern_builder.py — byte-for-byte parity is not expected; static instructions must match.


Suggestions

  • Confirm no consumers still read responses_template.instructions (breaking change).
  • Clarify whether tool_choice.tools: [{}] is intentional.
  • Add a test fixture with correct prompt templates #75-shaped user_message_text; assert exported system input contains grounding + citation keywords (fails today; passes after Option A/B).

Merge order

  1. Merge correct prompt templates #75 (prompt content).
  2. Rebase align responses api payload to previous chat/completion #77, fix test drift, add export parity (A or B).
  3. Diff exported pattern.json vs HPO messages for static instruction parity.

@filip-komarzyniec

Copy link
Copy Markdown
Collaborator Author

@LukaszCmielowski:

Ad.1) cc7c1f6 does not exist, most likely it has already been fixed. If there was a difference, the repo-level checks would fail,

Ad.2) How can I incorporate things from a not yet merged PR? Please propose the desired merging order

Ad.3) I'm not really following. The only "contract" between HPO and Responses phases is that the Responses call should resemble as closely as possible the chat/completion requests made during HPO phase. It has been achieved based on the multiple dedicated jira efforts. The instruction also do match (to most possible extent). What exactly should I document then?

As per the suggestions:

  • AFAIK there are no consumers of responses_template from the generated patterns,
  • tool_choice.tools: [{}] -- it works (tool usage is enforce) and it's taken from the OGX API docs. I'm not sure what's the intent behind such format. The docs are very vague and I've opened a separate issue for that: Improve the documented API schema ogx-ai/ogx#6176
Screenshot 2026-06-24 at 16 14 41 -

jakub-walaszczyk and others added 5 commits June 24, 2026 21:33
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com>
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com>
Assisted-by: Claude Code
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com>
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Co-authored-by: Jakub Walaszczyk <jwalaszc@redhat.com>
Assisted-by: Claude Code
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
…entation publish script.

Signed-off-by: Jakub Walaszczyk <jwalaszc@redhat.com>
Assisted-by: Claude Code
Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
This reverts commit 5217636.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
…#83)

* fix(assets): merge HPO user prompt rules into Responses export system input

Map Chat Completions prompts to responses_template without losing HPO-tuned
rules or duplicating OGX file_search runtime text.

Add build_responses_system_input() to merge non-redundant static supplements
from user_message_text (RAG rules, answer length, language policy) into
input[role=system], while stripping {reference_documents}, {question}, and
citation instructions that OGX injects via annotation_prompt_params.

Wire build_pattern_json() to use the merged system text (detected_language
is still applied on generation before export). Add unit tests for export
parity across all default model families and legacy prompt patterns.

RHOAIENG-71231

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* black formatting

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* ruff fix

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* fix(assets): add fallback for empty system input in Responses export

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* apply the feedback from Mateusz

fix(assets): address PR review for Responses export system input merge

Reorganize OGX runtime phrase lists into citation, grounding, and
file-search groups with derived combined views. Tighten grounding
deduplication so persona-only system prompts no longer suppress valid
user supplements (e.g. Granite RAG block). Simplify OGX runtime
stripping to a single pass, collapse whitespace after phrase removal,
and remove unused citation helpers.

Add unit tests for hybrid alpha=1.0 fallthrough, partial OGX sentence
removal, explicit vs persona grounding detection, export-parity system
input, and required generation fields.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

* refactor(assets): address staff review on Responses export prompt mapping

efactor(assets): address staff review on Responses export prompt mapping
Remove unreachable citation dedup branch, simplify citation-line detection
to use _CITATION_PREFIXES/_CITATION_SUBSTRINGS directly, and derive
_system_has_grounding_policy from _GROUNDING_PREFIXES for a single source
of truth. Rename _USER_RAG_SCAFFOLD_PREFIXES to _USER_RAG_GROUNDING_PREFIXES
and document _join_answer_scaffold_blocks scope.
Add direct tests for _is_placeholder_only_export and grounding-prefix
coverage; remove duplicate empty-input test.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor

---------

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
@Mateusz-Switala

Mateusz-Switala commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Claude Code Review: RHOAIENG-71231 — Chat Completions → Responses API Parameter Mapping

code review

Reviewer role: Principal Staff Engineer
Branch: RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones
Base: main


1. High-Level Summary

Goal: Map HPO (Hyperparameter Optimization) Chat Completions prompt parameters to the exported Responses API input[system] format. The core challenge is that HPO prompt templates embed OGX-owned runtime-injection strings (citation instructions, retrieval grounding rules, document slots) that must be stripped before being exported as a static Responses API system prompt — otherwise the API payload would duplicate text that OGX injects at inference time via file_search.

Scope:

  • ~420 lines of new production code in pattern_builder.py (a new build_responses_system_input() entry-point + ~15 private helpers)
  • 2-line change in experiment.py to propagate temperature and max_completion_tokens into the generation dict
  • ~320 lines of new tests in test_pattern_builder.py

2. Confirmed Issues & Critical Risks

[HIGH]tool_choice schema is structurally invalid

File: pattern_builder.py:460

"tool_choice": {"mode": "required", "tools": [{}], "type": "file_search"},

"tools": [{}] is an array containing one empty object. This matches no documented shape of the OpenAI/RHOAI Responses API. The tool_choice object for forced file-search should likely be {"type": "file_search"} (or whatever the deployment-specific schema requires). An empty dict inside the tools array will either silently be ignored or cause a 400-validation error from the API at runtime — with no test catching it since tests only assert the dict value, not validate it against the API spec.

Fix: Verify the exact tool_choice contract with the Responses API team and replace the literal. At minimum add an integration-level test or schema-validation fixture so this doesn't silently drift again.


[HIGH]_system_has_grounding_policy uses substring (in) while all other checkers use prefix match

File: pattern_builder.py:303

normalized = system.lower()
return any(prefix.lower() in normalized for prefix in _GROUNDING_PREFIXES)

Every other function that checks _GROUNDING_PREFIXES (e.g. _sentence_is_ogx_duplicative, _should_skip_redundant_user_line) uses startswith. This one uses in — a full-text substring scan. A system prompt like "Use only relevant information from documents below the line." or "All documents do not contain user PII" would incorrectly register as having a grounding policy, causing user-template grounding lines to be over-suppressed.

Fix:

def _system_has_grounding_policy(system: str) -> bool:
    sentences = [s.strip() for s in re.split(r"(?<=[.!?])\s+", system)]
    return any(
        any(sent.lower().startswith(p.lower()) for p in _GROUNDING_PREFIXES)
        for sent in sentences
    )

[HIGH]_strip_ogx_runtime_instructions double-replaces and risks over-stripping

File: pattern_builder.py:170-172

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "").replace(phrase.rstrip("."), "")

For phrase = "Answer using ONLY the provided documents.", after the first .replace(phrase, "") removes it (if present), the second .replace(phrase.rstrip("."), "") then scans for the substring "Answer using ONLY the provided documents" in the already-modified string. If another sentence like "Answer using ONLY the provided documents reliably and succinctly." exists, the rstrip version could partially match it, silently excising content mid-sentence.

Fix: Apply both replacements independently on the original text value before compounding:

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "")
    text = text.replace(phrase.rstrip("."), "")

Or, preferably, use a single regex with word-boundary anchoring to avoid mid-sentence partial matches.


[HIGH]temperature and max_completion_tokens may be None; no guard in the builder

File: experiment.py:335-336, pattern_builder.py:458-459

# experiment.py
"temperature": foundation_model.params.temperature,
"max_completion_tokens": foundation_model.params.max_completion_tokens,

# pattern_builder.py
"max_output_tokens": generation["max_completion_tokens"],
"temperature": generation["temperature"],

If params.temperature or params.max_completion_tokens is None (e.g. for models with optional sampling), the Responses API will receive "temperature": null. The API contract for null-vs-omitted is often not the same. No test exercises the None case. The test_build_pattern_json_requires_generation_model_id test covers a missing model_id but not missing temperature/max_completion_tokens.

Fix: Guard in build_pattern_json:

if generation.get("temperature") is not None:
    rt["temperature"] = generation["temperature"]
if generation.get("max_completion_tokens") is not None:
    rt["max_output_tokens"] = generation["max_completion_tokens"]

Also add a test:

def test_omits_temperature_when_none():
    pattern = _make_pattern()
    pattern["settings"]["generation"]["temperature"] = None
    build_pattern_json(pattern)
    assert "temperature" not in pattern["settings"]["responses_template"]

[MEDIUM]_normalize_answer_scaffold produces malformed output for non-canonical scaffold order

File: pattern_builder.py:159-162

def _normalize_answer_scaffold(line: str) -> str:
    normalized = line.replace(", with citations", "").replace("with citations", "")
    return _collapse_whitespace(normalized)

For "Answer (with citations, max 150 words):", the first replace removes nothing (the substring is "with citations," not ", with citations"), so the second replace removes "with citations", yielding "Answer (, max 150 words):" — a leading orphaned comma. The test only covers the canonical form "Answer (max 150 words, with citations):".

Fix: Use a regex:

def _normalize_answer_scaffold(line: str) -> str:
    normalized = re.sub(r",?\s*with citations,?\s*", "", line)
    return _collapse_whitespace(normalized)

[MEDIUM]ranker_alpha != 1 mixes integer and float comparison semantics

File: pattern_builder.py:482

elif search_mode == "hybrid" and ranker_strategy == "weighted" and ranker_alpha is not None and ranker_alpha != 1:

The comment immediately below says "ranker_alpha == 1.0 intentionally falls through" but the guard checks != 1 (integer literal). Python's 1.0 == 1 is True, so behaviour is correct, but a future reader (or linter rule) may flag this as a type inconsistency, and someone maintaining the logic may change it to != 1.0 believing they're doing a cleanup when they're not.

Fix: Use a float literal and a tolerance for floating-point safety:

import math
elif ... and ranker_alpha is not None and not math.isclose(ranker_alpha, 1.0):

[MEDIUM]_USER_RAG_GROUNDING_PREFIXES and _USER_GROUNDING_SKIP_PREFIXES overlap with _GROUNDING_PREFIXES without documented rationale

File: pattern_builder.py:85-93

There are three separate prefix collections for grounding-like suppression with non-obvious differences in where each is applied:

  • _GROUNDING_PREFIXES: used in sentence-level and system detection
  • _USER_GROUNDING_SKIP_PREFIXES: a subset, used only in pass-2 user-line filtering
  • _USER_RAG_GROUNDING_PREFIXES: only suppressed in pass-1 when system_has_grounding is True

A new contributor has no way to know when to add a new OGX phrase to which list. The module-level comment says "update these lists" but doesn't explain the tripartite structure.

Fix: Add inline comments to each list explaining the specific pipeline stage where it applies, and why the membership criteria differ.


3. Maintainability & Readability

[MEDIUM] — Two-pass filtering bleeds concerns across pass boundaries

The docstring for build_responses_system_input describes a "Pass 1 / Pass 2" two-pass architecture. In practice, pass-1 (_filter_static_user_for_responses) also strips citation lines via _is_citation_related_line, which is a pass-2 concern. And pass-2 (_adapt_static_user_for_responses_export) also checks _USER_GROUNDING_SKIP_PREFIXES, which overlaps with pass-1 deduplication. The abstraction is leaky — what the caller sees as two distinct semantics is, at the line level, the same logic duplicated with slightly different predicate sets.

Recommendation: Either commit fully to the two-pass model with clear, non-overlapping predicate sets, or collapse to a single-pass predicate union and document the single comprehensive filter.


[LOW]_is_citation_related_line checks _CITATION_PREFIXES then _HPO_CITATION_FRAGMENTS redundantly

File: pattern_builder.py:123-133

_CITATION_PREFIXES contains "You MUST cite sources". _HPO_CITATION_FRAGMENTS contains strings that all start with "You MUST cite sources ...". The startswith(_CITATION_PREFIXES) check on line 129 will always catch any _HPO_CITATION_FRAGMENTS hit before the second check on line 131 fires. The second check is unreachable for those fragments.


[LOW] — Module-level constants encode brittle external knowledge without a link to the source of truth

The extensive constant lists (_CITATION_PREFIXES, _FILE_SEARCH_MARKERS, etc.) are described as needing to stay in sync with benchmarking/rag/config.yaml. There is no automated check that these are in sync. Any OGX config change that adds a new injection phrase silently bypasses all filtering without any test failure.

Recommendation: Add a CI step or at least a comment pointing to the specific YAML keys that govern each list, so reviewers know what to check when config.yaml changes.


[LOW] — Test case numbering is out of order

File: test_pattern_builder.py, test_build_responses_system_input_handles_empty_inputs

The inline comments label the cases "Case 1", "Case 2", "Case 4", "Case 3" — Case 4 appears before Case 3 in source order. Minor, but breaks readability.


[LOW]build_pattern_json mutates its input and returns it — the return value adds no information

The function signature implies a transformation returning a new value. In practice pattern is mutated in-place and then returned. The return value is the same object. Either the function should be renamed/documented to make the mutation explicit, or it should create and return a deep copy.


4. Test Coverage Gaps

Gap Severity
temperature=None and max_completion_tokens=None in generation High
tool_choice value not schema-validated against the actual API contract High
_normalize_answer_scaffold with non-canonical argument order (e.g. "Answer (with citations, max 150 words):") Medium
_system_has_grounding_policy false-positive: system prompt contains _GROUNDING_PREFIXES text as a non-leading embedded substring Medium
_strip_ogx_runtime_instructions over-stripping: a phrase partially matching phrase.rstrip(".") embedded mid-sentence Medium
ranker_alpha passed as integer 1 (not float 1.0) Low
_filter_ogx_duplicative_sentences with a sentence ending in ? or ! Low
build_pattern_json called with detected_language set — verifies that build_responses_system_input is called after detected_language is stored Low

5. Actionable Fixes

Fix 1 — tool_choice schema (HIGH)

Rationale: Protects against a silent API contract violation.

# Current (line 460) — suspicious empty object in tools array:
"tool_choice": {"mode": "required", "tools": [{}], "type": "file_search"},

# Proposed — verify the actual API spec, then use explicit shape:
"tool_choice": {"type": "file_search"},  # or as per Responses API documentation

Fix 2 — _system_has_grounding_policy over-matching (HIGH)

Rationale: Prevents spurious suppression of user supplements when a system prompt mentions documents in a non-grounding context.

def _system_has_grounding_policy(system: str) -> bool:
    sentences = [s.strip() for s in re.split(r"(?<=[.!?])\s+", system)]
    return any(
        any(sent.lower().startswith(p.lower()) for p in _GROUNDING_PREFIXES)
        for sent in sentences
    )

Fix 3 — _strip_ogx_runtime_instructions double-replace risk (HIGH)

Rationale: Eliminates mid-sentence partial-match over-stripping.

for phrase in _SYSTEM_GROUNDING_PHRASES:
    text = text.replace(phrase, "")
    text = text.replace(phrase.rstrip("."), "")
text = _collapse_whitespace(text)

Fix 4 — Guard None temperature/tokens (HIGH)

Rationale: Prevents sending null to the Responses API for optional fields.

# In build_pattern_json, after building the base template dict:
if generation.get("temperature") is not None:
    pattern["settings"]["responses_template"]["temperature"] = generation["temperature"]
if generation.get("max_completion_tokens") is not None:
    pattern["settings"]["responses_template"]["max_output_tokens"] = generation["max_completion_tokens"]

Add corresponding test:

def test_omits_temperature_when_none():
    pattern = _make_pattern()
    pattern["settings"]["generation"]["temperature"] = None
    build_pattern_json(pattern)
    assert "temperature" not in pattern["settings"]["responses_template"]

Fix 5 — _normalize_answer_scaffold regex (MEDIUM)

Rationale: Handles non-canonical comma order without producing orphaned punctuation.

def _normalize_answer_scaffold(line: str) -> str:
    normalized = re.sub(r",?\s*with citations,?\s*", "", line)
    return _collapse_whitespace(normalized)

Fix 6 — ranker_alpha floating-point comparison (MEDIUM)

Rationale: Makes floating-point intent explicit and avoids future type-confusion bugs.

import math
elif (
    search_mode == "hybrid"
    and ranker_strategy == "weighted"
    and ranker_alpha is not None
    and not math.isclose(ranker_alpha, 1.0)
):

Fix 7 — Test case label order (LOW)

File: test_pattern_builder.py

Renumber the inline comments to # Case 1, # Case 2, # Case 3, # Case 4 in source order inside test_build_responses_system_input_handles_empty_inputs.


Fix 8 — while "\n\n\n" loop (LOW)

Rationale: Consistent with the existing re.sub in _collapse_whitespace; handles any run of 3+ newlines in one pass.

# Replace (pattern_builder.py:194-195):
while "\n\n\n" in result:
    result = result.replace("\n\n\n", "\n\n")

# With:
result = re.sub(r"\n{3,}", "\n\n", result)

@LukaszCmielowski LukaszCmielowski force-pushed the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch from 42ee158 to 94086d3 Compare June 29, 2026 15:46
…previous-chat-completions-requests-parameters-to-responses-requests-ones

Resolve conflicts:
- ai4rag/__init__.py: Use version 0.8.2 from main
- pyproject.toml: Use docling-core 2.83.1 from main
- pattern_builder.py: Remove detected_language parameter (handled in PR #81), keep PR #77 responses_template structure
- uv.lock: Regenerate from main

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
- Replace language_autodetect=False with language='English'
- Update test expectations to match unified RAG instruction format
- Remove checks for deprecated model-specific fragments

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Remove extra blank line between variable assignment and dict creation.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Extract 260 lines of prompt filtering logic from pattern_builder.py into
new prompt_filters.py module for better separation of concerns.

Changes:
- Create ai4rag/components/assets_generator/prompt_filters.py
  - Move all OGX deduplication constants and functions
  - Add comprehensive module docstring
  - Make functions public (remove underscore prefixes)

- Update pattern_builder.py (487 → 316 lines, -35%)
  - Import filtering functions from prompt_filters
  - Remove duplicated filtering logic
  - Focus on pattern building responsibilities

- Update tests to import from new module

Benefits:
- Clear separation: pattern building vs text filtering
- Better maintainability: OGX phrase lists in one place
- Easier testing: filter logic independently testable
- Reduced cognitive load when reviewing pattern code

All 35 tests passing.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@LukaszCmielowski

Copy link
Copy Markdown
Collaborator

@Mateusz-Switala updates were made - please re-review.

@Mateusz-Switala

Copy link
Copy Markdown
Collaborator

@Mateusz-Switala updates were made - please re-review.

@LukaszCmielowski It seems that not all suggestions from #77 (comment) have been applied or addressed.

Critical fixes:
- Remove legacy template support code (only support {reference_documents} format)
- Fix _system_has_grounding_policy to use sentence-level startswith instead of substring
  Prevents false positives like "All documents do not contain PII"
- Add None guards for temperature and max_completion_tokens
  Prevents sending null values to Responses API
- Replace while loop with single-pass regex in prompt_filters.py

Test coverage:
- Add test_omits_temperature_when_none
- Add test_omits_max_output_tokens_when_none
- Add test_system_grounding_detection_no_false_positive_on_embedded_substring
- Update test_extract_static_user_pure_text_no_slots for modern templates only
- Rename test_build_responses_system_input_legacy_user_prefix to
  test_build_responses_system_input_strips_ogx_prefix

All 38 tests passing. Black formatting passing.

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
@LukaszCmielowski

Copy link
Copy Markdown
Collaborator

@Mateusz-Switala noticed that - the commit above should address it

@Mateusz-Switala

Copy link
Copy Markdown
Collaborator

@LukaszCmielowski I see that problem with tool_choice schema was not addressed, nor fixed applied. Could you at least explain why you think this is the correct schema and should stay?

Comment thread ai4rag/components/assets_generator/prompt_filters.py Outdated
LukaszCmielowski and others added 4 commits July 1, 2026 12:52
Change headings metadata from list to string format using " > " separator.
This makes the metadata more readable and consistent with other formats.

Before: metadata["headings"] = ["Chapter 1", "Section 1.1"]
After:  metadata["headings"] = "Chapter 1 > Section 1.1"

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Update tests to handle headings as string ("A > B") instead of list.
Extract first heading from the string using .split(" > ")[0].

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
Applied 5 fixes from Mateusz's review comments:

1. Fix tool_choice schema (CRITICAL)
   - Changed from: {"mode": "required", "tools": [{}], "type": "file_search"}
   - Changed to: {"type": "file_search"}
   - Matches OGX SDK authoritative type definition

2. Update test assertion for tool_choice
   - Updated test to match corrected schema

3. Remove citation constant duplication (DRY)
   - Import _RAG_CITATION_INSTRUCTION from model_props.py
   - Remove duplicate HPO_CITATION_INSTRUCTION
   - Single source of truth for citation instruction

4. Document prefix constants
   - Add inline comments explaining usage of each constant group
   - Documents two-pass filtering architecture
   - Helps contributors understand which list to update

5. Fix normalize_answer_scaffold regex
   - Use regex to handle all comma orderings
   - Handles both ", with citations" and "with citations," correctly
   - No orphaned commas in output

All 89 tests passing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Lukasz Cmielowski <lcmielow@redhat.com>
Assisted-by: Cursor
…previous-chat-completions-requests-parameters-to-responses-requests-ones
@Mateusz-Switala Mateusz-Switala self-requested a review July 1, 2026 14:40

@Mateusz-Switala Mateusz-Switala left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@LukaszCmielowski

Copy link
Copy Markdown
Collaborator
EXTENDED LEADERBOARD WITH RESPONSES API
============================================================================================================================================
  Pattern Retrieval  Chunks Faithfulness Answer Correctness LLM Judge Citation Rate Multilingual % Responses-LLMJudge
 Pattern2    simple      10        53.8%              50.6%     79.1%         63.6%          36.4%              76.8%
 Pattern4    simple      10        43.5%              49.2%     80.9%         59.1%           9.1%              77.1%
 Pattern3    simple      10        42.3%              44.0%     75.5%         68.2%          27.3%              77.0%
 Pattern7    simple       3        42.1%              36.5%     75.5%         63.6%          22.7%              78.2%
 Pattern1    simple      10        42.0%              47.8%     74.5%         77.3%          27.3%              75.2%
Pattern10    simple       5        40.9%              35.7%     74.5%         59.1%          31.8%              81.8%
 Pattern8    simple       3        40.7%              34.3%     76.4%         81.8%          31.8%              77.3%
 Pattern5    simple       3        40.5%              41.0%     74.5%         81.8%          31.8%              80.0%
 Pattern6    simple      10        40.1%              48.0%     77.3%         68.2%          40.9%              80.0%
 Pattern9    simple       5        38.8%              36.4%     74.5%         59.1%          36.4%              79.1%

============================================================================================================================================
COMPARISON: AI4RAG vs Responses API
============================================================================================================================================
AI4RAG LLM Judge (mean):     76.3%
Responses API LLM Judge:      78.3%
Delta:                        +2.0%

Responses API success rate:   96.4% (212/220 answers)

@LukaszCmielowski LukaszCmielowski merged commit d86e672 into main Jul 1, 2026
4 checks passed
@LukaszCmielowski LukaszCmielowski deleted the RHOAIENG-71231-define-the-proper-mapping-of-previous-chat-completions-requests-parameters-to-responses-requests-ones branch July 1, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants