Skip to content

fix: emit Langfuse generations for root RLM calls#99

Merged
namastex888 merged 1 commit into
mainfrom
fix/rlmx-root-generation-langfuse-observations
May 29, 2026
Merged

fix: emit Langfuse generations for root RLM calls#99
namastex888 merged 1 commit into
mainfrom
fix/rlmx-root-generation-langfuse-observations

Conversation

@namastex888
Copy link
Copy Markdown
Collaborator

@namastex888 namastex888 commented May 29, 2026

Summary

  • emit Langfuse GENERATION create/update events for root RLM model calls
  • cover both normal root iterations and forced final-answer calls
  • include model, input/output, usageDetails, costDetails, and latency metadata
  • add regression coverage in recursive trace tests

Operator impact

Operators no longer see root-only RLMX traces with observations=0 while local stats show real LLM work. Root model calls now appear in Langfuse as named generation observations:

  • Model call — root iteration N
  • Model call — forced final answer

Verification

  • npm run build
  • node --test dist/tests/recursive-trace.test.js
  • focused suite: 15 pass / 0 fail
  • npm run check
  • npm test — 376 pass / 0 fail
  • live Langfuse smoke:
    • trace/run id: b5ebbd98-ad21-415e-8a66-86259cf7e9eb
    • project path: /project/cmpprjzwq001yvn07marcb6sg/traces/b5ebbd98-ad21-415e-8a66-86259cf7e9eb
    • generation observations: 2
    • Langfuse total cost: $0.01321075
    • model: anthropic/claude-opus-4-8

Notes

  • This PR intentionally only includes the root-generation trace hardening files and generated dist artifacts for those files.
  • The working tree has unrelated local changes from broader RLMX/University work that are not part of this PR.

Summary by CodeRabbit

  • New Features

    • Added recording of root generation lifecycle events for LLM calls with model, input, output, usage, and error tracking.
    • LLM calls are now instrumented with tracing integration.
  • Tests

    • Added test coverage for root generation tracing events.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR extends Langfuse tracing support in rlmx by adding root generation lifecycle event recording. It defines data contracts for root generation start/end payloads, implements corresponding methods in LangfuseTraceRecorder, integrates tracing into the main RLM iteration loop and forced-final-answer path, and validates event emission with a new test.

Changes

Root Generation Tracing

Layer / File(s) Summary
Root Generation Data Contract
src/langfuse.ts
Exported RootGenerationStartData and RootGenerationEndData interfaces define the input payload structure for recording root generation lifecycle events, including model, input, iteration, output, duration, usage, and error metadata.
Root Generation Recorder Methods
src/langfuse.ts
rootGenerationStart() and rootGenerationEnd() methods on LangfuseTraceRecorder generate IDs, enqueue generation-create and generation-update Langfuse events with timestamps, and compute token totals and cost aggregates from provided usage data.
Main Iteration Loop Tracing
src/rlm.ts
Main RLM iteration loop wraps the llmComplete() call with rootGenerationStart() before execution and rootGenerationEnd() after, capturing model, message list, iteration number, response, duration, and usage stats.
Forced Final Answer Tracing
src/rlm.ts
forceFinalAnswer() function signature extended to accept langfuse recorder and iteration parameter (default 0); the forced-final llmComplete() call is wrapped with root generation start/end using constructed messages and iteration metadata.
Root Generation Event Test
tests/recursive-trace.test.ts
Test validates that LangfuseTraceRecorder emits correct generation-create and generation-update event payloads with trace/model linkage, input/output, token usage including cache read/write, and cost details.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • automagik-dev/rlmx#96: Extends Langfuse recursive tracing with child generation lifecycle methods, providing complementary parent trace and span instrumentation that works alongside this PR's root generation tracing.

Poem

🐰 A root generation blooms so bright,
Langfuse captures every flight,
Loop and forced path both now trace,
Events flowing into place,
Testing ensures the metrics gleam! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: emit Langfuse generations for root RLM calls' directly and specifically describes the main change: adding Langfuse generation event emissions for root RLM model calls.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/rlmx-root-generation-langfuse-observations

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6d0c50fc11

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/rlm.ts
const llmStartMs = Date.now();
const generationId = langfuse.rootGenerationStart({
name: `Model call — root iteration ${iteration + 1}`,
input: messages,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Snapshot generation input before queueing

Passing the live messages array here means the queued generation-create event does not preserve the prompt that was actually sent for this LLM call. rootGenerationStart stores the object reference and the batch is JSON-serialized only later during langfuse.flush(), while the loop appends assistant/user messages and mutates the last user message for soft-limit nudges before that flush. In multi-iteration runs, earlier generation inputs will therefore show the final mutated conversation (often including that generation's own output) instead of the per-call input, corrupting the Langfuse trace.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates Langfuse tracing for root generations in the RLM loop, capturing start and end events for both standard iterations and forced final answers, and adds corresponding test coverage. The reviewer suggests wrapping the forced final answer LLM call in a try...catch block to prevent generations from being left dangling in Langfuse if an error occurs. Additionally, they recommend adding a top-level cost field to the Langfuse payload to ensure proper cost tracking in the Langfuse UI.

Comment thread src/rlm.ts
Comment on lines +724 to +744
const generationId = langfuse?.rootGenerationStart({
name: "Model call — forced final answer",
input: forceMessages,
model: `${config.model.provider}/${config.model.model}`,
iteration,
});
const llmStartMs = Date.now();
const response = await llmComplete(forceMessages, config.model, {
signal,
cacheConfig,
thinkingLevel: config.gemini.thinkingLevel,
outputSchema: config.output.schema,
geminiConfig: config.gemini,
});
if (generationId) {
langfuse?.rootGenerationEnd(generationId, {
output: response.text,
durationMs: Date.now() - llmStartMs,
usage: response.usage,
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the main loop iteration, if llmComplete throws an error during the forced final answer call, the generation is left dangling in Langfuse. Wrapping this call in a try...catch block ensures the generation is correctly closed with an error status.

  const generationId = langfuse?.rootGenerationStart({
    name: "Model call — forced final answer",
    input: forceMessages,
    model: `${config.model.provider}/${config.model.model}`,
    iteration,
  });
  const llmStartMs = Date.now();
  let response: any;
  try {
    response = await llmComplete(forceMessages, config.model, {
      signal,
      cacheConfig,
      thinkingLevel: config.gemini.thinkingLevel,
      outputSchema: config.output.schema,
      geminiConfig: config.gemini,
    });
    if (generationId) {
      langfuse?.rootGenerationEnd(generationId, {
        output: response.text,
        durationMs: Date.now() - llmStartMs,
        usage: response.usage,
      });
    }
  } catch (err) {
    if (generationId) {
      langfuse?.rootGenerationEnd(generationId, {
        output: null,
        durationMs: Date.now() - llmStartMs,
        isError: true,
        errorMessage: err instanceof Error ? err.message : String(err),
      });
    }
    throw err;
  }

Comment thread src/langfuse.ts
Comment on lines +122 to +124
costDetails: {
total: data.usage?.totalCost ?? 0,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Langfuse uses the standard top-level cost field (a number) to track and display generation costs in the UI. Passing only costDetails might result in the cost not being displayed correctly in standard Langfuse cost columns. Adding cost at the top level of the payload ensures standard cost tracking works out of the box.

      cost: data.usage?.totalCost ?? 0,
      costDetails: {
        total: data.usage?.totalCost ?? 0,
      },

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/rlm.ts (1)

435-447: ⚡ Quick win

Root generations are not closed/flushed on error or non-final exits.

flush() is only invoked inside finalize(). If llmComplete() throws, control jumps to the catch (Lines 680-700) which never calls langfuse.flush(); the empty-abort path (Lines 650-669) also skips it. In both cases the queued trace-create/generation-create events are silently dropped, so the very root generations this PR adds won't surface on timeouts/errors. Additionally, the isError/errorMessage fields on RootGenerationEndData are never populated, so a failed root call is never closed with an error status.

Consider wrapping the call in try/catch to emit rootGenerationEnd({ ..., isError: true, errorMessage }), and ensure flush() runs on all exit paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rlm.ts` around lines 435 - 447, Wrap the llmComplete(...) call in a
try/catch/finally so that on any error or early-abort you call
langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage:
err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked
in the finally (or on all exit paths including the empty-abort branch) so queued
trace-create/generation-create events are flushed; update the non-error/normal
path to still call rootGenerationEnd(...) with usage/output and then flush, and
ensure the finalize() behavior is preserved or moved into the finally block to
always populate RootGenerationEndData and call flush().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/rlm.ts`:
- Around line 429-434: The generation-create currently passes a live reference
(messages) to langfuse.rootGenerationStart so later serialization in flush()
captures the mutated final conversation instead of the per-iteration input; fix
by snapshotting messages when calling langfuse.rootGenerationStart (create a
deep copy of the messages array/objects — e.g., clone each message object or use
a safe deep-clone) and pass that snapshot as input; ensure this change is
applied where generationId is created (langfuse.rootGenerationStart) so
subsequent flush()/JSON.stringify(batch) serializes the immutable per-iteration
input rather than the live messages array.

---

Nitpick comments:
In `@src/rlm.ts`:
- Around line 435-447: Wrap the llmComplete(...) call in a try/catch/finally so
that on any error or early-abort you call
langfuse.rootGenerationEnd(generationId, {..., isError: true, errorMessage:
err.message}) before rethrowing/handling, and ensure langfuse.flush() is invoked
in the finally (or on all exit paths including the empty-abort branch) so queued
trace-create/generation-create events are flushed; update the non-error/normal
path to still call rootGenerationEnd(...) with usage/output and then flush, and
ensure the finalize() behavior is preserved or moved into the finally block to
always populate RootGenerationEndData and call flush().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2124e5d3-13e7-47f5-a759-66d3faa3214f

📥 Commits

Reviewing files that changed from the base of the PR and between d18743f and 6d0c50f.

⛔ Files ignored due to path filters (4)
  • dist/src/langfuse.d.ts is excluded by !**/dist/**
  • dist/src/langfuse.js is excluded by !**/dist/**
  • dist/src/rlm.js is excluded by !**/dist/**
  • dist/tests/recursive-trace.test.js is excluded by !**/dist/**
📒 Files selected for processing (3)
  • src/langfuse.ts
  • src/rlm.ts
  • tests/recursive-trace.test.ts

Comment thread src/rlm.ts
Comment on lines +429 to +434
const generationId = langfuse.rootGenerationStart({
name: `Model call — root iteration ${iteration + 1}`,
input: messages,
model: `${config.model.provider}/${config.model.model}`,
iteration,
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Snapshot messages when starting the root generation.

input: messages stores a reference to the live array. Events are only serialized later in flush() (JSON.stringify(batch)), but messages keeps getting mutated each iteration (messages.push(...) and the nudge lastMsg.content += nudge). As a result every generation-create will serialize the final full conversation as its input, not the input at that iteration — defeating the per-iteration capture.

🐛 Capture a snapshot at start time
       const generationId = langfuse.rootGenerationStart({
         name: `Model call — root iteration ${iteration + 1}`,
-        input: messages,
+        input: [...messages],
         model: `${config.model.provider}/${config.model.model}`,
         iteration,
       });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const generationId = langfuse.rootGenerationStart({
name: `Model call — root iteration ${iteration + 1}`,
input: messages,
model: `${config.model.provider}/${config.model.model}`,
iteration,
});
const generationId = langfuse.rootGenerationStart({
name: `Model call — root iteration ${iteration + 1}`,
input: [...messages],
model: `${config.model.provider}/${config.model.model}`,
iteration,
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/rlm.ts` around lines 429 - 434, The generation-create currently passes a
live reference (messages) to langfuse.rootGenerationStart so later serialization
in flush() captures the mutated final conversation instead of the per-iteration
input; fix by snapshotting messages when calling langfuse.rootGenerationStart
(create a deep copy of the messages array/objects — e.g., clone each message
object or use a safe deep-clone) and pass that snapshot as input; ensure this
change is applied where generationId is created (langfuse.rootGenerationStart)
so subsequent flush()/JSON.stringify(batch) serializes the immutable
per-iteration input rather than the live messages array.

@namastex888 namastex888 merged commit 21a55cf into main May 29, 2026
8 checks passed
namastex888 added a commit that referenced this pull request May 30, 2026
Squash-merge by Drogo after all checks green. Follow-up to PR #99 root-generation Langfuse fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants