Skip to content

feat(observability): add Omni curated lifecycle spans#688

Merged
namastex888 merged 1 commit into
devfrom
fix/omni-curated-lifecycle-spans-20260531T215628Z
May 31, 2026
Merged

feat(observability): add Omni curated lifecycle spans#688
namastex888 merged 1 commit into
devfrom
fix/omni-curated-lifecycle-spans-20260531T215628Z

Conversation

@namastex888
Copy link
Copy Markdown
Contributor

Adds minimal curated lifecycle spans for Omni inbound/dispatch/outbound/turn flow with PII-safe lifecycle attributes and strengthens Agno session/trace propagation tests.\n\nLocal guards:\n- bun test packages/api/src/plugins/tests/agent-dispatcher.test.ts packages/core/src/providers/tests/agno-client.test.ts → 86 pass / 0 fail\n- pre-push typecheck → 21 successful / 21 total\n- pre-push full suite → 3964 pass / 292 skip / 0 fail across 307 files\n\nHML gate intent: deploy this exact SHA, send two controlled Gupshup/WhatsApp messages in the same session, query Langfuse by sessionId, and return the Langfuse session URL plus ordered trace IDs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d11ea582-f45c-48f3-9ed5-132250234aef

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/omni-curated-lifecycle-spans-20260531T215628Z

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces human lifecycle observability helpers and OpenTelemetry span wrapping for agent dispatch flows, ensuring sensitive PII and secrets are redacted from attributes. It also adds corresponding tests, including verification of trace context propagation in streaming runs. The review feedback suggests using a nullish check in setTextLifecycleAttributes to safely handle potential null values, and refining the error handling in withLifecycleSpan to guarantee that any OpenTelemetry SDK errors do not disrupt the core application flow if the business logic executes successfully.

Comment on lines +158 to +163
function setTextLifecycleAttributes(
attributes: Record<string, LifecycleAttributeValue>,
prefix: 'input' | 'output',
value: string | undefined,
): void {
if (value === undefined) return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To prevent potential runtime TypeError crashes if value is null (which can happen with database fields or external payloads even if typed as string | undefined), use a nullish check if (value == null) instead of strict equality to undefined.

Suggested change
function setTextLifecycleAttributes(
attributes: Record<string, LifecycleAttributeValue>,
prefix: 'input' | 'output',
value: string | undefined,
): void {
if (value === undefined) return;
function setTextLifecycleAttributes(
attributes: Record<string, LifecycleAttributeValue>,
prefix: 'input' | 'output',
value: string | undefined | null,
): void {
if (value == null) return;

Comment on lines +233 to +260
async function withLifecycleSpan<T>(
name: string,
attributes: Record<string, LifecycleAttributeValue>,
fn: () => Promise<T>,
): Promise<T> {
let callbackStarted = false;
try {
const tracer = trace.getTracer('omni.agent-dispatcher');
return await tracer.startActiveSpan(name, { attributes }, async (span) => {
callbackStarted = true;
try {
const result = await fn();
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.recordException(error instanceof Error ? error : new Error(String(error)));
span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : String(error) });
throw error;
} finally {
span.end();
}
});
} catch (error) {
if (callbackStarted) throw error;
log.warn('Lifecycle span wrapper failed before dispatch; continuing without span', { spanName: name });
return fn();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If an error occurs within the OpenTelemetry SDK itself (e.g., during span.end() or span finalization) after the business logic (fn()) has successfully completed, the current implementation will rethrow the OTel error because callbackStarted is true. Instrumentation should be completely fail-safe and never disrupt the core application flow. Consider tracking whether the business logic itself threw an error, and only rethrow if the error originated from fn().

async function withLifecycleSpan<T>(
  name: string,
  attributes: Record<string, LifecycleAttributeValue>,
  fn: () => Promise<T>,
): Promise<T> {
  let businessError: unknown = undefined;
  let hasBusinessError = false;
  try {
    const tracer = trace.getTracer('omni.agent-dispatcher');
    return await tracer.startActiveSpan(name, { attributes }, async (span) => {
      try {
        const result = await fn();
        span.setStatus({ code: SpanStatusCode.OK });
        return result;
      } catch (error) {
        hasBusinessError = true;
        businessError = error;
        span.recordException(error instanceof Error ? error : new Error(String(error)));
        span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : String(error) });
        throw error;
      } finally {
        try {
          span.end();
        } catch (otelError) {
          log.warn('Failed to end lifecycle span', { error: String(otelError) });
        }
      }
    });
  } catch (error) {
    if (hasBusinessError) {
      throw businessError;
    }
    log.warn('Lifecycle span wrapper failed; continuing without span', { spanName: name, error: String(error) });
    return fn();
  }
}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 14d2f24d4a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +192 to +194
attributes['khal.session_id'] = sessionId;
attributes['langfuse.session.id'] = sessionId;
attributes['session.id'] = sessionId;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid exporting raw session IDs to spans

For the default per_chat/per_user strategies, computeSessionId() returns the raw external chat/user ID (for WhatsApp this is often a phone-bearing JID, and per_thread embeds it too), but these new lifecycle attributes send that value verbatim to OTel/Langfuse. This bypasses the hashing/redaction used for chatId and leaks PII into observability data whenever an instance does not supply an explicit safe Khal session ID; hash/redact these fields or only emit the raw value when it is known to be non-PII.

Useful? React with 👍 / 👎.

@namastex888 namastex888 merged commit b40691a into dev May 31, 2026
10 checks passed
@namastex888 namastex888 deleted the fix/omni-curated-lifecycle-spans-20260531T215628Z branch May 31, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants