feat: eval JSONL support, SQLite session store, --db CLI arg by lewing · Pull Request #14 · lewing/dotnet-replay

lewing · 2026-02-24T16:39:47Z

What's new on �valj

Eval JSONL Support

Parse and display waza/Arena evaluation transcripts (.jsonl format)
Auto-detect eval format via \IsEvalFormat()\ checking for \�val.start\ events
\EvalData\ class tracks test suites with per-test-case results, assertion feedback, real-time progress, and tool usage

SQLite Session Store

Load sessions from Copilot CLI \session-store.db\ databases
Read-only access (\Mode=ReadOnly) for safe concurrent reads

--db\ CLI Arg & DB Polling

--db \ flag to specify an external session database
Auto-detect .db\ file extension as positional argument
Live polling in browser mode — re-queries DB every 5 seconds for new/updated sessions
Updated help text

Tests

5 new subprocess-based tests for --db\ feature (\DbPathTests.cs)
All 67 tests passing

Team: Deckard (Lead), Batty (Core Dev), Rachael (Tester) + Scribe + Ralph Universe: Blade Runner Seeded with project history from session store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add --db <path> flag to load sessions from external session-store.db - .db files passed as positional args auto-route to browser mode - Browser scan thread now polls DB every 5s for new sessions - Only incremental queries (WHERE updated_at > last_seen) for efficiency - Skip filesystem/Claude scan when using external DB Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Five subprocess-based tests covering: - --db with nonexistent file - --db with valid DB but piped output (redirected TTY) - Positional .db file auto-detection - --db missing argument error - Help text includes --db documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This pull request adds support for evaluating JSONL transcripts from waza/Arena evaluation runs and introduces a new Squad team configuration system. However, the PR has significant discrepancies between its description and actual implementation.

Changes:

Adds eval JSONL format detection and parsing with auto-detection via IsEvalFormat() checking for eval.start events
Implements EvalData and EvalCaseResult classes to track test suites with per-test-case results, assertions, and tool usage
Adds Squad team configuration files (.squad/) for project organization with Blade Runner-themed agent personas
Tests for --db CLI argument (though the actual feature is not implemented in this PR)

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
tests/DbPathTests.cs	Adds 5 tests for --db flag behavior, but the actual --db feature is not implemented in the code
replay.cs	Implements comprehensive eval JSONL support including format detection, parsing, rendering, streaming, and pager integration
.squad/team.md	Introduces Squad team roster with Blade Runner characters and project context
.squad/skills/squad-conventions/SKILL.md	Documents Squad CLI conventions (unrelated to replay.cs functionality)
.squad/routing.md	Defines work routing rules for the Squad team
.squad/identity/wisdom.md	Empty template for team learnings
.squad/identity/now.md	Documents current focus area mentioning features not in this PR
.squad/decisions.md	Claims --db and DB polling features are completed (but they're not in the code)
.squad/ceremonies.md	Defines team meeting patterns
.squad/casting/registry.json	Registers three Blade Runner agents
.squad/casting/policy.json	Defines casting policy for agent universes
.squad/casting/history.json	Records universe assignment history
.squad/agents/*/history.md	Agent-specific project context and history files
.squad/agents/*/charter.md	Agent role definitions and boundaries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T17:00:58Z

replay.cs

+void ProcessEvalEvent(EvalData eval, string line)
+{
+    JsonDocument doc;
+    try { doc = JsonDocument.Parse(line); }
+    catch { return; }
+
+    var root = doc.RootElement;
+    var evType = SafeGetString(root, "type");
+    var data = root.TryGetProperty("data", out var d) ? d : default;
+    if (data.ValueKind == JsonValueKind.Undefined) return;
+
+    var currentCase = eval.Cases.LastOrDefault(c => c.Name == eval.CurrentCase);
+
+    switch (evType)
+    {
+        case "case.start":
+            var newCase = new EvalCaseResult
+            {
+                Name = SafeGetString(data, "case"),
+                Prompt = SafeGetString(data, "prompt")
+            };
+            eval.CurrentCase = newCase.Name;
+            eval.Cases.Add(newCase);
+            break;
+
+        case "message":
+            currentCase?.MessageAccumulator.Append(SafeGetString(data, "content"));
+            break;
+
+        case "tool.start":
+            if (currentCase is not null)
+            {
+                var toolName = SafeGetString(data, "tool_name");
+                var toolId = SafeGetString(data, "tool_call_id");
+                if (!currentCase.ToolsUsed.Contains(toolName))
+                    currentCase.ToolsUsed.Add(toolName);
+            }
+            break;
+
+        case "tool.complete":
+            if (currentCase is not null)
+            {
+                var tcName = SafeGetString(data, "tool_name");
+                var tcId = SafeGetString(data, "tool_call_id");
+                double durMs = 0;
+                if (data.TryGetProperty("duration_ms", out var dur))
+                    dur.TryGetDouble(out durMs);
+                currentCase.ToolEvents.Add((tcName, tcId, durMs));
+            }
+            break;
+
+        case "assertion.result":
+            if (currentCase is not null)
+            {
+                var fb = SafeGetString(data, "feedback");
+                currentCase.AssertionFeedback = string.IsNullOrEmpty(currentCase.AssertionFeedback)
+                    ? fb : currentCase.AssertionFeedback + "; " + fb;
+            }
+            break;
+
+        case "case.complete":
+            if (currentCase is not null)
+            {
+                if (data.TryGetProperty("passed", out var p))
+                    currentCase.Passed = p.GetBoolean();
+                if (data.TryGetProperty("duration_ms", out var dm))
+                { dm.TryGetDouble(out var dmv3); currentCase.DurationMs = dmv3; }
+                if (data.TryGetProperty("tool_call_count", out var tcc) && tcc.TryGetInt32(out var tccv))
+                    currentCase.ToolCallCount = tccv;
+                if (data.TryGetProperty("response_length", out var rl) && rl.TryGetInt32(out var rlv))
+                    currentCase.ResponseLength = rlv;
+            }
+            eval.CurrentCase = null;
+            break;
+
+        case "eval.complete":
+            if (data.TryGetProperty("passed", out var ep) && ep.TryGetInt32(out var epv))
+                eval.TotalPassed = epv;
+            if (data.TryGetProperty("failed", out var ef) && ef.TryGetInt32(out var efv))
+                eval.TotalFailed = efv;
+            if (data.TryGetProperty("skipped", out var es) && es.TryGetInt32(out var esv))
+                eval.TotalSkipped = esv;
+            if (data.TryGetProperty("total_duration_ms", out var etd))
+            { etd.TryGetDouble(out var etdv3); eval.TotalDurationMs = etdv3; }
+            if (data.TryGetProperty("total_tool_calls", out var etc2) && etc2.TryGetInt32(out var etc2v))
+                eval.TotalToolCalls = etc2v;
+            break;
+
+        case "error":
+            if (currentCase is not null)
+                currentCase.Error = SafeGetString(data, "message");
+            break;
+    }
+}


This function contains significant code duplication with the ParseEvalData function (lines 1160-1282). Both functions handle the same event types ("case.start", "tool.start", "tool.complete", "assertion.result", "case.complete", "eval.complete", "error") with nearly identical logic. The main difference is that ProcessEvalEvent operates on an existing EvalData object while ParseEvalData creates a new one. Consider extracting the common event processing logic into a shared helper function to improve maintainability and reduce the risk of bugs when updating one but not the other.

Copilot · 2026-02-24T17:00:58Z

replay.cs

+    // If eval.complete wasn't emitted, compute from cases
+    if (eval.TotalPassed == 0 && eval.TotalFailed == 0 && eval.Cases.Count > 0)
+    {
+        eval.TotalPassed = eval.Cases.Count(c => c.Passed == true);
+        eval.TotalFailed = eval.Cases.Count(c => c.Passed == false);
+        eval.TotalDurationMs = eval.Cases.Sum(c => c.DurationMs);
+        eval.TotalToolCalls = eval.Cases.Sum(c => c.ToolCallCount);
+    }


The fallback logic for computing totals when eval.complete wasn't emitted has a logic issue. It only computes totals when "eval.TotalPassed == 0 && eval.TotalFailed == 0", which means if the eval.complete event was emitted with TotalPassed=1 and TotalFailed=0, or any non-zero values, the actual case counts won't be recomputed. However, if eval.complete was emitted with incorrect values or partial data, those values will be used without validation. Consider changing the condition to check for the presence of the eval.complete event explicitly (e.g., using a flag) rather than inferring it from zero totals, as a legitimate eval could have 0 passed and 0 failed tests.

Copilot · 2026-02-24T17:00:58Z

tests/DbPathTests.cs

+using System.Text.Json;
+using Xunit;
+
+namespace ReplayTests;
+
+public class DbPathTests
+{
+    [Fact]
+    public void DbFlag_NonExistentFile_ShowsError()
+    {
+        var nonExistentDb = Path.Combine(Path.GetTempPath(), $"nonexistent-{Guid.NewGuid()}.db");
+        var (stdout, stderr) = RunReplayWithArgs($"--db {nonExistentDb}");
+
+        // Should show an error on stderr (app can't browse when output is redirected)
+        Assert.Contains("Error: Cannot use --db in redirected output", stderr);
+    }
+
+    [Fact]
+    public void DbFlag_ValidDbButPipedOutput_ShowsRedirectionError()
+    {
+        var tempDb = CreateEmptyDbFile();
+        var (stdout, stderr) = RunReplayWithArgs($"--db {tempDb}");
+
+        Assert.Contains("Error: Cannot use --db in redirected output", stderr);
+
+        // Clean up
+        try { File.Delete(tempDb); } catch { }
+    }
+
+    [Fact]
+    public void PositionalDbFile_AutoDetection_BehavesLikeDbFlag()
+    {
+        var tempDb = CreateEmptyDbFile();
+        var (stdout, stderr) = RunReplayWithArgs(tempDb);
+
+        // Should behave like --db: show redirected output error
+        Assert.Contains("Error: Cannot use --db in redirected output", stderr);
+
+        // Clean up
+        try { File.Delete(tempDb); } catch { }
+    }
+
+    [Fact]
+    public void DbFlag_MissingArgument_ShowsError()
+    {
+        var (stdout, stderr) = RunReplayWithArgs("--db");
+
+        Assert.Contains("Error: --db requires a file path", stderr);
+    }
+
+    [Fact]
+    public void HelpText_IncludesDbFlag()
+    {
+        var (stdout, stderr) = RunReplayWithArgs("--help");
+
+        Assert.Contains("--db", stdout);
+        Assert.Contains("Browse sessions from an external session-store.db file", stdout);
+    }
+
+    // Helper to create an empty DB file
+    private string CreateEmptyDbFile()
+    {
+        var path = Path.Combine(Path.GetTempPath(), $"test-{Guid.NewGuid()}.db");
+        // Create an empty file (doesn't need to be a valid SQLite DB for these tests)
+        File.WriteAllText(path, "");
+        return path;
+    }
+
+    private static readonly string ReplayCs = Path.GetFullPath(
+        Path.Combine(AppContext.BaseDirectory, "..", "..", "..", "..", "replay.cs"));
+
+    private (string stdout, string stderr) RunReplayWithArgs(string args)
+    {
+        var startInfo = new System.Diagnostics.ProcessStartInfo
+        {
+            FileName = "dotnet",
+            Arguments = $"run {ReplayCs} -- {args}",
+            RedirectStandardOutput = true,
+            RedirectStandardError = true,
+            RedirectStandardInput = true, // Need to redirect input too for proper TTY detection
+            UseShellExecute = false,
+            CreateNoWindow = true,
+            WorkingDirectory = Path.GetDirectoryName(ReplayCs)!
+        };
+
+        using var process = System.Diagnostics.Process.Start(startInfo);
+        if (process == null) throw new InvalidOperationException("Failed to start process");
+
+        process.StandardInput.Close(); // Close stdin immediately
+        var stdout = process.StandardOutput.ReadToEnd();
+        var stderr = process.StandardError.ReadToEnd();
+        process.WaitForExit();
+
+        return (stdout, stderr);
+    }
+}


The tests are checking for error messages related to the --db flag ("Error: Cannot use --db in redirected output", "Error: --db requires a file path"), but the actual --db functionality does not appear to be implemented in the replay.cs code changes included in this PR. The PR description mentions "--db CLI arg & DB Polling" as a feature, but there's no corresponding code in the diff to parse --db from command line arguments or to handle it. Either the implementation is missing from this PR, or the tests are testing未来 functionality that doesn't exist yet.

Copilot · 2026-02-24T17:00:59Z

.squad/decisions.md

+Two features implemented for session database management:
+
+1. **Periodic DB Re-query in Browser Mode**: Background scan thread now polls SQLite DB every 5 seconds for new sessions. Tracks known session IDs in `HashSet<string>` to avoid duplicates. Only re-polls DB; filesystem and Claude Code scans remain one-shot.
+
+2. **`--db <path>` CLI Argument**: Enables browsing sessions from external session-store.db files. CLI parsing treats `.db` files as dbPath. `LoadSessionsFromDb` and `BrowseSessions` accept optional `dbPathOverride` parameter. When external DB is used, filesystem fallback is skipped.


The decision describes two features that were "Completed", including a "--db CLI Argument" feature. However, the actual implementation of the --db CLI argument is missing from the replay.cs changes in this PR. The decision claims the feature enables "browsing sessions from external session-store.db files" with CLI parsing treating ".db files as dbPath" and accepting a "dbPathOverride parameter", but none of this code appears in the diff. This creates an inconsistency between what's documented as completed and what's actually in the code.

Copilot · 2026-02-24T17:00:59Z

replay.cs

    return false;
 }

+bool IsEvalFormat(string path)


The PR adds comprehensive eval JSONL support including IsEvalFormat detection, ParseEvalData parsing, StreamEvalEvents streaming, RenderEvalHeaderLines/RenderEvalContentLines rendering, OutputEvalSummary output, ProcessEvalEvent live updates, and integration with the interactive pager and stats command. However, there are no tests for this new functionality. Given that other features in the codebase have comprehensive test coverage (as evidenced by 67 total tests mentioned in the PR), the eval functionality should also have test coverage including tests for format detection, parsing various eval event types, handling edge cases like missing fields, and integration with different output modes.

Copilot · 2026-02-24T17:00:59Z

replay.cs

+                break;
+            case "case.start":
+                Console.WriteLine($"\n{Bold($"━━━ {SafeGetString(data, "case")}")} {Dim(timeStr)}");
+                Console.WriteLine(Dim($"    Prompt: {SafeGetString(data, "prompt")[..Math.Min(100, SafeGetString(data, "prompt").Length)]}..."));


Potential IndexOutOfRangeException when truncating prompt string. The code uses [..Math.Min(100, SafeGetString(data, "prompt").Length)] but calls SafeGetString(data, "prompt") twice, which could return different lengths if the data changes between calls (unlikely but possible in concurrent scenarios). More importantly, if the prompt is shorter than 100 characters, the ellipsis "..." will still be appended incorrectly. The ellipsis should only be added when the string is actually truncated. Recommend storing the result in a variable and checking the length before adding "...".

Suggested change

Console.WriteLine(Dim($" Prompt: {SafeGetString(data, "prompt")[..Math.Min(100, SafeGetString(data, "prompt").Length)]}..."));

var prompt = SafeGetString(data, "prompt");

const int maxPromptLength = 100;

var displayPrompt = prompt.Length > maxPromptLength

? prompt[..maxPromptLength] + "..."

: prompt;

Console.WriteLine(Dim($" Prompt: {displayPrompt}"));

Copilot · 2026-02-24T17:01:00Z

replay.cs

+    lines.Add(Row(Bold("  📋 Eval Suite: " + d.Suite)));
+    lines.Add(mid);
+    if (d.Description != "") lines.Add(Row($"  {Dim(d.Description.TrimEnd())}"));
+    lines.Add(Row($"  Cases:    {Bold(d.Cases.Count.ToString())}/{d.CaseCount}  " +


When displaying cases progress, the format shows "Cases.Count/CaseCount" (e.g., "3/5"). However, if the "case_count" field is missing or not parsed from the "eval.start" event, CaseCount will remain at its default value of 0, resulting in output like "3/0" which is confusing and misleading. Consider either checking if CaseCount is 0 and omitting the denominator in that case (showing just "3 cases" instead), or defaulting CaseCount to Cases.Count when it's not explicitly set in the eval.start event.

Suggested change

lines.Add(Row($" Cases: {Bold(d.Cases.Count.ToString())}/{d.CaseCount} " +

int displayCaseCount = d.CaseCount == 0 ? d.Cases.Count : d.CaseCount;

lines.Add(Row($" Cases: {Bold(d.Cases.Count.ToString())}/{displayCaseCount} " +

Copilot · 2026-02-24T17:01:00Z

replay.cs

+            if (parsed is null) return PagerAction.Quit;
+            headerLines = RenderEvalHeaderLines(parsed);
+            contentLines = RenderEvalContentLines(parsed, filterType, expandTools);
+            return RunInteractivePager(headerLines, contentLines, parsed, isJsonlFormat: false, noFollow: !noFollow);


The noFollow parameter passed to RunInteractivePager is negated (!noFollow) for eval format on line 982, which seems inconsistent with the other formats. For JSONL format (line 990), the parameter is "isClaude || noFollow" (no negation of noFollow), and for Waza format (line 997) it's hardcoded to "true". The variable noFollow comes from the CLI argument --no-follow, so when a user specifies --no-follow, noFollow=true, and passing !noFollow=false would actually enable following, which is the opposite of what the user requested. This appears to be a logic error. The parameter should likely be just "noFollow" without negation to match the user's intent.

Suggested change

return RunInteractivePager(headerLines, contentLines, parsed, isJsonlFormat: false, noFollow: !noFollow);

return RunInteractivePager(headerLines, contentLines, parsed, isJsonlFormat: false, noFollow: noFollow);

replay.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

lewing and others added 3 commits February 23, 2026 19:45

docs(squad): hire Blade Runner team for dotnet-replay

5cdb24b

Team: Deckard (Lead), Batty (Core Dev), Rachael (Tester) + Scribe + Ralph Universe: Blade Runner Seeded with project history from session store Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

lewing requested a review from Copilot February 24, 2026 16:54

Copilot started reviewing on behalf of lewing February 24, 2026 16:55 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Update replay.cs

ad2d714

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

lewing merged commit 6af7826 into main Feb 24, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: eval JSONL support, SQLite session store, --db CLI arg#14

feat: eval JSONL support, SQLite session store, --db CLI arg#14
lewing merged 4 commits intomainfrom
evalj

lewing commented Feb 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                Console.WriteLine(Dim($"    Prompt: {SafeGetString(data, "prompt")[..Math.Min(100, SafeGetString(data, "prompt").Length)]}..."));
+                var prompt = SafeGetString(data, "prompt");
+                const int maxPromptLength = 100;
+                var displayPrompt = prompt.Length > maxPromptLength
+                    ? prompt[..maxPromptLength] + "..."
+                    : prompt;
+                Console.WriteLine(Dim($"    Prompt: {displayPrompt}"));

	lines.Add(Row($" Cases: {Bold(d.Cases.Count.ToString())}/{d.CaseCount} " +
	int displayCaseCount = d.CaseCount == 0 ? d.Cases.Count : d.CaseCount;
	lines.Add(Row($" Cases: {Bold(d.Cases.Count.ToString())}/{displayCaseCount} " +

	return RunInteractivePager(headerLines, contentLines, parsed, isJsonlFormat: false, noFollow: !noFollow);
	return RunInteractivePager(headerLines, contentLines, parsed, isJsonlFormat: false, noFollow: noFollow);

Conversation

lewing commented Feb 24, 2026

What's new on �valj

Eval JSONL Support

SQLite Session Store

--db\ CLI Arg & DB Polling

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants