Skip to content

Prevent placeholder discussion posts in daily compiler quality reports#31252

Merged
pelikhan merged 7 commits into
mainfrom
copilot/fix-daily-compiler-quality-body
May 9, 2026
Merged

Prevent placeholder discussion posts in daily compiler quality reports#31252
pelikhan merged 7 commits into
mainfrom
copilot/fix-daily-compiler-quality-body

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 9, 2026

daily-compiler-quality was intermittently publishing discussions with a literal placeholder body (test) because the first emitted create_discussion item was accepted and posted. This change hardens both the workflow prompt contract and safe-outputs enforcement so short/draft bodies fail instead of being published.

  • Workflow contract + policy (daily-compiler-quality)

    • Added explicit output contract to emit exactly one final create_discussion item and avoid draft/placeholder emissions.
    • Added safe-outputs.create-discussion.min-body-length: 200 in the workflow to enforce a non-trivial report body.
  • Safe-outputs runtime guard (create_discussion handler)

    • Added support for min_body_length in actions/setup/js/create_discussion.cjs.
    • Handler now rejects discussion creation when body length is below configured minimum, returning a failed safe-output result (which fails the safe_outputs job).
  • Compiler/config/schema plumbing

    • Added min-body-length to CreateDiscussionsConfig and handler config emission.
    • Extended workflow schema (main_workflow_schema.json) to validate safe-outputs.create-discussion.min-body-length.
    • Updated schema/reference documentation (including frontmatter and safe-outputs specification docs) with the new field.
  • Targeted coverage

    • Added/updated tests for:
      • handler config propagation of min_body_length
      • create_discussion runtime rejection for too-short bodies
safe-outputs:
  create-discussion:
    category: "audits"
    title-prefix: "[daily-compiler-quality] "
    max: 1
    min-body-length: 200

[!WARNING]

Copilot AI and others added 4 commits May 9, 2026 17:34
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/625b4883-e50b-4978-a6a4-a55380399c1a

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
This reverts commit ac1c8f4.

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
…workflow

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/625b4883-e50b-4978-a6a4-a55380399c1a

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix daily compiler quality discussion body placeholder Prevent placeholder discussion posts in daily compiler quality reports May 9, 2026
Copilot AI requested a review from gh-aw-bot May 9, 2026 17:54
@github-actions github-actions Bot mentioned this pull request May 9, 2026
@pelikhan pelikhan marked this pull request as ready for review May 9, 2026 19:04
Copilot AI review requested due to automatic review settings May 9, 2026 19:04
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 9, 2026

@copilot update schemas, instructions

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens daily-compiler-quality discussion publishing by adding an explicit “one final discussion output” contract and enforcing a minimum discussion body length at safe-outputs runtime to prevent placeholder posts (e.g., test) from being published.

Changes:

  • Added min-body-length support to the create_discussion safe-output config and validated it in the workflow schema/docs.
  • Implemented a min_body_length guard in the actions/setup/js/create_discussion.cjs handler to fail safe-outputs on too-short bodies.
  • Added targeted tests for config propagation and the new runtime rejection behavior.
Show a summary per file
File Description
pkg/workflow/create_discussion.go Adds MinBodyLength to the create-discussion workflow config struct.
pkg/workflow/compiler_safe_outputs_handlers.go Emits min_body_length into handler config when configured.
pkg/workflow/compiler_safe_outputs_config_test.go Tests handler config propagation for min_body_length.
pkg/parser/schemas/main_workflow_schema.json Validates safe-outputs.create-discussion.min-body-length in workflow schema.
docs/src/content/docs/reference/safe-outputs.md Documents min-body-length usage for create-discussion.
actions/setup/js/create_discussion.cjs Enforces min_body_length at runtime and fails on too-short bodies.
actions/setup/js/create_discussion_sanitization.test.cjs Adds test ensuring too-short bodies fail and do not call GitHub API.
.github/workflows/daily-compiler-quality.md Adds safe-outputs config + explicit output contract for discussions.
.github/workflows/daily-compiler-quality.lock.yml Regenerates locked workflow with new safe-outputs config (and includes an env export change).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 9/9 changed files
  • Comments generated: 2

export GH_AW_ENGINE="copilot"
export GH_AW_MCP_CLI_SERVERS='["safeoutputs","serena"]'
echo 'GH_AW_MCP_CLI_SERVERS=["safeoutputs","serena"]' >> "$GITHUB_ENV"
echo GH_AW_MCP_CLI_SERVERS='["safeoutputs","serena"]' >> "$GITHUB_ENV"
Comment on lines 16 to 19
TitlePrefix string `yaml:"title-prefix,omitempty"`
Category string `yaml:"category,omitempty"` // Discussion category ID or name
MinBodyLength int `yaml:"min-body-length,omitempty"` // Minimum required discussion body length before footer/markers
Labels []string `yaml:"labels,omitempty"` // Labels to attach to discussions and match when closing older ones
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 75/100

⚠️ Acceptable, with suggestions

Metric Value
New/modified tests analyzed 2
✅ Design tests (behavioral contracts) 2 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (50%)
Duplicate test clusters 0
Test inflation detected Yes (Go test file)
🚨 Coding-guideline violations 0

Test Classification Details

View all test classifications
Test File Classification Issues Detected
it("should fail when body is below configured minimum length") actions/setup/js/create_discussion_sanitization.test.cjs:170 ✅ Design Error path; verifies observable rejection behavior and absence of API call
Table row "create discussion minimum body length" in TestHandlerConfigBooleanFields pkg/workflow/compiler_safe_outputs_config_test.go:1138 ✅ Design Verifies min_body_length is correctly serialized into handler config

Flagged Tests — Suggestions for Improvement

⚠️ Table row "create discussion minimum body length" — no error path

Classification: Design test (config serialization)
Issue: Tests that MinBodyLength: 200 serializes correctly, but does not cover edge cases such as MinBodyLength: 0 (disabled/default) or negative values.
What design invariant does this test enforce? That min_body_length is faithfully propagated through the configuration pipeline.
What would break if deleted? The min_body_length value could silently stop being passed to the handler, breaking the rejection behavior.
Suggested improvement: Add a second table row testing MinBodyLength: 0 (or absent) to confirm the field is omitted or zero-valued by default, completing the contract.

⚠️ Test inflation detected in Go test file

compiler_safe_outputs_config_test.go added 11 lines vs compiler_safe_outputs_handlers.go adding 1 line (ratio ≈ 11:1, threshold 2:1). This is common for table-driven test expansion and does not affect test quality — flagged for completeness only.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 1 test (table row, unit //go:build !integration)
  • 🟨 JavaScript (*.test.cjs): 1 test (vitest)

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). Both new tests verify behavioral contracts. The JS test is particularly strong — it asserts rejection behavior and the absence of an external API call on invalid input.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25609356275

🧪 Test quality analysis by Test Quality Sentinel · ● 5M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 75/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%). Both new tests verify behavioral contracts.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a bug fix (placeholder bodies being posted) with targeted new runtime guard and test coverage.


[/diagnose] Root cause assessment

The root cause (first emitted create_discussion accepted regardless of content) is addressed at two independent layers: prompt contract and runtime min_body_length guard. This defence-in-depth is good engineering — a single guard at either layer alone would be insufficient.

One concern: the check in create_discussion.cjs measures length from processedBody.trim() and names the variable preSanitizeBodyLength. If processedBody has already been through normalisation (whitespace stripping, header/footer injection, etc.) before this point, the measured length may differ meaningfully from the raw user-supplied body. The schema description says "before footer/metadata" — a comment or test making the measurement point explicit would prevent future confusion.

[/tdd] Test coverage gaps

The new test "should fail when body is below configured minimum length" is a solid tracer bullet, but several boundary and edge cases are missing:

  1. Boundary at exactly minBodyLength — a body of exactly 200 characters should pass. The current tests don't verify this boundary, which is where off-by-one errors hide.

  2. Guard-disabled path (min_body_length: 0) — a handler configured with min_body_length: 0 should skip the check entirely. The implementation evaluates config.min_body_length ? parseInt(...) : 0, so a configured 0 is falsy and falls through correctly — but there is no test asserting this path.

  3. Invalid config (negative value) — the handler throws when minBodyLength < 0, but there is no test verifying the error message and failure path for min_body_length: -1.

  4. Go test naming — the new test case for MinBodyLength is added inside TestHandlerConfigBooleanFields. Integer fields and boolean fields have different semantics; this makes the test name misleading. Consider a dedicated TestHandlerConfigIntegerFields table or renaming the parent to something neutral like TestHandlerConfigFieldEmission.

[/diagnose] Parsing defensiveness

const minBodyLength = config.min_body_length ? parseInt(String(config.min_body_length), 10) : 0;

The String(...) conversion defends against non-string inputs, but the config value arrives from JSON where numbers are always number type. parseInt(String(200), 10) is correct but redundant. A cleaner form: Number(config.min_body_length ?? 0) (or Math.trunc(...) if floats need guarding), which also pushes NaN naturally into the downstream validation check. Minor readability point only.


Positive highlights

  • ✅ Defence-in-depth: both prompt contract and runtime enforcement guard against the bug — neither layer alone would be reliable.
  • ✅ The test verifies that no GraphQL createDiscussion mutation is called on rejection — asserting at the correct integration seam.
  • ✅ Schema, Go struct, handler config emission, docs, and compiled lock file are all updated consistently.
  • min-body-length is a reusable feature, not a hard-coded hack for one workflow.

Verdict

The fix is correct and the approach is sound. The main gap is boundary and edge-case tests on the new guard itself (body at exactly 200 chars, guard disabled at 0, negative config). These are quick to add and would lock in the guard's contract. Happy to merge once those boundary tests are in place, or they can be tracked as an immediate follow-up.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.4M

Copilot AI and others added 2 commits May 9, 2026 19:15
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 9, 2026

@copilot update schemas, instructions

Updated in commits 2823ace and c71f71c: added min-body-length coverage in schema/spec documentation (frontmatter-full.md, safe-outputs-specification.md) and aligned the example value to 200.

Copilot AI requested a review from pelikhan May 9, 2026 19:17
@pelikhan pelikhan merged commit 0d12df9 into main May 9, 2026
@pelikhan pelikhan deleted the copilot/fix-daily-compiler-quality-body branch May 9, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[deep-report] Daily Compiler Quality discussion body is literal "test" placeholder (2-day regression)

4 participants