Skip to content

[ADR] Static -- @assert: annotation linting (W025) #52

Description

@Pawansingh3889

What
A new warning rule, W025 assertion-malformed, that scans for -- @Assert: comments in SQL files and flags any whose predicate doesn't parse. The rule does not execute the assertion — execution belongs to dbt tests, a downstream runner, or a follow-up companion tool.

Why
sql-sop catches dangerous SQL. It says nothing about whether the data the query returns is sane. That gap gets filled today by:

dbt tests — only if you use dbt
Great Expectations / Soda — new vendor, DB connection required
nothing, which is most teams
-- @Assert: comments are the lightest possible way to write a data assertion next to the query that produces the data. The problem is that nothing currently reads them, so people write whatever they feel like. By owning the syntax, sql-sop makes them machine-checkable. Execution is somebody else's problem.

Grammar (v1)
-- @Assert:
predicate := row_count
| unique()
| not_null()
|
op := = | != | < | <= | > | >=
col := ( . )? e.g. weight, users.batch_id
bare := [A-Za-z_][A-Za-z0-9_]*
literal := | | '' | ""
Examples that parse:

-- @Assert: row_count > 0
-- @Assert: row_count >= 100
-- @Assert: unique(batch_id)
-- @Assert: unique(orders.batch_id)
-- @Assert: not_null(weight)
-- @Assert: weight > 0
-- @Assert: status = 'paid'
Examples the rule fires on:

-- @Assert: row_count
-- @Assert: row_count > zero
-- @Assert: unique batch_id
-- @Assert: weight is positive
Whitespace
Tolerant. Extra whitespace anywhere between --, @Assert, :, and the predicate is fine. Leading indentation on the comment line is fine. Trailing whitespace before end-of-line is ignored. All three of these read the same to the rule:

-- @Assert: unique(id)
-- @Assert: unique(id)
-- @Assert: unique(id)
Identifiers — what's in, what's out
In for v1: bare identifiers and one level of qualification — weight, users.batch_id.

Out for v1: quoted identifiers in any dialect.

standard SQL "batch-id"
T-SQL [batch id]
MySQL batch_id
Reason: quoted identifiers are uncommon in assertion targets (you usually assert on clean column names), the three quote styles each need separate handling, and getting it wrong is a false-positive that costs every downstream user. Easy to add in v1.1 if a real user asks.

Scope discipline
Hard line, per GOVERNANCE.md: the rule does not execute anything. No DB connection. No credentials. Pure static check on the comment text.

Does not require assertions to be present. Files without -- @Assert: comments are unaffected.
Does not check that the predicate is true against data. That's the executor's job, not the linter's.
No new runtime dependency.
Anchor
-- is anchored to the start of the line (optional leading whitespace). This skips the case where -- @Assert: appears as a substring inside another comment — for example, a doc comment that mentions the rule. Trade-off: an inline assertion on the same line as SQL (SELECT ...; -- @Assert: row_count > 0) won't be picked up by v1. I'd rather lose that case than fire on prose.

Implementation
Already drafted on feature/W025-assertion-malformed:

sql_guard/rules/warnings.py — AssertionMalformed
tests/fixtures/assertions.sql — malformed examples
tests/test_new_rules.py — 22 tests: well-formed pass, malformed fire, embedded-mention regression, qualified identifiers, whitespace tolerance
tests/test_rules.py, tests/test_fluent.py — count assertions bumped
CHANGELOG.md — entry under [Unreleased]
README.md — rule table row, Key Numbers 43 → 44, 255 → 277
Open questions
Anchor placement. Start-of-line only (as drafted) vs anywhere on the line. Mid-line is more flexible but invites the prose false-positive. Lean: start-of-line.
Multiple predicates per comment. -- @Assert: row_count > 0 AND unique(id) — handy, but doubles the parser surface. Lean: no for v1.
Naming. @Assert vs @check vs @expect. @Assert matches Python's keyword. Lean: @Assert.
Quoted identifiers. Defer to v1.1 (see "Identifiers"). Confirm or push back.
What I rejected
Execute the assertion inside sql-sop. Needs DB connection, breaks the "rule-based, not AI" line, walks straight into Great Expectations / Soda territory.
Make this dbt-only. dbt has its own tests; the value of -- @Assert: is that it works for plain SQL files too.
No grammar — leave it freeform. Freeform is what gets written today, and nothing reads it. The grammar is the entire point.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions