What
A new warning rule, W025 assertion-malformed, that scans for -- @Assert: comments in SQL files and flags any whose predicate doesn't parse. The rule does not execute the assertion — execution belongs to dbt tests, a downstream runner, or a follow-up companion tool.
Why
sql-sop catches dangerous SQL. It says nothing about whether the data the query returns is sane. That gap gets filled today by:
dbt tests — only if you use dbt
Great Expectations / Soda — new vendor, DB connection required
nothing, which is most teams
-- @Assert: comments are the lightest possible way to write a data assertion next to the query that produces the data. The problem is that nothing currently reads them, so people write whatever they feel like. By owning the syntax, sql-sop makes them machine-checkable. Execution is somebody else's problem.
Grammar (v1)
-- @Assert:
predicate := row_count
| unique()
| not_null()
|
op := = | != | < | <= | > | >=
col := ( . )? e.g. weight, users.batch_id
bare := [A-Za-z_][A-Za-z0-9_]*
literal := | | '' | ""
Examples that parse:
-- @Assert: row_count > 0
-- @Assert: row_count >= 100
-- @Assert: unique(batch_id)
-- @Assert: unique(orders.batch_id)
-- @Assert: not_null(weight)
-- @Assert: weight > 0
-- @Assert: status = 'paid'
Examples the rule fires on:
-- @Assert: row_count
-- @Assert: row_count > zero
-- @Assert: unique batch_id
-- @Assert: weight is positive
Whitespace
Tolerant. Extra whitespace anywhere between --, @Assert, :, and the predicate is fine. Leading indentation on the comment line is fine. Trailing whitespace before end-of-line is ignored. All three of these read the same to the rule:
-- @Assert: unique(id)
-- @Assert: unique(id)
-- @Assert: unique(id)
Identifiers — what's in, what's out
In for v1: bare identifiers and one level of qualification — weight, users.batch_id.
Out for v1: quoted identifiers in any dialect.
standard SQL "batch-id"
T-SQL [batch id]
MySQL batch_id
Reason: quoted identifiers are uncommon in assertion targets (you usually assert on clean column names), the three quote styles each need separate handling, and getting it wrong is a false-positive that costs every downstream user. Easy to add in v1.1 if a real user asks.
Scope discipline
Hard line, per GOVERNANCE.md: the rule does not execute anything. No DB connection. No credentials. Pure static check on the comment text.
Does not require assertions to be present. Files without -- @Assert: comments are unaffected.
Does not check that the predicate is true against data. That's the executor's job, not the linter's.
No new runtime dependency.
Anchor
-- is anchored to the start of the line (optional leading whitespace). This skips the case where -- @Assert: appears as a substring inside another comment — for example, a doc comment that mentions the rule. Trade-off: an inline assertion on the same line as SQL (SELECT ...; -- @Assert: row_count > 0) won't be picked up by v1. I'd rather lose that case than fire on prose.
Implementation
Already drafted on feature/W025-assertion-malformed:
sql_guard/rules/warnings.py — AssertionMalformed
tests/fixtures/assertions.sql — malformed examples
tests/test_new_rules.py — 22 tests: well-formed pass, malformed fire, embedded-mention regression, qualified identifiers, whitespace tolerance
tests/test_rules.py, tests/test_fluent.py — count assertions bumped
CHANGELOG.md — entry under [Unreleased]
README.md — rule table row, Key Numbers 43 → 44, 255 → 277
Open questions
Anchor placement. Start-of-line only (as drafted) vs anywhere on the line. Mid-line is more flexible but invites the prose false-positive. Lean: start-of-line.
Multiple predicates per comment. -- @Assert: row_count > 0 AND unique(id) — handy, but doubles the parser surface. Lean: no for v1.
Naming. @Assert vs @check vs @expect. @Assert matches Python's keyword. Lean: @Assert.
Quoted identifiers. Defer to v1.1 (see "Identifiers"). Confirm or push back.
What I rejected
Execute the assertion inside sql-sop. Needs DB connection, breaks the "rule-based, not AI" line, walks straight into Great Expectations / Soda territory.
Make this dbt-only. dbt has its own tests; the value of -- @Assert: is that it works for plain SQL files too.
No grammar — leave it freeform. Freeform is what gets written today, and nothing reads it. The grammar is the entire point.
What
A new warning rule, W025 assertion-malformed, that scans for -- @Assert: comments in SQL files and flags any whose predicate doesn't parse. The rule does not execute the assertion — execution belongs to dbt tests, a downstream runner, or a follow-up companion tool.
Why
sql-sop catches dangerous SQL. It says nothing about whether the data the query returns is sane. That gap gets filled today by:
dbt tests — only if you use dbt
Great Expectations / Soda — new vendor, DB connection required
nothing, which is most teams
-- @Assert: comments are the lightest possible way to write a data assertion next to the query that produces the data. The problem is that nothing currently reads them, so people write whatever they feel like. By owning the syntax, sql-sop makes them machine-checkable. Execution is somebody else's problem.
Grammar (v1)
-- @Assert:
predicate := row_count
| unique()
| not_null()
|
op := = | != | < | <= | > | >=
col := ( . )? e.g. weight, users.batch_id
bare := [A-Za-z_][A-Za-z0-9_]*
literal := | | '' | ""
Examples that parse:
-- @Assert: row_count > 0
-- @Assert: row_count >= 100
-- @Assert: unique(batch_id)
-- @Assert: unique(orders.batch_id)
-- @Assert: not_null(weight)
-- @Assert: weight > 0
-- @Assert: status = 'paid'
Examples the rule fires on:
-- @Assert: row_count
-- @Assert: row_count > zero
-- @Assert: unique batch_id
-- @Assert: weight is positive
Whitespace
Tolerant. Extra whitespace anywhere between --, @Assert, :, and the predicate is fine. Leading indentation on the comment line is fine. Trailing whitespace before end-of-line is ignored. All three of these read the same to the rule:
-- @Assert: unique(id)
-- @Assert: unique(id)
-- @Assert: unique(id)
Identifiers — what's in, what's out
In for v1: bare identifiers and one level of qualification — weight, users.batch_id.
Out for v1: quoted identifiers in any dialect.
standard SQL "batch-id"
T-SQL [batch id]
MySQL
batch_idReason: quoted identifiers are uncommon in assertion targets (you usually assert on clean column names), the three quote styles each need separate handling, and getting it wrong is a false-positive that costs every downstream user. Easy to add in v1.1 if a real user asks.
Scope discipline
Hard line, per GOVERNANCE.md: the rule does not execute anything. No DB connection. No credentials. Pure static check on the comment text.
Does not require assertions to be present. Files without -- @Assert: comments are unaffected.
Does not check that the predicate is true against data. That's the executor's job, not the linter's.
No new runtime dependency.
Anchor
-- is anchored to the start of the line (optional leading whitespace). This skips the case where -- @Assert: appears as a substring inside another comment — for example, a doc comment that mentions the rule. Trade-off: an inline assertion on the same line as SQL (SELECT ...; -- @Assert: row_count > 0) won't be picked up by v1. I'd rather lose that case than fire on prose.
Implementation
Already drafted on feature/W025-assertion-malformed:
sql_guard/rules/warnings.py — AssertionMalformed
tests/fixtures/assertions.sql — malformed examples
tests/test_new_rules.py — 22 tests: well-formed pass, malformed fire, embedded-mention regression, qualified identifiers, whitespace tolerance
tests/test_rules.py, tests/test_fluent.py — count assertions bumped
CHANGELOG.md — entry under [Unreleased]
README.md — rule table row, Key Numbers 43 → 44, 255 → 277
Open questions
Anchor placement. Start-of-line only (as drafted) vs anywhere on the line. Mid-line is more flexible but invites the prose false-positive. Lean: start-of-line.
Multiple predicates per comment. -- @Assert: row_count > 0 AND unique(id) — handy, but doubles the parser surface. Lean: no for v1.
Naming. @Assert vs @check vs @expect. @Assert matches Python's keyword. Lean: @Assert.
Quoted identifiers. Defer to v1.1 (see "Identifiers"). Confirm or push back.
What I rejected
Execute the assertion inside sql-sop. Needs DB connection, breaks the "rule-based, not AI" line, walks straight into Great Expectations / Soda territory.
Make this dbt-only. dbt has its own tests; the value of -- @Assert: is that it works for plain SQL files too.
No grammar — leave it freeform. Freeform is what gets written today, and nothing reads it. The grammar is the entire point.