Skip to content

Commit

Permalink
feat: decouple conformance failure from pytest failure (substrait-io#139
Browse files Browse the repository at this point in the history
)

This PR implements the main part of substrait-io#127, the RFC proposing to decouple
CI failure from spec adherence. Previously, the tests would record the
desired result (Substrait plans, execution results, etc.) in a snapshot
(using `pytest-snapshot`) and make the test fail if the system under
test produced a different result. In this PR, the result of that
comparison is now treated as the test "outcome", which, itself, is
snapshot and only a change in test *outcome* will now lead to CI
failure. This allows to keep CI green at all times even though some
systems do not have perfect conformance to the Substrait spec. CI only
fails if unexpected things happen: bugs are introduced in the test
infrastructure or systems that were previously conformant in a test are
now not conformant anymore or vice versa (for example, after upgrading
that system to a new version).

Above logic is implemented by wrapping the "inner"/"first" level of
snapshots into a function, `check_match`, that, instead of raising an
exception if the snapshots don't match, returns a Boolean indicating if
the snapshots match. That Boolean is then the outcome of the test, which
is snapshot using the normal snapshotting mechanism (which leads to test
failure if the snapshots differ).

Also, exceptions produced by producers and consumers due to unsupported
plan features are now caught by the test logic and recorded as test
"outcome". This way, if a previously unsupported feature is supported by
a new version of system under test after an upgrade, the outcome of the
test changes from "failure" to "pass" and needs to be changed explicitly
by changing the snapshot.

Signed-off-by: Ingo Müller <[email protected]>
  • Loading branch information
ingomueller-net authored Dec 11, 2024
1 parent 4fb0bd6 commit 0db7759
Show file tree
Hide file tree
Showing 1,416 changed files with 1,768 additions and 72 deletions.
4 changes: 1 addition & 3 deletions .github/workflows/support_matrix.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,7 @@ jobs:

- name: Run functions tests
working-directory: ./substrait_consumer/tests/functional/extension_functions
# Make this target succeed always as failing tests are part of the outcome
# and should be processed by the next step.
run: pytest -v --tb=no --csv ${{ matrix.target }}r_pytest_output.csv --csv-delimiter ';' --csv-columns 'id,status' -m ${{ matrix.target }}_substrait_snapshot || true
run: pytest -v --tb=no --csv ${{ matrix.target }}r_pytest_output.csv --csv-delimiter ';' --csv-columns 'id,status,properties_as_columns' -m ${{ matrix.target }}_substrait_snapshot

- name: Parse results
working-directory: ./substrait_consumer/tests/functional/extension_functions
Expand Down
45 changes: 38 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,23 @@ and substrait function tests (which test individual extension functions). Test d
using DuckDB at the start of the test class using the `prepare_tpch_parquet_data` fixture,
which is located in `substrait_consumer/conftest.py`.

It is important to understand that there are two levels of "testing" in this repository:

1. We want to determine whether producers and consumers produce the plans,
results, schemas, error messages, etc. given a certain input. This is
arguably the main purpose of this repository.
2. We want to test whether the testing infrastructure (which is non-trivial) is
behaving as intended and no bugs have been introduced on that level (such as
loading a test case that doesn't exist or an exception in the testing
logic).

The rationale of this repository is to make `pytest` and, hence, CI pass if and
only if the *second* class of tests passes, i.e., if the testing infrastructure
runs as expected and, thus, *irrespective of whether or not consumers and
consumers adhere to the Substrait spec.* The goal is, thus, to have CI green at
all times. To understand the level of conformance of producers and consumers,
the [support matrix](SUPPORT_MATRIX.md) is used instead.

# Setup
Create and activate your conda environment with python3.9:
```commandline
Expand Down Expand Up @@ -143,8 +160,7 @@ to be used later on for verification as well as an input to the consumer tests.
The consumer tests read the saved substrait plan snapshots and generate results. These
results are saved as a snapshot to be used for verification.

If there is a mismatch between results and a saved snapshot, the test will fail and a diff
will be generated.
If there is a mismatch between results and a saved snapshot, the result will be considered incorrect.

Substrait function test files are located in the `substrait_consumer/functional/extension_functions` folder.

Expand Down Expand Up @@ -196,6 +212,16 @@ IBIS_SCALAR = {

## Updating Snapshots

For each test, there are typically two snapshots, one for each of the levels
we are testing: (1) the snapshot of the behavior under test such as a Substrait
plan, a query result, etc. and (2) the "outcome" of whether or not the system
under test behaved as expected. A typical test, thus, consists of running the
system under test, comparing the output with the first snapshot (where a
diverging answer will *not* lead to immediate test failure), and then a
comparison of that outcome with the previously registered outcome (where a
diverging answer *will* lead to test failure). When updating snapshots we,
thus, need to be conscious about which of the two snapshots we update.

### Substrait Plan Snapshots
Each producer has its own set of substrait plan snapshots that are stored in the `*_snapshots`
directory under `substrait_consumer/tests/functional/extension_functions/`
Expand All @@ -204,8 +230,9 @@ cd substrait_consumer/tests/functional/extension_functions/boolean_snapshots
ls
DuckDBProducer IbisProducer IsthmusProducer
cd DuckDBProducer
and_plan.json bool_or_plan.json or_plan.json
bool_and_plan.json not_plan.json xor_plan.json
and-duckdb_outcome.txt bool_and_plan.json not-duckdb_outcome.txt or_plan.json
and_plan.json bool_or-duckdb_outcome.txt not_plan.json xor-duckdb_outcome.txt
bool_and-duckdb_outcome.txt bool_or_plan.json or-duckdb_outcome.txt xor_plan.json
```

Substrait plan snapshots are used to verify that producers are able to generate substrait plans.
Expand All @@ -221,7 +248,10 @@ for a single test by specifying the test file:
```commandline
pytest -m produce_substrait_snapshot --producer isthmus --snapshot-update test_arithmetic_functions.py
```

Note that this updates *both* snapshots, the one with the expected behavior and
the one recording whether or not the system behaved that way. After updating
the snapshots, be sure that the new snapshots correspond to what you want and
manually intervene if necessary.

### Results Snapshots
Results Snapshots are generated by running the SQL query corresponding to the function under
Expand All @@ -237,8 +267,9 @@ function grouping snapshots folder.
```commandline
cd substrait_consumer/tests/functional/extension_functions/boolean_snapshots/function_test_results
ls
and_result.txt bool_or_result.txt or_result.txt
bool_and_result.txt not_result.txt xor_result.txt
and-datafusion-acero_outcome.txt and-duckdb-acero_outcome.txt and-ibis-acero_outcome.txt and_outcome.txt
and-datafusion-datafusion_outcome.txt and-duckdb-duckdb_outcome.txt and-isthmus-acero_outcome.txt and_result.txt
...
```


Expand Down
147 changes: 111 additions & 36 deletions substrait_consumer/functional/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,38 @@
)


def check_match(
snapshot: Snapshot, value: str | bytes, snapshot_name: str | Path
) -> bool:
"""
Returns True iff the given `value` matches the snapshot according to `assert_match`.
Calls `snapshot.assert_match(value, snapshot_name)` and returns True if the
snapshot matches, returns False if it doesn't, and reraises any other exception
that may occur.
Parameters:
snapshot:
`snapshot` test fixture from pytest-snapshot
value:
Value to check against the snapshot
snapshot_name:
Path to the file containing the snapshot
Returns:
True if `value` matches the content of the file in `snapshot_name`, False
otherwise.
"""
try:
snapshot.assert_match(value, snapshot_name)
except AssertionError as e:
if str(e).startswith("value does not match the expected value in"):
return False
raise e

return True


def check_subtrait_function_names(
substrait_plan: dict,
expected_function_name,
Expand Down Expand Up @@ -52,6 +84,7 @@ def check_subtrait_function_names(
def generate_snapshot_results(
test_name: str,
snapshot: Snapshot,
record_property,
db_con: DuckDBPyConnection,
local_files: dict[str, str],
named_tables: dict[str, str],
Expand Down Expand Up @@ -79,7 +112,22 @@ def generate_snapshot_results(
producer = DuckDBProducer()
producer.setup(db_con, local_files, named_tables)

duckdb_result = producer.run_sql_query(sql_query[0])
group, name = test_name.split(":")
outcome_path = f"{name}_outcome.txt"
result_path = f"{name}_result.txt"

if "relation" in group:
snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / "relation_test_results"
else:
snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / "function_test_results"

try:
duckdb_result = producer.run_sql_query(sql_query[0])
except BaseException as e:
record_property("outcome", str(type(e)))
snapshot.assert_match(str(type(e)), outcome_path)
return

duckdb_result = duckdb_result.rename_columns(
list(map(str.lower, duckdb_result.column_names))
)
Expand All @@ -88,17 +136,16 @@ def generate_snapshot_results(
duckdb_result_data.extend(column.data)
duckdb_result_data.extend([' '])
str_result_data = '\n'.join(map(str, duckdb_result_data))
group, name = test_name.split(":")
if "relation" in group:
snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / "relation_test_results"
else:
snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / "function_test_results"
snapshot.assert_match(str_result_data, f"{name}_result.txt")

match_result = check_match(snapshot, str_result_data, result_path)
record_property("outcome", str(match_result))
snapshot.assert_match(str(match_result), outcome_path)


def substrait_producer_sql_test(
test_name: str,
snapshot: Snapshot,
record_property,
db_con: DuckDBPyConnection,
local_files: dict[str, str],
named_tables: dict[str, str],
Expand Down Expand Up @@ -136,31 +183,50 @@ def substrait_producer_sql_test(
producer.setup(db_con, local_files, named_tables)
sql_query, supported_producers = sql_query

group, name = test_name.split(":")
outcome_path = f"{name}-{producer.name()}_outcome.txt"
plan_path = f"{name}_plan.json"

if "relation" in group:
snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / type(producer).__name__
else:
snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / type(producer).__name__

# Convert the SQL/Ibis expression to a substrait query plan
if isinstance(producer, IbisProducer):
if ibis_expr:
substrait_plan = producer.produce_substrait(sql_query, validate, ibis_expr(*args))
try:
substrait_plan = producer.produce_substrait(
sql_query, validate, ibis_expr(*args)
)
except BaseException as e:
record_property("outcome", str(type(e)))
snapshot.assert_match(str(type(e)), outcome_path)
return
else:
pytest.xfail("ibis expression currently undefined")
else:
if type(producer) in supported_producers:
substrait_plan = producer.produce_substrait(sql_query, validate)
try:
substrait_plan = producer.produce_substrait(sql_query, validate)
except BaseException as e:
record_property("outcome", str(type(e)))
snapshot.assert_match(str(type(e)), outcome_path)
return
else:
pytest.xfail(
f"{producer.name()} does not support the following SQL: {sql_query}"
)

group, name = test_name.split(":")
if "relation" in group:
snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / type(producer).__name__
else:
snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / type(producer).__name__
snapshot.assert_match(str(substrait_plan), f"{name}_plan.json")
match_result = check_match(snapshot, str(substrait_plan), plan_path)
record_property("outcome", str(match_result))
snapshot.assert_match(str(match_result), outcome_path)


def substrait_consumer_sql_test(
test_name: str,
snapshot: Snapshot,
record_property,
db_con: DuckDBPyConnection,
local_files: dict[str, str],
named_tables: dict[str, str],
Expand Down Expand Up @@ -197,31 +263,40 @@ def substrait_consumer_sql_test(
consumer.setup(db_con, local_files, named_tables)

group, name = test_name.split(":")
snopshot_dir = RELATION_SNAPSHOT_DIR if "relation" in group else FUNCTION_SNAPSHOT_DIR
plan_path = (
snopshot_dir
/ group
/ type(producer).__name__
/ f"{name}_plan.json"
snapshot_dir = (
RELATION_SNAPSHOT_DIR if "relation" in group else FUNCTION_SNAPSHOT_DIR
)
results_dir = (
"relation_test_results" if "relation" in group else "function_test_results"
)
if plan_path.is_file():
substrait_plan = plan_path.read_text()
snapshot.snapshot_dir = snapshot_dir / group / results_dir
plan_path = snapshot_dir / group / type(producer).__name__ / f"{name}_plan.json"
outcome_path = f"{name}-{producer.name()}-{consumer.name()}_outcome.txt"
result_path = f"{name}_result.txt"

results_dir = "relation_test_results" if "relation" in group else "function_test_results"
snapshot.snapshot_dir = snopshot_dir / group / results_dir
if not plan_path.is_file():
pytest.skip(f"No substrait plan exists for {producer.name()}:{name}")

substrait_plan = plan_path.read_text()

try:
actual_result = consumer.run_substrait_query(substrait_plan)
actual_result = actual_result.rename_columns(
list(map(str.lower, actual_result.column_names))
).columns
result_list = []
for column in actual_result:
result_list.extend(column.data)
result_list.extend([' '])
str_result = '\n'.join(map(str, result_list))
snapshot.assert_match(str_result, f"{name}_result.txt")
except BaseException as e:
record_property("outcome", str(type(e)))
snapshot.assert_match(str(type(e)), outcome_path)
return

else:
pytest.skip(f"No substrait plan exists for {producer.name()}:{name}")
actual_result = actual_result.rename_columns(
list(map(str.lower, actual_result.column_names))
).columns
result_list = []
for column in actual_result:
result_list.extend(column.data)
result_list.extend([" "])
str_result = "\n".join(map(str, result_list))
match_result = check_match(snapshot, str_result, result_path)
record_property("outcome", str(match_result))
snapshot.assert_match(str(match_result), outcome_path)


def load_custom_duckdb_table(db_connection):
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 0db7759

Please sign in to comment.