feat: decouple conformance failure from pytest failure (substrait-io#139

) This PR implements the main part of substrait-io#127, the RFC proposing to decouple CI failure from spec adherence. Previously, the tests would record the desired result (Substrait plans, execution results, etc.) in a snapshot (using `pytest-snapshot`) and make the test fail if the system under test produced a different result. In this PR, the result of that comparison is now treated as the test "outcome", which, itself, is snapshot and only a change in test *outcome* will now lead to CI failure. This allows to keep CI green at all times even though some systems do not have perfect conformance to the Substrait spec. CI only fails if unexpected things happen: bugs are introduced in the test infrastructure or systems that were previously conformant in a test are now not conformant anymore or vice versa (for example, after upgrading that system to a new version). Above logic is implemented by wrapping the "inner"/"first" level of snapshots into a function, `check_match`, that, instead of raising an exception if the snapshots don't match, returns a Boolean indicating if the snapshots match. That Boolean is then the outcome of the test, which is snapshot using the normal snapshotting mechanism (which leads to test failure if the snapshots differ). Also, exceptions produced by producers and consumers due to unsupported plan features are now caught by the test logic and recorded as test "outcome". This way, if a previously unsupported feature is supported by a new version of system under test after an upgrade, the outcome of the test changes from "failure" to "pass" and needs to be changed explicitly by changing the snapshot. Signed-off-by: Ingo Müller <[email protected]>
ingomueller-net · Dec 11, 2024 · 0db7759 · 0db7759
1 parent 4fb0bd6
commit 0db7759
Show file tree

Hide file tree

Showing 1,416 changed files with 1,768 additions and 72 deletions.
diff --git a/.github/workflows/support_matrix.yml b/.github/workflows/support_matrix.yml
@@ -41,9 +41,7 @@ jobs:
 
     - name: Run functions tests
       working-directory: ./substrait_consumer/tests/functional/extension_functions
-      # Make this target succeed always as failing tests are part of the outcome
-      # and should be processed by the next step.
-      run: pytest -v --tb=no --csv ${{ matrix.target }}r_pytest_output.csv --csv-delimiter ';' --csv-columns 'id,status' -m ${{ matrix.target }}_substrait_snapshot || true
+      run: pytest -v --tb=no --csv ${{ matrix.target }}r_pytest_output.csv --csv-delimiter ';' --csv-columns 'id,status,properties_as_columns' -m ${{ matrix.target }}_substrait_snapshot
 
     - name: Parse results
       working-directory: ./substrait_consumer/tests/functional/extension_functions

diff --git a/README.md b/README.md
@@ -32,6 +32,23 @@ and substrait function tests (which test individual extension functions). Test d
 using DuckDB at the start of the test class using the `prepare_tpch_parquet_data` fixture, 
 which is located in `substrait_consumer/conftest.py`.
 
+It is important to understand that there are two levels of "testing" in this repository:
+
+1. We want to determine whether producers and consumers produce the plans,
+   results, schemas, error messages, etc. given a certain input. This is
+   arguably the main purpose of this repository.
+2. We want to test whether the testing infrastructure (which is non-trivial) is
+   behaving as intended and no bugs have been introduced on that level (such as
+   loading a test case that doesn't exist or an exception in the testing
+   logic).
+
+The rationale of this repository is to make `pytest` and, hence, CI pass if and
+only if the *second* class of tests passes, i.e., if the testing infrastructure
+runs as expected and, thus, *irrespective of whether or not consumers and
+consumers adhere to the Substrait spec.* The goal is, thus, to have CI green at
+all times. To understand the level of conformance of producers and consumers,
+the [support matrix](SUPPORT_MATRIX.md) is used instead.
+
 # Setup
 Create and activate your conda environment with python3.9:
 ```commandline
@@ -143,8 +160,7 @@ to be used later on for verification as well as an input to the consumer tests.
 The consumer tests read the saved substrait plan snapshots and generate results. These
 results are saved as a snapshot to be used for verification.
 
-If there is a mismatch between results and a saved snapshot, the test will fail and a diff
-will be generated.
+If there is a mismatch between results and a saved snapshot, the result will be considered incorrect.
 
 Substrait function test files are located in the `substrait_consumer/functional/extension_functions` folder.
 
@@ -196,6 +212,16 @@ IBIS_SCALAR = {
 
 ## Updating Snapshots
 
+For each test, there are typically two snapshots, one for each of the levels
+we are testing: (1) the snapshot of the behavior under test such as a Substrait
+plan, a query result, etc. and (2) the "outcome" of whether or not the system
+under test behaved as expected. A typical test, thus, consists of running the
+system under test, comparing the output with the first snapshot (where a
+diverging answer will *not* lead to immediate test failure), and then a
+comparison of that outcome with the previously registered outcome (where a
+diverging answer *will* lead to test failure). When updating snapshots we,
+thus, need to be conscious about which of the two snapshots we update.
+
 ### Substrait Plan Snapshots
 Each producer has its own set of substrait plan snapshots that are stored in the `*_snapshots`
 directory under `substrait_consumer/tests/functional/extension_functions/`
@@ -204,8 +230,9 @@ cd substrait_consumer/tests/functional/extension_functions/boolean_snapshots
 ls
 DuckDBProducer		IbisProducer		IsthmusProducer
 cd DuckDBProducer
-and_plan.json		bool_or_plan.json	or_plan.json
-bool_and_plan.json	not_plan.json		xor_plan.json
+and-duckdb_outcome.txt       bool_and_plan.json          not-duckdb_outcome.txt  or_plan.json
+and_plan.json                bool_or-duckdb_outcome.txt  not_plan.json           xor-duckdb_outcome.txt
+bool_and-duckdb_outcome.txt  bool_or_plan.json           or-duckdb_outcome.txt   xor_plan.json
 ```
 
 Substrait plan snapshots are used to verify that producers are able to generate substrait plans.
@@ -221,7 +248,10 @@ for a single test by specifying the test file:
 ```commandline
 pytest -m produce_substrait_snapshot --producer isthmus --snapshot-update test_arithmetic_functions.py
 ```
-
+Note that this updates *both* snapshots, the one with the expected behavior and
+the one recording whether or not the system behaved that way. After updating
+the snapshots, be sure that the new snapshots correspond to what you want and
+manually intervene if necessary.
 
 ### Results Snapshots
 Results Snapshots are generated by running the SQL query corresponding to the function under
@@ -237,8 +267,9 @@ function grouping snapshots folder.
 ```commandline
 cd substrait_consumer/tests/functional/extension_functions/boolean_snapshots/function_test_results
 ls
-and_result.txt		bool_or_result.txt	or_result.txt
-bool_and_result.txt	not_result.txt		xor_result.txt
+and-datafusion-acero_outcome.txt       and-duckdb-acero_outcome.txt   and-ibis-acero_outcome.txt     and_outcome.txt
+and-datafusion-datafusion_outcome.txt  and-duckdb-duckdb_outcome.txt  and-isthmus-acero_outcome.txt  and_result.txt
+...
 ```
 
 

diff --git a/substrait_consumer/functional/common.py b/substrait_consumer/functional/common.py
@@ -22,6 +22,38 @@
 )
 
 
+def check_match(
+    snapshot: Snapshot, value: str | bytes, snapshot_name: str | Path
+) -> bool:
+    """
+    Returns True iff the given `value` matches the snapshot according to `assert_match`.
+
+    Calls `snapshot.assert_match(value, snapshot_name)` and returns True if the
+    snapshot matches, returns False if it doesn't, and reraises any other exception
+    that may occur.
+
+    Parameters:
+        snapshot:
+            `snapshot` test fixture from pytest-snapshot
+        value:
+            Value to check against the snapshot
+        snapshot_name:
+            Path to the file containing the snapshot
+
+    Returns:
+        True if `value` matches the content of the file in `snapshot_name`, False
+        otherwise.
+    """
+    try:
+        snapshot.assert_match(value, snapshot_name)
+    except AssertionError as e:
+        if str(e).startswith("value does not match the expected value in"):
+            return False
+        raise e
+
+    return True
+
+
 def check_subtrait_function_names(
     substrait_plan: dict,
     expected_function_name,
@@ -52,6 +84,7 @@ def check_subtrait_function_names(
 def generate_snapshot_results(
     test_name: str,
     snapshot: Snapshot,
+    record_property,
     db_con: DuckDBPyConnection,
     local_files: dict[str, str],
     named_tables: dict[str, str],
@@ -79,7 +112,22 @@ def generate_snapshot_results(
     producer = DuckDBProducer()
     producer.setup(db_con, local_files, named_tables)
 
-    duckdb_result = producer.run_sql_query(sql_query[0])
+    group, name = test_name.split(":")
+    outcome_path = f"{name}_outcome.txt"
+    result_path = f"{name}_result.txt"
+
+    if "relation" in group:
+        snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / "relation_test_results"
+    else:
+        snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / "function_test_results"
+
+    try:
+        duckdb_result = producer.run_sql_query(sql_query[0])
+    except BaseException as e:
+        record_property("outcome", str(type(e)))
+        snapshot.assert_match(str(type(e)), outcome_path)
+        return
+
     duckdb_result = duckdb_result.rename_columns(
         list(map(str.lower, duckdb_result.column_names))
     )
@@ -88,17 +136,16 @@ def generate_snapshot_results(
         duckdb_result_data.extend(column.data)
         duckdb_result_data.extend([' '])
     str_result_data = '\n'.join(map(str, duckdb_result_data))
-    group, name = test_name.split(":")
-    if "relation" in group:
-        snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / "relation_test_results"
-    else:
-        snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / "function_test_results"
-    snapshot.assert_match(str_result_data, f"{name}_result.txt")
+
+    match_result = check_match(snapshot, str_result_data, result_path)
+    record_property("outcome", str(match_result))
+    snapshot.assert_match(str(match_result), outcome_path)
 
 
 def substrait_producer_sql_test(
     test_name: str,
     snapshot: Snapshot,
+    record_property,
     db_con: DuckDBPyConnection,
     local_files: dict[str, str],
     named_tables: dict[str, str],
@@ -136,31 +183,50 @@ def substrait_producer_sql_test(
     producer.setup(db_con, local_files, named_tables)
     sql_query, supported_producers = sql_query
 
+    group, name = test_name.split(":")
+    outcome_path = f"{name}-{producer.name()}_outcome.txt"
+    plan_path = f"{name}_plan.json"
+
+    if "relation" in group:
+        snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / type(producer).__name__
+    else:
+        snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / type(producer).__name__
+
     # Convert the SQL/Ibis expression to a substrait query plan
     if isinstance(producer, IbisProducer):
         if ibis_expr:
-            substrait_plan = producer.produce_substrait(sql_query, validate, ibis_expr(*args))
+            try:
+                substrait_plan = producer.produce_substrait(
+                    sql_query, validate, ibis_expr(*args)
+                )
+            except BaseException as e:
+                record_property("outcome", str(type(e)))
+                snapshot.assert_match(str(type(e)), outcome_path)
+                return
         else:
             pytest.xfail("ibis expression currently undefined")
     else:
         if type(producer) in supported_producers:
-            substrait_plan = producer.produce_substrait(sql_query, validate)
+            try:
+                substrait_plan = producer.produce_substrait(sql_query, validate)
+            except BaseException as e:
+                record_property("outcome", str(type(e)))
+                snapshot.assert_match(str(type(e)), outcome_path)
+                return
         else:
             pytest.xfail(
                 f"{producer.name()} does not support the following SQL: {sql_query}"
             )
 
-    group, name = test_name.split(":")
-    if "relation" in group:
-        snapshot.snapshot_dir = RELATION_SNAPSHOT_DIR / group / type(producer).__name__
-    else:
-        snapshot.snapshot_dir = FUNCTION_SNAPSHOT_DIR / group / type(producer).__name__
-    snapshot.assert_match(str(substrait_plan), f"{name}_plan.json")
+    match_result = check_match(snapshot, str(substrait_plan), plan_path)
+    record_property("outcome", str(match_result))
+    snapshot.assert_match(str(match_result), outcome_path)
 
 
 def substrait_consumer_sql_test(
     test_name: str,
     snapshot: Snapshot,
+    record_property,
     db_con: DuckDBPyConnection,
     local_files: dict[str, str],
     named_tables: dict[str, str],
@@ -197,31 +263,40 @@ def substrait_consumer_sql_test(
     consumer.setup(db_con, local_files, named_tables)
 
     group, name = test_name.split(":")
-    snopshot_dir = RELATION_SNAPSHOT_DIR if "relation" in group else FUNCTION_SNAPSHOT_DIR
-    plan_path = (
-        snopshot_dir
-        / group
-        / type(producer).__name__
-        / f"{name}_plan.json"
+    snapshot_dir = (
+        RELATION_SNAPSHOT_DIR if "relation" in group else FUNCTION_SNAPSHOT_DIR
+    )
+    results_dir = (
+        "relation_test_results" if "relation" in group else "function_test_results"
     )
-    if plan_path.is_file():
-        substrait_plan = plan_path.read_text()
+    snapshot.snapshot_dir = snapshot_dir / group / results_dir
+    plan_path = snapshot_dir / group / type(producer).__name__ / f"{name}_plan.json"
+    outcome_path = f"{name}-{producer.name()}-{consumer.name()}_outcome.txt"
+    result_path = f"{name}_result.txt"
 
-        results_dir = "relation_test_results" if "relation" in group else "function_test_results"
-        snapshot.snapshot_dir = snopshot_dir / group / results_dir
+    if not plan_path.is_file():
+        pytest.skip(f"No substrait plan exists for {producer.name()}:{name}")
+
+    substrait_plan = plan_path.read_text()
+
+    try:
         actual_result = consumer.run_substrait_query(substrait_plan)
-        actual_result = actual_result.rename_columns(
-            list(map(str.lower, actual_result.column_names))
-        ).columns
-        result_list = []
-        for column in actual_result:
-            result_list.extend(column.data)
-            result_list.extend([' '])
-        str_result = '\n'.join(map(str, result_list))
-        snapshot.assert_match(str_result, f"{name}_result.txt")
+    except BaseException as e:
+        record_property("outcome", str(type(e)))
+        snapshot.assert_match(str(type(e)), outcome_path)
+        return
 
-    else:
-        pytest.skip(f"No substrait plan exists for {producer.name()}:{name}")
+    actual_result = actual_result.rename_columns(
+        list(map(str.lower, actual_result.column_names))
+    ).columns
+    result_list = []
+    for column in actual_result:
+        result_list.extend(column.data)
+        result_list.extend([" "])
+    str_result = "\n".join(map(str, result_list))
+    match_result = check_match(snapshot, str_result, result_path)
+    record_property("outcome", str(match_result))
+    snapshot.assert_match(str(match_result), outcome_path)
 
 
 def load_custom_duckdb_table(db_connection):

diff --git a/...nctions/approximation_snapshots/DataFusionProducer/approx_distinct-datafusion_outcome.txt b/...nctions/approximation_snapshots/DataFusionProducer/approx_distinct-datafusion_outcome.txt
diff --git a/...functions/approximation_snapshots/DuckDBProducer/approx_count_distinct-duckdb_outcome.txt b/...functions/approximation_snapshots/DuckDBProducer/approx_count_distinct-duckdb_outcome.txt
diff --git a/...roximation_snapshots/function_test_results/approx_count_distinct-duckdb-acero_outcome.txt b/...roximation_snapshots/function_test_results/approx_count_distinct-duckdb-acero_outcome.txt
diff --git a/...oximation_snapshots/function_test_results/approx_count_distinct-duckdb-duckdb_outcome.txt b/...oximation_snapshots/function_test_results/approx_count_distinct-duckdb-duckdb_outcome.txt
diff --git a/...functions/approximation_snapshots/function_test_results/approx_count_distinct_outcome.txt b/...functions/approximation_snapshots/function_test_results/approx_count_distinct_outcome.txt
diff --git a/...pproximation_snapshots/function_test_results/approx_distinct-datafusion-acero_outcome.txt b/...pproximation_snapshots/function_test_results/approx_distinct-datafusion-acero_outcome.txt
diff --git a/...nsion_functions/approximation_snapshots/function_test_results/approx_distinct_outcome.txt b/...nsion_functions/approximation_snapshots/function_test_results/approx_distinct_outcome.txt
diff --git a/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/avg-datafusion_outcome.txt b/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/avg-datafusion_outcome.txt
diff --git a/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/max-datafusion_outcome.txt b/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/max-datafusion_outcome.txt
diff --git a/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/min-datafusion_outcome.txt b/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/min-datafusion_outcome.txt
diff --git a/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/sum-datafusion_outcome.txt b/...sion_functions/arithmetic_decimal_snapshots/DataFusionProducer/sum-datafusion_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/add-duckdb_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/add-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/avg-duckdb_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/avg-duckdb_outcome.txt
diff --git a/...extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/divide-duckdb_outcome.txt b/...extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/divide-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/max-duckdb_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/max-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/min-duckdb_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/min-duckdb_outcome.txt
diff --git a/...xtension_functions/arithmetic_decimal_snapshots/DuckDBProducer/modulus-duckdb_outcome.txt b/...xtension_functions/arithmetic_decimal_snapshots/DuckDBProducer/modulus-duckdb_outcome.txt
diff --git a/...tension_functions/arithmetic_decimal_snapshots/DuckDBProducer/multiply-duckdb_outcome.txt b/...tension_functions/arithmetic_decimal_snapshots/DuckDBProducer/multiply-duckdb_outcome.txt
diff --git a/...tension_functions/arithmetic_decimal_snapshots/DuckDBProducer/subtract-duckdb_outcome.txt b/...tension_functions/arithmetic_decimal_snapshots/DuckDBProducer/subtract-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/sum-duckdb_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/DuckDBProducer/sum-duckdb_outcome.txt
diff --git a/...functions/arithmetic_decimal_snapshots/function_test_results/add-duckdb-acero_outcome.txt b/...functions/arithmetic_decimal_snapshots/function_test_results/add-duckdb-acero_outcome.txt
diff --git a/...unctions/arithmetic_decimal_snapshots/function_test_results/add-duckdb-duckdb_outcome.txt b/...unctions/arithmetic_decimal_snapshots/function_test_results/add-duckdb-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/function_test_results/add_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/function_test_results/add_outcome.txt
diff --git a/...tions/arithmetic_decimal_snapshots/function_test_results/avg-datafusion-acero_outcome.txt b/...tions/arithmetic_decimal_snapshots/function_test_results/avg-datafusion-acero_outcome.txt
diff --git a/.../arithmetic_decimal_snapshots/function_test_results/avg-datafusion-datafusion_outcome.txt b/.../arithmetic_decimal_snapshots/function_test_results/avg-datafusion-datafusion_outcome.txt
diff --git a/...functions/arithmetic_decimal_snapshots/function_test_results/avg-duckdb-acero_outcome.txt b/...functions/arithmetic_decimal_snapshots/function_test_results/avg-duckdb-acero_outcome.txt
diff --git a/...unctions/arithmetic_decimal_snapshots/function_test_results/avg-duckdb-duckdb_outcome.txt b/...unctions/arithmetic_decimal_snapshots/function_test_results/avg-duckdb-duckdb_outcome.txt
diff --git a/...al/extension_functions/arithmetic_decimal_snapshots/function_test_results/avg_outcome.txt b/...al/extension_functions/arithmetic_decimal_snapshots/function_test_results/avg_outcome.txt
diff --git a/...ctions/arithmetic_decimal_snapshots/function_test_results/divide-duckdb-acero_outcome.txt b/...ctions/arithmetic_decimal_snapshots/function_test_results/divide-duckdb-acero_outcome.txt