Skip to content

Conversation

jayshrivastava
Copy link
Collaborator

@jayshrivastava jayshrivastava commented Sep 29, 2025

test: add sqllogictest runner

This change adds a sqllogictest runner + CLI which runs .slt files in the tests/sqllogictest directory. Right
now, the runner uses a hardcoded distributed config (ex. with_network_shuffle_tasks(2) etc.) but can be
extended in the future. Using sqllogictest will make it much easier to write tests (especially for explain
and explain (analyze)). explain (analyze) tests will be added in a future commit as it is still being
implemented.

Also, this change deletes tests/distributed_aggregation.rs and moves the test cases to .slt files.

Documentation:


ci: add sqllogictest files to github ci

We now run any sqllogictest commands from tests/sqllogictest.sh in CI


misc: add .vscode dir to .gitignore

@jayshrivastava jayshrivastava marked this pull request as ready for review September 29, 2025 21:03
@jayshrivastava jayshrivastava force-pushed the js/sqllogictest branch 2 times, most recently from 3a10cd2 to 3c2d05e Compare September 29, 2025 21:35
Copy link
Collaborator

@robtandy robtandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI looks like his passed, but clicking on the individual tests and inspecting output shows,

...
❌ tests/sqllogictest/explain.slt: FAILED
...

which looked like it might have been missed.

@jayshrivastava jayshrivastava force-pushed the js/sqllogictest branch 2 times, most recently from 0272567 to 82ed70e Compare September 30, 2025 01:09
@jayshrivastava jayshrivastava changed the title Js/sqllogictest test: add sqllogictest runner Sep 30, 2025
@jayshrivastava jayshrivastava force-pushed the js/sqllogictest branch 2 times, most recently from 743b276 to 35d428b Compare September 30, 2025 01:30
This change adds a sqllogictest runner + CLI which runs .slt files in the tests/sqllogictest directory. Right
now, the runner uses a hardcoded distributed config (ex. `with_network_shuffle_tasks(2)` etc.) but can be
extended in the future. Using sqllogictest will make it much easier to write tests (especially for `explain`
and `explain (analyze)`). `explain (analyze)` tests will be added in a future commit as it is still being
implemented.

Also, this change deletes `tests/distributed_aggregation.rs` and moves the test cases to `.slt` files.

Documentation:
- https://sqlite.org/sqllogictest/doc/trunk/about.wiki
- https://github.com/risinglightdb/sqllogictest-rs
We now run any sqllogictest commands from tests/sqllogictest.sh in CI
@jayshrivastava
Copy link
Collaborator Author

@robtandy Fixed! Good catch

Copy link
Collaborator

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice to see more efforts towards testing 😄 , although I'm going to try to persuade you to not go down this path:

From sqllogictest definition itself:

Sqllogictest is a program designed to verify that an SQL database engine computes correct results by comparing the results to identical queries from other SQL database engines.

Here are my reason to think sqllogictests are not suitable for testing this codebase:

  • In this project, we care about bringing distributed capabilities to DataFusion, but it's not in our scope to test DataFusion correctness itself, that's done already upstream in their sqllogictest.

  • This approach is not suitable for performing assertions over big pieces of data, where we want to assert results against normal DataFusion regardless of what they are (e.g. assert_eq!(df_results, df_distributed_results)).

  • We lose insta as our snapshotting tool in favor of manually maintaining assertions over big portions of text. We can no longer do cargo insta accept for accepting changes in snapshots.

  • Knowing that we still need our current snapshot testing, we would be maintaining two paths of testing with overlapping intentions, when we could reinforce on one.

  • This approach requires adding additional custom scripts, pipelines, executable binaries and overall overhead to the project.

  • It's more code to maintain. In terms of amount of code, this approach would pay off if we were to add a lot of sqllogictests, but I don't think we want or should do it, we should focus on testing things like:

    • Are the metrics for each node reconstructed across the network?
    • Are we able to propagate config properly across network boundaries?
    • Are we able to encode/decode custom user nodes?

Answering to the PR description:

Using sqllogictest will make it much easier to write tests

I'm almost certain that writing tests this way is going to be far more complicated than using https://github.com/mitsuhiko/insta as we do in the other tests. I'd strongly recommend to get familiar and comfortable with the tool, and then reconsider if sqllogictest would help.

In short:

I'd redirect the efforts towards reinforcing on the current integration testing tooling in order to make it even more ergonomic so that we can bring more tests if needed, but I'd advocated towards not using sqllogictests in this project.

@jayshrivastava
Copy link
Collaborator Author

Ahh I didn't realize we had a solution for this. https://github.com/mitsuhiko/insta looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants