Skip to content

Add support for the SDQL dialect#391

Open
tmarzec wants to merge 14 commits into
edin-dal:mainfrom
tmarzec:sdql-mlir
Open

Add support for the SDQL dialect#391
tmarzec wants to merge 14 commits into
edin-dal:mainfrom
tmarzec:sdql-mlir

Conversation

@tmarzec
Copy link
Copy Markdown

@tmarzec tmarzec commented Dec 25, 2025

TL;DR

This PR:

  • Adds an SDQL dialect following Shaikhha et al. (2022, doi: 10.1145/3527333).
  • Introduces DictionaryType and RecordType with custom parsing rules.
  • Implements core SDQL operations for dictionaries, records, and sum, with filecheck examples under dialects/sdql/.
  • Adds a basic interpreter for a small, type-restricted subset of the dialect.

Related: sdql_query_public can emit MLIR for this dialect in amirsh/sdql_query#16.

Details

This PR introduces SDQL dialect implementation following Shaikhha et al. (2022), "Functional collection programming with semi-ring dictionaries" (doi: 10.1145/3527333), and aligned with its reference implementation contained in the repository sdql_query_public.

Types

It introduces two types: DictionaryType and RecordType (builtin/Builtin.scala) with custom parsing rules (parse/AttrParser.scala).

Operations and examples

The dialect introduces operations for most SDQL concepts contained in sdql/Sdql.scala, including dictionary construction and lookup (sdql.empty_dictionary, sdql.create_dictionary, sdql.lookup_dictionary, sdql.dictionary_add), record construction and access (sdql.create_record, sdql.access_record, sdql.concat), and collection aggregation via sdql.sum with sdql.yield. Usage examples are available under filecheck/dialects/sdql.

To comment on more interesting representations, the let in construct from SDQL doesn't have its own operation, since it follows the standard imperative flow of execution of MLIR programs. The sum(x in e) e construct is modeled using a region/block:

// sum (x in d) x.val

%d = sdql.empty_dictionary : dictionary<i32, f16>

%res = sdql.sum %d : dictionary<i32, f16> -> f16 {
^bb0(%k: i32, %v: f16):
  sdql.yield %v : f16
}

Some operations include custom verification mechanisms/type checking.

TPC-H coverage

To ensure this implementation of the SDQL dialect is expressive enough, the codegen in sdql_query_public was extended (see amirsh/sdql_query#16) to have an option of emitting MLIR code, making use of the dialect for validation. The codegen successfully emits MLIR code for 19 out of 22 TPC-H queries (fails for 3 queries due to issues with type promotion, pending further investigation). All of those are successfully typechecked using scair-opt (e.g., scair-opt ./tests/filecheck/dialects/sdql/tpch-gen/q1.mlir). All generated queries are in the tests/filecheck/dialects/sdql/tpch-gen directory, and can be regenerated using a script available in amirsh/sdql_query#16. The script run_scair_opt.sh (added in this PR) automates typechecking those files.

Interpreter

Another contribution of this PR is a very basic interpreter for the new dialect (see interpreter/src/main/scala-3/Dialects/Sdql.scala with examples in tests/filecheck/interpreter/sdql. It supports sdql.empty_dictionary, sdql.sum, sdql.yield, sdql.create_dictionary, sdql.lookup_dictionary, sdql.create_record, sdql.access_record, and is heavily constrained on types (only supports i32 and dictionary<i32, i32>). To run, use ./mill tools.runTool.run ./tests/filecheck/interpreter/sdql/dictionary_sum.sdql.

@tmarzec tmarzec marked this pull request as ready for review December 27, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant