Add support for the SDQL dialect#391
Open
tmarzec wants to merge 14 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR:
DictionaryTypeandRecordTypewith custom parsing rules.dialects/sdql/.Related: sdql_query_public can emit MLIR for this dialect in amirsh/sdql_query#16.
Details
This PR introduces SDQL dialect implementation following Shaikhha et al. (2022), "Functional collection programming with semi-ring dictionaries" (doi: 10.1145/3527333), and aligned with its reference implementation contained in the repository sdql_query_public.
Types
It introduces two types:
DictionaryTypeandRecordType(builtin/Builtin.scala) with custom parsing rules (parse/AttrParser.scala).Operations and examples
The dialect introduces operations for most SDQL concepts contained in sdql/Sdql.scala, including dictionary construction and lookup (
sdql.empty_dictionary,sdql.create_dictionary,sdql.lookup_dictionary,sdql.dictionary_add), record construction and access (sdql.create_record,sdql.access_record,sdql.concat), and collection aggregation viasdql.sumwithsdql.yield. Usage examples are available under filecheck/dialects/sdql.To comment on more interesting representations, the
let inconstruct from SDQL doesn't have its own operation, since it follows the standard imperative flow of execution of MLIR programs. Thesum(x in e) econstruct is modeled using a region/block:Some operations include custom verification mechanisms/type checking.
TPC-H coverage
To ensure this implementation of the SDQL dialect is expressive enough, the codegen in sdql_query_public was extended (see amirsh/sdql_query#16) to have an option of emitting MLIR code, making use of the dialect for validation. The codegen successfully emits MLIR code for 19 out of 22 TPC-H queries (fails for 3 queries due to issues with type promotion, pending further investigation). All of those are successfully typechecked using
scair-opt(e.g.,scair-opt ./tests/filecheck/dialects/sdql/tpch-gen/q1.mlir). All generated queries are in thetests/filecheck/dialects/sdql/tpch-gendirectory, and can be regenerated using a script available in amirsh/sdql_query#16. The scriptrun_scair_opt.sh(added in this PR) automates typechecking those files.Interpreter
Another contribution of this PR is a very basic interpreter for the new dialect (see interpreter/src/main/scala-3/Dialects/Sdql.scala with examples in tests/filecheck/interpreter/sdql. It supports
sdql.empty_dictionary,sdql.sum,sdql.yield,sdql.create_dictionary,sdql.lookup_dictionary,sdql.create_record,sdql.access_record, and is heavily constrained on types (only supportsi32anddictionary<i32, i32>). To run, use./mill tools.runTool.run ./tests/filecheck/interpreter/sdql/dictionary_sum.sdql.