Skip to content

Latest commit

 

History

History
334 lines (239 loc) · 10.6 KB

README.adoc

File metadata and controls

334 lines (239 loc) · 10.6 KB

PartiQL Transpiler

Scribe is a compiler framework for the PartiQL SQL dialect. It is considered experimental and is under active development.

Upgrading PartiQL

Scribe will depend on a PartiQL major-version, but for now we shadow partiql-lang-kotlin to a single dependency.

# From partiql-lang-kotlin produce a JAR

./gradlew :partiql-lang:shadowJar

cp ./partiql-lang/build/libs/partiql-lang-kotlin-version.jar ./path/to/Scribe/libs

# Also copy the local plugin (we should consider bundling with SPI)

./gradlew :plugins:partiql-local:installDist

cp ./plugins/partiql-local/build/libs/partiql-local-version.jar ./path/to/Scribe/libs

Terms

  • SQL — Specifically the SQL-99 Data Query Language specification, colloquially select-from-where

  • Dialect — An implementation of the SQL language specification

  • Target — A representation of some computation

  • Catalog — Schemas, tables, types, functions, and operators available in a target

Usage

Scribe leverages PartiQL’s plugin system and planner to produce a resolved and typed logical query plan. This plan is passed to a target implementation to be transformed to the domain specific output.

Note
Much of the transpiler involves manipulating both the AST and Plan which are PartiQL intermediate representations. This PartiQL AST README has tips on working with these structures.
Creating the Transpiler
// PartiQL's plugin system is how you provide tables and schemas to the planner (à la Trino).
val plugin = MyPlugin()

// Instantiate the transpiler once. It can be re-used!
val transpiler = PartiQLTranspiler(listOf(plugin))

Suppose you have some table

CREATE TABLE orders (
    order_id   STRING PRIMARY KEY, -- PartiQL STRING type
    ordered_at DATETIME NOT NULL   -- PartiQL DATETIME type
);

How do you express this query for a different SQL engine like Trino?

// Get all orders in the last 30 days
val query = """
    SELECT order_id FROM orders
    WHERE ordered_at > date_diff(day, -30, UTCNOW())
"""

// TrinoTarget holds the translation rules from a PartiQL plan to Trino SQL
val target = TrinoTarget()

// Planner session, assuming your table `orders` exists in the "default" catalog
val session = PartiQLPlanner.Session(
    queryId = "readme_query_id",
    userId = "readme_user_id",
    currentCatalog = "default",
)

// Invoke the transpiler
val result = transpiler.transpile(query, target, session)

println(result.value)
// Output:
//   SELECT orders.order_id AS order_id FROM orders AS orders
//   WHERE orders.ordered_at > date_add('day', -30, at_timezone(current_timestamp, 'UTC'))

Shell

There’s a shell which can be useful for rapid experimentation. You can also attach a debugger to this shell.

Note
You may load the TPC-DS schemas in catalogs/ or provide your own catalog.
Debugging
# Install
./gradlew install

# Execute
./build/install/scribe/bin/scribe --help

Usage: scribe [-hV] [--catalog=catalog] -d=DIR

The PartiQL Transpiler Debug REPL
This REPL is used for debugging the transpiler
      --catalog=catalog   Catalog, use `default` .. by default
                              Default: default
  -d=DIR                    Database root directory
  -h, --help                Show this help message and exit.
  -V, --version             Print version information and exit.


./build/install/scribe/bin/scribe -d ./catalogs --catalog=tpc_ds

 ____ ____ ____ ____ ____ ____
||S |||C |||R |||I |||B |||E ||
||__|||__|||__|||__|||__|||__||
|/__\|/__\|/__\|/__\|/__\|/__\|


scribe ▶ .h

.h                  Print this message
.s                  Print command history
.q                  Disconnect
.d                  Describe catalog
.dt table           Describe table
.debug on|off       Toggle debug printing
.t target           Get/Set the transpiler target
.clear              Clear screen
Note
You can attach a debugger with JAVA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=localhost:5050 …​

Overview

Scribe is a framework to plug different compilation backends. Perhaps this project should be renamed to BYOB (bring your own backend). For now, we only provide SQL source-to-source compilation (hence "transpile"), but you could conceive of several non-SQL targets such as:

Producing SQL

For now, Scribe provides two simple SQL text targets. Each dialect is quite similar (hence dialect) so much of the base translation from PartiQL’s logical plan to an SQL AST is captured by org.partiql.transpiler.sql.SqlTransform.

This applies a transformation of relational algebra to an SQL AST just like Calcite’s RelToSqlConverter; however, this is currently more limited than Calcite’s.

Much of the differences between dialects comes down to scalar functions, but it’s often the case that each dialect has functions with similar functionality albeit different names. This is shown in the earlier UTCNOW() example.

Common Interfaces

The most useful interfaces to implement for an SQL target are

  • SqlTarget<T> — Base transpiler target interface

  • SqlTarget — Base SqlTarget<String> implementation for an SQL dialect target

  • SqlCalls — Ruleset for rewriting scalar calls

  • SqlTransform — Ruleset for RelToSql conversion

Development

Let’s work through an example of developing our own SQL target using SQLite as the target. How might we transpile?

SELECT CAST(a AS STRING) FROM T

With basic familiarity of SQLite, we know that STRING is not a valid type name, and we should replace it with TEXT. How do we express this in a transpilation target?

Tutorial

Extend SqlTarget
public object SQLiteTarget : SqlTarget() {

    override val target: String = "SQLite"

    // Using SQLite3
    override val version: String = "3"

    // Override the default call ruleset with the SQLiteCalls ruleset
    override fun getCalls(onProblem: Problemhandler): SqlCalls = SQLiteCalls()

    // No need to rewrite the plan, return as is
    override fun rewrite(plan: PartiQLPlan, onProblem: ProblemCallback) = plan
}
Note
I’m conflicted on how to pass the problem handler to SqlCalls, so that’s subject to change.
Provide Scalar Function Ruleset
@OptIn(PartiQLValueExperimental::class)
public class SQLiteCalls : SqlCalls() {

    /**
    * SqlCalls has many open functions which you can extend to override for edge cases.
    */
    override fun rewriteCast(type: PartiQLValueType, args: SqlArgs): Expr = Ast.create {
        if (type == PartiQLValueType.STRING) {
            // do something special for `CAST(.. AS STRING)`
            Ast.create { exprCast(args[0].expr, typeCustom("TEXT")) }
        } else {
            return super.rewriteCast(type, args)
        }
    }
}

This is reasonable, but what about replacing all occurrences of STRING with TEXT? It would be a cumbersome to track down all the places a type might be used (like this IS special form is another).

We can actually also extend how SQL is rendered to text via an extendable query printing framework. See Pull #1183. You can provide the pretty-printer a Dialect which contains base behavior for translating from an AST to a Block tree where the Block tree is a basic formatting structure.

Let’s implement SQLiteDialect and wire it to our SQLiteTarget.

Defining a Dialect
public object SQLiteDialect : SqlDialect() {

    override fun visitTypeString(node: Type.String, head: SqlBlock) =
        SqlBlock.Link(head, SqlBlock.Text("TEXT"))
}
Providing the Dialect

All this says is during the fold from an AST to Block tree, is to append the string "TEXT" to the tree. We can use this dialect for our target by overriding the dialect field.

public object SQLiteTarget : SqlTarget() {

    // ... same as before

    // hook up the pretty-printer rules
    override val dialect = SQLiteDialect
}

Testing

PartiQL Scribe has a simple testing framework whereby each target asserts its desired output against a shared set of input queries (defined in test/resources/inputs/).

If you wish to add a new test; please add in one of the .sql files of test/resources/inputs/ with a unique name.

Inputs

All tests within a directory are flattened; you may define multiple tests in one file.

-- Tests are named with the macro `--#[my-test-name]`

--#[readme-example-00]
SELECT header FROM readme;

-- be sure to terminate a statement with `;`

--#[readme-example-01]
SELECT x, y, z FROM T
WHERE x BETWEEN y AND z;

Outputs

Similar to inputs, you’ll see that expected test outputs are stored in test/resources/outputs. The default test suite will produce a junit test for each expected output. You may implement additional junit tests for negative testing.

Please see test / org.partiql.scribe.targets.PartiQLTargetSuite as an example.

Appendix

I. PartiQL Value Schema Language

Testing schemas are described using a modified version of the Avro JSON schema. The changes are (1) it’s Ion and (2) we use the PartiQL type names.

Basic Type Schema Examples
// type name atomic types
"int"

// type list for union types
[ "int", "null" ]

// Collection Type
{
  type: "bag",  // valid values "bag", "list", "sexp"
  items: type
}

// Struct Type
{
  type: "struct",
  fields: [
    {
      name: "foo",
      type: type
    },
    // ....
  ]
}
Important
For now, we omit constraints such as open/closed structs.

II. PartiQL FS Plugin

Note
Directories are nested schemas; files represent table schema where the table name is the file name (without .ion).

It will eventually be merged with mockdb. The only difference is that is memoizes parsed schemas and uses PartiQL Value Schema syntax.