-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Type system
Wojciech Biela edited this page Apr 7, 2016
·
16 revisions
(jotting down ideas for formalizing the type system)
A type expression can be described by the following language:
TypeExpression
: Constant
| TypeConstructorInvocation
| Variable
| Tuple
Constant
: Integer
| TypeName
TypeConstructorInvocation
: TypeName '(' TypeExpression (',' TypeExpression)* ')'
TypeName
: IDENTIFIER
Variable
: IDENTIFIER # TODO: solve ambiguity with TypeName
Tuple
: '{' (TypeExpression (',' TypeExpression)*)? '}
Some examples are:
bigint
array(bigint)
map(varchar(10), bigint)
array(a)
->({array(a), array(a)}, array(a))
The last entry is the type of a function that takes two arrays of some arbitrary type as arguments and returns an array of that same type (e.g., a 2-arg concat function). This form will be necessary for representing the type of lambdas.
A "type constructor invocation" is the instantiation of what we currently call a "parametric type"
High level todo-list around typesystem. Subject to change:
- Unification of type parameters (type parameters, literal parameters, type calculation parameters, named type parameters for ROWs etc) - https://github.com/facebook/presto/issues/3938
- apply/synchronise 3 patches in some order (and rebase the other 2):
@cberner's "Improved framework for writing parametric functions" https://github.com/facebook/presto/pull/3926@pnowojski's work on unification of type parameters https://github.com/facebook/presto/pull/4005 and followup work still in progressDECIMAL related annotation changes for function definitions https://github.com/Teradata/presto/commit/55bdf2e5ad5196278ab03a377ae63c924dc8aeec
finalize parametrized VARCHAR and DECIMAL pull requests- Allow SqlType to only reference types, type variables and constants in
TypeSignatureParameter
. CurrentlyTypeLiteralCalculation
is also allowed. Introduce a TypeConstraint annotation to express the relationship between the arguments of the input types and the output types. - (re)design a well defined API to register and query about coercion, type hierarchy and other type metadata
- Currently type logic is scattered around the system
- static methods in FunctionRegistry
- static methods in TypeUtils
- Some goals:
- Have a TypeHierarchy object which is passed around the system and exposes type hierarchy information. Each type would register itself in TypeHierarchyBuilder with information about available coercions, type-only coercions etc.
- Some functionality now exposed as static methods should be exposed in more objective manner as methods of Type/TypeSignature interfaces. E.g. TypeUtils.resolveCalculatedType.
- Merge Type and ParametricType interfaces
- Currently type logic is scattered around the system
- Type inference (not bottom-up but "real", e.g. Hindley-Milner algorithm) https://www21.in.tum.de/~nipkow/pubs/aplas11.pdf ("Extending Hindley-Milner Type Inference with Coercive Structural Subtyping")
- lambdas (based on "real" type inference)
- extract signature parser as separate entity
- feed it with the type metadata API handle/object reference
- Make it ANTLR based
- rework ROW type syntax
- support computation of the return type of a vararg function with literal parameters
- extend supported functions set for DECIMAL
- add coercion decimal -> double, decimal -> bigint
- benchmark/improve performance for long DECIMALs (>64bits)
- make DECIMAL default type for literals (eg. "1.0")