-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[FLINK-38104][table] add table api support for model ml_predict #27108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| * | ||
| * @return the context resolved model metadata. | ||
| */ | ||
| ContextResolvedModel getModel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twalthr , this forces ContextResolvedModel to be PublicEvolving. I guess one way to avoid it is to remove this method from Model and just add it in ModelImpl
074a935 to
982f128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I left some comments.
| * runtime configuration options such as max-concurrent-operations, timeout, and execution mode | ||
| * settings. | ||
| * | ||
| * <p>Common runtime options include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not easy for the community to maintain the javadoc about these options. How about we just link the MLPredictRuntimeConfigOptions.
| // lit() is not serializable to sql. | ||
| if (options.isEmpty()) { | ||
| return tableEnvironment.fromCall( | ||
| "ML_PREDICT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use org.apache.flink.table.functions.BuiltInFunctionDefinitions#ML_PREDICT.getName() instead?
|
|
||
| @Override | ||
| public Table predict(Table table, ColumnList inputColumns, Map<String, String> options) { | ||
| // Use Expressions.map() instead of Expression.lit() to create a MAP literal since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Expressions.lit() ?
...-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/ModelImpl.java
Show resolved
Hide resolved
...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java
Show resolved
Hide resolved
|
|
||
| private final String name; | ||
| private final ContextResolvedModel model; | ||
| private final TableEnvironment env; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need TableEnv here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in https://github.com/apache/flink/pull/27108/files#diff-7c9c46d35bb24b9f87d4c993274ef8c0054894c86c72c8afcc0c4f6398562fdeR1570. Same pattern for TableReferenceExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I see FieldReferenceExpression doesn't have table environment. I don't see any need to include the table env. How about we relaxing the limit and add it back when we really need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mainly for validation I think. See
https://github.com/apache/flink/blob/master/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java#L1527-L1531
I think for FieldReferenceExpression, it just contains field name and type, no need to validate environment I guess. How about I add more comments here similar to table?
...able-api-java/src/main/java/org/apache/flink/table/expressions/ModelReferenceExpression.java
Show resolved
Hide resolved
|
|
||
| public abstract R visit(TableReferenceExpression tableReference); | ||
|
|
||
| public abstract R visit(ModelReferenceExpression modelReferenceExpression); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add else if branch in line 30?
| new int[0]); | ||
| inputStack.add(relBuilder.build()); | ||
| return tableArgCall; | ||
| } else if (resolvedArg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to use instance of here. It's better we reuse ExpressionConverter to convert these expression. How about let ExpressionConverterextends ResolvedExpressionVisitor<RexNode>?
...c/test/java/org/apache/flink/table/planner/plan/nodes/exec/stream/MLPredictTestPrograms.java
Show resolved
Hide resolved
824f5f1 to
208c58d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments
| throw new ValidationException("Anonymous models cannot be serialized."); | ||
| } | ||
|
|
||
| return "MODEL " + model.getIdentifier().asSerializableString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the name is lost after searilization. I am not sure whether we need to restore the object from the string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name are not used for TableReferenceExpression and literal argument name as well. These are not serialized as named argument call. I don't see this is restored from serialized string, looks mainly to convert it sql query looks
|
|
||
| private final String name; | ||
| private final ContextResolvedModel model; | ||
| private final TableEnvironment env; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I see FieldReferenceExpression doesn't have table environment. I don't see any need to include the table env. How about we relaxing the limit and add it back when we really need it.
| this.typeFactory = (FlinkTypeFactory) relBuilder.getRexBuilder().getTypeFactory(); | ||
| this.dataTypeFactory = | ||
| unwrapContext(relBuilder.getCluster()).getCatalogManager().getDataTypeFactory(); | ||
| this.inputStack = new java.util.ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.inputStack = new ArrayList<>();
| } | ||
| final RexTableArgCall tableArgCall = | ||
| new RexTableArgCall(rowType, inputStack.size(), partitionKeys, new int[0]); | ||
| inputStack.add(relBuilder.build()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maintain the stack and do the relBuilder.build() in the QueryOperationConverter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to do since inputStack depends on this call. Maybe that's why TableReferenceExpression was handled in QueryOperationConverter in the first place
| "from", | ||
| "registerFunction", | ||
| "fromCall", | ||
| "fromModelPath", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a tickect about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of the change
Add table api support for Model and ml_predict function in https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API
Brief change log
Modelinterface andModelImplimplementation for model and ml_predictfromModelPathandfromto constructModelfromTableEnvironmentModelReferenceExpressionand handle it inQueryOperationConverterContextResolvedModelVerifying this change
Unit and Integration test
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation