-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved N-1 join query performance for DW SQL #2631
base: main
Are you sure you want to change the base?
Conversation
/// </summary> | ||
/// <returns></returns> | ||
[TestMethod] | ||
public async override Task DeeplyNestedManyToOneJoinQuery() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be having JSON PATH applied in this case for the outer query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the idea here is to do one step further to make sure the results we generated using JSON PATH is same as the results generated previously using STRING AGG to ensure backward compatibility.
@@ -53,6 +53,10 @@ public static void Init() | |||
VerifierSettings.IgnoreMember<RuntimeConfig>(options => options.EnableAggregation); | |||
// Ignore the EnableAggregation as that's unimportant from a test standpoint. | |||
VerifierSettings.IgnoreMember<GraphQLRuntimeOptions>(options => options.EnableAggregation); | |||
// Ignore the EnableDwNto1JoinOpt as that's unimportant from a test standpoint. | |||
VerifierSettings.IgnoreMember<RuntimeConfig>(options => options.EnableDwNto1JoinOpt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to runtime config makes this publicly facing and something users should be aware of @JerryNixon on his inputs here.
@@ -11,7 +11,8 @@ public record GraphQLRuntimeOptions(bool Enabled = true, | |||
bool AllowIntrospection = true, | |||
int? DepthLimit = null, | |||
MultipleMutationOptions? MultipleMutationOptions = null, | |||
bool EnableAggregation = true) | |||
bool EnableAggregation = true, | |||
bool EnableDwNto1JoinOpt = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dab users would not see the benefits of this perf improvement and it is a very unintuitive thing to go to config file and enable. @aaronburtle any other way of having a feature flag apart from adding it in the runtime config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can flip this flag to always true once we've verified with customers. So that everyone else can benefit from it without flip it in config file
@aaronburtle , pls let me know if there is a better way to achieve the purpose without making it public configuration to everyone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DAB users will benefit if they use dwsql with Synapse or fabric warehouses/SAE. It doesnt need to be a public configuration though. We should enable this optimization by a default after building confidence in GraphQL workload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{
"feature-flags": {
"enable-sqldw-join-rewrite": true
}
"runtime": {
...
}
}
Great. Let's not document these or add them to dab validate
or dab config
. I think feature flags are temporary and might be removed later. So, docs would confuse the matter. I think updating JSON schema for feature-flags
still makes sense so schema validation can still pass. Meaning, allowing /anything/ inside feature-flags
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest update: looks like we dont necessarily need this option altogether in the config/schema. So, @lingxiao-microsoft is looking at if he can get away without adding the feature-flags
section altogether
@@ -15,8 +15,17 @@ namespace Azure.DataApiBuilder.Core.Resolvers | |||
public class DwSqlQueryBuilder : BaseSqlQueryBuilder, IQueryBuilder | |||
{ | |||
private static DbCommandBuilder _builder = new SqlCommandBuilder(); | |||
private readonly bool _enableNto1JoinOpt; | |||
private const string FOR_JSON_SUFFIX = " FOR JSON PATH, INCLUDE_NULL_VALUES"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for other constants we dont prefix it with space (see the WITHOUT_ARRAY_WRAPPER) should probably stick to ensure we follow same format and there are not ghost spaces added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually needed similar to a FROM
, else we will get error like '"Incorrect syntax near 'ASCFOR'."'
/// </summary> | ||
/// <param name="structure"></param> | ||
/// <returns></returns> | ||
private string BuildWithJsonFunc(SqlQueryStructure structure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is this should exactly mimic the logic for mssql file, any chance of code reuse to prevent having to maintain two version os essentially the same TSQL code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point, I was thinking the same.
There are 3 differences comparing with mssql
- dwsql does not support mutation, so there is no
MultipleCreateOption
available - dwsql never adopts the AddEscapeToLikeClauses
- mssql can directly append the FOR JSON to their sub-query and use it for recursive calls, for dwsql, it has to be treated differently.
I've built a new base class BaseTSqlQueryBuilder
for dwsql and mssql to better concise the code
/// </summary> | ||
/// <param name="structure">Sql query structure to build query on</param> | ||
/// <returns></returns> | ||
protected virtual string BuildJsonPath(SqlQueryStructure structure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Centralizing the functions so it can be used or override by both dwsql and mssql query builder.
/// Base query builder class for T-SQL engine | ||
/// Can be used by dwsql and mssql | ||
/// </summary> | ||
public abstract class BaseTSqlQueryBuilder : BaseSqlQueryBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've built a new base class BaseTSqlQueryBuilder
for dwsql and mssql to better concise the code
@@ -113,6 +113,17 @@ internal GraphQLRuntimeOptionsConverter(bool replaceEnvVar) | |||
throw new JsonException($"Unexpected type of value entered for enable-aggregation: {reader.TokenType}"); | |||
} | |||
|
|||
break; | |||
case "enable-dw-nto1joinopt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on our discussion, this may not be needed.
Why make this change?
Back 1 year ago, as DW SQL does not support
JSON PATH
for converting the execution to json format. Hence we had to useSTRING_AGG
as the workaround.Recently, we've noticed
JSON PATH
is now supported for outer query in DW, and we can useJSON_OBJECT
+JSON_PATH
to address the json conversion for N-to-1 relations, which can optimize the performance.For N-N relations, we're still looking into any resolutions as
JSON_ARRAYAGG
does not provide much performance improvements.For other scenarios when joins are not needed for a simple SELECT, we will
JSON PATH
instead ofSTRING_AGG
for better performance.What is this change?
This PR covers
JSON_OBJECT
to generate the columns for sub-queries and appliedJSON PATH
to handle outer query, which fully replace the need ofSTRING_AGG
.JSON PATH
to replace the need ofSTRING_AGG
for better performance as well. This will have impact on aggregations, non-join queries and pagination.How was this tested?
Unit Tests
Integration Tests
Manual Testing - Join Scenarios
Query 1-1 relation - As expected, optimization applied
Query N-1 relation - As expected, optimization applied
Query 1-N Relation - As expected, optimization not applied
Query N-N Relation - As expected, optimization not applied
Other Scenarios
We've applied the
JSON PATH
when there is no join in the query to replace theSTRING_AGG
for better performance.Aggregation
Non-Join Query
Pagination
N to 1, total items: 3


N to N


