Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved N-1 join query performance for DW SQL #2631

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lingxiao-microsoft
Copy link
Contributor

@lingxiao-microsoft lingxiao-microsoft commented Mar 21, 2025

Why make this change?

  • Back 1 year ago, as DW SQL does not support JSON PATH for converting the execution to json format. Hence we had to use STRING_AGG as the workaround.

  • Recently, we've noticed JSON PATH is now supported for outer query in DW, and we can use JSON_OBJECT + JSON_PATH to address the json conversion for N-to-1 relations, which can optimize the performance.

  • For N-N relations, we're still looking into any resolutions as JSON_ARRAYAGG does not provide much performance improvements.

  • For other scenarios when joins are not needed for a simple SELECT, we will JSON PATH instead of STRING_AGG for better performance.

What is this change?

This PR covers

  1. Introduced a feature flag to safeguard the changes, the feature flag is default as False when not provided to avoid any regressions. It will be removed once the changes are validated in production with scoped audiences.
  2. For DW query builder, use JSON_OBJECT to generate the columns for sub-queries and applied JSON PATH to handle outer query, which fully replace the need of STRING_AGG.
  3. Also, for non-join queries (in which we don't need to handle the relations), used JSON PATH to replace the need of STRING_AGG for better performance as well. This will have impact on aggregations, non-join queries and pagination.
  4. Added some helper functions into the unit tests module, which aims to compare the results from GraphQL & DB engine easily for deeply nested queries.

How was this tested?

  • Unit Tests

    • As this change does not introduce new scenarios, so mostly added some new test cases to get more coverage when M-M / M-1 join queries are needed.
  • Integration Tests

Manual Testing - Join Scenarios

Query 1-1 relation - As expected, optimization applied

image
image

Query N-1 relation - As expected, optimization applied

image
image

Query 1-N Relation - As expected, optimization not applied

image
image

Query N-N Relation - As expected, optimization not applied

image
image

Other Scenarios

We've applied the JSON PATH when there is no join in the query to replace the STRING_AGG for better performance.

Aggregation

image

Non-Join Query

image

Pagination

  • N to 1, total items: 3
    image
    image

  • N to N
    image
    image
    image

@lingxiao-microsoft lingxiao-microsoft changed the title Use json_object to improve generated DW query performance Improve N-1 join query performance for DW Mar 21, 2025
@lingxiao-microsoft lingxiao-microsoft changed the title Improve N-1 join query performance for DW Improved N-1 join query performance for DW SQL Mar 21, 2025
/// </summary>
/// <returns></returns>
[TestMethod]
public async override Task DeeplyNestedManyToOneJoinQuery()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be having JSON PATH applied in this case for the outer query?

Copy link
Contributor Author

@lingxiao-microsoft lingxiao-microsoft Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the idea here is to do one step further to make sure the results we generated using JSON PATH is same as the results generated previously using STRING AGG to ensure backward compatibility.

@@ -53,6 +53,10 @@ public static void Init()
VerifierSettings.IgnoreMember<RuntimeConfig>(options => options.EnableAggregation);
// Ignore the EnableAggregation as that's unimportant from a test standpoint.
VerifierSettings.IgnoreMember<GraphQLRuntimeOptions>(options => options.EnableAggregation);
// Ignore the EnableDwNto1JoinOpt as that's unimportant from a test standpoint.
VerifierSettings.IgnoreMember<RuntimeConfig>(options => options.EnableDwNto1JoinOpt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to runtime config makes this publicly facing and something users should be aware of @JerryNixon on his inputs here.

@@ -11,7 +11,8 @@ public record GraphQLRuntimeOptions(bool Enabled = true,
bool AllowIntrospection = true,
int? DepthLimit = null,
MultipleMutationOptions? MultipleMutationOptions = null,
bool EnableAggregation = true)
bool EnableAggregation = true,
bool EnableDwNto1JoinOpt = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dab users would not see the benefits of this perf improvement and it is a very unintuitive thing to go to config file and enable. @aaronburtle any other way of having a feature flag apart from adding it in the runtime config?

Copy link
Contributor Author

@lingxiao-microsoft lingxiao-microsoft Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can flip this flag to always true once we've verified with customers. So that everyone else can benefit from it without flip it in config file

@aaronburtle , pls let me know if there is a better way to achieve the purpose without making it public configuration to everyone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAB users will benefit if they use dwsql with Synapse or fabric warehouses/SAE. It doesnt need to be a public configuration though. We should enable this optimization by a default after building confidence in GraphQL workload.

Copy link
Contributor

@JerryNixon JerryNixon Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
    "feature-flags": {
        "enable-sqldw-join-rewrite": true
    }
    "runtime": {
        ...
    }
}

Great. Let's not document these or add them to dab validate or dab config. I think feature flags are temporary and might be removed later. So, docs would confuse the matter. I think updating JSON schema for feature-flags still makes sense so schema validation can still pass. Meaning, allowing /anything/ inside feature-flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest update: looks like we dont necessarily need this option altogether in the config/schema. So, @lingxiao-microsoft is looking at if he can get away without adding the feature-flags section altogether

@@ -15,8 +15,17 @@ namespace Azure.DataApiBuilder.Core.Resolvers
public class DwSqlQueryBuilder : BaseSqlQueryBuilder, IQueryBuilder
{
private static DbCommandBuilder _builder = new SqlCommandBuilder();
private readonly bool _enableNto1JoinOpt;
private const string FOR_JSON_SUFFIX = " FOR JSON PATH, INCLUDE_NULL_VALUES";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for other constants we dont prefix it with space (see the WITHOUT_ARRAY_WRAPPER) should probably stick to ensure we follow same format and there are not ghost spaces added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually needed similar to a FROM, else we will get error like '"Incorrect syntax near 'ASCFOR'."'

/// </summary>
/// <param name="structure"></param>
/// <returns></returns>
private string BuildWithJsonFunc(SqlQueryStructure structure)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is this should exactly mimic the logic for mssql file, any chance of code reuse to prevent having to maintain two version os essentially the same TSQL code?

Copy link
Contributor Author

@lingxiao-microsoft lingxiao-microsoft Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I was thinking the same.

There are 3 differences comparing with mssql

  1. dwsql does not support mutation, so there is no MultipleCreateOption available
  2. dwsql never adopts the AddEscapeToLikeClauses
  3. mssql can directly append the FOR JSON to their sub-query and use it for recursive calls, for dwsql, it has to be treated differently.

I've built a new base class BaseTSqlQueryBuilder for dwsql and mssql to better concise the code

/// </summary>
/// <param name="structure">Sql query structure to build query on</param>
/// <returns></returns>
protected virtual string BuildJsonPath(SqlQueryStructure structure)
Copy link
Contributor Author

@lingxiao-microsoft lingxiao-microsoft Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Centralizing the functions so it can be used or override by both dwsql and mssql query builder.

/// Base query builder class for T-SQL engine
/// Can be used by dwsql and mssql
/// </summary>
public abstract class BaseTSqlQueryBuilder : BaseSqlQueryBuilder
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've built a new base class BaseTSqlQueryBuilder for dwsql and mssql to better concise the code

@@ -113,6 +113,17 @@ internal GraphQLRuntimeOptionsConverter(bool replaceEnvVar)
throw new JsonException($"Unexpected type of value entered for enable-aggregation: {reader.TokenType}");
}

break;
case "enable-dw-nto1joinopt":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on our discussion, this may not be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants