Skip to content

Conversation

@kevintang2022
Copy link
Contributor

@kevintang2022 kevintang2022 commented Nov 26, 2025

Description

Add checkQueryIntegrity permission check validation for Presto spark queries to give it the same behavior as Presto queries.

Motivation and Context

The query string is not validated currently for Presto spark queries, so this must be fixed in order to match the permissions checking behavior of Presto queries.

Impact

No impact on open source code unless the default implementation of checkQueryIntegrity is overridden in the access control.

Test Plan

Existing behavior is maintained for permissions checking. The dedicated checkQueryIntegrity tests that check for at least one call to the check still pass.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

Differential Revision: D87871753

Summary:
Sapphire is currently not calling checkQueryIntegrity.

checkQueryIntegrity is needed for DPAS authorization, and it is also needed for DPAS shadow call

Differential Revision: D87871753
@kevintang2022 kevintang2022 requested review from a team and shrinidhijoshi as code owners November 26, 2025 22:34
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Nov 26, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 26, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds query integrity validation via accessControl.checkQueryIntegrity for Sapphire Presto Spark executions, ensuring DPAS-related authorization is enforced for both DDL/control statements and regular queries before planning or execution proceeds.

Sequence diagram for Sapphire PrestoSpark query execution with query integrity validation

sequenceDiagram
    actor User
    participant Client as PrestoClient
    participant Factory as PrestoSparkQueryExecutionFactory
    participant Access as AccessControl
    participant Planner as QueryPlanner
    participant DDLTask as DDLDefinitionTask

    User->>Client: submit SQL
    Client->>Factory: create(session, sql, ...)
    activate Factory
    Factory->>Factory: prepareQuery(sql)
    Factory->>Factory: getQueryType(preparedQuery)

    alt queryType is DATA_DEFINITION or CONTROL
        Factory->>Access: checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)
        Access-->>Factory: integrityOk
        Factory->>DDLTask: execute(preparedStatement, ...)
        DDLTask-->>Factory: ddlResult
        Factory-->>Client: PrestoSparkDataDefinitionExecution
    else explainTypeValidate
        Factory-->>Client: accessControlChecker.createExecution(...)
    else regular query
        Factory->>Access: checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)
        Access-->>Factory: integrityOk
        Factory->>Planner: createQueryPlan(session, preparedQuery, ...)
        Planner-->>Factory: planAndMore
        Factory-->>Client: PrestoSparkQueryExecution
    end
    deactivate Factory
Loading

Class diagram for PrestoSparkQueryExecutionFactory and AccessControl usage with checkQueryIntegrity

classDiagram
    class PrestoSparkQueryExecutionFactory {
        +IPrestoSparkQueryExecution create(Session session, PreparedQuery preparedQuery, String sql, QueryStateTimer queryStateTimer, WarningCollector warningCollector, Object sparkContext)
    }

    class AccessControl {
        +checkQueryIntegrity(Identity identity, AccessControlContext accessControlContext, String sql, Map parameters, Map extraProperties)
    }

    class Session {
        +Identity getIdentity()
        +AccessControlContext getAccessControlContext()
    }

    class Identity
    class AccessControlContext

    class DDLDefinitionTask {
        +execute(...)
    }

    class QueryPlanner {
        +createQueryPlan(Session session, PreparedQuery preparedQuery, WarningCollector warningCollector, VariableAllocator variableAllocator, PlanNodeIdAllocator planNodeIdAllocator, Object sparkContext, String sql)
    }

    PrestoSparkQueryExecutionFactory --> AccessControl : uses
    PrestoSparkQueryExecutionFactory --> Session : uses
    Session --> Identity : returns
    Session --> AccessControlContext : returns
    PrestoSparkQueryExecutionFactory --> DDLDefinitionTask : uses for DDL_CONTROL
    PrestoSparkQueryExecutionFactory --> QueryPlanner : uses for regular queries
Loading

Flow diagram for query type handling with query integrity checks in PrestoSparkQueryExecutionFactory

flowchart TD
    A["create(session, preparedQuery, sql, ...)"] --> B["Determine queryType from preparedQuery"]

    B --> C{"queryType is DATA_DEFINITION or CONTROL?"}

    C -- Yes --> D["End analysis timer"]
    D --> E["accessControl.checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)"]
    E --> F["Lookup DDLDefinitionTask for statement class"]
    F --> G["Return PrestoSparkDataDefinitionExecution"]

    C -- No --> H{"preparedQuery.isExplainTypeValidate()?"}

    H -- Yes --> I["Return accessControlChecker.createExecution(...) (EXPLAIN VALIDATE)"]

    H -- No --> J["End analysis timer (existing)"]
    J --> K["accessControl.checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)"]
    K --> L["Build VariableAllocator and PlanNodeIdAllocator"]
    L --> M["queryPlanner.createQueryPlan(...)"]
    M --> N["Return normal PrestoSpark query execution"]
Loading

File-Level Changes

Change Details Files
Invoke query integrity validation for DDL/control queries before creating PrestoSparkDataDefinitionExecution.
  • After ending analysis for DDL/control queries, call accessControl.checkQueryIntegrity with the session identity, access control context, SQL text, and empty property maps
  • Ensure that DDL/control execution now performs the same query token validation used elsewhere in the system
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkQueryExecutionFactory.java
Invoke query integrity validation for non-DDL queries in the main planning path before creating the query plan.
  • In the non-EXPLAIN, non-DDL path, add a call to accessControl.checkQueryIntegrity prior to planning via queryPlanner.createQueryPlan
  • Use empty maps for additional property parameters, aligning with existing checkQueryIntegrity call signatures and keeping behavior focused on token validation
presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkQueryExecutionFactory.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The accessControl.checkQueryIntegrity(...) call is duplicated in two branches; consider extracting a small helper (e.g., checkQueryIntegrity(session, sql)) to avoid copy‑paste and keep future changes consistent.
  • In the DDL/CONTROL branch you call queryStateTimer.endAnalysis() before checkQueryIntegrity, whereas in other flows integrity checks may conceptually be part of analysis; consider aligning the ordering with the non‑Spark query path so timing/metrics semantics remain consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `accessControl.checkQueryIntegrity(...)` call is duplicated in two branches; consider extracting a small helper (e.g., `checkQueryIntegrity(session, sql)`) to avoid copy‑paste and keep future changes consistent.
- In the DDL/CONTROL branch you call `queryStateTimer.endAnalysis()` before `checkQueryIntegrity`, whereas in other flows integrity checks may conceptually be part of analysis; consider aligning the ordering with the non‑Spark query path so timing/metrics semantics remain consistent.

## Individual Comments

### Comment 1
<location> `presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkQueryExecutionFactory.java:673` </location>
<code_context>
             Optional<QueryType> queryType = StatementUtils.getQueryType(preparedQuery.getStatement().getClass());
             if (queryType.isPresent() && (queryType.get() == QueryType.DATA_DEFINITION || queryType.get() == QueryType.CONTROL)) {
                 queryStateTimer.endAnalysis();
+                accessControl.checkQueryIntegrity(session.getIdentity(), session.getAccessControlContext(), sql, ImmutableMap.of(), ImmutableMap.of());
                 DDLDefinitionTask<?> task = (DDLDefinitionTask<?>) ddlTasks.get(preparedQuery.getStatement().getClass());
                 return new PrestoSparkDataDefinitionExecution(task, preparedQuery.getStatement(), transactionManager, accessControl, metadata, session, queryStateTimer, warningCollector, sql);
</code_context>

<issue_to_address>
**🚨 question (security):** Consider whether query integrity checks should also apply to EXPLAIN VALIDATE and other non-DDL/non-DML statement types for consistency.

Currently `checkQueryIntegrity` is invoked for DDL/CONTROL and the general query path, but not in the `isExplainTypeValidate` branch. If this check is intended to protect against malformed or tampered SQL, please confirm whether EXPLAIN VALIDATE (and any other specialized branches) should also invoke it so they cannot bypass these protections.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Optional<QueryType> queryType = StatementUtils.getQueryType(preparedQuery.getStatement().getClass());
if (queryType.isPresent() && (queryType.get() == QueryType.DATA_DEFINITION || queryType.get() == QueryType.CONTROL)) {
queryStateTimer.endAnalysis();
accessControl.checkQueryIntegrity(session.getIdentity(), session.getAccessControlContext(), sql, ImmutableMap.of(), ImmutableMap.of());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 question (security): Consider whether query integrity checks should also apply to EXPLAIN VALIDATE and other non-DDL/non-DML statement types for consistency.

Currently checkQueryIntegrity is invoked for DDL/CONTROL and the general query path, but not in the isExplainTypeValidate branch. If this check is intended to protect against malformed or tampered SQL, please confirm whether EXPLAIN VALIDATE (and any other specialized branches) should also invoke it so they cannot bypass these protections.

@kevintang2022 kevintang2022 changed the title Enable validate query token on Sapphire Enable checkQueryIntegrity permissions check from Presto spark Nov 26, 2025
@kevintang2022 kevintang2022 changed the title Enable checkQueryIntegrity permissions check from Presto spark fix: enable checkQueryIntegrity permissions check from Presto spark Dec 1, 2025
@kevintang2022 kevintang2022 changed the title fix: enable checkQueryIntegrity permissions check from Presto spark fix: Add checkQueryIntegrity permissions check from Presto spark Dec 1, 2025
Copy link
Contributor

@singcha singcha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants