-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix: Add checkQueryIntegrity permissions check from Presto spark #26709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary: Sapphire is currently not calling checkQueryIntegrity. checkQueryIntegrity is needed for DPAS authorization, and it is also needed for DPAS shadow call Differential Revision: D87871753
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds query integrity validation via accessControl.checkQueryIntegrity for Sapphire Presto Spark executions, ensuring DPAS-related authorization is enforced for both DDL/control statements and regular queries before planning or execution proceeds. Sequence diagram for Sapphire PrestoSpark query execution with query integrity validationsequenceDiagram
actor User
participant Client as PrestoClient
participant Factory as PrestoSparkQueryExecutionFactory
participant Access as AccessControl
participant Planner as QueryPlanner
participant DDLTask as DDLDefinitionTask
User->>Client: submit SQL
Client->>Factory: create(session, sql, ...)
activate Factory
Factory->>Factory: prepareQuery(sql)
Factory->>Factory: getQueryType(preparedQuery)
alt queryType is DATA_DEFINITION or CONTROL
Factory->>Access: checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)
Access-->>Factory: integrityOk
Factory->>DDLTask: execute(preparedStatement, ...)
DDLTask-->>Factory: ddlResult
Factory-->>Client: PrestoSparkDataDefinitionExecution
else explainTypeValidate
Factory-->>Client: accessControlChecker.createExecution(...)
else regular query
Factory->>Access: checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)
Access-->>Factory: integrityOk
Factory->>Planner: createQueryPlan(session, preparedQuery, ...)
Planner-->>Factory: planAndMore
Factory-->>Client: PrestoSparkQueryExecution
end
deactivate Factory
Class diagram for PrestoSparkQueryExecutionFactory and AccessControl usage with checkQueryIntegrityclassDiagram
class PrestoSparkQueryExecutionFactory {
+IPrestoSparkQueryExecution create(Session session, PreparedQuery preparedQuery, String sql, QueryStateTimer queryStateTimer, WarningCollector warningCollector, Object sparkContext)
}
class AccessControl {
+checkQueryIntegrity(Identity identity, AccessControlContext accessControlContext, String sql, Map parameters, Map extraProperties)
}
class Session {
+Identity getIdentity()
+AccessControlContext getAccessControlContext()
}
class Identity
class AccessControlContext
class DDLDefinitionTask {
+execute(...)
}
class QueryPlanner {
+createQueryPlan(Session session, PreparedQuery preparedQuery, WarningCollector warningCollector, VariableAllocator variableAllocator, PlanNodeIdAllocator planNodeIdAllocator, Object sparkContext, String sql)
}
PrestoSparkQueryExecutionFactory --> AccessControl : uses
PrestoSparkQueryExecutionFactory --> Session : uses
Session --> Identity : returns
Session --> AccessControlContext : returns
PrestoSparkQueryExecutionFactory --> DDLDefinitionTask : uses for DDL_CONTROL
PrestoSparkQueryExecutionFactory --> QueryPlanner : uses for regular queries
Flow diagram for query type handling with query integrity checks in PrestoSparkQueryExecutionFactoryflowchart TD
A["create(session, preparedQuery, sql, ...)"] --> B["Determine queryType from preparedQuery"]
B --> C{"queryType is DATA_DEFINITION or CONTROL?"}
C -- Yes --> D["End analysis timer"]
D --> E["accessControl.checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)"]
E --> F["Lookup DDLDefinitionTask for statement class"]
F --> G["Return PrestoSparkDataDefinitionExecution"]
C -- No --> H{"preparedQuery.isExplainTypeValidate()?"}
H -- Yes --> I["Return accessControlChecker.createExecution(...) (EXPLAIN VALIDATE)"]
H -- No --> J["End analysis timer (existing)"]
J --> K["accessControl.checkQueryIntegrity(identity, accessControlContext, sql, emptyMap, emptyMap)"]
K --> L["Build VariableAllocator and PlanNodeIdAllocator"]
L --> M["queryPlanner.createQueryPlan(...)"]
M --> N["Return normal PrestoSpark query execution"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The
accessControl.checkQueryIntegrity(...)call is duplicated in two branches; consider extracting a small helper (e.g.,checkQueryIntegrity(session, sql)) to avoid copy‑paste and keep future changes consistent. - In the DDL/CONTROL branch you call
queryStateTimer.endAnalysis()beforecheckQueryIntegrity, whereas in other flows integrity checks may conceptually be part of analysis; consider aligning the ordering with the non‑Spark query path so timing/metrics semantics remain consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `accessControl.checkQueryIntegrity(...)` call is duplicated in two branches; consider extracting a small helper (e.g., `checkQueryIntegrity(session, sql)`) to avoid copy‑paste and keep future changes consistent.
- In the DDL/CONTROL branch you call `queryStateTimer.endAnalysis()` before `checkQueryIntegrity`, whereas in other flows integrity checks may conceptually be part of analysis; consider aligning the ordering with the non‑Spark query path so timing/metrics semantics remain consistent.
## Individual Comments
### Comment 1
<location> `presto-spark-base/src/main/java/com/facebook/presto/spark/PrestoSparkQueryExecutionFactory.java:673` </location>
<code_context>
Optional<QueryType> queryType = StatementUtils.getQueryType(preparedQuery.getStatement().getClass());
if (queryType.isPresent() && (queryType.get() == QueryType.DATA_DEFINITION || queryType.get() == QueryType.CONTROL)) {
queryStateTimer.endAnalysis();
+ accessControl.checkQueryIntegrity(session.getIdentity(), session.getAccessControlContext(), sql, ImmutableMap.of(), ImmutableMap.of());
DDLDefinitionTask<?> task = (DDLDefinitionTask<?>) ddlTasks.get(preparedQuery.getStatement().getClass());
return new PrestoSparkDataDefinitionExecution(task, preparedQuery.getStatement(), transactionManager, accessControl, metadata, session, queryStateTimer, warningCollector, sql);
</code_context>
<issue_to_address>
**🚨 question (security):** Consider whether query integrity checks should also apply to EXPLAIN VALIDATE and other non-DDL/non-DML statement types for consistency.
Currently `checkQueryIntegrity` is invoked for DDL/CONTROL and the general query path, but not in the `isExplainTypeValidate` branch. If this check is intended to protect against malformed or tampered SQL, please confirm whether EXPLAIN VALIDATE (and any other specialized branches) should also invoke it so they cannot bypass these protections.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| Optional<QueryType> queryType = StatementUtils.getQueryType(preparedQuery.getStatement().getClass()); | ||
| if (queryType.isPresent() && (queryType.get() == QueryType.DATA_DEFINITION || queryType.get() == QueryType.CONTROL)) { | ||
| queryStateTimer.endAnalysis(); | ||
| accessControl.checkQueryIntegrity(session.getIdentity(), session.getAccessControlContext(), sql, ImmutableMap.of(), ImmutableMap.of()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚨 question (security): Consider whether query integrity checks should also apply to EXPLAIN VALIDATE and other non-DDL/non-DML statement types for consistency.
Currently checkQueryIntegrity is invoked for DDL/CONTROL and the general query path, but not in the isExplainTypeValidate branch. If this check is intended to protect against malformed or tampered SQL, please confirm whether EXPLAIN VALIDATE (and any other specialized branches) should also invoke it so they cannot bypass these protections.
singcha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
Add checkQueryIntegrity permission check validation for Presto spark queries to give it the same behavior as Presto queries.
Motivation and Context
The query string is not validated currently for Presto spark queries, so this must be fixed in order to match the permissions checking behavior of Presto queries.
Impact
No impact on open source code unless the default implementation of checkQueryIntegrity is overridden in the access control.
Test Plan
Existing behavior is maintained for permissions checking. The dedicated checkQueryIntegrity tests that check for at least one call to the check still pass.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Differential Revision: D87871753