-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Allow flexible thread pool monitoring for Spiller cpu pool #26695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideRefactors the spiller thread pool handling in PrestoServer to separately track a CPU-thread-pool executor pointer for stall monitoring while keeping a generic executor handle, and updates all server initialization and wiring to use the new pointer. Sequence diagram for PrestoServer spiller CPU pool initialization and wiringsequenceDiagram
participant PrestoServer
participant SystemConfig
participant SpillerExecutor as folly_CPUThreadPoolExecutor
participant TaskManager
PrestoServer->>SystemConfig: getSpillerNumCpuThreadsHwMultiplier()
SystemConfig-->>PrestoServer: multiplier
PrestoServer->>PrestoServer: compute numSpillerCpuThreads
alt numSpillerCpuThreads > 0
PrestoServer->>SpillerExecutor: create CPUThreadPoolExecutor(numSpillerCpuThreads, NamedThreadFactory)
PrestoServer->>PrestoServer: set spillerCpuExecutor_ = SpillerExecutor
PrestoServer->>PrestoServer: set spillerExecutor_ (std_unique_ptr_folly_Executor)
end
PrestoServer->>TaskManager: create TaskManager(driverCpuExecutor_, httpSrvCpuExecutor_, spillerCpuExecutor_)
TaskManager-->>PrestoServer: TaskManager instance
Class diagram for updated PrestoServer spiller executorsclassDiagram
class PrestoServer {
- folly_CPUThreadPoolExecutor* driverCpuExecutor_
- std_unique_ptr_folly_Executor spillerExecutor_
- folly_CPUThreadPoolExecutor* spillerCpuExecutor_
+ void initializeThreadPools()
+ void run()
+ void createTaskManager()
}
class TaskManager {
+ TaskManager(folly_CPUThreadPoolExecutor* driverCpuExecutor, folly_Executor* httpSrvCpuExecutor, folly_CPUThreadPoolExecutor* spillerCpuExecutor)
}
PrestoServer --> TaskManager : creates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- spillerCpuExecutor_ is never initialized when numSpillerCpuThreads == 0, but is still passed into TaskManager in createTaskManager(), so it should be explicitly initialized to nullptr and guarded wherever it is used.
- Having both spillerExecutor_ and spillerCpuExecutor_ pointing at the same pool introduces a risk of them getting out of sync; consider simplifying the ownership model (e.g., always using spillerExecutor_.get() where a raw pointer is needed or centralizing the cast) to avoid duplication.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- spillerCpuExecutor_ is never initialized when numSpillerCpuThreads == 0, but is still passed into TaskManager in createTaskManager(), so it should be explicitly initialized to nullptr and guarded wherever it is used.
- Having both spillerExecutor_ and spillerCpuExecutor_ pointing at the same pool introduces a risk of them getting out of sync; consider simplifying the ownership model (e.g., always using spillerExecutor_.get() where a raw pointer is needed or centralizing the cast) to avoid duplication.
## Individual Comments
### Comment 1
<location> `presto-native-execution/presto_cpp/main/PrestoServer.cpp:548-555` </location>
<code_context>
PRESTO_STARTUP_LOG(INFO)
- << "Spiller CPU executor '" << spillerExecutor_->getName() << "', has "
- << spillerExecutor_->numThreads() << " threads.";
+ << "Spiller CPU executor '" << spillerCpuExecutor_->getName()
+ << "', has " << spillerCpuExecutor_->numThreads() << " threads.";
} else {
PRESTO_STARTUP_LOG(INFO) << "Spill executor was not configured.";
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Align the null-check with the pointer being dereferenced to avoid potential null dereferences.
The guard condition checks `spillerExecutor_ != nullptr`, but the block dereferences `spillerCpuExecutor_`. If these ever get out of sync (e.g., `spillerExecutor_` set while `spillerCpuExecutor_` is left null), this will crash. Please either guard on `spillerCpuExecutor_` or ensure both pointers are checked before dereferencing `spillerCpuExecutor_`.
```suggestion
}
if (spillerCpuExecutor_ != nullptr) {
PRESTO_STARTUP_LOG(INFO)
<< "Spiller CPU executor '" << spillerCpuExecutor_->getName()
<< "', has " << spillerCpuExecutor_->numThreads() << " threads.";
} else {
PRESTO_STARTUP_LOG(INFO) << "Spill executor was not configured.";
}
auto* asyncDataCache = velox::cache::AsyncDataCache::getInstance();
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
…db#26695) Summary: as title Differential Revision: D87850298
a4b9441 to
68e0e7b
Compare
Summary: as title Differential Revision: D87850298
68e0e7b to
a0b943f
Compare
| // folly::CPUThreadPoolExecutor. The executor is stored as abstract type to | ||
| // provide flexibility of thread pool monitoring. The underlying | ||
| // folly::CPUThreadPoolExecutor can be obtained through 'spillerCpuExecutor_'. | ||
| std::unique_ptr<folly::Executor> spillerExecutor_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @duxiao1212 for this code. I'm not sure I follow the motivation for this change. Please can you help explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right @aditi-pandit - the original type worked. However, we need to change it to std::unique_ptr<folly::Executor> because the build system allows wrapping executors with monitoring capabilities.
Background
While investigating stuck driver issues, we identified the need for thread monitoring capability on the spiller thread executor. This helps detect and diagnose thread stalls during spill operations.
Implementation
By storing as the base type Executor, the executor can be wrapped with monitoring.
While keeping the typed spillerCpuExecutor_ pointer allows PeriodicTaskManager to access monitoring methods.
This follows the same pattern as driverExecutor_ and enables consistent thread stall detection across execution contexts.
Summary: as title
Differential Revision: D87850298
Release Notes