-
-
Notifications
You must be signed in to change notification settings - Fork 168
feat(sched): 实现初步的cpu多核心上的负载均衡 #1421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements initial multi-core CPU load balancing for the DragonOS scheduler. The feature enables tasks to be distributed across multiple CPU cores based on load conditions, with a priority-based selection strategy that balances performance (cache locality) against load distribution.
Key changes include:
- Added a new load balancer module with CPU selection logic for task wakeup
- Integrated load balancing into the scheduler tick and process wakeup paths
- Added test programs to verify load distribution across CPUs
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| kernel/src/sched/load_balance.rs | New module implementing CPU selection strategy and load balancing logic |
| kernel/src/sched/mod.rs | Integrated load balancer into scheduler tick, added nr_running_lockless() method and migration flag handling |
| kernel/src/process/mod.rs | Updated process wakeup to use load balancer for CPU selection with migration support |
| kernel/src/smp/mod.rs | Enabled load balancing after SMP initialization |
| kernel/src/smp/core.rs | Updated documentation comment style from @brief to ## |
| user/apps/c_unitest/test_smp_balance.c | Simple test program to verify multi-core load balancing |
| user/apps/c_unitest/test_load_balance.c | Comprehensive test suite for load balancing functionality |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pthread_t threads[NUM_WORKERS]; | ||
| int thread_ids[NUM_WORKERS]; | ||
| int i; | ||
| int cpu_usage[2] = {0}; /* 假设最多2个CPU */ |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array size mismatch: cpu_usage is declared with size 2 but later code accesses indices up to 16 (lines 160-162, 166-169). This will cause buffer overflow if final_cpu >= 2. The array size should be at least 16 to match the loop bounds.
| int cpu_usage[2] = {0}; /* 假设最多2个CPU */ | |
| int cpu_usage[16] = {0}; /* 支持最多16个CPU */ |
| result = 1; | ||
| } | ||
|
|
||
| // 这个测例会报错,先注释掉 |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Corrected spelling of '测例' (test case) to '测试用例' for consistency with standard terminology.
| // 这个测例会报错,先注释掉 | |
| // 这个测试用例会报错,先注释掉 |
|
|
||
| use super::{cpu_rq, CpuRunQueue, DequeueFlag, EnqueueFlag, SchedPolicy}; | ||
|
|
||
| /// ## 负载均衡间隔(单位:jiffies),执行一次负载均衡检查 |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment says 'execute one load balance check' but doesn't clarify what happens after the interval. Consider rephrasing to '每隔100个jiffies执行一次负载均衡检查' (execute load balance check every 100 jiffies) for clarity.
| /// ## 负载均衡间隔(单位:jiffies),执行一次负载均衡检查 | |
| /// ## 负载均衡间隔(单位:jiffies),每隔100个jiffies执行一次负载均衡检查 |
| drop(guard); | ||
| // TODO:处理负载均衡 | ||
|
|
||
| //todo 检查并执行负载均衡(只检测不均衡,migrate_tasks 内部是空的) |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment contains important implementation status (that migrate_tasks is empty), but uses lowercase 'todo' instead of the standard 'TODO:' format. For consistency with other TODOs in the codebase, use 'TODO:' prefix.
| //todo 检查并执行负载均衡(只检测不均衡,migrate_tasks 内部是空的) | |
| // TODO: 检查并执行负载均衡(只检测不均衡,migrate_tasks 内部是空的) |
| } | ||
|
|
||
| // 如果原CPU负载不高,保持缓存亲和性 | ||
| if prev_load <= 2 { |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number 2 used for load threshold without explanation. Consider defining a named constant like CACHE_HOT_LOAD_THRESHOLD to make the cache affinity strategy more explicit.
| } | ||
|
|
||
| // 如果当前CPU负载低,直接使用当前cpu即可 | ||
| if current_load <= 1 { |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number 1 used for load threshold without explanation. Consider defining a named constant like IDLE_LOAD_THRESHOLD to clarify when a CPU is considered idle enough for immediate use.
| let current_load = Self::get_rq_load_lockless(¤t_rq); | ||
|
|
||
| // 如果当前CPU负载很高,不主动拉取任务 | ||
| if current_load > 2 { |
Copilot
AI
Nov 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Magic number 2 appears again for load threshold. This should use the same named constant as suggested for line 103 to maintain consistency in the load balancing strategy.
|
貌似多核线程同步有问题,测例过不了 |
Signed-off-by: sparkzky <[email protected]>
e9af674 to
2b25264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| pthread_t threads[NUM_WORKERS]; | ||
| int thread_ids[NUM_WORKERS]; | ||
| int i; | ||
| int cpu_usage[2] = {0}; /* 假设最多2个CPU */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Buffer overflow: array of 2 accessed with index up to 15
The cpu_usage array is declared with only 2 elements at line 122, but both the write operation at line 161 and the read loop at lines 166-168 use bounds checking for indices 0-15. If final_cpu is between 2 and 15, writing to cpu_usage[stats[i].final_cpu]++ causes a buffer overflow, and the subsequent loop reading cpu_usage[i] for i from 0 to 15 reads out of bounds. Compare with test_fork_balancing() which correctly uses cpu_count[16].
Additional Locations (1)
| #[inline] | ||
| pub fn nr_running_lockless(&self) -> usize { | ||
| self.nr_running | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Data race: reading non-atomic field without synchronization
The nr_running_lockless() method reads the nr_running field (a plain usize) without holding any lock, while other code paths modify this field under the run queue lock. In Rust's memory model, reading a non-atomic value that can be concurrently modified constitutes a data race (undefined behavior). This could contribute to the multi-core synchronization issues mentioned in the PR discussion. The field would need to be AtomicUsize with appropriate memory ordering for safe concurrent access.
初步的cpu多核心上的负载均衡
目前DragonOS启动时会有两个CPU核心,但是之前进程只会调度到第一个核心上运行,第二个核心空转,现在在调度进程之前加入核心选择,进程将调度到当前负载较低的核心上执行
核心选择策略:
Note
Adds a load balancer that selects target CPUs on task wakeup and runs periodic imbalance checks; integrates migration-aware enqueue, enables LB after SMP init, and adds user tests.
kernel/src/sched/load_balance.rsimplementing CPU selection on wakeup and periodic imbalance checks (migration stubs present).scheduler_tickviaLoadBalancer::should_balance/run_load_balance(); exportnr_running_lockless()for lockless load queries.CpuRunQueue::activate_taskto refresh CFS refs whenENQUEUE_MIGRATEDis set.pub mod load_balance;insched/mod.rs.LoadBalancer::select_task_rqinProcessManager::wakeup/wakeup_stopto choosetarget_cpu; setENQUEUE_MIGRATEDwhen moving between CPUs.smp_init(); minor doc tweak forsmp_get_processor_id().user/apps/c_unitest/test_load_balance.candtest_smp_balance.cto validate multicore distribution and basic balancing behavior.Written by Cursor Bugbot for commit e14d92b. This will update automatically on new commits. Configure here.