Skip to content

DAOS-18705 rebuild: set rebuild flag before creating rebuild_pool_tls#17941

Merged
gnailzenh merged 17 commits into
masterfrom
liang/b_agg_epc
Apr 28, 2026
Merged

DAOS-18705 rebuild: set rebuild flag before creating rebuild_pool_tls#17941
gnailzenh merged 17 commits into
masterfrom
liang/b_agg_epc

Conversation

@gnailzenh
Copy link
Copy Markdown
Collaborator

@gnailzenh gnailzenh commented Apr 8, 2026

  • set rebuild flag before creating rebuild_pool_tls, otherwise aggregation
    can progress to higher epoch than rebuild.
  • aggregation doesn't do full scan anymore after rebuild
  • fix a rpt refcount leak in rebuild_tgt_scan_handler()

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

- stop refreshing aggregation epoch while rebuilding
- set rebuilding flag before setting rebuild fence

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@gnailzenh gnailzenh requested review from a team as code owners April 8, 2026 04:56
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

Ticket title is 'DAOS 2.6.5: Interrupt rebuild with reintegration and interrupt with exclude with active IO'
Status is 'In Progress'
Labels: 'test_2.6.5,testp1'
https://daosio.atlassian.net/browse/DAOS-18705

@daosbuild3
Copy link
Copy Markdown
Collaborator

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
liuxuezhao
liuxuezhao previously approved these changes Apr 8, 2026
Comment thread src/rebuild/rebuild_internal.h Outdated
/* local rebuild epoch mainly to constrain the VOS aggregation
* to make sure aggregation will not cross the epoch
/*
* XX: remove this.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean replace by other method?

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@gnailzenh gnailzenh changed the title DAOS-18705 rebuild: stop refreshing aggregation epoch while rebuilding DAOS-18705 rebuild: set rebuild flag before creating rebuild_pool_tls Apr 9, 2026
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@daosbuild3
Copy link
Copy Markdown
Collaborator

;

epoch_range.epr_lo = epoch_min != 0 ? epoch_min + 1 : 0;
if (i == 0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor]Perhaps to add some comments to explain even epr_lo is set to zero here, we have filter optimizations later. and this is used to guarantee aggregation could not cross snapshot.

wangshilong
wangshilong previously approved these changes Apr 13, 2026
@wangshilong wangshilong requested a review from liuxuezhao April 13, 2026 08:22
@daosbuild3
Copy link
Copy Markdown
Collaborator

NiuYawei
NiuYawei previously approved these changes Apr 13, 2026
liuxuezhao
liuxuezhao previously approved these changes Apr 13, 2026
@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17941/12/execution/node/640/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17941/12/execution/node/630/log

@daosbuild3
Copy link
Copy Markdown
Collaborator

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17941/12/execution/node/765/log

gnailzenh added a commit that referenced this pull request Apr 17, 2026
…#17941) (#17962)

- set rebuild flag before creating rebuild_pool_tls, otherwise aggregation
  can progress to higher epoch than rebuild.
- aggregation doesn't do full scan anymore after rebuild
- fix a rpt refcount leak in rebuild_tgt_scan_handler()

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Co-authored-by: Wang Shilong <shilong.wang@hpe.com>
Reviewed-by: Xuezhao Liu <xuezhao.liu@hpe.com>
Reviewed-by: Niu Yawei <yawei.niu@hpe.com>
gnailzenh and others added 2 commits April 17, 2026 11:04
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@gnailzenh gnailzenh dismissed stale reviews from liuxuezhao, NiuYawei, and wangshilong via 82a7a0f April 17, 2026 03:04
wangshilong
wangshilong previously approved these changes Apr 17, 2026
liuxuezhao
liuxuezhao previously approved these changes Apr 17, 2026
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
@gnailzenh gnailzenh dismissed stale reviews from liuxuezhao and wangshilong via 9e128f4 April 17, 2026 04:04
@daosbuild3
Copy link
Copy Markdown
Collaborator

gnailzenh added a commit that referenced this pull request Apr 28, 2026
…#17941) (#17985)

set rebuild flag before creating rebuild_pool_tls, otherwise aggregation
can progress to higher epoch than rebuild.
aggregation doesn't do full scan anymore after rebuild
fix a rpt refcount leak in rebuild_tgt_scan_handler()

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
Co-authored-by: Wang Shilong <shilong.wang@hpe.com>
Reviewed-by: Xuezhao Liu <xuezhao.liu@hpe.com>
@gnailzenh gnailzenh merged commit 379be16 into master Apr 28, 2026
35 of 36 checks passed
@gnailzenh gnailzenh deleted the liang/b_agg_epc branch April 28, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants