Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference implementation for parallel_phase feature #1570

Open
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

isaevil
Copy link
Contributor

@isaevil isaevil commented Nov 26, 2024

Description

Add a comprehensive description of proposed changes

Fixes # - issue number(s) if exists

Type of change

Choose one or multiple, leave empty if none of the other choices apply

Add a respective label(s) to PR if you have permissions

  • bug fix - change that fixes an issue
  • new feature - change that adds functionality
  • tests - change in tests
  • infrastructure - change in infrastructure and CI
  • documentation - documentation update

Tests

  • added - required for new features and some bug fixes
  • not needed

Documentation

  • updated in # - add PR number
  • needs to be updated
  • not needed

Breaks backward compatibility

  • Yes
  • No
  • Unknown

Notify the following users

List users with @ to send notifications

Other information

@isaevil isaevil changed the title Add reference implementation of parallel_block feature Add reference implementation for parallel_block feature Nov 26, 2024
@isaevil
Copy link
Contributor Author

isaevil commented Nov 27, 2024

@akukanov @aleksei-fedotov @vossmjp Could you please take a look at the PR in terms of implementation and ABI.

P.S. Tests are still WIP.

Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a complete review, just a couple of starters to think about.

src/tbb/waiters.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
@isaevil isaevil changed the title Add reference implementation for parallel_block feature Add reference implementation for parallel_phase feature Dec 4, 2024
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments from me. Have not looked at the tests yet. Also, not finished reviewing the PHASE_* switching logic because of found issue.

include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.cpp Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/waiters.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
@isaevil isaevil marked this pull request as ready for review December 13, 2024 15:19
src/tbb/arena.cpp Outdated Show resolved Hide resolved
src/tbb/arena.cpp Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, I think need to cover in tests the following:

  • The DELAYED_LEAVE is returned back once new task is submitted.
    • Here I think we can test that the return of a worker after one-time-fast-leave happens generally longer than the following returns made after new task(s) is/are submitted.
  • Nested parallel phases with combinations of DELAYED and (ONE TIME) FAST leaves.

src/tbb/waiters.h Outdated Show resolved Hide resolved
@isaevil isaevil force-pushed the dev/pavelkumbrasev/parallel_block branch from b93e59e to 398d16b Compare December 17, 2024 15:07
src/tbb/arena.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@aleksei-fedotov aleksei-fedotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR looks good enough as a preview functionality. Consider my other suggestions to ponder on and insignificant remarks. Perhaps, we would be able to "close on" these before fully supporting this feature.

I approve it. Though you might also want implementing Alexey's comment about single fast_leave_policy_flag as well because suggested in-the-patch approach can be implemented in future if we would ever have more than two leave policies.

src/tbb/waiters.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
test/tbb/test_task_arena.cpp Outdated Show resolved Hide resolved
src/tbb/waiters.h Outdated Show resolved Hide resolved
src/tbb/arena.h Outdated Show resolved Hide resolved
src/tbb/arena.h Show resolved Hide resolved
Signed-off-by: Isaev, Ilya <[email protected]>
dnmokhov
dnmokhov previously approved these changes Dec 20, 2024
Copy link
Contributor

@dnmokhov dnmokhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@akukanov akukanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments below are not critical; feel free to commit as-is and address later.

include/oneapi/tbb/task_arena.h Outdated Show resolved Hide resolved
return fast_policy_set ? leave_policy::fast : leave_policy::automatic;
}

int leave_policy_to_traits(leave_policy lp) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to leave_policy_trait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 306 to 310
task_arena(int max_concurrency_ = automatic, unsigned reserved_for_masters = 1,
priority a_priority = priority::normal)
: task_arena_base(max_concurrency_, reserved_for_masters, a_priority)
priority a_priority = priority::normal
#if __TBB_PREVIEW_PARALLEL_PHASE
, leave_policy lp = leave_policy::automatic
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation seems a bit misaligned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/tbb/arena.h Outdated
static const std::uint64_t DELAYED_LEAVE = 1 << 2;
static const std::uint64_t PARALLEL_PHASE = 1 << 3;

std::atomic<std::uint64_t> my_state{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if 64-bit atomic is really needed. Even if it's only 32 bit, 2^29 seems quite enough for the amount of simultaneous parallel phases. The benefit would be better support for 32 bit platforms, but maybe it's negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to uintptr_t.


std::atomic<std::uint64_t> my_state{0};
public:
void set_initial_state(tbb::task_arena::leave_policy lp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is essentially the constructor, please add a comment that this method is required to be called soon after construction, and is not thread-safe. Or maybe consider converting it into a real constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

src/tbb/arena.h Outdated
}
}

void restore_default_policy_if_needed() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just call it reset (or reset_if_needed) for better encapsulation.

Copy link
Contributor Author

@isaevil isaevil Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed it to reset_if_needed.

src/tbb/arena.h Outdated
Comment on lines 184 to 186
static const std::uint64_t FAST_LEAVE = 1;
static const std::uint64_t ONE_TIME_FAST_LEAVE = 1 << 1;
static const std::uint64_t DELAYED_LEAVE = 1 << 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, you only need 2 bits, not 3. One bit could be set at construction. indicating "default" fast leave; the other could be set for one-time fast leave. Both bits unset would mean delayed leave. I think that might also simplify the implementation logic.

Copy link
Contributor Author

@isaevil isaevil Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find how it would change the implementation logic. But at least we leave one more bit for parallel phase ref counting :)

@@ -95,6 +95,11 @@ TBB_EXPORT void __TBB_EXPORTED_FUNC isolate_within_arena(d1::delegate_base& d, s
TBB_EXPORT void __TBB_EXPORTED_FUNC enqueue(d1::task&, d1::task_arena_base*);
TBB_EXPORT void __TBB_EXPORTED_FUNC enqueue(d1::task&, d1::task_group_context&, d1::task_arena_base*);
TBB_EXPORT void __TBB_EXPORTED_FUNC submit(d1::task&, d1::task_group_context&, arena*, std::uintptr_t);

#if __TBB_PREVIEW_PARALLEL_PHASE
TBB_EXPORT void __TBB_EXPORTED_FUNC register_parallel_phase(d1::task_arena_base*, std::uintptr_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use "register" and "unregister" in the names for the entry points? To me "registration" does not imply the start and end of the active time in the region, but might be used to indicate that a region just exists. For example, an athlete might register for a race but that doesn't mean they've started running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main idea was to use more generic names, which do not depend on feature API. So, if API will change dramatically, entry point names would still make some sense.

auto median_automatic = utils::median(times_automatic.begin(), times_automatic.end());
auto median_fast = utils::median(times_fast.begin(), times_fast.end());

WARN_MESSAGE(median_automatic < median_fast,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of the tests guaranteed to work (i.e. not have warnings) for hybrid systems where automatic is not delayed leave?

Copy link
Contributor Author

@isaevil isaevil Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I suppose that only one test case(Parallel Phase retains workers in task_arena) would produce expected results on hybrid systems. Other tests don't make much sense on such configurations, but I think it is fine since these are warnings, not asserts.

isaevil and others added 2 commits January 15, 2025 15:26
Co-authored-by: Alexey Kukanov <[email protected]>
Signed-off-by: Isaev, Ilya <[email protected]>
Signed-off-by: Isaev, Ilya <[email protected]>
@isaevil isaevil force-pushed the dev/pavelkumbrasev/parallel_block branch from c05d9a7 to 515fd33 Compare January 15, 2025 14:34
Signed-off-by: Isaev, Ilya <[email protected]>
Signed-off-by: Isaev, Ilya <[email protected]>
Signed-off-by: Isaev, Ilya <[email protected]>
@isaevil
Copy link
Contributor Author

isaevil commented Jan 15, 2025

@aleksei-fedotov @akukanov @vossmjp @pavelkumbrasev @dnmokhov I have updated the RFC document: included some technical details and conditions to leave the experimental stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants