Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the execution order of entites in executors to favor insertion order #2537

Open
wants to merge 6 commits into
base: rolling
Choose a base branch
from

Conversation

wjwwood
Copy link
Member

@wjwwood wjwwood commented May 23, 2024

In specific circumstances, multiple entities (of the same kind) in an executor can be ready at the same time and the executor must arbitrarily decide which to execute first, and in Iron and before, this fell back to the order in which they were added, but in Jazzy we (inadvertently) changed this and it became related to the hash order, subtly changing the order of execution that some folks had come to rely on, see: #2532

While this ordering was not guaranteed, this pr aims to restore the previous behavior for consistency and to provide a less surprising result, but the order is still not a guarantee nor is it the same for all executor types (e.g. the EventsExecutor is consistent, but is instead based on the trigger order of the entities, not their insertion/addition order).

The new test is also in a separate pr here: #2536 so it can be more easily back ported to previous versions. I think it should pass on them, but I still need to test it. See that pr for updates.

This approach will not work for backporting as it breaks ABI, but I believe I could make a version that doesn't break ABI (it will be as ugly as past non-ABI breaking changes). I just want to make sure we want to follow through with this before committing resources to that attempt.

This pr does incur some additional overhead, but I believe it should be relatively low, considering what we're storing. It does, however, slow down insertion and erasure from the collections which could be a concern in the tight loop of spin. I will do some more experimentation related to this, but the memory overhead of storing the entity pointers of the collection twice (once as an unordered map for fast random access and once in a vector for maintaining insertion order) should be relatively small. It does increase the memory footprint by 2x, but the footprint was small to begin with (three pointers per entity).

I looked at using existing "ordered maps" but all of them (that I could see) either provided the desired API but lost the fast lookup of the hash or did what I did here and maintained the information twice in two different data structures.

So I think this is a reasonable approach, but I'm up for discussion on alternatives.

@wjwwood wjwwood added the bug Something isn't working label May 23, 2024
@wjwwood wjwwood self-assigned this May 23, 2024
@wjwwood wjwwood requested review from ivanpauno and hidmic as code owners May 23, 2024 00:38
Comment on lines -75 to +97
for (auto it = update_to.begin(); it != update_to.end(); ) {
for (auto it = update_to.begin_ordered(); it != update_to.end_ordered(); ) {
if (update_from.count(it->first) == 0) {
auto entity = it->second.entity.lock();
if (entity) {
on_removed(entity);
}
it = update_to.erase(it);
it = update_to.erase_ordered(it);
} else {
++it;
}
}
for (auto it = update_from.begin(); it != update_from.end(); ++it) {
for (auto it = update_from.begin_ordered(); it != update_from.end_ordered(); ++it) {
if (update_to.count(it->first) == 0) {
auto entity = it->second.entity.lock();
if (entity) {
on_added(entity);
}
update_to.insert(*it);
bool inserted = update_to.insert(*it);
// Should never be false, so this is a defensive check, mark unused too
// in order to avoid a warning in release builds.
assert(inserted);
RCUTILS_UNUSED(inserted);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the important part of the changes that affect the order of execution when tied, because this is what controls the order in which they are added to the wait set in the case of the wait set based executors, i.e. everything except the EventsExecutor.

Comment on lines +107 to +115
/// Type of the map used for random access
using MapType = std::unordered_map<const EntityKeyType *, CollectionEntry<EntityValueType>>;

/// Type of the vector for insertion order access
// Note, we cannot use typename MapType::value_type because it makes the first
// item in the pair const, which prevents copy assignment of the pair, which
// prevents std::vector::erase from working later...
using VectorType = std::vector<std::pair<const EntityKeyType *,
CollectionEntry<EntityValueType>>>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detail was annoying, the MapType::value_type makes the first argument of the pair const (would be const EntityKeyType * const in this case), which prevents the use of the operator= on the pair, which prevents std::vector::erase from working...

Anyway, this is essentially a std::vector<typename MapType::value_type> except for that detail...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just store the iterator returned by std::map::insert ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they can be invalidated by future insertions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered storing the pointer (which is the key for the map) only, but then when iterating in order you need to do a lookup in the map for each one you want to use, which is made redundant if you just store the information twice. In fact this conclusion is what several of the "ordered dict in c++" projects I looked at online ended up using.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because they can be invalidated by future insertions.

Nope, not for std::map :
https://en.cppreference.com/w/cpp/container/map/insert

No iterators or references are invalidated. If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid.(since C++17)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the point is to use the unordered_map so that you get the better look up performance (Constant on average, worst case linear in the size of the container. for unordered_map vs Logarithmic in the size of the container. for map), and in that case I interpret it to mean that rehashing will invalidate them:

If after the operation the new number of elements is greater than old max_load_factor() * bucket_count() a rehashing takes place.
If rehashing occurs (due to the insertion), all iterators are invalidated. Otherwise (no rehashing), iterators are not invalidated. If the insertion is successful, pointers and references to the element obtained while it is held in the node handle are invalidated, and pointers and references obtained to that element before it was extracted become valid.(since C++17)

-- https://en.cppreference.com/w/cpp/container/unordered_map/insert

Which is also the conclusion of this SO answer (which I was using to try and confirm my interpretation): https://stackoverflow.com/a/54004916


For what it's worth, I considered using std::map and its iterators too.
However, the conclusion I came to is that whether it's a vector of iterators (pointers) or a vector of triplets of pointers, the impact is not much different on memory, even at relatively large scales. Instead, I thought preserving the cpu time optimizations was better for the executor, given the feedback we've consistently gotten, but that's a generalization.

Comment on lines 44 to +49
auto remove_entities = [](auto & collection) -> size_t {
size_t removed = 0;
for (auto it = collection.begin(); it != collection.end(); ) {
for (auto it = collection.begin_ordered(); it != collection.end_ordered(); ) {
if (it->second.entity.expired()) {
++removed;
it = collection.erase(it);
it = collection.erase_ordered(it);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change was not strictly necessary, but I reasoned it was slightly faster to provide the iterator for erasure to the vector and look up the iterator for the map, rather than the other way around, which is what erase_ordered does.

Comment on lines -514 to +528
collection.insert(
bool inserted = collection.insert(
{
this->notify_waitable_.get(),
{this->notify_waitable_, weak_group_ptr}
});
// Explicitly ignore if the notify waitable was not inserted because that means
// it was already inserted, which happens initially as it is explicitly added
// in the constructor as well as every time the collection is reset, so on
// the first reset there is a second insertion attempt.
// We could check before trying to insert, but that would require a "find" call
// on each refresh, which is expensive, and otherwise it would require additional
// state in this class to detect the initial case where it is added twice.
// Therefore we just insert and ignore it if it fails (the only way it fails
// is when a duplicate is inserted).
RCUTILS_UNUSED(inserted);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an oddity I found with the events executor, where it was always inserting the notify waitable twice on one occasion (first refresh after construction), and it wasn't checking the return value of insert, which if we had checked would have occasionally been false meaning it wasn't inserted. This didn't end up mattering, but it did feel weird and broke some things when I added the vector storage along side the map.

Anyway I landed on this approach where we continue to add it twice sometimes, but we just ignore it when it wasn't inserted as the fact that it has been inserted is important, not the order in which it was, at least in the specific case of this notify waitable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this was a bug.
I opened a PR to fix it #2564

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for checking up on it. I guess I'll wait for that pr to merge and then update this. I have to check into timer execution order on Windows for these new tests anyways.

@thomasmoore-torc
Copy link

As I've been looking into how all of this works, it seems to me that the bulk of the complexity associated with the ExecutorEntitiesCollection is to allow the Executor class to handle callback groups. So far as I can tell, the ExecutorEntitiesCollection essentially serves two purposes:

  1. To act as a staging area for entities to be added to the waitset
  2. To associate a callback group with an entity

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither, would it be worth considering the "don't pay for what you don't use" principal and refactoring some of the purpose-specific functionality out of the base Executor class and either directly into the executor implementations that require that functionality or into intermediate implementations that can be used by multiple executors as needed?

@wjwwood
Copy link
Member Author

wjwwood commented May 23, 2024

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither

I don't think that's actually true. The SingleThreadedExecutor does actually consider the callback groups, but in a very limited fashion. There's actually a test that disables individual callback groups explicitly, and checks that executors do not execute from them. Now whether or not that's useful isn't clear to me, but as it stands CallbackGroups are a non-optional part of the executor related APIs, and for better worse we cannot just ignore them. Now we could change that, but it would require changing some of the semantics, but that's not something we can do in a stable release, in my opinion. But we can consider it for the future.

From a technical angle, there's nothing that requires that any executor, including the two you mentioned, use the Entity Collection classes, it was just convenient. However, in my experience you cannot remove much if anything that's there and still handle ownership mechanics correctly, e.g. nodes hold pointers to callback groups which holds pointers to entities and you have to handle gracefully removing them when the user lets any go out of scope, but you have to hold ownership in the executor while using them, etc... So feel free to try and optimize them, but just be aware that there be dragons, lol.

Also, despite the name, the StaticSingleThreadedExecutor does actually need to do both 1 and 2 that you listed. And in fact after the recent refactoring there's little to no practical difference between it and the SingleThreadedExecutor, we just didn't manage to get it deprecated in the chaos of getting everything else done before the API freeze this time. (someone disagree with me if you think otherwise)

@alsora
Copy link
Collaborator

alsora commented May 29, 2024

As the SingleThreadedExecutor only needs the first purpose and the StaticSingleThreadedExecutor needs neither

As discussed in the client library WG, an important feature (that is currently achieved through callback groups) is to allow to add only some of the node's entities to an executor

This pr does incur some additional overhead, but I believe it should be relatively low, considering what we're storing. It does, however, slow down insertion and erasure from the collections which could be a concern in the tight loop of spin

We usually decided to not focus too much on the overhead of insertion/deletion (compared to the overhead of the fast-loop: selection and execution of entities).
Most of the ROS 2 systems will setup their nodes and then just run with a static graph,

@wjwwood
Copy link
Member Author

wjwwood commented May 30, 2024

We usually decided to not focus too much on the overhead of insertion/deletion (compared to the overhead of the fast-loop: selection and execution of entities).
Most of the ROS 2 systems will setup their nodes and then just run with a static graph,

I'm not sure if we only ever insert/delete when the user adds or removes something, e.g. copying or updating a collection can happen even when the user hasn't added or removed anything, but maybe I'm wrong about that. We have two collections in the executor base class which are swapped (sort of) at specific intervals:

// Update each of the groups of entities in the current collection, adding or removing
// from the wait set as necessary.
current_collection_.timers.update(
collection.timers,
[this](auto timer) {wait_set_.add_timer(timer);},
[this](auto timer) {wait_set_.remove_timer(timer);});
current_collection_.subscriptions.update(
collection.subscriptions,
[this](auto subscription) {
wait_set_.add_subscription(subscription, kDefaultSubscriptionMask);
},
[this](auto subscription) {
wait_set_.remove_subscription(subscription, kDefaultSubscriptionMask);
});
current_collection_.clients.update(
collection.clients,
[this](auto client) {wait_set_.add_client(client);},
[this](auto client) {wait_set_.remove_client(client);});
current_collection_.services.update(
collection.services,
[this](auto service) {wait_set_.add_service(service);},
[this](auto service) {wait_set_.remove_service(service);});
current_collection_.guard_conditions.update(
collection.guard_conditions,
[this](auto guard_condition) {wait_set_.add_guard_condition(guard_condition);},
[this](auto guard_condition) {wait_set_.remove_guard_condition(guard_condition);});
current_collection_.waitables.update(
collection.waitables,
[this](auto waitable) {wait_set_.add_waitable(waitable);},
[this](auto waitable) {wait_set_.remove_waitable(waitable);});

Anyway, maybe @mjcarroll can expand on that.

This is still something I'm going to investigate before pushing to merge this.

@alsora
Copy link
Collaborator

alsora commented May 30, 2024

Well, rebuilding the collection only when something changes was the purpose of the "notifier waitable" and the "entities collection" classes, and that's how they were used in the events executor and static single threaded executor.
Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

@wjwwood
Copy link
Member Author

wjwwood commented Jun 13, 2024

Well, rebuilding the collection only when something changes was the purpose of the "notifier waitable" and the "entities collection" classes, and that's how they were used in the events executor and static single threaded executor.
Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

That was my hope, but I wanted to confirm which I am going to do soon hopefully.

@mjcarroll
Copy link
Member

Unless we did something wrong while expanding them to the single threaded executor, i would expect the same behavior.

That is the behavior unless something was done incorrectly.

If the set of the entities in the collection stays constant, the waitset should not be rebuilt.

@wjwwood
Copy link
Member Author

wjwwood commented Jun 18, 2024

I closed #2536 in favor of this pr (it was just the test, with no fix), but cross-posting here for visibility since there were some good discussions on that pr too.

@alsora
Copy link
Collaborator

alsora commented Jun 19, 2024

@wjwwood did you have a chance to check if the collection was correctly functioning? i.e. it's rebuilt only when something changes.
P.S. this is something we should be able to easy unit-test.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, just a minor comment for the test.

"reverse call order",
{
{"waitable", {false, reverse, forward}},
{"subscription", {false, reverse, forward}},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we skip this test, i think the same test is executed twice.

Suggested change
{"subscription", {false, reverse, forward}},
{"subscription", {true, reverse, forward}},

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked again, it doesn't seem like we're doing the same test twice (not exactly), so maybe you could explain why you think that?

If we were to disable this test, then we should do {true, {},{}} also, as the call order vs expected order don't matter if it is skipped.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think call_order is not used in the test of the subscription (maybe cannot be used), so it always check against expected_execution_order which is forward. in that case, we are doing the same test twice?

@wjwwood
Copy link
Member Author

wjwwood commented Jun 25, 2024

I can confirm that the insertions only occur when entities are created or destroyed, so the overhead of this fix should be minimal.

I just added print statements during insertion as well as some to show what steps of the test we're in:

FINDME >> start executor test, before executor creation
FINDME >> before adding callback group
insert(0xac797083e690) insert(0xac7970886c00) insert(0xac79708aeb30) insert(0xac79708c2780) insert(0xac79708c1970) insert(0xac79708d45d0) insert(0xac79708fcbe0) insert(0xac79708fbd30) insert(0xac79708fc410) insert(0xac7970934f10) insert(0xac7970947cf0) insert(0xac797095e870) insert(0xac7970972220) insert(0xac79709706a0) insert(0xac7970997dd0) insert(0xac7970996ca0) insert(0xac79709a9c00) insert(0xac79709bcc50) insert(0xac79709e3910) insert(0xac79709f6800) insert(0xac797089b4a0) insert(0xac797089b300) insert(0xac79708ae480) insert(0xac7970629a60) insert(0xac79708c2050) insert(0xac7970629820) insert(0xac79708d4cc0) insert(0xac79706295e0) insert(0xac79708e7840) insert(0xac7970629310) insert(0xac79708fc570) insert(0xac79708fc440) insert(0xac797090f270) insert(0xac797090f140) insert(0xac7970921d30) insert(0xac79708fc220) insert(0xac7970934770) insert(0xac7970934640) insert(0xac7970947660) insert(0xac7970628770) insert(0xac797095e280) insert(0xac79706284a0) insert(0xac7970971c30) insert(0xac79706267e0) insert(0xac79709847d0) insert(0xac79706265e0) insert(0xac7970997740) insert(0xac79708adb90) insert(0xac79709aa780) insert(0xac79709aa5c0) insert(0xac79709bd650) insert(0xac79709bd490) insert(0xac79709d05b0) insert(0xac79709d03f0) insert(0xac79709e3280) insert(0xac79709e30c0) insert(0xac79709f6170) insert(0xac79709f5fb0) insert(0xac7970a09170) insert(0xac7970a08fb0) insert(0xac79707ff8a0) insert(0xac797083e690) insert(0xac7970886c00) insert(0xac79708aeb30) insert(0xac79708c2780) insert(0xac79708c1970) insert(0xac79708d45d0) insert(0xac79708fcbe0) insert(0xac79708fbd30) insert(0xac79708fc410) insert(0xac7970934f10) insert(0xac7970947cf0) insert(0xac797095e870) insert(0xac7970972220) insert(0xac79709706a0) insert(0xac7970997dd0) insert(0xac7970996ca0) insert(0xac79709a9c00) insert(0xac79709bcc50) insert(0xac79709e3910) insert(0xac79709f6800) insert(0xac797089b4a0) insert(0xac797089b300) insert(0xac79708ae480) insert(0xac7970629a60) insert(0xac79708c2050) insert(0xac7970629820) insert(0xac79708d4cc0) insert(0xac79706295e0) insert(0xac79708e7840) insert(0xac7970629310) insert(0xac79708fc570) insert(0xac79708fc440) insert(0xac797090f270) insert(0xac797090f140) insert(0xac7970921d30) insert(0xac79708fc220) insert(0xac7970934770) insert(0xac7970934640) insert(0xac7970947660) insert(0xac7970628770) insert(0xac797095e280) insert(0xac79706284a0) insert(0xac7970971c30) insert(0xac79706267e0) insert(0xac79709847d0) insert(0xac79706265e0) insert(0xac7970997740) insert(0xac79708adb90) insert(0xac79709aa780) insert(0xac79709aa5c0) insert(0xac79709bd650) insert(0xac79709bd490) insert(0xac79709d05b0) insert(0xac79709d03f0) insert(0xac79709e3280) insert(0xac79709e30c0) insert(0xac79709f6170) insert(0xac79709f5fb0) insert(0xac7970a09170) insert(0xac7970a08fb0) insert(0xac79707ff8a0) FINDME >> spinning to complete executor test: 0 / 20
FINDME >> spinning to complete executor test: 1 / 20
FINDME >> spinning to complete executor test: 2 / 20
FINDME >> spinning to complete executor test: 3 / 20
FINDME >> spinning to complete executor test: 4 / 20
FINDME >> spinning to complete executor test: 5 / 20
FINDME >> spinning to complete executor test: 6 / 20
FINDME >> spinning to complete executor test: 7 / 20
FINDME >> spinning to complete executor test: 8 / 20
FINDME >> spinning to complete executor test: 9 / 20
FINDME >> spinning to complete executor test: 10 / 20
FINDME >> spinning to complete executor test: 11 / 20
FINDME >> spinning to complete executor test: 12 / 20
FINDME >> spinning to complete executor test: 13 / 20
FINDME >> spinning to complete executor test: 14 / 20
FINDME >> spinning to complete executor test: 15 / 20
FINDME >> spinning to complete executor test: 16 / 20
FINDME >> spinning to complete executor test: 17 / 20
FINDME >> spinning to complete executor test: 18 / 20
FINDME >> spinning to complete executor test: 19 / 20
FINDME >> stop executor test

You can see that it does some insertions when the callback group is added, but not between spins, which was expected.

Signed-off-by: William Woodall <[email protected]>
@wjwwood
Copy link
Member Author

wjwwood commented Jun 25, 2024

P.S. this is something we should be able to easy unit-test.

I tested it with print statements, and I started to add a unit test, but checking it as a unit test became pretty difficult, since it's not just testing a unit, but rather two working together (with different results for each pair), e.g. STE with the EntityCollection vs the MTE with the EntityCollection vs the EventsExecutor with the EntityCollection. I tried to add a test that just used some methods of the EntityCollection, but it's not set up for introspection, so I kind of ran out of time to make the changes needed to properly test this. Probably with a derived class only used in the test or perhaps some hooks that can be installed just for testing.

So unfortunately I'll have to leave that for others or future me to investigate.

if (update_to.count(it->first) == 0) {
auto entity = it->second.entity.lock();
if (entity) {
on_added(entity);
}
update_to.insert(*it);
bool inserted = update_to.insert(*it);
Copy link
Contributor

@jmachowinski jmachowinski Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why insert an outdated entity? I guess this should be in the if(entity) ?

@@ -72,34 +73,47 @@ void update_entities(
std::function<void(const typename CollectionType::EntitySharedPtr &)> on_removed
)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is a potential bug. If between two call of update_entities the order of elements in update_from changes, the order in update_to will not change. I am not really sure if this might happen though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants