Skip to content

Fix panic when specializing materials for entities spawned in PostUpdate #19064

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

grind086
Copy link
Contributor

@grind086 grind086 commented May 5, 2025

Objective

If an entity requiring specialization is spawned during PostUpdate after its material's check_entities_needing_specialization, but before CheckVisibility, a panic occurs during material specialization. This happens because the view assumes that every visible entity will be present in the specialization ticks map, but in this scenario the entity won't be added to the map until the next frame.

Fixes #19048. This may also fix the related #18980, but I wasn't able to reproduce that one and it could be unrelated.

Edit by Alice: Fixes #18980 too!

Solution

Move check_entities_needing_specialization systems to Last. This ensures that they always runs after CheckVisibility.

Testing

Confirmed the reproduction from #19048 is fixed, and ran several 3d and 2d examples with no apparent change.

@alice-i-cecile alice-i-cecile added this to the 0.16.1 milestone May 5, 2025
@alice-i-cecile
Copy link
Member

(edited your PR description to avoid accidentally closing the related issue)

@alice-i-cecile alice-i-cecile added C-Bug An unexpected or incorrect behavior A-Rendering Drawing game state to the screen P-Crash A sudden unexpected crash D-Straightforward Simple bug fixes and API improvements, docs, test and examples S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels May 5, 2025
Copy link
Contributor

@Henauxg Henauxg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that just changing the schedule of check_entities_needing_specialization to Last is correct. Some systems expect it to run in PostUpdate
As an example, in material.rs in bevy_pbr:

       if self.shadows_enabled {
            app.add_systems(
                PostUpdate,
                check_light_entities_needing_specialization::<M>
                    .after(check_entities_needing_specialization::<M>),
            );
        }

check_light_entities_needing_specialization writes in EntitiesNeedingSpecialization after the clear done by check_entities_needing_specialization

  1. We could maybe? move also these systems to the Last schedule, but there may be other reasons for those to be in PostUpdate that I did not deduce yet.
  2. But we could instead explicitly specify .after(VisibilitySystems::CheckVisibility) as a condition for check_entities_needing_specialization

@tychedelia, maybe you would know what decision would be the best ?

Notes:

  • AssetChanged doc has an error, it says that it runs in Last, while it runs in PostUpdate. l opened a PR to fix it.

PostUpdate,
check_entities_needing_specialization::<M>.after(AssetEvents),
);
.add_systems(Last, check_entities_needing_specialization::<M>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going with option 2 as described above

Suggested change
.add_systems(Last, check_entities_needing_specialization::<M>);
.add_systems(
PostUpdate,
check_entities_needing_specialization::<M>
.after(AssetEvents)
.after(VisibilitySystems::CheckVisibility),
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually my initial choice for a fix, but @tychedelia mentioned on discord that moving the system to Last might be safer if it turns out there's no benefit to keeping it in LastUpdate. I think they were going to take a look when they had time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Then we should at least move check_light_entities_needing_specialization to Last in bevy_pbr)

Comment on lines 287 to +293
.add_systems(
PostUpdate,
(
mark_meshes_as_changed_if_their_materials_changed::<M>.ambiguous_with_all(),
check_entities_needing_specialization::<M>.after(AssetEvents),
)
mark_meshes_as_changed_if_their_materials_changed::<M>
.ambiguous_with_all()
.after(mark_3d_meshes_as_changed_if_their_assets_changed),
);
)
.add_systems(Last, check_entities_needing_specialization::<M>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With option 2. described above, this would look like:

       .add_systems(
                PostUpdate,
                (
                    mark_3d_meshes_as_changed_if_their_assets_changed,
                    mark_meshes_as_changed_if_their_materials_changed::<M>.ambiguous_with_all(),
                    check_entities_needing_specialization::<M>,
                )
                    .chain()
                    .after(AssetEvents)
                    .after(VisibilitySystems::CheckVisibility),
            );

The chaining is not strictly necessary, we could just add the new aftercondition to check_entities_needing_specialization.
But that's also a point I'd like to discuss. Currently check_entities_needing_specialization needlessly? checks for

 Or<(
                Changed<Mesh3d>,
                AssetChanged<Mesh3d>,
                Changed<MeshMaterial3d<M>>,
                AssetChanged<MeshMaterial3d<M>>,
            )>,

But that is literally what mark_3d_meshes_as_changed_if_their_assets_changed and mark_meshes_as_changed_if_their_materials_changed seem to be there for (marking Mesh3d as changed) ? So chaining, and just checking for Changed<Mesh3d> in check_entities_needing_specialization would probably be simpler and achieve the expected goal.

Also:

  • with the current PR mark_meshes_as_changed_if_their_materials_changed and mark_3d_meshes_as_changed_if_their_assets_changed are not properly scheduled after AssetEvents
  • on main, mark_meshes_as_changed_if_their_materials_changed is not either, which could expain why all the Or parameters were introduced in check_entities_needing_specialization

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you may have misread something here. mark_3d_meshes_as_changed_if_their_assets_changed is from bevy_render, and isn't added here. The only ordering constraint related to it in either main or this PR should be mark_meshes_as_changed_if_their_materials_changed.after(mark_3d_meshes_as_changed_if_their_assets_changed) unless there's some other interaction happening.

The minimal change from main to ensure ordering relative to CheckVisibility would just be:

.add_systems(
    PostUpdate,
    (
        mark_meshes_as_changed_if_their_materials_changed::<M>.ambiguous_with_all(),
        check_entities_needing_specialization::<M>
            .after(AssetEventSystems)
            .after(VisibilitySystems::CheckVisibility),
    )
        .after(mark_3d_meshes_as_changed_if_their_assets_changed)
);

mark_3d_meshes_as_changed_if_their_assets_changed is also already explicitly ordered before asset events here:

mark_3d_meshes_as_changed_if_their_assets_changed
.ambiguous_with(VisibilitySystems::CalculateBounds)
.before(AssetEventSystems),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about mark_3d_meshes_as_changed_if_their_assets_changed, it's only used for ordering in bevy_pbr, my bad.
I'm still surprised though that mark_3d_meshes_as_changed_if_their_assets_changed is ordered before AssetEvents all the while reading the asset events. If anyone knows why I'd gladly take an explanation.
And, even a bit worse, mark_meshes_as_changed_if_their_materials_changed is also reading asset events. Its execution order is not deterministic as its only ordered after mark_3d_meshes_as_changed_if_their_assets_changed which itself is before AssetEvents. So mark_meshes_as_changed_if_their_materials_changed can end up running before or after AssetEvents depending on scheduling (and as such, marking meshes as modifed or not depending on scheduling).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like you just uncovered another bug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for thinking about this deeply. These indeed seem to be sketchy to me. I think that both these systems were implemented before the AssetChanged filter was added and so they are ad-hoc implementations of the same thing. We should be able to simply add that filter in extract_meshes_for_gpu_building for mark_3d_meshes_as_changed_if_their_assets_changed. mark_meshes_as_changed_if_their_materials_changed is a bit more complicated because it requires the material type, so needs to stay in place with the potential for fixing post #18075 potentially.

Copy link
Member

@janhohenheim janhohenheim May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a chance that this scheduling ambiguity I just ran into is also caused by this.

@janhohenheim
Copy link
Member

janhohenheim commented May 6, 2025

From reading this, I'm fairly sure this fixes my issue #18980 as well, so let's close that one on merging :)

@tychedelia
Copy link
Member

tychedelia commented May 7, 2025

For context, the reason these were added to PostUpdate instead of Last (including moving AssetEvents to PostUpdate) was to hide some latency of the change detection table scans behind the check_visibility system and any other engine internal bookkeeping we tend to do there. Benchmarking at the time showed an improvement.

Last is the safer choice in some ways, but it still creates the possibility of this bug if people spawn things in Last without ordering relative to AssetEvents (i.e. if we were to move AssetEvents back to Last). I originally chose the unwrap to enforce the invariant that we always have a detected change tick for new entities and have defended that choice recently precisely because it helps uncover issues like this. However, I hadn't fully considered that users might in practice spawn new mesh/materials even in Last. It's a little weird, but the schedule exists for a reason.

Given we've rooted out a lot of the cold specialization bugs, I could be convinced that we maybe could swap the unwrap with a very noisey error log that points out the user is likely doing something wrong.

Copy link
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move these to Last we should also revert the change to move AssetEvents into PostUpdate. I'd like to see some benchmarking too if possible.

@Trashtalk217 Trashtalk217 added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels May 16, 2025
@alice-i-cecile alice-i-cecile modified the milestones: 0.16.1, 0.16.2 May 30, 2025
@mockersf mockersf modified the milestones: 0.16.2, 0.17 Aug 18, 2025
@alice-i-cecile
Copy link
Member

Hi @grind086, do you have time to do the requested changes right now? This has ended up in the 0.17 milestone, but I'm tempted to cut it unless it's fixed up in the next couple of days.

@mirsella
Copy link
Contributor

hey, I would really love this being in 0.17, as this crash is prominent in my wasm builds.

@alice-i-cecile
Copy link
Member

@mirsella, are you up for adopting this PR then? I'd like to get this in too, but we need help finishing this work.

@mirsella
Copy link
Contributor

mirsella commented Aug 20, 2025

as much i would like to contribute to this level, sadly my knowledge of internal bevy things are nowhere near enough to understand the implications of each internal system ordering.

also, i remember now when i tried using this fix it was still happening, so i instead removed the panic instead to be sure:

            let Some(entity_tick) = entity_specialization_ticks.get(visible_entity) else {
                continue;
            };

now, ive tried using 0.16.1 again without these patch to try to get it to crash again but couldn't reproduce it this time :(

but i can help if its about moving back AssetEvents to Last or

Given we've rooted out a lot of the cold specialization bugs, I could be convinced that we maybe could swap the unwrap with a very noisey error log that points out the user is likely doing something wrong.

which is basically what i did in my fork minus the log

@mirsella
Copy link
Contributor

what about i do a small PR just to remove the panic and instead error! log for 0.17, and this PR solving the source issue can be completed in a future time ?

@alice-i-cecile
Copy link
Member

what about i do a small PR just to remove the panic and instead error! log for 0.17, and this PR solving the source issue can be completed in a future time ?

I would be very happy to review that PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior D-Straightforward Simple bug fixes and API improvements, docs, test and examples P-Crash A sudden unexpected crash S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Option::unwrap panic in specialize_material2d_meshes Wasm build hits panic on unwrap in specialize_material_meshes
8 participants