Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rp2: Fix lost CYW43 WiFi events when using both cores. #16915

Merged
merged 3 commits into from
Mar 26, 2025

Conversation

projectgus
Copy link
Contributor

@projectgus projectgus commented Mar 12, 2025

Summary

Closes #16779. Thanks to @coencoensmeets for a clear and easy to reproduce bug report, and to @Gadgetoid for doing some deep digging to zero in on the root cause.

There's a very odd but predictable sequence of events that breaks Wi-Fi when using both cores:

  1. CPU1 calls pendsv_suspend() - for example sleep() causes a softtimer node to be inserted, which calls pendsv_suspend().
  2. CYW43 sends wakeup IRQ. CPU0 GPIO IRQ handler schedules PendSV and disables the GPIO IRQ on CPU0, to re-enable after cyw43_poll() runs and completes.
  3. CPU0 PendSV_Handler runs, sees pendsv is suspended, exits.
  4. CPU1 calls pendsv_resume() and pendsv_resume() sees PendSV is pending and triggers it on CPU1.
  5. CPU1 runs PendSV_Handler, runs cyw43_poll(), and at the end it re-enables the pin IRQ but now on CPU1.

However CPU1 has the overall GPIO IRQ line disabled, so the CYW43 interrupt never runs again...

Fix is to specifically only enable the GPIO interrupt on CPU0. This isn't supported in pico-sdk but appears to be supported by the hardware.

Plus two follow-up commits:

  • Refactor out the cyw43_has_pending global flag to instead be a check of the pendsv table. This was necessary for the previous version of this PR. Not necessary now, but seems like a worthwhile change.
  • Enable PendSV interrupt at the correct priority on both CPUs, and even if CYW43 is not in use. Also adds some comments in PendSV explaining how it can run on either CPU.

This work was funded through GitHub Sponsors.

Testing

  • Ran the test Python program from the linked issue on both RPI_PICO_W and RPI_PICO2_W boards, verified the problem goes away.
  • Ran the multi_wlan test between RPI_PICO_W and RPI_PICO2_W and an ESP32, verified tests still pass.

Trade-offs and Alternatives

  • First version of this PR instead changed the interrupt to rising edge, so it didn't need to be disabled. This appears to work fine, and simplifies the code, but actually the CYW43 interrupt pin on RP2 is muxed with the data pin so the GPIO edge interrupt was triggering spuriously at a very high frequency. Oops!
  • @Gadgetoid has a PR rp2: pendsv add support for CPU affinity #16788 which adds support for CPU affinity in PendSV callbacks. We could add this support, although we don't really need the feature (everything that calls pendsv_schedule_dispatch() runs on CPU0).
  • As @Gadgetoid also suggested in rp2: pendsv add support for CPU affinity #16788, we could change PendSV to only ever fire on CPU0. In fact, comments in pendsv.c suggest that's the current expectation. I actually implemented this fix first, see projectgus@47723da . That is still more complex than this fix as it needs a cross-core PendSV trigger for when CPU1 does pendsv_resume(). However it might be easier to reason about overall.

Copy link

github-actions bot commented Mar 12, 2025

Code size report:

   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
     mimxrt:    +0 +0.000% TEENSY40
        rp2:   -12 -0.001% RPI_PICO_W[incl -4(bss)]
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

@projectgus

This comment was marked as outdated.

@Gadgetoid
Copy link
Contributor

Thank you for delving deeper into this and detailing the cause, I didn't quite understand why what I was seeing was happening, but now I do.

CYW43 never fails to surface the wildest bugs!

@dpgeorge dpgeorge added this to the release-1.25.0 milestone Mar 12, 2025
@projectgus

This comment was marked as outdated.

@projectgus
Copy link
Contributor Author

New version pushed which keeps the level-triggered interrupt but hand-rolls a function to only ever enable it on CPU0.

Have also a raised a feature request in pico-sdk to have this in the GPIO API: raspberrypi/pico-sdk#2354

@projectgus projectgus force-pushed the bugfix/rp2_multicore_wifi branch 2 times, most recently from 0854938 to 6a72c58 Compare March 18, 2025 05:23
@dpgeorge
Copy link
Member

The latest changes look really good. I will test them.

@projectgus projectgus force-pushed the bugfix/rp2_multicore_wifi branch from 6a72c58 to 069cd2f Compare March 24, 2025 23:08
@dpgeorge
Copy link
Member

I've done extensive tests of this PR:

  • tested RPI_PICO_W, RPI_PICO2_W and RPI_PICO2_W-RISCV
  • ran full test suite, all possible tests, including network and BLE
  • ran iperf3 test on RPI_PICO2_W-RISCV

I didn't see any issues or regressions.

There's a very odd but predictable sequence of events that breaks Wi-Fi
when using both cores:

1) CPU1 calls pendsv_suspend() - for example sleep() causes
   a softtimer node to be inserted, which calls pendsv_suspend().
2) CYW43 sends wakeup IRQ. CPU0 GPIO IRQ handler schedules PendSV
   and disables the GPIO IRQ on CPU0, to re-enable after
   cyw43_poll() runs and completes.
3) CPU0 PendSV_Handler runs, sees pendsv is suspended, exits.
4) CPU1 calls pendsv_resume() and pendsv_resume() sees PendSV
   is pending and triggers it on CPU1.
5) CPU1 runs PendSV_Handler, runs cyw43_poll(), and at the end
   it re-enables the IRQ *but now on CPU1*.

However CPU1 has GPIO IRQs disabled, so the CYW43 interrupt never runs
again...

The fix in this commit is to always enable/disable the interrupt on CPU0.
This isn't supported by the pico-sdk, but it is supported by the hardware.

Fixes issue micropython#16779.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <[email protected]>
A better indication of whether a cyw43 event is pending is the actual flag
in the PendSV handler table. (If this fails, could also use the GPIO
interrupt enabled register bit).

This commit was needed of a previous version of the fix in the parent
commit, but it turned out not strictly necessary for the current version.
However, it's still a good clean up.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <[email protected]>
Changes:
- Move setting of PendSV priority to pendsv_init().
- Call pendsv_init() from CPU1 as well, to ensure priority is the same.

This work was funded through GitHub Sponsors.

Signed-off-by: Angus Gratton <[email protected]>
@dpgeorge dpgeorge force-pushed the bugfix/rp2_multicore_wifi branch from 069cd2f to 35d4d2d Compare March 26, 2025 13:09
@dpgeorge dpgeorge merged commit 35d4d2d into micropython:master Mar 26, 2025
8 checks passed
@dpgeorge
Copy link
Member

Thanks for splitting this up into a set of very clean, independent commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RP2: Pico 2 W Not connecting with thread running
3 participants