Skip to content

Add multicore example #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 16, 2025
Merged

Add multicore example #24

merged 5 commits into from
Apr 16, 2025

Conversation

jonathanpallant
Copy link
Contributor

Adds a multi-core example for MPS3-AN536.

Builds on #23, which adds testing for MPS3-AN536

cc @robamu

@jonathanpallant jonathanpallant force-pushed the add-multicore-example branch from 1c913ae to 9b9875d Compare April 4, 2025 17:53
@thejpster
Copy link
Contributor

Oops. Forgot to update the expected output for the smp test.

CAS test passed
CS Mutex test passed

@thejpster
Copy link
Contributor

Idea - use TCM to store the stack. The Core 0 stack can be set by assigning _stack_top in memory.x to the top of either ATCM0, BTCM0 or CTCM0. We can pass a pointer to the top of ATCM1, BTCM1 or CTCM1 when starting Core 1. See https://github.com/qemu/qemu/blob/53f3a13ac1069975ad47cf8bd05cc96b4ac09962/hw/arm/mps3r.c#L157.

I'm still looking for a better spin lock variable than the FPGA LED control register.

@thejpster
Copy link
Contributor

There is no defined secondary boot protocol for Linux for the AN536, because real hardware has a restriction that atomic operations between the two CPUs do not function correctly, and so true SMP is not possible. Link

Amazing. Apologies to anyone who tries this example on a real MPS3 loaded with the AN536 image.

I guess we could use the GICv3 and send an interrupt to wake up the second core.

@robamu
Copy link
Contributor

robamu commented Apr 5, 2025

What about wfi/wfe and sev?

Really cool that you added a multi-core Mutex. I am completely unfamiliar with correctly using/implementing SMP for bare-metal apps. I know that AMP is also common on the Zynq7000, because you can just use the FPGA a a mailbox mechanism (circumventing issues like cache coherency, proper multi-core locks etc.), and just run 2 single-core apps on each processor.
I guess issues like cache coherency still be problematic even with a correctly implemented multi-core mutex?

@thejpster
Copy link
Contributor

wfe doesn't work in QEMU, at least not with GDB connected. It just goes straight over the instruction.

@thejpster
Copy link
Contributor

I guess issues like cache coherency still be problematic

In theory, you configure the MPU to mark the appropriate address ranges as cacheable or not cacheable, and the hardware should just sort everything else out?

@jonathanpallant jonathanpallant force-pushed the add-multicore-example branch from 5a66532 to cb1357f Compare April 5, 2025 14:41
@jonathanpallant
Copy link
Contributor Author

Switched to using TCM for the stacks, and squashed to remove all the redundant 'struct Stack' code.

@jonathanpallant
Copy link
Contributor Author

Wait, this might not be sound. The setup_stacks function assumes that LR is preserved, but we switch modes a bunch of times and that switches us to a different LR if we end up in sys mode but we didn't arrive in sys mode? We should store the incoming LR somewhere.

@jonathanpallant jonathanpallant force-pushed the add-multicore-example branch from cb1357f to f52c823 Compare April 9, 2025 12:29
@jonathanpallant
Copy link
Contributor Author

Rebased after #23 came in

@jonathanpallant jonathanpallant force-pushed the add-multicore-example branch from f52c823 to ff75c22 Compare April 9, 2025 12:34
@jonathanpallant
Copy link
Contributor Author

Wait, this might not be sound. The setup_stacks function assumes that LR is preserved, but we switch modes a bunch of times and that switches us to a different LR if we end up in sys mode but we didn't arrive in sys mode? We should store the incoming LR somewhere.

Hopefully I fixed this.

We need our SMP applications to be able to intialise the stacks on
the secondary cores, so we break that function out separately. I also
allocated some space on Armv8-R for the HYP mode stack, and drew a
diagram to visualize where the stacks are.
@jonathanpallant
Copy link
Contributor Author

Rebased now #27 is in, and fixed up some minor typos.

let core_id = crate::asm::core_id();

let locked_already = loop {
match CORE_SPIN_LOCK.compare_exchange(
Copy link
Contributor

@robamu robamu Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are on ARM, so maybe compare_exchange_weak works as well? I'd need to check the Armv7a ref manual again, but IIRC, it uses these special instructions (LL/SC) to load and store .

Copy link
Contributor Author

@jonathanpallant jonathanpallant Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think spurious failure here would be bad, as we might believe we already hold the lock when it in fact we took the lock. Then we would unlock incorrectly.

But I'm happy for someone to argue that it would in fact be correct.

Copy link
Contributor

@robamu robamu Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. It probably only works when there is a re-try on all error cases.

match CORE_SPIN_LOCK.compare_exchange(
UNLOCKED,
core_id,
atomic::Ordering::Acquire,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand why this does not have to be AcqRel .. is it because of the compiler fence after the loop?

Copy link
Contributor Author

@jonathanpallant jonathanpallant Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mara's example at https://marabos.nl/atomics/building-spinlock.html uses Acquire, Relaxed, as does spin.

As I understand it, we need a load-Acquire when locking the spin-lock, which works with the the store-Release when the spin-lock is unlocked to ensure that the unlock completes before the lock can be taken. Because the lock can only be locked by one thing at a time, there is no subsequent load-Acquire of the spin-lock that we need to ensure happens after the spin-lock is locked. But my grasp on this is tenuous at best.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'll definitely read that. I read Jons Rust for Rustacean book about that, but its a very tough subject..

@jonathanpallant jonathanpallant merged commit 6f287c1 into main Apr 16, 2025
57 checks passed
@jonathanpallant jonathanpallant deleted the add-multicore-example branch April 16, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants