-
Notifications
You must be signed in to change notification settings - Fork 8
Add multicore example #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1c913ae
to
9b9875d
Compare
Oops. Forgot to update the expected output for the smp test.
|
Idea - use TCM to store the stack. The Core 0 stack can be set by assigning _stack_top in memory.x to the top of either ATCM0, BTCM0 or CTCM0. We can pass a pointer to the top of ATCM1, BTCM1 or CTCM1 when starting Core 1. See https://github.com/qemu/qemu/blob/53f3a13ac1069975ad47cf8bd05cc96b4ac09962/hw/arm/mps3r.c#L157. I'm still looking for a better spin lock variable than the FPGA LED control register. |
Amazing. Apologies to anyone who tries this example on a real MPS3 loaded with the AN536 image. I guess we could use the GICv3 and send an interrupt to wake up the second core. |
What about Really cool that you added a multi-core Mutex. I am completely unfamiliar with correctly using/implementing SMP for bare-metal apps. I know that AMP is also common on the Zynq7000, because you can just use the FPGA a a mailbox mechanism (circumventing issues like cache coherency, proper multi-core locks etc.), and just run 2 single-core apps on each processor. |
wfe doesn't work in QEMU, at least not with GDB connected. It just goes straight over the instruction. |
In theory, you configure the MPU to mark the appropriate address ranges as cacheable or not cacheable, and the hardware should just sort everything else out? |
5a66532
to
cb1357f
Compare
Switched to using TCM for the stacks, and squashed to remove all the redundant 'struct Stack' code. |
Wait, this might not be sound. The setup_stacks function assumes that LR is preserved, but we switch modes a bunch of times and that switches us to a different LR if we end up in sys mode but we didn't arrive in sys mode? We should store the incoming LR somewhere. |
cb1357f
to
f52c823
Compare
Rebased after #23 came in |
f52c823
to
ff75c22
Compare
Hopefully I fixed this. |
Uses an atomic variable as a spin-lock.
7c3f48e
to
f1c4582
Compare
We need our SMP applications to be able to intialise the stacks on the secondary cores, so we break that function out separately. I also allocated some space on Armv8-R for the HYP mode stack, and drew a diagram to visualize where the stacks are.
46e9f8d
to
e444249
Compare
Rebased now #27 is in, and fixed up some minor typos. |
let core_id = crate::asm::core_id(); | ||
|
||
let locked_already = loop { | ||
match CORE_SPIN_LOCK.compare_exchange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are on ARM, so maybe compare_exchange_weak
works as well? I'd need to check the Armv7a ref manual again, but IIRC, it uses these special instructions (LL/SC) to load and store .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think spurious failure here would be bad, as we might believe we already hold the lock when it in fact we took the lock. Then we would unlock incorrectly.
But I'm happy for someone to argue that it would in fact be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. It probably only works when there is a re-try on all error cases.
match CORE_SPIN_LOCK.compare_exchange( | ||
UNLOCKED, | ||
core_id, | ||
atomic::Ordering::Acquire, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand why this does not have to be AcqRel
.. is it because of the compiler fence after the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mara's example at https://marabos.nl/atomics/building-spinlock.html uses Acquire, Relaxed
, as does spin
.
As I understand it, we need a load-Acquire when locking the spin-lock, which works with the the store-Release when the spin-lock is unlocked to ensure that the unlock completes before the lock can be taken. Because the lock can only be locked by one thing at a time, there is no subsequent load-Acquire of the spin-lock that we need to ensure happens after the spin-lock is locked. But my grasp on this is tenuous at best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I'll definitely read that. I read Jons Rust for Rustacean book about that, but its a very tough subject..
Adds a multi-core example for MPS3-AN536.
Builds on #23, which adds testing for MPS3-AN536
cc @robamu