-
Notifications
You must be signed in to change notification settings - Fork 199
Transient boot selection #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Transient boot selection #2050
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update fn boot_preferences
to read our transient preference (GitHub won't let me comment on code not in the PR)
4864f92
to
66acc49
Compare
66acc49
to
6e9da78
Compare
83fc549
to
d94e656
Compare
done |
d94e656
to
f1e94f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the MGS side is ready I will hit approve
8d68a85
to
87a91ed
Compare
See bootleby:src/lib.rs for the receiving end of the override selection.
Given the current API for setting the transient boot preference there is an implementation choice. While setting the transient selection to the alternate image (currently non-active) is the common use case, we also need to define what happens when the transient preference is specified to be the currently active RoT Hubris image. The simple case is that setting transient to active will clear the transient selection. That is the choice this PR makes. The other case, setting an actual transient preference to the active gives rise to some more complex behavior that is just annoying. Assume that A is active, B is the pending persistent preference, and A is also the transient preference. On the first reset, the ROM will promote the CFPA scratch page to active, thus making B the persistent preference. However, Bootleby will use the transient preference of A to select the active image. The RoT will then be running Hubris image A. Bootleby has cleared the transient selection. On the second reset, there is no transient selection or pending persistent. Bootleby will select Hubris image B. These scenarios have been tested and analyzed and there is no motivation found for enabling this sequence. If, after merging this PR, it is found that we do want to enable this sequence of two resets in a row running different images, we can change the API to an enum of `Select(SlotId)` or `ClearSelection`. --- Hubris issue 2093 was found in testing where setting the persistent preference to different values in succession leads to the inability to clear a pending persistent boot preference. This may be needed to recover from a previous failed/abandoned RoT Hubris update where one wants to avoid additional reset operations. This is a corner case with a workaround, but it is also against the intent of the API.
87a91ed
to
a780177
Compare
Fixes from review. Fix memory alias for RoT Hubris 'B' image's access to transient boot override setting. `component_update_prep_image` was using `bootstate` to determine active image. The `same_image` function is a more direct test and does not depend on `bootstate` being valid.
a780177
to
e74fe24
Compare
stage0_handoff::RotSlot::B => SlotId::B, | ||
}; | ||
|
||
// If the requested slot is the same as the active slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These semantics seem confusing / non-obvious. I think it would make more sense to explicitly return an error (InvalidPreferredSlotId
?) if someone tries to set the transient override to be the same as the active slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the case where a previous update attempt has been abandoned or failed and left a transient preference that the new session does not want. We want to enable resetting an existing transient preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This semantic came out of a fault insertion test where this type of abandoned session was simulated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems non ideal for the developer user case. I brought up not clearing the transient setting without some indication before #2050 (comment) albeit in a slightly different place. I'd rather have an explicit way to clear the transient setting if we want that behavior. This would require MGS changes but I'd argue this is a gap in how the original API was designed.
The use of (This is admittedly not a new problem introduced in the PR, but the PR adds another case of it) I think the right way to handle this is using an Here's an (untested) patch which does just that: diff --git a/app/lpc55xpresso/app.toml b/app/lpc55xpresso/app.toml
index cc03b0de53..48b479bacb 100644
--- a/app/lpc55xpresso/app.toml
+++ b/app/lpc55xpresso/app.toml
@@ -51,7 +51,8 @@
max-sizes = {flash = 26720, ram = 16704}
stacksize = 8192
start = true
-sections = {bootstate = "usbsram", transient_override = "override"}
+sections = {bootstate = "usbsram"}
+extern-regions = ["transient_override"]
uses = ["flash_controller", "hash_crypt"]
notifications = ["flash-irq", "hashcrypt-irq"]
interrupts = {"flash_controller.irq" = "flash-irq", "hash_crypt.irq" = "hashcrypt-irq"}
diff --git a/app/oxide-rot-1/app-dev.toml b/app/oxide-rot-1/app-dev.toml
index 50a945c016..9d45ad973f 100644
--- a/app/oxide-rot-1/app-dev.toml
+++ b/app/oxide-rot-1/app-dev.toml
@@ -55,7 +55,8 @@
priority = 3
stacksize = 8192
start = true
-sections = {bootstate = "usbsram", transient_override = "override"}
+sections = {bootstate = "usbsram"}
+extern-regions = ["transient_override"]
uses = ["flash_controller", "hash_crypt"]
notifications = ["flash-irq", "hashcrypt-irq"]
interrupts = {"flash_controller.irq" = "flash-irq", "hash_crypt.irq" = "hashcrypt-irq"}
diff --git a/app/oxide-rot-1/app.toml b/app/oxide-rot-1/app.toml
index 9e12953bad..0c45eb254d 100644
--- a/app/oxide-rot-1/app.toml
+++ b/app/oxide-rot-1/app.toml
@@ -43,7 +43,8 @@
priority = 3
stacksize = 8192
start = true
-sections = {bootstate = "usbsram", transient_override = "override"}
+sections = {bootstate = "usbsram"}
+extern-regions = ["transient_override"]
uses = ["flash_controller", "hash_crypt"]
notifications = ["flash-irq", "hashcrypt-irq"]
interrupts = {"flash_controller.irq" = "flash-irq", "hash_crypt.irq" = "hashcrypt-irq"}
diff --git a/app/rot-carrier/app.toml b/app/rot-carrier/app.toml
index 3072a96e1c..74c20d7d64 100644
--- a/app/rot-carrier/app.toml
+++ b/app/rot-carrier/app.toml
@@ -43,7 +43,8 @@
# TODO size this appropriately
stacksize = 8192
start = true
-sections = {bootstate = "usbsram", transient_override = "override"}
+sections = {bootstate = "usbsram"}
+extern-regions = ["transient_override"]
uses = ["flash_controller", "hash_crypt"]
notifications = ["flash-irq", "hashcrypt-irq"]
interrupts = {"flash_controller.irq" = "flash-irq", "hash_crypt.irq" = "hashcrypt-irq"}
diff --git a/chips/lpc55/memory.toml b/chips/lpc55/memory.toml
index 0125777412..c6322ef47d 100644
--- a/chips/lpc55/memory.toml
+++ b/chips/lpc55/memory.toml
@@ -93,7 +93,7 @@
execute = false
dma = true
-[[override]]
+[[transient_override]]
name = "a"
address = 0x2003ffe0
size = 32
@@ -101,7 +101,7 @@
write = true
execute = false
-[[override]]
+[[transient_override]]
name = "b"
address = 0x2003ffe0
size = 32
diff --git a/drv/lpc55-update-server/src/main.rs b/drv/lpc55-update-server/src/main.rs
index 8f78b91c42..3cc2898d46 100644
--- a/drv/lpc55-update-server/src/main.rs
+++ b/drv/lpc55-update-server/src/main.rs
@@ -41,10 +41,6 @@
#[link_section = ".bootstate"]
static BOOTSTATE: MaybeUninit<[u8; 0x1000]> = MaybeUninit::uninit();
-#[used]
-#[link_section = ".transient_override"]
-static mut TRANSIENT_OVERRIDE: MaybeUninit<[u8; 32]> = MaybeUninit::uninit();
-
#[derive(Copy, Clone, PartialEq)]
enum UpdateState {
NoUpdate,
@@ -1416,22 +1412,39 @@
RotBootStateV2::load_from_addr(addr)
}
+extern "C" {
+ // Symbols injected by the linker.
+ //
+ // This requires adding `extern-regions = ["transient_override"]` to the
+ // task config
+ pub static mut __REGION_TRANSIENT_OVERRIDE_BASE: [u32; 0];
+}
+
fn set_transient_override(preference: [u8; 32]) {
- // Safety: Data is consumed by Bootleby on next boot.
- // There are no concurrent writers possible.
- // Calling this function multiple times is ok.
- // Bootleby is careful to vet contents before acting.
- unsafe {
- TRANSIENT_OVERRIDE.write(preference);
+ // SAFETY: populated by the linker, getting the address is fine
+ let override_addr =
+ unsafe { __REGION_TRANSIENT_OVERRIDE_BASE.as_ptr() } as *mut u8;
+ for (i, p) in preference.iter().enumerate() {
+ // SAFETY: this points to a valid region of RAM that is otherwise unused
+ // by Rust, so we can write to it.
+ unsafe {
+ core::ptr::write_volatile(override_addr.wrapping_add(i), *p);
+ }
}
}
fn get_transient_override() -> [u8; 32] {
- // Safety: Data is consumed by Bootleby on next boot.
- // There are no concurrent writers possible.
- // Bootleby consumes and resets TRANSIENT_OVERRIDE.
- // The client may be verifying state set during update flows.
- unsafe { TRANSIENT_OVERRIDE.assume_init() }
+ // SAFETY: populated by the linker, getting the address is fine
+ let override_addr =
+ unsafe { __REGION_TRANSIENT_OVERRIDE_BASE.as_ptr() } as *const u8;
+ let mut out = [0u8; 32];
+ for (i, p) in out.iter_mut().enumerate() {
+ // SAFETY: values are guaranteed to be initialized by stage0
+ unsafe {
+ *p = core::ptr::read_volatile(override_addr.wrapping_add(i));
+ }
+ }
+ out
}
// Preference constants are taken from bootleby:src/lib.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(marking this as "Request changes" to discuss returning an error if the persistent slot matches the active slot)
Co-authored-by: Matt Keeter <[email protected]>
Co-authored-by: Matt Keeter <[email protected]>
stage0_handoff::RotSlot::B => SlotId::B, | ||
}; | ||
|
||
// If the requested slot is the same as the active slot, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems non ideal for the developer user case. I brought up not clearing the transient setting without some indication before #2050 (comment) albeit in a slightly different place. I'd rather have an explicit way to clear the transient setting if we want that behavior. This would require MGS changes but I'd argue this is a gap in how the original API was designed.
// If we did allow setting the active slot as the transient | ||
// preference, then we get this confusing 2-boot scenario when | ||
// pending persistent preference is also used: | ||
// the next reset returns us to the original image and the 2nd | ||
// reset has us running the new/alternate image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement seems strange in contrast to "This is for the case where a previous update attempt has been abandoned or failed and left a transient preference that the new session does not want" from above. If we are running active slot A
and alternate slot is B
, an incomplete/aborted update will always be targeting B
. Why do we have a persistent setting for B
if the update was aborted?
Automated update should take advantage of transient boot selection. This is the last piece to make that feature available.
See RFD 374 for a description and bootleby source for the logic being driven.