Skip to content

Making panic-never a requirement or convention for rust-embedded libraries where feasible #551

Open
@mitchmindtree

Description

@mitchmindtree

TL;DR

It would be great if rust-embedded adopted panic-never as a standard for libraries. I found it impossible to take advantage of panic-never while also taking advantage of the rust-embedded libraries necessary to build my first non-trivial Rust embedded project. This was due to frequent uses of panic! throughout libraries. This is totally understandable as Rust itself provides very little tooling to avoid this and almost encourages it (i.e. indexing, slicing). While panic! is very useful for quick iteration in software, it can be detrimental to firmware without significant tooling/logging in the panic-handler which isn't always feasible for embedded projects, especially due to severe lack of context in the panic handler compared to regular error handling. The newish no-panic crate may help with retrofitting panic-never, and const generics may help to avoid common causes of panicking code branches.


Context & Motivation

This suggestion comes from my experience writing firmware for a kinetic artwork over the past few months. I finally have had some time to reflect and thought I'd open an issue here to get others' thoughts on this :) It seems particularly good timing considering that avoiding Rust panics seems to be the hot topic for landing Rust support in the Linux kernel today https://lkml.org/lkml/2021/4/14/1099.

This was one of my first major firmware projects, involving a pretty tall stack of protocols including SPI for LEDs, I2C for time of flight, one-wire UART for motor driver control and Ethernet for real-time TCP/IP communication with the master software. To achieve this project in the deadline that we had would have been impossible without all the awesome existing work in the rust-embedded ecosystem. The fact that I could include an Ethernet bootloader, serialization between software and firmware, use a real-time scheduler and more thanks to existing work made the project possible! Naturally, this required leaning on quite a few dependencies, including postcard (and in turn serde), rtic, smoltcp and loads of others.

During early prototype testing, I quickly learned just how drastic panic!ing could be in firmware compared to my experience with writing software, particularly when controlling a large number of motors attached to expensive parts. This lead me to search for solutions to ensure that I could avoid panicking entirely, which lead me to the panic-never crate.

After a few days of commenting out the entire project and trying to add modules back one by one with panic-never included, I quickly realised that, while I could track down and address all of the panic! sites in my own code, it would be impossible for me to track down and address all panic! sites throughout all the dependencies that I required for the project to function - especially considering the limited, cryptic linker errors that panic-never could provide, resulting in an approach that consisted of commenting everything out and re-adding parts one at a time until the linker error showed up.

Following the realisation that I would have to accept the possibility of panic!s, I began work on a custom panic handler. Easily the largest problem with the custom panic handler was the lack of context, and not knowing what state the device was in when the panic occurred... This lead to the need for moving parts of the application state into global state. This was necessary to 1. send some indication of an error back to the master via Ethernet (provided it was even possible to do so in the panicking state) and 2. disable the motor via UART! This was of utmost importance as the motor driver has it's own step generator, and if the last thing it received was some high velocity before the panic, then there was nothing else stopping it from endlessly driving out the motors until someone freaks out and cuts the power 😱

Beyond the obvious reasons why moving state into a global context was unpleasant, I was using RTIC to handle scheduling. RTIC requires managing state in a certain way in order for its priority task system to function in a safe-yet-efficient manner. This meant lots of acrobatics with mutexes and critical sections in order to expose the necessary networking and motor state to the panic handler through a global context, much of which I'm still uncertain is actually safe to this day.


I want to acknowledge that all of these problems are ultimately our own fault. Specifically, for cornering ourselves by accepting a timeline for a project that meant I simply couldn't both 1. take advantage of many of the awesome existing crates throughout the rust ecosystem that were necessary to make such a sophisticated project possible in a short amount of time and 2. actually review all of these dependencies and develop enough familiarity with their src to guarantee there could be no panic! conditions throughout. It is this choice that lead to the need for the aformentioned hacks and awkward panic handling solution.

That said, I think it is at least worth checking whether or not it is possible to have our cake and eat it too by investigating the feasibility of having panic-never as a standard practise for embedded libraries. I cannot tell you how much of a relief it would be to know for certain that it simply wasn't possible to panic, particularly when the firmware is moving 100s of motors around on an artist's budget that provides very little room for repairs 😂 While custom panic handlers help, they provide almost no context about the state of the system during a panic by default and encourage some serious anti-patterns in order to handle those cases.

no-panic

I think perhaps this is more achievable now that no-panic exists, allowing for a more granular approach to narrowing down panic! sites, also with slightly better error messages. The function attribute approach allows for achieving a panic-less codebase one function at a time, without having to solve everything at once as is required with panic-neveralone.

Indexing, slicing and const generics

I think const generics may also play a large role in making this possible. Perhaps the sneakiest and most prevalent culprit for introducing panicking code is rust's core Indexing and slicing methods. This is especially frustrating when most embedded code works with fixed size arrays, where the author performing the indexing/slicing knows that it is safe to do so and that it is impossible for the panic to actually occur. I wonder if we can come up with some const-generic based approach to bounds checking for indexing and slicing of fixed size arrays that avoids the need for generating panicking branches.

rustc

Another approach might be to instead focus on landing support for avoiding panicking in rustc itself? While no-panic is already a big improvement over panic-never, it is still a long-shot from having a nicely formatted call-stack with line-numbered links to the source code of each function call that leads to each panic. I'm yet to investigate existing proposals for such a tool.


My aim with this issue is mostly to begin a discussion. I'm curious to hear others' thoughts, i.e. Have you had similar experirences? Is this a worthy/pracitcal goal? Or perhaps infeasible for reasons I haven't touched on yet?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions