diff --git a/STYLE.md b/STYLE.md index 88d57d2dda98..9d4accdfe606 100644 --- a/STYLE.md +++ b/STYLE.md @@ -226,9 +226,23 @@ collapsed or removed entirely from the slide. - Where to pause and engage the class with questions. -- Speaker notes are not a script for the instructor. When teaching the course, - instructors only have a short time to glance at the notes. Don't include full - paragraphs for the instructor to read out loud. +- Speaker notes should serve as a quick reference for instructors, not a + verbatim script. Because instructors have limited time to glance at notes, the + content should be concise and easy to scan. + + **Avoid** long, narrative paragraphs meant to be read aloud: + > **Bad:** _"In this example, we define a trait named `StrExt`. This trait has + > a single method, `is_palindrome`, which takes a `&self` receiver and returns + > a boolean value indicating if the string is the same forwards and + > backwards..."_ + + **Instead, prefer** bullet points with background information or actionable + **teaching prompts**: + > **Good:** + > + > - Note: The `Ext` suffix is a common convention. + > - Ask: What happens if the `use` statement is removed? + > - Demo: Comment out the `use` statement to show the compiler error. - Nevertheless, include all of the necessary teaching prompts for the instructor in the speaker notes. Unlike the main content, the speaker notes don't have to diff --git a/src/SUMMARY.md b/src/SUMMARY.md index dfd360edf6b4..06f62e009670 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -445,6 +445,13 @@ - [Serializer: implement Struct](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/struct.md) - [Serializer: implement Property](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/property.md) - [Serializer: Complete implementation](idiomatic/leveraging-the-type-system/typestate-pattern/typestate-generics/complete.md) + - [Token Types](idiomatic/leveraging-the-type-system/token-types.md) + - [Permission Tokens](idiomatic/leveraging-the-type-system/token-types/permission-tokens.md) + - [Token Types with Data: Mutex Guards](idiomatic/leveraging-the-type-system/token-types/mutex-guard.md) + - [Branded pt 1: Variable-specific tokens](idiomatic/leveraging-the-type-system/token-types/branded-01-motivation.md) + - [Branded pt 2: `PhantomData` and Lifetime Subtyping](idiomatic/leveraging-the-type-system/token-types/branded-02-phantomdata.md) + - [Branded pt 3: Implementation](idiomatic/leveraging-the-type-system/token-types/branded-03-impl.md) + - [Branded pt 4: Branded types in action.](idiomatic/leveraging-the-type-system/token-types/branded-04-in-action.md) --- diff --git a/src/idiomatic/leveraging-the-type-system/token-types.md b/src/idiomatic/leveraging-the-type-system/token-types.md new file mode 100644 index 000000000000..c3b2168d3cc8 --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types.md @@ -0,0 +1,72 @@ +--- +minutes: 15 +--- + +# Token Types + +Types with private constructors can be used to act as proof of invariants. + + +```rust,editable +pub mod token { + // A public type with private fields behind a module boundary. + pub struct Token { proof: () } + + pub fn get_token() -> Option { + Some(Token { proof: () }) + } +} + +pub fn protected_work(token: token::Token) { + println!("We have a token, so we can make assumptions.") +} + +fn main() { + if let Some(token) = token::get_token() { + // We have a token, so we can do this work. + protected_work(token); + } else { + // We could not get a token, so we can't call `protected_work`. + } +} +``` + + +
+ +- Motivation: We want to be able to restrict user's access to functionality + until they've performed a specific task. + + We can do this by defining a type the API consumer cannot construct on their + own, through the privacy rules of structs and modules. + + [Newtypes](./newtype-pattern.md) use the privacy rules in a similar way, to + restrict construction unless a value is guaranteed to hold up an invariant at + runtime. + +- Ask: What is the purpose of the `proof: ()` field here? + + Without `proof: ()`, `Token` would have no private fields and users would be + able to construct values of `Token` arbitrarily. + + Demonstrate: Try to construct the token manually in `main` and show the + compilation error. Demonstrate: Remove the `proof` field from `Token` to show + how users would be able to construct `Token` if it had no private fields. + +- By putting the `Token` type behind a module boundary (`token`), users outside + that module can't construct the value on their own as they don't have + permission to access the `proof` field. + + The API developer gets to define methods and functions that produce these + tokens. The user does not. + + The token becomes a proof that one has met the API developer's conditions of + access for those tokens. + +- Ask: How might an API developer accidentally introduce ways to circumvent + this? + + Expect answers like "serialization implementations", other parser/"from + string" implementations, or an implementation of `Default`. + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/branded-01-motivation.md b/src/idiomatic/leveraging-the-type-system/token-types/branded-01-motivation.md new file mode 100644 index 000000000000..29e4e628eb1f --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/branded-01-motivation.md @@ -0,0 +1,78 @@ +--- +minutes: 10 +--- + +# Variable-Specific Tokens (Branding 1/4) + +What if we want to tie a token to a specific variable? + +```rust,editable +struct Bytes { + bytes: Vec, +} +struct ProvenIndex(usize); + +impl Bytes { + fn get_index(&self, ix: usize) -> Option { + if ix < self.bytes.len() { Some(ProvenIndex(ix)) } else { None } + } + fn get_proven(&self, token: &ProvenIndex) -> u8 { + unsafe { *self.bytes.get_unchecked(token.0) } + } +} + +fn main() { + let data_1 = Bytes { bytes: vec![0, 1, 2] }; + if let Some(token_1) = data_1.get_index(2) { + data_1.get_proven(&token_1); // Works fine! + + // let data_2 = Bytes { bytes: vec![0, 1] }; + // data_2.get_proven(&token_1); // Panics! Can we prevent this? + } +} +``` + +
+ +- What if we want to tie a token to a _specific variable_ in our code? Can we do + this in Rust's type system? + +- Motivation: We want to have a Token Type that represents a known, valid index + into a byte array. + + Once we have these proven indexes we would be able to avoid bounds checks + entirely, as the tokens would act as the _proof of an existing index_. + + Since the index is known to be valid, `get_proven()` can skip the bounds + check. + + In this example there's nothing stopping the proven index of one array being + used on a different array. If an index is out of bounds in this case, it is + undefined behavior. + +- Demonstrate: Uncomment the `data_2.get_proven(&token_1);` line. + + The code here panics! We want to prevent this "crossover" of token types for + indexes at compile time. + +- Ask: How might we try to do this? + + Expect students to not reach a good implementation from this, but be willing + to experiment and follow through on suggestions. + +- Ask: What are the alternatives, why are they not good enough? + + Expect runtime checking of index bounds, especially as both `Vec::get` and + `Bytes::get_index` already uses runtime checking. + + Runtime bounds checking does not prevent the erroneous crossover in the first + place, it only guarantees a panic. + +- The kind of token-association we will be doing here is called Branding. This + is an advanced technique that expands applicability of token types to more API + designs. + +- [`GhostCell`](https://plv.mpi-sws.org/rustbelt/ghostcell/paper.pdf) is a + prominent user of this, later slides will touch on it. + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/branded-02-phantomdata.md b/src/idiomatic/leveraging-the-type-system/token-types/branded-02-phantomdata.md new file mode 100644 index 000000000000..bec625170ef8 --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/branded-02-phantomdata.md @@ -0,0 +1,175 @@ +--- +minutes: 30 +--- + +# `PhantomData` and Lifetime Subtyping (Branding 2/4) + +Idea: + +- Use a lifetime as a unique brand for each token. +- Make lifetimes sufficiently distinct so that they don't implicitly convert + into each other. + + +```rust,editable +use std::marker::PhantomData; + +#[derive(Default)] +struct InvariantLifetime<'id>(PhantomData<&'id ()>); // The main focus + +struct Wrapper<'a> { value: u8, invariant: InvariantLifetime<'a> } + +fn lifetime_separator(value: u8, f: impl for<'a> FnOnce(Wrapper<'a>) -> T) -> T { + f(Wrapper { value, invariant: InvariantLifetime::default() }) +} + +fn try_coerce_lifetimes<'a>(left: Wrapper<'a>, right: Wrapper<'a>) {} + +fn main() { + lifetime_separator(1, |wrapped_1| { + lifetime_separator(2, |wrapped_2| { + // We want this to NOT compile + try_coerce_lifetimes(wrapped_1, wrapped_2); + }); + }); +} +``` + + +
+ + + +- In Rust, lifetimes can have subtyping relations between one another. + + This kind of relation allows the compiler to determine if one lifetime + outlives another. + + Determining if a lifetime outlives another also allows us to say _the shortest + common lifetime is the one that ends first_. + + This is useful in many cases, as it means two different lifetimes can be + treated as if they were the same in the regions they do overlap. + + This is usually what we want. But here we want to use lifetimes as a way to + distinguish values so we say that a token only applies to a single variable + without having to create a newtype for every single variable we declare. + +- **Goal**: We want two lifetimes that the rust compiler cannot determine if one + outlives the other. + + We are using `try_coerce_lifetimes` as a compile-time check to see if the + lifetimes have a common shorter lifetime (AKA being subtyped). + +- Note: This slide compiles, by the end of this slide it should only compile + when `subtyped_lifetimes` is commented out. + +- There are two important parts of this code: + - The `impl for<'a>` bound on the closure passed to `lifetime_separator`. + - The way lifetimes are used in the parameter for `PhantomData`. + +## `for<'a>` bound on a Closure + +- We are using `for<'a>` as a way of introducing a lifetime generic parameter to + a function type and asking that the body of the function to work for all + possible lifetimes. + + What this also does is remove some ability of the compiler to make assumptions + about that specific lifetime for the function argument, as it must meet rust's + borrow checking rules regardless of the "real" lifetime its arguments are + going to have. The caller is substituting in actual lifetime, the function + itself cannot. + + This is analogous to a forall (Ɐ) quantifier in mathematics, or the way we + introduce `` as type variables, but only for lifetimes in trait bounds. + + When we write a function generic over a type `T`, we can't determine that type + from within the function itself. Even if we call a function + `fn foo(first: T, second: U)` with two arguments of the same type, the + body of this function cannot determine if `T` and `U` are the same type. + + This also prevents _the API consumer_ from defining a lifetime themselves, + which would allow them to circumvent the restrictions we want to impose. + +## PhantomData and Lifetime Variance + +- We already know `PhantomData`, which can introduce a formal no-op usage of an + otherwise unused type or a lifetime parameter. + +- Ask: What can we do with `PhantomData`? + + Expect mentions of the Typestate pattern, tying together the lifetimes of + owned values. + +- Ask: In other languages, what is subtyping? + + Expect mentions of inheritance, being able to use a value of type `B` when a + asked for a value of type `A` because `B` is a "subtype" of `A`. + +- Rust does have Subtyping! But only for lifetimes. + + Ask: If one lifetime is a subtype of another lifetime, what might that mean? + + A lifetime is a "subtype" of another lifetime when it _outlives_ that other + lifetime. + +- The way that lifetimes used by `PhantomData` behave depends not only on where + the lifetime "comes from" but on how the reference is defined too. + + The reason this compiles is that the + [**Variance**](https://doc.rust-lang.org/stable/reference/subtyping.html#r-subtyping.variance) + of the lifetime inside of `InvariantLifetime` is too lenient. + + Note: Do not expect to get students to understand variance entirely here, just + treat it as a kind of ladder of restrictiveness on the ability of lifetimes to + establish subtyping relations. + + + +- Ask: How can we make it more restrictive? How do we make a reference type more + restrictive in rust? + + Expect or demonstrate: Making it `&'id mut ()` instead. This will not be + enough! + + We need to use a + [**Variance**](https://doc.rust-lang.org/stable/reference/subtyping.html#r-subtyping.variance) + on lifetimes where subtyping cannot be inferred except on _identical + lifetimes_. That is, the only subtype of `'a` the compiler can know is `'a` + itself. + + Note: Again, do not try to get the whole class to understand variance. Treat + it as a ladder of restrictiveness for now. + + Demonstrate: Move from `&'id ()` (covariant in lifetime and type), + `&'id mut ()` (covariant in lifetime, invariant in type), `*mut &'id mut ()` + (invariant in lifetime and type), and finally `*mut &'id ()` (invariant in + lifetime but not type). + + Those last two should not compile, which means we've finally found candidates + for how to bind lifetimes to `PhantomData` so they can't be compared to one + another in this context. + + Reason: `*mut` means + [mutable raw pointer](https://doc.rust-lang.org/reference/types/pointer.html#r-type.pointer.raw). + Rust has mutable pointers! But you cannot reason about them in safe rust. + Making this a mutable raw pointer to a reference that has a lifetime + complicates the compiler's ability subtype because it cannot reason about + mutable raw pointers within the borrow checker. + +- Wrap up: We've introduced ways to stop the compiler from deciding that + lifetimes are "similar enough" by choosing a Variance for a lifetime in + `PhantomData` that is restrictive enough to prevent this slide from compiling. + + That is, we can now create variables that can exist in the same scope as each + other, but whose types are automatically made different from one another + per-variable without much boilerplate. + +## More to Explore + +- The `for<'a>` quantifier is not just for function types. It is a + [**Higher-ranked trait bound**](https://doc.rust-lang.org/reference/subtyping.html?search=Hiher#r-subtype.higher-ranked). + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/branded-03-impl.md b/src/idiomatic/leveraging-the-type-system/token-types/branded-03-impl.md new file mode 100644 index 000000000000..e2b10b8b9672 --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/branded-03-impl.md @@ -0,0 +1,86 @@ +--- +minutes: 10 +--- + +# Implementing Branded Types (Branding 3/4) + +Constructing branded types is different to how we construct non-branded types. + +```rust +# use std::marker::PhantomData; +# +# #[derive(Default)] +# struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>); +struct ProvenIndex<'id>(usize, InvariantLifetime<'id>); + +struct Bytes<'id>(Vec, InvariantLifetime<'id>); + +impl<'id> Bytes<'id> { + fn new( + // The data we want to modify in this context. + bytes: Vec, + // The function that uniquely brands the lifetime of a `Bytes` + f: impl for<'a> FnOnce(Bytes<'a>) -> T, + ) -> T { + f(Bytes(bytes, InvariantLifetime::default()),) + } + + fn get_index(&self, ix: usize) -> Option> { + if ix < self.0.len() { Some(ProvenIndex(ix, InvariantLifetime::default())) } + else { None } + } + + fn get_proven(&self, ix: &ProvenIndex<'id>) -> u8 { + debug_assert!(ix.0 < self.0.len()); + unsafe { *self.0.get_unchecked(ix.0) } + } +} +``` + +
+ +- Motivation: We want to have "proven indexes" for a type, and we don't want + those indexes to be usable by different variables of the same type. We also + don't want those indexes to escape a scope. + + Our Branded Type will be `Bytes`: a byte array. + + Our Branded Token will be `ProvenIndex`: an index known to be in range. + +- There are several notable parts to this implementation: + - `new` does not return a `Bytes`, instead asking for "starting data" and a + use-once Closure that is passed a `Bytes` when it is called. + - That `new` function has a `for<'a>` on its trait bound. + - We have both a getter for an index and a getter for a values with a proven + index. + +- Ask: Why does `new` not return a `Bytes`? + + Answer: Because we need `Bytes` to have a unique lifetime controlled by the + API. + +- Ask: So what if `new()` returned `Bytes`, what is the specific harm that it + would cause? + + Answer: Think about the signature of that hypothetical `new()` method: + + `fn new<'a>() -> Bytes<'a> { ... }` + + This would allow the API user to choose what the lifetime `'a` is, removing + our ability to guarantee that the lifetimes between different instances of + `Bytes` are unique and unable to be subtyped to one another. + +- Ask: Why do we need both a `get_index` and a `get_proven`? + + Expect "Because we can't know if an index is occupied at compile time" + + Ask: Then what's the point of the proven indexes? + + Answer: Avoiding bounds checking while keeping knowledge of what indexes are + occupied specific to individual variables, unable to erroneously be used on + the wrong one. + + Note: The focus is not on only on avoiding overuse of bounds checks, but also + on preventing that "cross over" of indexes. + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/branded-04-in-action.md b/src/idiomatic/leveraging-the-type-system/token-types/branded-04-in-action.md new file mode 100644 index 000000000000..10201d9588aa --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/branded-04-in-action.md @@ -0,0 +1,102 @@ +--- +minutes: 15 +--- + +# Branded Types in Action (Branding 4/4) + +```rust,editable +use std::marker::PhantomData; + +#[derive(Default)] +struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>); +struct ProvenIndex<'id>(usize, InvariantLifetime<'id>); + +struct Bytes<'id>(Vec, InvariantLifetime<'id>); + +impl<'id> Bytes<'id> { + fn new( + // The data we want to modify in this context. + bytes: Vec, + // The function that uniquely brands the lifetime of a `Bytes` + f: impl for<'a> FnOnce(Bytes<'a>) -> T, + ) -> T { + f(Bytes(bytes, InvariantLifetime::default())) + } + + fn get_index(&self, ix: usize) -> Option> { + if ix < self.0.len() { + Some(ProvenIndex(ix, InvariantLifetime::default())) + } else { + None + } + } + + fn get_proven(&self, ix: &ProvenIndex<'id>) -> u8 { + self.0[ix.0] + } +} + +fn main() { + let result = Bytes::new(vec![4, 5, 1], move |mut bytes_1| { + Bytes::new(vec![4, 2], move |mut bytes_2| { + let index_1 = bytes_1.get_index(2).unwrap(); + let index_2 = bytes_2.get_index(1).unwrap(); + bytes_1.get_proven(&index_1); + bytes_2.get_proven(&index_2); + // bytes_2.get_proven(&index_1); // ❌🔨 + "Computations done!" + }) + }); + println!("{result}"); +} +``` + +
+ +- We now have the implementation ready, we can now write a program where token + types that are proofs of existing indexes cannot be shared between variables. + +- Demonstration: Uncomment the `bytes_2.get_proven(&index_1);` line and show + that it does not compile when we use indexes from different variables. + +- Ask: What operations can we perform that we can guarantee would produce a + proven index? + + Expect a "push" implementation, suggested demo: + + ```rust,compile_fail + fn push(&mut self, value: u8) -> ProvenIndex<'id> { + self.0.push(value); + ProvenIndex(self.0.len() - 1, InvariantLifetime::default()) + } + ``` + +- Ask: Can we make this not just about a byte array, but as a general wrapper on + `Vec`? + + Trivial: Yes! + + Maybe demonstrate: Generalising `Bytes<'id>` into `BrandedVec<'id, T>` + +- Ask: What other areas could we use something like this? + +- The resulting token API is **highly restrictive**, but the things that it + makes possible to prove as safe within the Rust type system are meaningful. + +## More to Explore + +- [GhostCell](https://plv.mpi-sws.org/rustbelt/ghostcell/paper.pdf), a structure + that allows for safe cyclic data structures in Rust (among other previously + difficult to represent data structures), uses this kind of token type to make + sure cells can't "escape" a context where we know where operations similar to + those shown in these examples are safe. + + This "Branded Types" sequence of slides is based off their `BrandedVec` + implementation in the paper, which covers many of the implementation details + of this use case in more depth as a gentle introduction to how `GhostCell` + itself is implemented and used in practice. + + GhostCell also uses formal checks outside of Rust's type system to prove that + the things it allows within this kind of context (lifetime branding) are safe. + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/mutex-guard.md b/src/idiomatic/leveraging-the-type-system/token-types/mutex-guard.md new file mode 100644 index 000000000000..fdd125def249 --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/mutex-guard.md @@ -0,0 +1,54 @@ +--- +minutes: 10 +--- + +# Token Types with Data: Mutex Guards + +Sometimes, a token type needs additional data. A mutex guard is an example of a +token that represents permission + data. + +```rust,editable +use std::sync::{Arc, Mutex, MutexGuard}; + +fn main() { + let mutex = Arc::new(Mutex::new(42)); + let try_mutex_guard: Result, _> = mutex.lock(); + if let Ok(mut guarded) = try_mutex_guard { + // The acquired MutexGuard is proof of exclusive access. + *guarded = 451; + } +} +``` + +
+ + + +- Mutexes enforce mutual exclusion of read/write access to a value. We've + covered Mutexes earlier in this course already (See: RAII/Mutex), but here + we're looking at `MutexGuard` specifically. + +- `MutexGuard` is a value generated by a `Mutex` that proves you have read/write + access at that point in time. + + `MutexGuard` also holds onto a reference to the `Mutex` that generated it, + with `Deref` and `DerefMut` implementations that give access to the data of + `Mutex` while the underlying `Mutex` keeps that data private from the user. + +- If `mutex.lock()` does not return a `MutexGuard`, you don't have permission to + change the value within the mutex. + + Not only do you have no permission, but you have no means to access the mutex + data unless you gain a `MutexGuard`. + + This contrasts with C++, where mutexes and lock guards do not control access + to the data itself, acting only as a flag that a user must remember to check + every time they read or manipulate data. + +- Demonstrate: make the `mutex` variable mutable then try to dereference it to + change its value. Show how there's no deref implementation for it, and no + other way to get to the data held by it other than getting a mutex guard. + +
diff --git a/src/idiomatic/leveraging-the-type-system/token-types/permission-tokens.md b/src/idiomatic/leveraging-the-type-system/token-types/permission-tokens.md new file mode 100644 index 000000000000..55a7e296a970 --- /dev/null +++ b/src/idiomatic/leveraging-the-type-system/token-types/permission-tokens.md @@ -0,0 +1,51 @@ +--- +minutes: 5 +--- + +# Permission Tokens + +Token types work well as a proof of checked permission. + +```rust,editable +mod admin { + pub struct AdminToken(()); + + pub fn get_admin(password: &str) -> Option { + if password == "Password123" { Some(AdminToken(())) } else { None } + } +} + +// We don't have to check that we have permissions, because +// the AdminToken argument is equivalent to such a check. +pub fn add_moderator(_: &admin::AdminToken, user: &str) {} + +fn main() { + if let Some(token) = admin::get_admin("Password123") { + add_moderator(&token, "CoolUser"); + } else { + eprintln!("Incorrect password! Could not prove privileges.") + } +} +``` + +
+ +- This example shows modelling gaining administrator privileges for a chat + client with a password and giving a user a moderator rank once those + privileges are gained. The `AdminToken` type acts as "proof of correct user + privileges." + + The user asked for a password in-code and if we get the password correct, we + get a `AdminToken` to perform administrator actions within a specific + environment (here, a chat client). + + Once the permissions are gained, we can call the `add_moderator` function. + + We can't call that function without the token type, so by being able to call + it at all all we can assume we have permissions. + +- Demonstrate: Try to construct the `AdminToken` in `main` again to reiterate + that the foundation of useful tokens is preventing their arbitrary + construction. + +