-
Notifications
You must be signed in to change notification settings - Fork 1.9k
"Token Types" chapter of Idiomatic Rust #2921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b7e8128
1a7d356
d7f3984
36c07e3
ba2ceda
400c336
a70aa6b
29872e3
bc6abb0
aa3b402
d47ca92
a23df16
dfebb37
6c2157d
146a30f
63e40dc
1adee3c
a680dd8
36da55f
af6523c
a7d0d76
46b6b35
c6160b9
ae5f961
c3aa869
7267b17
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
--- | ||
minutes: 15 | ||
--- | ||
|
||
# Token Types | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Types with private constructors can be used to act as proof of invariants. | ||
|
||
<!-- dprint-ignore-start --> | ||
```rust,editable | ||
pub mod token { | ||
// A public type with private fields behind a module boundary. | ||
pub struct Token { proof: () } | ||
|
||
pub fn get_token() -> Option<Token> { | ||
Some(Token { proof: () }) | ||
} | ||
} | ||
|
||
pub fn protected_work(token: token::Token) { | ||
println!("We have a token, so we can make assumptions.") | ||
} | ||
|
||
fn main() { | ||
if let Some(token) = token::get_token() { | ||
// We have a token, so we can do this work. | ||
protected_work(token); | ||
} else { | ||
// We could not get a token, so we can't call `protected_work`. | ||
} | ||
} | ||
``` | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<!-- dprint-ignore-end --> | ||
|
||
<details> | ||
|
||
- Motivation: We want to be able to restrict user's access to functionality | ||
until they've performed a specific task. | ||
|
||
We can do this by defining a type the API consumer cannot construct on their | ||
own, through the privacy rules of structs and modules. | ||
|
||
[Newtypes](./newtype-pattern.md) use the privacy rules in a similar way, to | ||
restrict construction unless a value is guaranteed to hold up an invariant at | ||
runtime. | ||
|
||
- Ask: What is the purpose of the `proof: ()` field here? | ||
|
||
Without `proof: ()`, `Token` would have no private fields and users would be | ||
able to construct values of `Token` arbitrarily. | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Demonstrate: Try to construct the token manually in `main` and show the compilation error. | ||
Demonstrate: Remove the | ||
`proof` field from `Token` to show how users would be able to construct | ||
`Token` if it had no private fields. | ||
|
||
- By putting the `Token` type behind a module boundary (`token`), users outside | ||
that module can't construct the value on their own as they don't have | ||
permission to access the `proof` field. | ||
|
||
The API developer gets to define methods and functions that produce these | ||
tokens. The user does not. | ||
|
||
The token becomes a proof that one has met the API developer's conditions of | ||
access for those tokens. | ||
|
||
- Ask: How might an API developer accidentally introduce ways to circumvent | ||
this? | ||
|
||
Expect answers like "serialization implementations", other parser/"from | ||
string" implementations, or an implementation of `Default`. | ||
|
||
</details> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
minutes: 10 | ||
--- | ||
|
||
# Variable-Specific Tokens (Branding 1/4) | ||
|
||
What if we want to tie a token to a specific variable? | ||
|
||
```rust,editable | ||
struct Bytes { | ||
bytes: Vec<u8>, | ||
} | ||
struct ProvenIndex(usize); | ||
|
||
impl Bytes { | ||
fn get_index(&self, ix: usize) -> Option<ProvenIndex> { | ||
if ix < self.bytes.len() { Some(ProvenIndex(ix)) } else { None } | ||
} | ||
fn get_proven(&self, token: &ProvenIndex) -> u8 { | ||
self.bytes[token.0] | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
|
||
fn main() { | ||
let data_1 = Bytes { bytes: vec![0, 1, 2] }; | ||
if let Some(token_1) = data_1.get_index(2) { | ||
data_1.get_proven(&token_1); // Works fine! | ||
|
||
// let data_2 = Bytes { bytes: vec![0, 1] }; | ||
// data_2.get_proven(&token_1); // Panics! How do we prevent this at compile time? | ||
} | ||
} | ||
``` | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<details> | ||
|
||
- What if we want to tie a token to a _specific variable_ in our code? Can we do | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
this in Rust's type system? | ||
|
||
- Motivation: We want to have a Token Type that represents a known, valid index | ||
into a byte array. | ||
|
||
In this example there's nothing stopping the proven index of one array being | ||
used on a different array. | ||
|
||
- Demonstrate: Uncomment the `data_2.get_proven(&token_1);` line. | ||
|
||
The code here panics! We want to prevent this "crossover" of token types for | ||
indexes at compile time. | ||
|
||
- Ask: How might we try to do this? | ||
|
||
Expect students to not reach a good implementation from this, but be willing | ||
to experiment and follow through on suggestions. | ||
|
||
- Ask: What are the alternatives, why are they not good enough? | ||
|
||
Expect runtime checking of index bounds, especially as `get_index` already | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
uses runtime checking. | ||
|
||
Runtime bounds checking does not prevent the erroneous crossover in the | ||
first place, it only guarantees a panic. That erroneous checking | ||
|
||
- The kind of token-association we will be doing here is called Branding. | ||
This is an advanced technique that expands applicability of token types to more API designs. | ||
|
||
- [`GhostCell`](https://plv.mpi-sws.org/rustbelt/ghostcell/paper.pdf) is a | ||
prominent user of this, later slides will touch on it. | ||
|
||
</details> | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
minutes: 20 | ||
--- | ||
|
||
# `PhantomData` and Lifetime Subtyping (Branding 2/4) | ||
|
||
Idea: | ||
- Use a lifetime as a unique brand for each token. | ||
- Make lifetimes sufficiently distinct so that they don't implicitly convert into each other. | ||
|
||
<!-- dprint-ignore-start --> | ||
```rust,editable | ||
use std::marker::PhantomData; | ||
|
||
#[derive(Default)] | ||
struct InvariantLifetime<'id>(PhantomData<&'id ()>); // The main focus | ||
|
||
struct Wrapper<'a> { value: u8, invariant: InvariantLifetime<'a> } | ||
|
||
fn lifetime_separator<T>(value: u8, f: impl for<'a> FnOnce(Wrapper<'a>) -> T) -> T { | ||
f(Wrapper { value, invariant: InvariantLifetime::default() }) | ||
} | ||
|
||
fn compare_lifetimes<'a>(left: Wrapper<'a>, right: Wrapper<'a>) {} | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
fn main() { | ||
lifetime_separator(1, |wrapped_1| { | ||
lifetime_separator(2, |wrapped_2| { | ||
// We want this to NOT compile | ||
compare_lifetimes(wrapped_1, wrapped_2); | ||
}); | ||
}); | ||
} | ||
``` | ||
<!-- dprint-ignore-end --> | ||
|
||
<details> | ||
|
||
<!-- TODO: Link back to PhantomData in the borrowck invariants chapter. | ||
- We saw `PhantomData` back in the Borrow Checker Invariants chapter. | ||
--> | ||
|
||
- **Goal**: We want two lifetimes that the rust compiler cannot determine if one | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe step back a little and say that Rust defines a subtyping relationship between references, and references with longer lifetimes implicitly convert into shorter lifetimes. This also happens when a user-defined type has a lifetime parameter, Rust defines a subtyping relationship. I know you're talking more about subtyping later, but maybe a brief mention upfront would be good. I would like you to consider how you'd introduce the flow from the idea ("Use a lifetime as a unique brand for each token") to the suggested solution (disable subtyping) in the shortest way possible when first introducing the slide. ... thus our goal: eliminate that subtying relationship, in order for each Token's lifetime to be unique and incompatible with every other Token. |
||
outlives the other. | ||
|
||
We are using `compare_lifetimes` as a compile-time check to see if the | ||
lifetimes are being subtyped. | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- Note: This slide compiles, by the end of this slide it should only compile | ||
when `subtyped_lifetimes` is commented out. | ||
|
||
- There are two important parts of this code: | ||
- The `impl for<'a>` bound on the closure passed to `lifetime_separator`. | ||
- The way lifetimes are used in the parameter for `PhantomData`. | ||
|
||
- `for<'a> [trait bound]` is a way of introducing a new lifetime variable to a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know I asked for a direct explanation, but I think we can customize it a bit here. I'd rather refer to "function types" rather than "trait bounds" here (even though that's technically true, but a bit too abstract). Once you start talking about the types of functions, you can customize the rest of the explanation. What's the type of sqrt? Maybe this is worth putting on a separate slide. |
||
trait bound and asking that the trait bound be true for all instances of that | ||
new lifetime variable. | ||
|
||
This is analogous to a forall (Ɐ) quantifier in mathematics, or the way we | ||
introduce `<T>` as type variables, but only for lifetimes in trait bounds. | ||
|
||
What it also does is remove some ability of the compiler to make assumptions | ||
about that specific lifetime, as this `for<'a>` trait bound asks that the | ||
bound hold true for all possible lifetimes. This makes comparing that bound | ||
lifetime to other lifetimes slightly more difficult. | ||
|
||
This is a | ||
[**Higher-ranked trait bound**](https://doc.rust-lang.org/reference/subtyping.html?search=Hiher#r-subtype.higher-ranked). | ||
|
||
- We already know `PhantomData`, which we can use to capture unused type or | ||
lifetime parameters to make them "used." | ||
|
||
- Ask: What can we do with `PhantomData`? | ||
|
||
Expect mentions of the Typestate pattern, tying together the lifetimes of | ||
owned values. | ||
|
||
- Ask: In other languages, what is subtyping? | ||
|
||
Expect mentions of inheritance, being able to use a value of type `B` when a | ||
asked for a value of type `A` because `B` is a "subtype" of `A`. | ||
|
||
- Rust does have Subtyping! But only for lifetimes. | ||
|
||
Ask: If one lifetime is a subtype of another lifetime, what might that mean? | ||
|
||
A lifetime is a "subtype" of another lifetime when it _outlives_ that other | ||
lifetime. | ||
|
||
- The way that lifetimes captured by `PhantomData` behave depends not only on | ||
where the lifetime "comes from" but on how the reference is defined too. | ||
|
||
The reason this compiles is that the | ||
[**Variance**](https://doc.rust-lang.org/stable/reference/subtyping.html#r-subtyping.variance) | ||
of the lifetime captured by `InvariantLifetime` is too lenient. | ||
|
||
<!-- Note: We've been using "invariants" in this module in a specific way, but subtyping introduces _invariant_, _covariant_, and _contravariant_ as specific terms. --> | ||
|
||
- Ask: How can we make it more restrictive? | ||
|
||
Expect or demonstrate: Making it `&'id mut ()` instead. This will not be | ||
enough! | ||
|
||
We need to use a | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a very complex topic. What's your intent? Mention it for those who are in the know, or ensure that everyone in the audience understands it? If it is the former (given that this slide is not introducing this concept, I think it must be the former), I think we need to mention this in the notes for the instructor that they should not try to make the whole class understand variance. But is it really an option, given what follows? WDYT? |
||
[**Variance**](https://doc.rust-lang.org/stable/reference/subtyping.html#r-subtyping.variance) | ||
on lifetimes where subtyping cannot be inferred except on _identical | ||
lifetimes_. That is, the only subtype of `'a` the compiler can know is `'a` | ||
itself. | ||
|
||
Demonstrate: Move from `&'id ()` (covariant in lifetime and type), | ||
`&'id mut ()` (covariant in lifetime, invariant in type), `*mut &'id mut ()` | ||
(invariant in lifetime and type), and finally `*mut &'id ()` (invariant in | ||
lifetime but not type). | ||
|
||
Those last two should not compile, which means we've finally found candidates | ||
for how to bind lifetimes to `PhantomData` so they can't be compared to one | ||
another in this context. | ||
|
||
- Wrap up: We've introduced ways to stop the compiler from deciding that | ||
lifetimes are "similar enough" by choosing a Variance for a lifetime captured | ||
in `PhantomData` that is restrictive enough to prevent this slide from | ||
compiling. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a final summary sentence in even simpler English, something like "thus, we now can create token values with unique lifetimes, and one token does not implicitly convert to any other" |
||
|
||
</details> | ||
tall-vase marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
minutes: 10 | ||
--- | ||
|
||
# Implementing Branded Types (Branding 3/4) | ||
|
||
Constructing branded types is different to how we construct non-branded types. | ||
|
||
```rust | ||
# use std::marker::PhantomData; | ||
# | ||
# #[derive(Default)] | ||
# struct InvariantLifetime<'id>(PhantomData<*mut &'id ()>); | ||
struct ProvenIndex<'id>(usize, InvariantLifetime<'id>); | ||
|
||
struct Bytes<'id>(Vec<u8>, InvariantLifetime<'id>); | ||
|
||
impl<'id> Bytes<'id> { | ||
fn new<T>( | ||
// The data we want to modify in this context. | ||
bytes: Vec<u8>, | ||
// The function that uniquely brands the lifetime of a `Bytes` | ||
f: impl for<'a> FnOnce(Bytes<'a>) -> T, | ||
) -> T { | ||
f(Bytes(bytes, InvariantLifetime::default()),) | ||
} | ||
|
||
fn get_index(&self, ix: usize) -> Option<ProvenIndex<'id>> { | ||
if ix < self.0.len() { Some(ProvenIndex(ix, InvariantLifetime::default())) } | ||
else { None } | ||
} | ||
|
||
fn get_proven(&self, ix: &ProvenIndex<'id>) -> u8 { self.0[ix.0] } | ||
} | ||
``` | ||
|
||
<details> | ||
|
||
- Motivation: We want to have "proven indexes" for a type, and we don't want | ||
those indexes to be usable by different variables of the same type. We also | ||
don't want those indexes to escape a scope. | ||
|
||
Our Branded Type will be `Bytes`: a byte array. | ||
|
||
Our Branded Token will be `ProvenIndex`: an index known to be in range. | ||
|
||
- There are several notable parts to this implementation: | ||
- `new` does not return a `Bytes`, instead asking for "starting data" and a | ||
use-once Closure that is passed a `Bytes` when it is called. | ||
- That `new` function has a `for<'a>` on its trait bound. | ||
- We have both a getter for an index and a getter for a values with a proven | ||
index. | ||
|
||
- Ask: Why does `new` not return a `Bytes`? | ||
|
||
Answer: Because we need `Bytes` to have a unique lifetime. | ||
|
||
- Ask: Why do we need both a `get_index` and a `get_proven`? | ||
|
||
Expect "Because we can't know if an index is occupied at compile time" | ||
|
||
Ask: Then what's the point of the proven indexes? | ||
|
||
Answer: The throughline of preventing proven indexes "crossing over" to arrays | ||
of the same type, causing panics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd rather motivate them by avoiding repeated bounds checks. Maybe this should be explained on the branded-01 slide? |
||
|
||
Note: The focus is not on avoiding overuse of bounds checks, but instead on | ||
preventing that "cross over" of indexes. | ||
|
||
</details> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this content already exists in the main branch? Please rebase to hide the spurious diff.