Skip to content

Conversation

dhil
Copy link
Member

@dhil dhil commented Jan 22, 2025

This patch populates the "Execution" section of the Explainer document with the reduction rules for stack switching.

Resolves #91.

This patch populates the "Execution" section of the Explainer document
with the reduction rules for stack switching.
@dhil dhil requested review from rossberg and tlively January 22, 2025 10:19
Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also make sense to have Maxime take a look, since he's mechanising this right now.

Copy link
Contributor

@mlegoupil mlegoupil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of these comments are either
(1) those already discussed on zoom on the 6th of February
(2) sidenotes about small insignificant differences between this and the Iris-WasmFX mechanisation. I do not think it is necessary to incorporate these comments into the explainer document but I figured I would share these details.

The main exception is the comment on line 856 which might well require our attention.


* `(prompt{<hdl>*} <instr>* end)` represents an active handler
- `(prompt{hdl*}? instr* end) : [t1*] -> [t2*]`
- iff `instr* : [t1*] -> [t2*]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation does not mention what typing context is used. Here, the body instr* of the prompt instruction should be typechecked under the empty context. This enforces that its body is closed, which is necessary since continuations live in the store and store objects should be closed.

The administrative structure `hdl` is defined as
```
hdl ::= (<tagaddr> $l) | (<tagaddr> switch)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resume instruction needs a list of tags, and prompt needs a list of (desugared) tag addresses. Hence we need to either define two separate notions hdl and hdlnew where hdl is as shown above and is used by prompt and hdlnew has tags instead of tag addresses and is used by resume; or we can keep one single hdl and allow it to either take tags or tag addresses as inputs. The former is the solution adopted by the Iris-WasmFX mechanisation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we define this as a separate syntactic class, I'd suggest to mirror the syntax of the index-based notation, i.e., keep the on.


* `S; F; v^n (ref.cont ca) (resume $ct hdl*) --> S'; F; prompt{hdl*} E[v^n] end`
- iff `S.conts[ca] = (E : n)`
- and `S' = S with conts[ca] = epsilon`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdl must be desugared here: on the LHS it contains tags and on the RHS it should contain tag addresses. The field F.tags in the frame converts one to the other.

* `S; F; v^m (ref.cont ca) (resume_throw $ct $e hdl*) --> S'; F; prompt{hdl*} E[v^m (throw $e)] end`
- iff `S.conts[ca] = (E : n)`
- and `S.tags[F.tags[$e]].type ~~ [t1^m] -> [t2*]`
- and `S' = S with conts[ca] = epsilon`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for resume: the list hdl must be desugared using F.tags


* `S; F; (prompt{hdl1* (ea $l) hdl2*} H^ea[v^n (suspend $e)] end) --> S'; F; v^n (ref.cont |S.conts|) (br $l)`
- iff `ea notin tagaddr(hdl1*)`
- and `ea = F.tags[$e]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, F is not always the right frame to use; instead the innermost frame in H should be used (in case H contains nested frame instructions). I can suggest two solutions. The first is to write a function innermost_frame that explores H and returns the innermost frame F_i from the innermost frame instruction in H; if no frame instruction is found, the function should return F (the current top-level frame). I find this tedious. The second solution is have two instructions suspend and suspend_desugared, the first taking a tag $e as an immediate argument, the second taking a tag address ea as an argument. Then the rule above should mention suspend_desugared ea instead of suspend $e, and we would need to add a reduction rule that reduces S; F; suspend $e --> S; F; suspend_desugared ea when F.tags[$e] = ea, conveniently using the closest frame F without needing to define a function that explores the context. For simplicity, we can also consider using a single instruction suspend that can take both a tag or a tag address as an immediate argument instead of two separate instructions. The Iris-WasmFX mechanisation uses two instructions as it is convenient to consider suspend as a basic instruction and suspend_desugared as an administrative instruction.

* `(ref.cont a)` represents a continuation value, where `a` is a *continuation address* indexing into the store's `conts` component
- `ref.cont a : [] -> [(ref $ct)]`
- iff `S.conts[a] = epsilon \/ S.conts[a] = (E : n)`
+ iff `E[val^n] : t2*`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two ways of doing this. The first is the one displayed here, where type-checking the ref.cont instructions requires typechecking the body of the continuation here and now. The other one (which is the one used in the Iris-WasmFX mechanisation) is to merely read a type annotation here; and instead add a clause to the (unshown here) store_typing predicate that describes a well-formed state, mandating that all continuations in the store must have a body that type-checks. From a theoretical point of view, I prefer the second solution. Besides, the second approach is the one used when typechecking the invoke administrative instruction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'd generally prefer the first option, since that is a more faithful reflection of the intended runtime representation that erases these types. We really want to know that this is sound, so ideally, even a mechanised soundness proof would model the store without introducing additional type information that may affect the result in subtle ways.

Invoke is different in that functions are already type-annotated in the source program, and these types are in fact kept around in real implementations (e.g., to perform link-time type-checks).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does typechecking the instructions inside the continuation require a type context C? If so, where does the context come from?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the WasmFX-Cert mechanisation in the Rocq proof assistant, we use the empty context. This is ultimately irrelevant since all continuations start off as a function call and the body of a function call is typechecked using a different typing context as per the typing rules of (plain) WebAssembly

- `S ::= {..., conts <cont>?*}`

* A continuation is a context annotated with its hole's arity
- `cont ::= (E : n)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: the Iris-WasmFX mechanisation stores more than just the arity n together with the context E, it stores the actual expected type t1* -> t2*. Transforming the presentation from the mechanisation to this one is simple (n = length(t1*)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is interesting. Is that merely for convenience (i.e., not having to guess the type non-deterministically in the proof), or would soundness actually break without fixing the types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that the logical relation can later have more to go on. The type soundness could be proved in a mechanisation that only decorates contexts with the arity n with minor changes to the proofs

- and `$ct ~~ cont $ft`
- and `$ft ~~ [t1^n] -> [t2*]`

* `(prompt{<hdl>*} <instr>* end)` represents an active handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: the Iris-WasmFX mechanisation adds one more immediate argument to the prompt instruction: the type t* expected for the body <instr>*. This is necessary to define the behaviour of the suspend instruction since the mechanisation stores each continuation together with its expected type t1* -> t2*. If the type annotation was not present in the prompt instruction, it would be impossible to know the return type t2* of the captured continuation when reducing suspend. It is easy to transform the presentation from the mechanisation into this one (just forget the type annonation).

- and `$ft ~~ [t1^n] -> [t2*]`

* `(prompt{<hdl>*} <instr>* end)` represents an active handler
- `(prompt{hdl*}? instr* end) : [t1*] -> [t2*]`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can [t1*] be non-empty? In the Iris-WasmFX mechanisation, this list is always empty… There is a mistake either here or in the mechanisation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, it's always empty, just like for other administrative block instructions.

- and `$ct2 ~~ cont $ft2`
- and `$ft2 ~~ [t1'^m] -> [t2'*]`
- and `S' = S with conts[ca] = epsilon`
- and `S'' = S' with conts += (H^ea : m)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: the Iris-WasmFX mechanisation does not yet have the switch instruction. I will add it shortly, but cannot at present comment on this reduction rule. However on a first glance, it appears that this reduction rule might suffer from the same issue as the suspend rule: the tag $e should be desugared not with frame F but the innermost frame of H.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the Iris-WasmFX mechanisation now has the switch instruction and type soundness has been proven. The issue using the correct frame when desugaring $e is indeed present

Copy link
Contributor

@mlegoupil mlegoupil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Iris-WasmFX now includes the switch instruction. I have added extra comments pertaining to this.

- and `([te2*] -> [t2*] <: $ft')*`

* `(a switch)` represents a tag-switch association
- `(a switch)` and `(S.tags[b].type ~~ [] -> [te2*])*`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines make no sense, perhaps it was meant "(a switch) : [t2*] iff S.tags[a].type ~~ [] -> [t2*]"? I'm not sure what the typing rule is meant to be…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is a typo/mistake. It should be the trivial typing rule as you noted.

- and `$ct2 ~~ cont $ft2`
- and `$ft2 ~~ [t1'^m] -> [t2'*]`
- and `S' = S with conts[ca] = epsilon`
- and `S'' = S' with conts += (H^ea : m)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the Iris-WasmFX mechanisation now has the switch instruction and type soundness has been proven. The issue using the correct frame when desugaring $e is indeed present

Incorporated comments from before
The biggest thing I am unsure of is lines 814 and 826. This pertains to what point the tag argument of suspend and switch is translated into a tag address. Doing it in the reduction rule for resume (as was the case in the previous version of Explainer.md) is wrong, for reasons explained in the comment I wrote at the time. I believe I was told in the actual implementation, the change happens during the validation phase, which is why I propose these lines 814 and 826; however my phrasing can perhaps be improved.
@mlegoupil
Copy link
Contributor

I have gone into the files to make all the modifications discussed in the comments previously. I believe the operational semantics is now fixed.

The only uncertainty I have is lines 814 and 826, where I think a better phrasing might better reflect the way the implementation works. The crux is that the suspend instruction takes a tag index as an immediate argument, but when reducing a suspension, the tags in the prompt instruction are tag addresses (into the store), not indices. When does this translation from addresses to indices happen? In the previous version of the explainer, this happened within the reduction rule for suspend itself using the wasm frame F from S, F, prompt H[suspend] but this is wrong, since context H may (and in practice, always will), contain a function call and hence the frame to use is not F but rather the innermost wasm frame in H. I can see two solutions, and if I remember correctly an oral conversation I had months ago with Sam and Andres, the second solution is the one used by the actual implementation. The first solution (the one implemented by the Rocq formalisation WasmFXCert) is to have two instructions suspend (available to the programmer, uses tag indices) and suspend.desugared (only exists at runtime, uses tag addresses) and a desugaring reduction rule (which can now use directly wasm frame F since the rule is focused on the suspend instruction itself rather than suspend inside some context H). The second solution is for the change from tag indices to tag addresses to happen during the validation (type-checking) phase. This is what I wrote on lines 814 and 826 but my phrasing might be too vague, probably because I am not certain myself of what exactly happens in practice.

Apart from that point, I believe all other changes I have made should be uncontroversial and the updated explainer.md is now fixed.

@tlively
Copy link
Member

tlively commented Aug 26, 2025

Real implementations will not know the tag addresses at validation or compile time because they will avoid generating different code for different instantiations of the same compiled module. The compiled code will only have the tag index, and it will have to look up the corresponding tag address on the instance at runtime. The desugaring solution therefore makes more sense to me.

Incorporating Thomas Lively's comment, I have rectified the spot where suspend and switch translate the tag index into a tag address, by creating new instructions suspend.addr and switch.addr that take addresses (rather than indices) as arguments. This allows for a rule focused on translating suspend to suspend.addr (and likewise for switch), enforcing that the correct wasm frame F is used.
@mlegoupil
Copy link
Contributor

With this latest commit, I now believe the explainer to be fully correct.

Thank you Thomas for your comment! What you said makes a lot of sense and I have now changed to the other solution I had suggested. Sam and I took the time to double-check that this behaviour is the one exhibited by the reference implementation.

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would things be slightly simpler if we didn't store the arity of the holes in the store, either? The arities should always be computable from instruction immediates.

#### Administrative instructions

* `(ref.cont a)` represents a continuation value, where `a` is a *continuation address* indexing into the store's `conts` component
- `ref.cont a : [] -> [(ref $ct)]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be many valid choices of $ct here, which violates our principal typing rules. Or are those not intended to apply to administrative instructions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossberg, can you answer this?

Comment on lines 936 to 938
- and `hdl'*` is obtained by translating the `<tagidx>` from `hdl*` into `<tagaddr>` using `F.tag`:
- if `on $a $l` is in `hdl*` and `F.tags[$e]=ea`, then `ea $l` is in `hdl'*`
- if `on $a switch` is in `hdl'*` and `F.tags[$e]=ea`, then `ea switch` is in `hdl'*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formulation doesn't seem to preserve the order of handlers, but the order can be important if there are multiple handlers for the same tag.

Also, I believe F.tags[$e] should be F.module.tags[$e]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For types, the spec defines a notion of inst_m(t) that substitutes all occurrences of type indices in t with respective defined types from the moduleinst m. We could generalise this notion to tag indices, then it would just be inst_F.module(hdl)* after generalising tagidx to taguse in the AST.

Copy link
Contributor

@Alan-Liang Alan-Liang Sep 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should F.tags[$e] be F.tags[$a]? $e comes from nowhere. (Also applies to resume_throw.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the formulation can be improved to make it explicit that the order of the handlers should be preserved. F.tags should be F.module.tags, and $e should be $a

* `S; F; (prompt{hdl1* (ea switch) hdl2*} H^ea[v^n (ref.cont ca) (switch.addr $ct ea)] end) --> S''; F; prompt{hdl1* (ea switch) hdl2*} E[v^n (ref.cont |S.conts|)] end`
- iff `S.conts[ca] = (E : n')`
- and `n' = 1 + n`
- and `ea notin tagaddr(hdl1*)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be (ea switch) notin hdl1*, I think. It's fine if there is an (ea $l) in hdl1*. suspend needs a similar fix.

Copy link
Member

@rossberg rossberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the index vs address issue: for types, an analogous problem exists. The way I addressed this without introducing whole lotta duplicated syntax and clumsy rules is by generalising type indices to "type uses" in the AST, which can be either indices or concrete types. During reduction, the indices are then substituted.

For tags we should do the same, that is, introduce taguse ::= tagidx | tagaddr and use that in appropriate places of the AST. Then instantiation again can simply perform substitution. (As @tlively says, this substitution cannot happen at compile time, but only after the tags are actually allocated. It would happen in the same places during reduction where you currently introduce the new syntax forms like suspend.addr and hdl'.)

Comment on lines 936 to 938
- and `hdl'*` is obtained by translating the `<tagidx>` from `hdl*` into `<tagaddr>` using `F.tag`:
- if `on $a $l` is in `hdl*` and `F.tags[$e]=ea`, then `ea $l` is in `hdl'*`
- if `on $a switch` is in `hdl'*` and `F.tags[$e]=ea`, then `ea switch` is in `hdl'*`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For types, the spec defines a notion of inst_m(t) that substitutes all occurrences of type indices in t with respective defined types from the moduleinst m. We could generalise this notion to tag indices, then it would just be inst_F.module(hdl)* after generalising tagidx to taguse in the AST.

label_n{instr*} H^ea end
frame_n{F} H^ea end
catch{...} H^ea end
prompt{hdl*} H^ea end (iff ea notin tagaddr(hdl*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this change, you need to first compute the set of free tag addresses.

@tlively
Copy link
Member

tlively commented Sep 19, 2025

@mlegoupil, will you have time to push this over the finish line soon?

- `S ::= {..., conts <cont>?*}`

* A continuation is a context annotated with its hole's arity
- `cont ::= (E : n)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bikeshedding: should this be called continst instead of cont? Other components in the store are all named like fooinst.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, continst would be more consistent.

mlegoupil and others added 5 commits October 2, 2025 14:11
Committing tlively's suggested change

Co-authored-by: Thomas Lively <[email protected]>
Committing tlively's suggested change

Co-authored-by: Thomas Lively <[email protected]>
Applied all suggested changes except taguse which is the last remaining issue
@mlegoupil
Copy link
Contributor

I have now incorporated all the changes mentioned above. From where I stand the PR is ready to be merged.

#### Administrative instructions

* `(ref.cont a)` represents a continuation value, where `a` is a *continuation address* indexing into the store's `conts` component
- `ref.cont a : [] -> [(ref $ct)]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rossberg, can you answer this?

Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I still wonder if we can simplify the rules (and make them better match implementations) by computing arities from immediates where they are needed rather than by looking them up and storing them in the store. Happy to discuss or consider that as a follow-up.

@slindley slindley merged commit 389f6cc into WebAssembly:main Oct 3, 2025
@slindley
Copy link
Collaborator

slindley commented Oct 3, 2025

Looks good to me, but I still wonder if we can simplify the rules (and make them better match implementations) by computing arities from immediates where they are needed rather than by looking them up and storing them in the store. Happy to discuss or consider that as a follow-up.

We can still bikeshed details in further issues / PRs, but now that we've converged on something coherent I've merged it into main.

@dhil dhil deleted the reduction-semantics branch October 6, 2025 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Put the reduction rules in the explainer
7 participants