Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Motivation for externref <: anyref #143

Closed
RossTate opened this issue Sep 26, 2020 · 35 comments
Closed

Motivation for externref <: anyref #143

RossTate opened this issue Sep 26, 2020 · 35 comments

Comments

@RossTate
Copy link
Contributor

#130 and #142 identify some disadvantages to having externref be a subtype of anyref. I am wondering what the advantages are. Note that, after upcasting externrefs to anyrefs, there is no reliable way to downcast the resulting anyrefs back to externrefs, making the subtyping a one-way street. Consequently, it seems to me that a module mixing externrefs with its own values (uniformly represented as anyrefs) would want to box the externrefs in some manner, both to provide a way to later unbox them and to prevent bugs that could be caused by unintended/unanticipated overlaps between externref values and the module's own values. This boxing pattern would not need externref to be a subtyping of anyref. And if the boxing pattern is indeed common and we leave externref as a subtype of anyref, then that would mean that foreign references would first get boxed to become externrefs that are compatible with anyrefs, which would then get boxed again by the application, and then the reverse for unboxing.

@rossberg
Copy link
Member

If extern references weren't interchangeable with GC references, then we'd be violating a basic design goal of Wasm, namely that imports can equally be implemented by another Wasm module or primitively by the host, and that imported modules can be both host-implemented or self-hosted, and the difference is transparent. Frameworks like WASI make use of this ability, which is just a basic abstraction principle.

@jakobkummerow
Copy link
Contributor

I suppose an example would be a module that's built to run in a browser context, and tracks (opaque imported extern) references to HTMLElement and various subtypes thereof; and then you might want to run that module in a headless (server?) environment where DOM interaction is stubbed out by a "fake DOM" module, which would have to be able to provide its own structs/arrays as "fake externrefs".

It took me a while to see this reasoning, but the example above I find reasonably convincing as something we'd want to support.

(I suppose the question one could ask is: how important is it to have support for subtyping hierarchies among imported opaque types? For a human-written language, one would absolutely want that; for a compiler-generated wire format like Wasm, I'm not so sure: X-to-Wasm compilers could track such types internally in order to detect typing errors early; emitting them into the Wasm module doesn't seem essential for that functionality.)

@RossTate
Copy link
Contributor Author

If extern references weren't interchangeable with GC references, then we'd be violating a basic design goal of Wasm, namely that imports can equally be implemented by another Wasm module or primitively by the host, and that imported modules can be both host-implemented or self-hosted, and the difference is transparent.

@rossberg This statement seems to be predicated on the assumption that all imported types need to be a subtype of anyref to be useful. Is that true? I am asking why that subtyping helps with externref.

@jakobkummerow This seems to be an argument for why we would want to support subtyping between imported types, which is something I agree with. But I do not see how this argues for the specific subtyping externref <: anyref.

@tlively
Copy link
Member

tlively commented Sep 28, 2020

we'd be violating a basic design goal of Wasm, namely that imports can equally be implemented by another Wasm module or primitively by the host... Frameworks like WASI make use of this ability, which is just a basic abstraction principle.

Actually, this goal is already violated by all existing proposals as far as WASI is concerned because they need to be able to implement their imported capabilities in terms of number types, not GC types. Fixing that is a separate discussion, but I do want to push back on the assertion that WASI depends on externref <: anyref.

Edit: Responding to @tschneidereit's later question inline up here to avoid derailing discussion further.

the intent for WASI is to use reference types instead of numbers to represent capabilities

Yes, the intent for the interface presented to client modules is that capabilities are reference types (or more specifically, opaque type imports), but for the virtualization use case, it would be much better if the virtualizing modules implementing that interface could use normal types (i.e. i32) that could be manipulated natively in their source language. If both sides of the interface were required to use reference types to represent capabilities, it would be impossible to implement a WASI interface in Rust, C, or C++ (without extremely intrusive language extensions).

@jakobkummerow
Copy link
Contributor

predicated on the assumption that all imported types need to be a subtype of anyref to be useful

I see an implication in the other direction: starting with viewing "externref" as an entirely opaque reference, where a module doesn't know or care what such a reference might be referring to, implies in particular that it might be a reference to another module's GC type.
At the very least, that causes engines to have to use the same representation internally (because when the first module is compiled, the engine can't know yet what imports it will be instantiated with).
As far as the type system is concerned: such a second module needs a way to cast its own GC types to the first module's externref. I suppose we could put special magic into the cross-module type interaction system to let it bridge an otherwise un-castable divide between distinct type hierarchies. But then arguably it is cleaner and simpler to just say that there is no such divide.

@RossTate
Copy link
Contributor Author

@jakobkummerow I'm not entirely sure I understand your argument, so apologies if this response is misdirected. That said, here are a few thoughts.

  1. externref does not need to be a subtype of anyref for it to be possible to instantiate an imported type with externref and with a wasm reference type without recompilation. From a GC perspective, what matters is that they are consistent with which bits represent integers versus addresses (which they currently are in V8) and with how the GC description is represented (which they currently are in V8). So, without the subtyping, it's perfectly possible to use Smis for externref and i31ref for anyref and yet use either representation for an imported type.

  2. This requirement that we be able to instantiate an imported type differently without requiring recompilation has not been particularly motivated. For practical purposes, all that seems to be useful is for multiple instantiations with closely related types to be able to reuse the same compiled binary. Closely related does not necessarily mean subtypes; it could be something abstract like "reference types".

  3. To @tlively's point, it could be more useful for externref or imported types to be able to represent non-reference types.

From an application's perspective, I suspect applications will fall into one of two camps: the application is willing to rely on or trust that the imported type follows some convention, or it is not. In the latter case, even if an imported type (or externref) is a subtype of anyref, the application will box and unbox externref values manually in order to keep foreign values distinct from their own. In the former case, the application's needs are likely met by importing coercions between the imported type and the type representing the convention (whether it is anyref or some more specific type). The imported coercion from the imported type to the convention type could just be an upcast (i.e. a no-op), and the reverse direction could just be a down cast (i.e. an rtt cast). Either way, subtyping does not seem particularly necessary/helpful. Actually, there's a third camp: the application makes no use of the GC proposal, e.g. all current applications. Pragmatically speaking, though, this camp is akin to the boxing/unboxing camp.

@tschneidereit
Copy link
Member

Actually, this goal is already violated by all existing proposals as far as WASI is concerned because they need to be able to implement their imported capabilities in terms of number types, not GC types.

Can you expand on this a bit? I'm probably missing something, since the intent for WASI is to use reference types instead of numbers to represent capabilities. Wherever numbers are needed I'd expect WASI to stick to Wasm core types.

@RossTate
Copy link
Contributor Author

RossTate commented Sep 28, 2020

In order to avoid going into the specifics of WASI, I think the meta-point is that a good compositional/modular design would not require us to consider WASI's implementation strategy nor would impose constraints on WASI's implementation strategy. Removing the externref <: anyref (or $import-type <: anyref) subtyping removes the dependency, which is why it would be nice to have a solid motivation for having that subtyping.

@rossberg
Copy link
Member

rossberg commented Oct 1, 2020

@tlively, I am a bit astounded but what you're saying. The WASI (and WASI-like) use case was the main motivation for the CG to ask to split type imports from the GC proposal. So it is highly relevant to this question. And the ability to transparently virtualise/poly-fill/self-host system modules (and vice versa) has always been a goal for Wasm and came up regularly in CG discussions. How do you suggest reconciling this goal without making extern refs and Wasm refs interchangeable?

@RossTate
Copy link
Contributor Author

RossTate commented Oct 1, 2020

@rossberg The OP is very clearly questioning (with arguments) whether the subtyping in question is practically useful for this interchangeability. It is frustrating to have you equate removing the subtyping in question with removing interchangeability without addressing, let alone acknowledging, the points in the OP. A much more helpful way to advance the conversation would be to give a concrete example illustrating why you believe these to be effectively the same. If they're obviously effectively the same, then providing such an example should be easy. Ideally this example should not just as easily work with importing coercions, since coercions do not have the downsides of #130 and #142.

@rossberg
Copy link
Member

rossberg commented Oct 1, 2020

The need for interchangeable representations does not generally imply a need for downcasts, especially in the type import scenario, which is essentially parametric polymorphism. So my observation here is that we gain nothing by assuming a distinction in the subtype relation -- the representations already have to be compatible anyway, and casting is a separate concern.

Even when using subtyping and downcasts, the side for which the ability to downcast primarily matters is the one producing the values. And that side always has a choice to make that possible. If it is Wasm code, it can create values with appropriate RTTs. If it is the host, then it has full freedom to use extra-linguistic means and enrich its own object representation to allow recognising their types.

Furthermore, a host environment also has the choice of enabling custom object layouts which the Wasm-side can view as RTT-carrying. In general, it can be useful to enable Wasm to downcast host objects to concrete import types, by importing a respective host RTT. OTOH, it is hardly useful to downcast to an abstract type like externref itself. (FWIW, in discussions with @jakobkummerow and @tebbi we recently concluded that the RTT mechanism and the actual cast instruction only needs to work for concrete data types, and I am working on a change to simplify the proposal that way. More on this later.)

@RossTate
Copy link
Contributor Author

RossTate commented Oct 1, 2020

It sounds like you're saying you have no application for externref <: anyref. Rather, your argument for the subtyping is "the representations already have to be compatible anyway". This argument was already addressed above (point 1 specifically), and the OP discussed the costs imposed by having an unnecessary subtyping (e.g. double boxing).

@tlively
Copy link
Member

tlively commented Oct 1, 2020

@tlively, I am a bit astounded but what you're saying. The WASI (and WASI-like) use case was the main motivation for the CG to ask to split type imports from the GC proposal.

AFAICT, WASI needs non-reference type imports to realize their goal of virtualizing interfaces that deal in abstract types from C/C++/Rust, since it is unlikely that Rust will ever have first-class support for reference-typed values. I thought I had raised this issue before, but I guess not because now I can't find any discussion about it.

At any rate, I think the answer to the original question here is clear: given the currently proposed type-import mechanism, externref <: anyref is necessary to make externrefs and GC refs interchangeable. Any dissatisfaction with the premise of that answer should be taken up in separate issues on the type-imports repo.

@titzer
Copy link
Contributor

titzer commented Oct 2, 2020

@RossTate

Note that, after upcasting externrefs to anyrefs, there is no reliable way to downcast the resulting anyrefs back to externrefs, making the subtyping a one-way street.

This isn't really different than any other subtyping relationship currently proposed for references: upcasts are free (subsumption, no representation change) and downcasts require an RTT witness value that has an RTT type which literally encodes a nominal subtyping chain. The GC proposal doesn't have RTT types for externref, but I don't understand why it couldn't have one, whose witness value could be gotten with rtt.canon externref. Failing that, one can always import a downcast operation (i.e. a function import) from the host environment.

@titzer
Copy link
Contributor

titzer commented Oct 2, 2020

@RossTate

Consequently, it seems to me that a module mixing externrefs with its own values (uniformly represented as anyrefs) would want to box the externrefs in some manner, both to provide a way to later unbox them and to prevent bugs that could be caused by unintended/unanticipated overlaps between externref values and the module's own values.

Sorry for splitting replies across multiple comments.

I don't find arguments of the form "modules will definitely want to do X because reasons Y" compelling in general. That is because we cannot realistically envision all possible use cases or why a module producer would do a particular thing. We shouldn't put producers in a box (pun intended). What that means is that we shouldn't be in the business of forcing modules to do specific things to get around Wasm's inadequacy, in particular, box things (i.e. leaving them no options except boxing). Forcing modules to box values is a far worse sin that forcing dynamic casts (both here, and in other discussions such as how we deal with dispatch sequences). Boxing leads to a lot of garbage production that is a lot more expensive in total than dynamic casts. Especially because engines that inline a lot can do type analysis to eliminate casts in many cases but boxes escape into user data structures and cannot be optimized away so easily. Concretely, forcing modules to box externref into a user-created box in order to mix them with other references wastes space. Wasm data structures that hold references to JavaScript objects will be penalized and can be less efficient than data structures built in JavaScript. Forcing a box here is a hard fail.

The point of putting anyref at the top of a subtyping hierarchy of all references is to provide programs a representational type that allows them to build data structures that hold values that are references to any value whose representation is a reference. That includes values from the host environment that happen to be references. Moreover, that happen to be references that the engine is able to distinguish from Wasm objects by their bits alone. It's right there in the name. externref is for external references. It's not for external values, just external references.

The only way forward for actually representing fully abstract external values is to allow type imports that have unknown representation, i.e. allow true parametric polymorphism. To support this, we need an embedding API mechanism for staging, because engines simply cannot compile machine code for imported types until the machine representation of imported types is known. If we can stage imports so that types' representations are known at compile time, engines can produce machine code without even having the full type import yet.

There are a lot of good arguments for fully abstract type imports and there are discussion over on that repo, e.g. this one I raised: WebAssembly/proposal-type-imports#10.

In summary, I now no longer see any problems with externref's "arbitrary bits" getting mixed up with Wasm reference bit patterns. externrefs cannot be arbitrary bits.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 2, 2020

The GC proposal doesn't have RTT types for externref, but I don't understand why it couldn't have one, whose witness value could be gotten with rtt.canon externref. Failing that, one can always import a downcast operation (i.e. a function import) from the host environment.

Programs cannot assume externref values have the rtt.canon externref rtt. For example, in any JS API that is not boxing JS/wasm reference values at the JS/wasm boundary, all anyref values are also externref values. Note that this makes an imported downcaster fairly useless, at least if your program cares about correctness.

Realistically speaking, every GC language will box externref values in order to conform to the invariants of their own values, whether those invariants are to have a v-table or to have an int type-id header or to conform to a particular rtt convention. In all these cases, the externref <: anyref subtyping is not helpful. On the other hand, this subtyping has costs, as discussed above, and restricts the values that externref can represent. Ironically, the languages that are best positioned to reason about externref values directly (without boxing them) are non-GC languages, which also have no need for these values to be references.

As @tlively points out, the need for this subtyping stems from the limitation of the Type Imports proposal that requires all imported types to be subtypes of anyref. (@tlively, the discussion you were looking for is in WebAssembly/type-imports#6.) As @titzer points out, that limitation stems from limitations in the compilation model. At the in-person meeting, I pointed out that limitations in the compilation model also preclude common abstractions used for encapsulation and separate compilation like private fields and structure extension (which reappeared in #102). So I wonder if a real issue underlying various discussions is actually that the type imports and GC proposals have been designed for a compilation model that does not particularly serve type imports or GC languages particularly well.

@titzer
Copy link
Contributor

titzer commented Oct 2, 2020

Programs cannot assume externref values have the rtt.canon externref rtt. For example, in any JS API that is not boxing JS/wasm reference values at the JS/wasm boundary, all anyref values are also externref values. Note that this makes an imported downcaster fairly useless, at least if your program cares about correctness.

The RTT for externref is obviously a marker statically known to the Wasm engine and rtt.canon externref is just a way to get a hold of the witness. It just makes downcasting an externref symmetric with other casts. A downcast to externref can be statically determined by an engine and can be compiled entirely differently from other RTT-based casts.

I am not sure what you mean by correctness, there isn't any context.

Realistically speaking, every GC language will box externref values in order to conform to the invariants of their own values, whether those invariants are to have a v-table or to have an int type-id header or to conform to a particular rtt convention. In all these cases, the externref <: anyref subtyping is not helpful.

It seems like you did not read my comment so let me repeat it:

I don't find arguments of the form "modules will definitely want to do X because reasons Y" compelling in general.

I gave a concrete example of why subtyping is useful: it avoids boxing. Boxing leads to an inefficiency that actually puts Wasm at a disadvantage to the host language and is not an easily fixable cost.

On the other hand, this subtyping has costs, as discussed above, and restricts the values that externref can represent.

Again I addressed this in my comment, which I will now repeat:

It's right there in the name. externref is for external references. It's not for external values, just external references.

If you want to represent external values, we need fully abstract type imports.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 2, 2020

The RTT for externref is obviously a marker statically known to the Wasm engine and rtt.canon externref is just a way to get a hold of the witness. It just makes downcasting an externref symmetric with other casts. A downcast to externref can be statically determined by an engine and can be compiled entirely differently from other RTT-based casts.

In the current JS API, an externref value can be a funcref or an i31ref. There is no rtt in common across externref values.

I am not sure what you mean by correctness, there isn't any context.

It is difficult to write a correct and useful program that mixes foreign values with its own without a reliable way to distinguish foreign values from its own or to cast foreign values back to externref.


We can sort hosts into roughly two categories: 1) reference values conform to wasm conventions, and 2) reference values do not conform to wasm conventions. We can sort applications into roughly two categories: a) directly mixes externref values with own values, and b) boxes externref values when mixing with own values.

1a. Can utilize subtyping, but purposes are just as efficiently served by pre-importing coercions.
1b. Has no need for subtyping or coercions, but is not harmed by them either.
2a. Subtyping forces boxing to happen. Coercions would do the same.
2b. Subtyping forces boxing to happen but without any benefits.

In all four scenarios, subtyping is either matched or outperformed by optionally pre-imported coercions. Coercions are also more flexible as they can work for arbitrary foreign values (like smis), not just foreign references. The JS API falls between 1 and 2 (say 1.5), and most applications fall in b (for the reasons I gave above). For 1.5b, subtyping is slower, not faster. This suggests there is no case where subtyping helps performance, and for the common case subtyping hurts performance.

@titzer
Copy link
Contributor

titzer commented Oct 6, 2020

In the current JS API, an externref value can be a funcref or an i31ref. There is no rtt in common across externref values.

There must be a weird hole here that we missed, because funcref as a wasm type is not assignable to a externref, but apparently you are saying that a funcref can escape to JS and then return as an externref. I haven't looked at the JS API since before reference types made it out of phase 4, but this seems like that's already a problem.

We can sort hosts into roughly two categories: 1) reference values conform to wasm conventions, and 2) reference values do not conform to wasm conventions. We can sort applications into roughly two categories: a) directly mixes externref values with own values, and b) boxes externref values when mixing with own values.

I am saying that category 2 is not served by externref at all. If we have this magic pre-import wand, then you might as well pre-import a (non-reference) host type that is completely abstract. There's no point in even having a wasm value for something which is clearly not a wasm value. Besides, importing an abstract type with unknown representation but binding the representation at compile time basically specializes the module's code to the representation of that type. To see why that is more powerful, just consider the possibility of importing a 256-bit vector SIMD type and operations on it; its representation is not the same as any Wasm value today. Host types are like that; they are completely outside of Wasm's representations.

I'd like to restate this yet again, externref is not the thing for representing arbitrary host values, it's for external references.

@rossberg
Copy link
Member

rossberg commented Oct 6, 2020

There must be a weird hole here that we missed, because funcref as a wasm type is not assignable to a externref, but apparently you are saying that a funcref can escape to JS and then return as an externref. I haven't looked at the JS API since before reference types made it out of phase 4, but this seems like that's already a problem.

That's not a problem, because the Wasm/JS boundary is coercive, and the interpretation is controlled by the Wasm type. There is no problem with allowing the same JS value to be viewed as multiple unrelated Wasm types -- that's already the case, e.g., a JS number can become any Wasm number type, even involving different coercions. Now it can also become an externref.

We only have to be careful when we add subtyping to the picture. Then we have to make sure that the coercions defined at the boundary are coherent, i.e., remain the same across subtype relations. If you have anyref (and e.g. funcref <: anyref), that implies that an exported Wasm function must be coerced the same way as an externref vs a funcref, because their representations must be interchangeable. But it can still be treated as both. (Technically, that will require some tweaks to the current formulation of the coercion in the JS API, but that's fine as is for now because no relation between funcref and externref is observable within Wasm yet.)

(Side note: The reference types proposal is conservative in that it only allows exported Wasm functions to be used as funcref, not arbitrary JS functions. So an implementation could avoid boundary coercions on functions altogether if it chooses to represent Wasm functions the same internally and externally. That may or may not be a useful tradeoff for some implementations.)

I'd like to restate this yet again, externref is not the thing for representing arbitrary host values, it's for external references.

Exactly. And it is merely an artefact (or convenient "coincidence") of the JavaScript environment that we can happily treat all JS values as references, because JS depends on a uniform representation already, so practically speaking they all are heap references already (or have to be representable as such, local unboxing optimisations notwithstanding).

@RossTate
Copy link
Contributor Author

RossTate commented Oct 6, 2020

As @rossberg suggests, we want a coherent coercion system for our embedders. You can prove that any well-behaved coercion system will preserve rtts of wasm references even as externref values. So it is impossible to have a single rtt for all externref values.

If we have this magic pre-import wand, then you might as well pre-import a (non-reference) host type that is completely abstract.

Right. Similarly, if we remove the externref <: anyref subtyping, then externref can denote arbitrary external values.


An easy way to way to show that this subtyping is useful is to provide an application that uses it and in a way that is not just as well (or better) served by pre-importing coercions. As just established, the correctness of this application cannot assume there is a single rtt for all externref values.

@titzer
Copy link
Contributor

titzer commented Oct 7, 2020

Right. Similarly, if we remove the externref <: anyref subtyping, then externref can denote arbitrary external values.

No, because externref has only a single representation, and arbitrary external values can have arbitrary (and different) representations. This is why I gave the example of a 256-bit vector type. Obviously, we are not going to make the representation of externref 256 bits in order to support wider vectors (or some other compound value type), or make it any other size than a reference.

So another way to say the above is that externref is for external references.

I feel if we keep disagreeing on the above point then the conversation isn't making progress.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 7, 2020

So another way to say the above is that externref is for external references.

@titzer I understand that this is what you intended for externref. But because of how the JS API for externref is already defined in the Reference Types proposal, and because we want coercions to be coherent with respect to subtyping, and because externref is a subtype of anyref, it is the case that externref will in actuality represents arbitrary wasm/JS references. That is, there is no semantic difference between anyref and externref, at least in the JS API (or more generally in any embedder whose coercions are coherent and satisfy certain code-migration principles).

The only way to have a semantic difference between anyref and externref is to remove the subtyping between them. This in turn gives the engine complete freedom to determine how to implement/represent externref values (so long as there is a testable default value). Yes, engines could use this freedom to do weird things like make externref a 256-bit type, but more likely they'll use it do sensible things, like represent JS values without having to box smis and unbox i31refs (or make sure all JS references conform to the layout expected by rtt casting), or denote whatever WASI chooses to represent capabilities and the like.

@rossberg
Copy link
Member

rossberg commented Oct 7, 2020

As @rossberg suggests, we want a coherent coercion system for our embedders. You can prove that any well-behaved coercion system will preserve rtts of wasm references even as externref values. So it is impossible to have a single rtt for all externref values.

Correct, but nor is that needed.

it is the case that externref will in actuality represents arbitrary wasm/JS references.

The interpretation of externref is completely up to the embedder. That it includes all values from anyref is one possible interpretation in the JS API, but would not be the only one. Nor is it necessarily the case for other embeddings. For example, in the C API, this would not be the natural interpretation.

The fact that we have both types with a fuzzy distinction is one of the warts that resulted from deferring subtyping. Otherwise we wouldn't have needed to introduce externref in the first place.

Right. Similarly, if we remove the externref <: anyref subtyping, then externref can denote arbitrary external values.

No, for the reasons I already stated upthread (representation independence for imports). It would simply be an entirely pointless complication to the language.

Like @titzer has pointed out, externref denotes a reference, not an arbitrary value. Abstracting over arbitrary value types, and thus value representations, would be a separate feature, and only possible with preimports.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 7, 2020

That it includes all values from anyref is one possible interpretation in the JS API, but would not be the only one. Nor is it necessarily the case for other embeddings. For example, in the C API, this would not be the natural interpretation.

Currently WebAssembly/wasm-c-api#148, which you approved, has externref representing arbitrary wasm reference values (including funcref).

only possible with preimports

Many very useful things seem to be only possible with preimports. It seems like this might even be necessary to properly support one of Type Imports intended use cases. So let's ship the feature rather than use limitations of the current design of a Phase 1 proposal to argue that a subtyping is necessary.


In the meanwhile, no application has been provided demonstrating the utility of this subtyping. All the suggested utility seems to come from having a way to coerce host values to anyref in the embedding API. But having a anyref-subtype externref that denotes exactly the same values in the most prominent embedding APIs seems to be useless. Meanwhile, removing the subtyping lets externref represent host values without coercing them into anyref values.

@titzer
Copy link
Contributor

titzer commented Oct 8, 2020

In the meanwhile, no application has been provided demonstrating the utility of this subtyping.

This was actually mentioned before. The subtyping avoids boxing host references when mixing them with Wasm references. E.g. simply having an array of anyref allows one to mix both host references and Wasm references with no boxing overhead. As per previous comment, I consider forcing user-level boxing to be a significant, avoidable cost than cannot be eliminated by the engine.

@aardappel
Copy link

E.g. simply having an array of anyref allows one to mix both host references and Wasm references with no boxing overhead.

That's a pretty abstract reason though. What are "must have" use cases for such arrays? An array of things you can do absolutely nothing with (without downcasting) doesn't sound that useful to me. In some languages these kinds of arrays exist because you'd dynamically dispatch on the elements, but you can't do that here either. If such arrays would cause boxing, would people instead use 2 arrays to avoid it, or would they not care about the boxing since these use cases are rare and/or not performance sensitive? etc.

A better example maybe a "userdata" or "parent" pointer of some very generic structure that must be able to hold any kind of ref, but again, not sure if that is frequent enough that boxing will be a deal breaker. Code that is so generic it must deal with arbitrary JS and Wasm GC objects likely is not the bottleneck of your code.

@titzer
Copy link
Contributor

titzer commented Oct 8, 2020

Unfortunately it's really hard to predict what bottlenecks applications will have, so one of the common themes of design work in Wasm has been to avoid adding expensive things. To this end, a primary design requirements of Wasm GC proposal (i.e. here) is to provide statically-typed data structures that do not need to unnecessarily tag or box fields so that applications are space-efficient. Forcing applications to box externref in order to mix with wasm references is counter to that.

@RossTate
Copy link
Contributor Author

RossTate commented Oct 8, 2020

@titzer As @aardappel points out, that example is abstract. It does not illustrate how the subtyping is helpful; it does not demonstrate that anything useful can be done; and it seems to be addressed by a coercion to anyref. I understand the abstract goal you have in mind, but I have also illustrated why I believe the concrete subtyping externref <: anyref does not help that goal (and even hurts that goal). I, myself, found this counterintuitive, which is why I am skeptical of arguments that appeal to intuition. So a concrete example, rather than an abstract example that appeals to intuition, would be much more helpful in progressing the conversation forward.

@aardappel
Copy link

aardappel commented Oct 9, 2020

@titzer I am all for avoiding expensive things in Wasm, but seems to me here we have a case where making one thing cheaper makes another more expensive and vice versa. So we need to try and estimate expected cost.

@RossTate
Copy link
Contributor Author

#150 makes it so that downcasts to externref are impossible, so that upcasting to anyref is purely a one-way street (without importing coercions). The discussion of #150 also clarifies that externref has no utility relative to anyref in at least the JS API. Using externref in the C API is likely to be similarly problematic given that it already defines externref to be any wasm reference and that C-host references will certainly need to be boxed before being treatable as wasm references.

On the other hand, removing the externref <: anyref subtyping enables externref to actually denote external (reference) values (of some particular bit-width chosen by the engine) in their native form. This would make the common pattern for using external (reference) values described in the OP more efficient as the engine no longer needs to convert/box external (reference) values to look like anyref values (which the OP pattern does not need anyways).

@rossberg
Copy link
Member

rossberg commented Oct 22, 2020

#150 makes it so that downcasts to externref are impossible, so that upcasting to anyref is purely a one-way street (without importing coercions).

That is incorrect. You cannot cast to externref itself, but you can downcast to subtypes of it just fine when given a respective RTT. (Edit: in cases where externref would overlap to a dataref or funcref.)

But more to the point, we are going in circles. You are merely repeating your claims from above, which are based on the implicit assumption that subtyping is only useful if there are downcasts. That assumption is false, as I aIready pointed out upthread. The primary reason for externref-anyref representation compatibility is imports, and the ability to import an unknown reference type and supply an externref for it. As also was explained above, that has to work and does not generally require downcasts. And since the compatibility is required, there is zero benefit in not explaining/exposing it simply as subtyping.

@bvibber
Copy link

bvibber commented Oct 22, 2020

I'm become convinced by the above and discussion on #150 that externref has no usefulness for the JS embedding, as anyref is more flexible and works as expected. Thus its subtyping relationship is irrelevant as it's just a spec wart that won't get used.

It sounds like it might be useful on other specific embeddings? Are there example scenarios where native references would be larger than Wasm references, etc?

rossberg pushed a commit that referenced this issue Feb 24, 2021
Fixes #142. A mismatched `DataCount` is malformed, not a validation error.
@RossTate
Copy link
Contributor Author

RossTate commented Sep 7, 2021

Following today's discussion, it seems odd that a language created for the web has no type that (in the JS embedder) can denote arbitrary JavaScript values in whatever representation engines deem most natural for them. From what I understand, externref could serve that purpose if not for its subtyping relation with anyref. And from the above discussion, I don't see a particularly compelling realistic use case for this subtyping. On the other hand, being able to import a type represent a JavaScript struct and (externref-typed) field accessors for that imported type seem like a useful (Post-MVP) path to leave open for bidirectional interop. So I would recommend removing this subtyping.

@RossTate RossTate mentioned this issue Nov 9, 2021
@tlively
Copy link
Member

tlively commented Feb 10, 2022

#271 makes this issue (as originally stated) obsolete, so I'll close it. See #254 (comment) for the latest information necessary to continue this discussion in a new issue.

@tlively tlively closed this as completed Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants