-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Motivation for externref <: anyref #143
Comments
If extern references weren't interchangeable with GC references, then we'd be violating a basic design goal of Wasm, namely that imports can equally be implemented by another Wasm module or primitively by the host, and that imported modules can be both host-implemented or self-hosted, and the difference is transparent. Frameworks like WASI make use of this ability, which is just a basic abstraction principle. |
I suppose an example would be a module that's built to run in a browser context, and tracks (opaque imported extern) references to It took me a while to see this reasoning, but the example above I find reasonably convincing as something we'd want to support. (I suppose the question one could ask is: how important is it to have support for subtyping hierarchies among imported opaque types? For a human-written language, one would absolutely want that; for a compiler-generated wire format like Wasm, I'm not so sure: X-to-Wasm compilers could track such types internally in order to detect typing errors early; emitting them into the Wasm module doesn't seem essential for that functionality.) |
@rossberg This statement seems to be predicated on the assumption that all imported types need to be a subtype of @jakobkummerow This seems to be an argument for why we would want to support subtyping between imported types, which is something I agree with. But I do not see how this argues for the specific subtyping |
Actually, this goal is already violated by all existing proposals as far as WASI is concerned because they need to be able to implement their imported capabilities in terms of number types, not GC types. Fixing that is a separate discussion, but I do want to push back on the assertion that WASI depends on Edit: Responding to @tschneidereit's later question inline up here to avoid derailing discussion further.
Yes, the intent for the interface presented to client modules is that capabilities are reference types (or more specifically, opaque type imports), but for the virtualization use case, it would be much better if the virtualizing modules implementing that interface could use normal types (i.e. i32) that could be manipulated natively in their source language. If both sides of the interface were required to use reference types to represent capabilities, it would be impossible to implement a WASI interface in Rust, C, or C++ (without extremely intrusive language extensions). |
I see an implication in the other direction: starting with viewing "externref" as an entirely opaque reference, where a module doesn't know or care what such a reference might be referring to, implies in particular that it might be a reference to another module's GC type. |
@jakobkummerow I'm not entirely sure I understand your argument, so apologies if this response is misdirected. That said, here are a few thoughts.
From an application's perspective, I suspect applications will fall into one of two camps: the application is willing to rely on or trust that the imported type follows some convention, or it is not. In the latter case, even if an imported type (or |
Can you expand on this a bit? I'm probably missing something, since the intent for WASI is to use reference types instead of numbers to represent capabilities. Wherever numbers are needed I'd expect WASI to stick to Wasm core types. |
In order to avoid going into the specifics of WASI, I think the meta-point is that a good compositional/modular design would not require us to consider WASI's implementation strategy nor would impose constraints on WASI's implementation strategy. Removing the |
@tlively, I am a bit astounded but what you're saying. The WASI (and WASI-like) use case was the main motivation for the CG to ask to split type imports from the GC proposal. So it is highly relevant to this question. And the ability to transparently virtualise/poly-fill/self-host system modules (and vice versa) has always been a goal for Wasm and came up regularly in CG discussions. How do you suggest reconciling this goal without making extern refs and Wasm refs interchangeable? |
@rossberg The OP is very clearly questioning (with arguments) whether the subtyping in question is practically useful for this interchangeability. It is frustrating to have you equate removing the subtyping in question with removing interchangeability without addressing, let alone acknowledging, the points in the OP. A much more helpful way to advance the conversation would be to give a concrete example illustrating why you believe these to be effectively the same. If they're obviously effectively the same, then providing such an example should be easy. Ideally this example should not just as easily work with importing coercions, since coercions do not have the downsides of #130 and #142. |
The need for interchangeable representations does not generally imply a need for downcasts, especially in the type import scenario, which is essentially parametric polymorphism. So my observation here is that we gain nothing by assuming a distinction in the subtype relation -- the representations already have to be compatible anyway, and casting is a separate concern. Even when using subtyping and downcasts, the side for which the ability to downcast primarily matters is the one producing the values. And that side always has a choice to make that possible. If it is Wasm code, it can create values with appropriate RTTs. If it is the host, then it has full freedom to use extra-linguistic means and enrich its own object representation to allow recognising their types. Furthermore, a host environment also has the choice of enabling custom object layouts which the Wasm-side can view as RTT-carrying. In general, it can be useful to enable Wasm to downcast host objects to concrete import types, by importing a respective host RTT. OTOH, it is hardly useful to downcast to an abstract type like |
It sounds like you're saying you have no application for |
AFAICT, WASI needs non-reference type imports to realize their goal of virtualizing interfaces that deal in abstract types from C/C++/Rust, since it is unlikely that Rust will ever have first-class support for reference-typed values. I thought I had raised this issue before, but I guess not because now I can't find any discussion about it. At any rate, I think the answer to the original question here is clear: given the currently proposed type-import mechanism, |
This isn't really different than any other subtyping relationship currently proposed for references: upcasts are free (subsumption, no representation change) and downcasts require an RTT witness value that has an RTT type which literally encodes a nominal subtyping chain. The GC proposal doesn't have RTT types for |
Sorry for splitting replies across multiple comments. I don't find arguments of the form "modules will definitely want to do X because reasons Y" compelling in general. That is because we cannot realistically envision all possible use cases or why a module producer would do a particular thing. We shouldn't put producers in a box (pun intended). What that means is that we shouldn't be in the business of forcing modules to do specific things to get around Wasm's inadequacy, in particular, box things (i.e. leaving them no options except boxing). Forcing modules to box values is a far worse sin that forcing dynamic casts (both here, and in other discussions such as how we deal with dispatch sequences). Boxing leads to a lot of garbage production that is a lot more expensive in total than dynamic casts. Especially because engines that inline a lot can do type analysis to eliminate casts in many cases but boxes escape into user data structures and cannot be optimized away so easily. Concretely, forcing modules to box The point of putting The only way forward for actually representing fully abstract external values is to allow type imports that have unknown representation, i.e. allow true parametric polymorphism. To support this, we need an embedding API mechanism for staging, because engines simply cannot compile machine code for imported types until the machine representation of imported types is known. If we can stage imports so that types' representations are known at compile time, engines can produce machine code without even having the full type import yet. There are a lot of good arguments for fully abstract type imports and there are discussion over on that repo, e.g. this one I raised: WebAssembly/proposal-type-imports#10. In summary, I now no longer see any problems with |
Programs cannot assume Realistically speaking, every GC language will box As @tlively points out, the need for this subtyping stems from the limitation of the Type Imports proposal that requires all imported types to be subtypes of |
The RTT for I am not sure what you mean by correctness, there isn't any context.
It seems like you did not read my comment so let me repeat it:
I gave a concrete example of why subtyping is useful: it avoids boxing. Boxing leads to an inefficiency that actually puts Wasm at a disadvantage to the host language and is not an easily fixable cost.
Again I addressed this in my comment, which I will now repeat:
If you want to represent external values, we need fully abstract type imports. |
In the current JS API, an
It is difficult to write a correct and useful program that mixes foreign values with its own without a reliable way to distinguish foreign values from its own or to cast foreign values back to We can sort hosts into roughly two categories: 1) reference values conform to wasm conventions, and 2) reference values do not conform to wasm conventions. We can sort applications into roughly two categories: a) directly mixes externref values with own values, and b) boxes externref values when mixing with own values. 1a. Can utilize subtyping, but purposes are just as efficiently served by pre-importing coercions. In all four scenarios, subtyping is either matched or outperformed by optionally pre-imported coercions. Coercions are also more flexible as they can work for arbitrary foreign values (like smis), not just foreign references. The JS API falls between 1 and 2 (say 1.5), and most applications fall in b (for the reasons I gave above). For 1.5b, subtyping is slower, not faster. This suggests there is no case where subtyping helps performance, and for the common case subtyping hurts performance. |
There must be a weird hole here that we missed, because
I am saying that category 2 is not served by I'd like to restate this yet again, |
That's not a problem, because the Wasm/JS boundary is coercive, and the interpretation is controlled by the Wasm type. There is no problem with allowing the same JS value to be viewed as multiple unrelated Wasm types -- that's already the case, e.g., a JS number can become any Wasm number type, even involving different coercions. Now it can also become an externref. We only have to be careful when we add subtyping to the picture. Then we have to make sure that the coercions defined at the boundary are coherent, i.e., remain the same across subtype relations. If you have anyref (and e.g. funcref <: anyref), that implies that an exported Wasm function must be coerced the same way as an externref vs a funcref, because their representations must be interchangeable. But it can still be treated as both. (Technically, that will require some tweaks to the current formulation of the coercion in the JS API, but that's fine as is for now because no relation between funcref and externref is observable within Wasm yet.) (Side note: The reference types proposal is conservative in that it only allows exported Wasm functions to be used as funcref, not arbitrary JS functions. So an implementation could avoid boundary coercions on functions altogether if it chooses to represent Wasm functions the same internally and externally. That may or may not be a useful tradeoff for some implementations.)
Exactly. And it is merely an artefact (or convenient "coincidence") of the JavaScript environment that we can happily treat all JS values as references, because JS depends on a uniform representation already, so practically speaking they all are heap references already (or have to be representable as such, local unboxing optimisations notwithstanding). |
As @rossberg suggests, we want a coherent coercion system for our embedders. You can prove that any well-behaved coercion system will preserve rtts of wasm references even as
Right. Similarly, if we remove the An easy way to way to show that this subtyping is useful is to provide an application that uses it and in a way that is not just as well (or better) served by pre-importing coercions. As just established, the correctness of this application cannot assume there is a single |
No, because So another way to say the above is that I feel if we keep disagreeing on the above point then the conversation isn't making progress. |
@titzer I understand that this is what you intended for The only way to have a semantic difference between |
Correct, but nor is that needed.
The interpretation of externref is completely up to the embedder. That it includes all values from anyref is one possible interpretation in the JS API, but would not be the only one. Nor is it necessarily the case for other embeddings. For example, in the C API, this would not be the natural interpretation. The fact that we have both types with a fuzzy distinction is one of the warts that resulted from deferring subtyping. Otherwise we wouldn't have needed to introduce externref in the first place.
No, for the reasons I already stated upthread (representation independence for imports). It would simply be an entirely pointless complication to the language. Like @titzer has pointed out, externref denotes a reference, not an arbitrary value. Abstracting over arbitrary value types, and thus value representations, would be a separate feature, and only possible with preimports. |
Currently WebAssembly/wasm-c-api#148, which you approved, has
Many very useful things seem to be only possible with preimports. It seems like this might even be necessary to properly support one of Type Imports intended use cases. So let's ship the feature rather than use limitations of the current design of a Phase 1 proposal to argue that a subtyping is necessary. In the meanwhile, no application has been provided demonstrating the utility of this subtyping. All the suggested utility seems to come from having a way to coerce host values to |
This was actually mentioned before. The subtyping avoids boxing host references when mixing them with Wasm references. E.g. simply having an array of |
That's a pretty abstract reason though. What are "must have" use cases for such arrays? An array of things you can do absolutely nothing with (without downcasting) doesn't sound that useful to me. In some languages these kinds of arrays exist because you'd dynamically dispatch on the elements, but you can't do that here either. If such arrays would cause boxing, would people instead use 2 arrays to avoid it, or would they not care about the boxing since these use cases are rare and/or not performance sensitive? etc. A better example maybe a "userdata" or "parent" pointer of some very generic structure that must be able to hold any kind of ref, but again, not sure if that is frequent enough that boxing will be a deal breaker. Code that is so generic it must deal with arbitrary JS and Wasm GC objects likely is not the bottleneck of your code. |
Unfortunately it's really hard to predict what bottlenecks applications will have, so one of the common themes of design work in Wasm has been to avoid adding expensive things. To this end, a primary design requirements of Wasm GC proposal (i.e. here) is to provide statically-typed data structures that do not need to unnecessarily tag or box fields so that applications are space-efficient. Forcing applications to box |
@titzer As @aardappel points out, that example is abstract. It does not illustrate how the subtyping is helpful; it does not demonstrate that anything useful can be done; and it seems to be addressed by a coercion to |
@titzer I am all for avoiding expensive things in Wasm, but seems to me here we have a case where making one thing cheaper makes another more expensive and vice versa. So we need to try and estimate expected cost. |
#150 makes it so that downcasts to On the other hand, removing the |
That is incorrect. You cannot cast to externref itself, but you can downcast to subtypes of it just fine when given a respective RTT. (Edit: in cases where externref would overlap to a dataref or funcref.) But more to the point, we are going in circles. You are merely repeating your claims from above, which are based on the implicit assumption that subtyping is only useful if there are downcasts. That assumption is false, as I aIready pointed out upthread. The primary reason for externref-anyref representation compatibility is imports, and the ability to import an unknown reference type and supply an externref for it. As also was explained above, that has to work and does not generally require downcasts. And since the compatibility is required, there is zero benefit in not explaining/exposing it simply as subtyping. |
I'm become convinced by the above and discussion on #150 that externref has no usefulness for the JS embedding, as anyref is more flexible and works as expected. Thus its subtyping relationship is irrelevant as it's just a spec wart that won't get used. It sounds like it might be useful on other specific embeddings? Are there example scenarios where native references would be larger than Wasm references, etc? |
Fixes #142. A mismatched `DataCount` is malformed, not a validation error.
Following today's discussion, it seems odd that a language created for the web has no type that (in the JS embedder) can denote arbitrary JavaScript values in whatever representation engines deem most natural for them. From what I understand, |
#271 makes this issue (as originally stated) obsolete, so I'll close it. See #254 (comment) for the latest information necessary to continue this discussion in a new issue. |
#130 and #142 identify some disadvantages to having
externref
be a subtype ofanyref
. I am wondering what the advantages are. Note that, after upcastingexternref
s toanyref
s, there is no reliable way to downcast the resultinganyref
s back toexternref
s, making the subtyping a one-way street. Consequently, it seems to me that a module mixingexternref
s with its own values (uniformly represented asanyref
s) would want to box theexternref
s in some manner, both to provide a way to later unbox them and to prevent bugs that could be caused by unintended/unanticipated overlaps betweenexternref
values and the module's own values. This boxing pattern would not needexternref
to be a subtyping ofanyref
. And if the boxing pattern is indeed common and we leaveexternref
as a subtype ofanyref
, then that would mean that foreign references would first get boxed to becomeexternref
s that are compatible withanyref
s, which would then get boxed again by the application, and then the reverse for unboxing.The text was updated successfully, but these errors were encountered: