-
Notifications
You must be signed in to change notification settings - Fork 74
Remove anyref #254
Comments
Overall I don't know of any good reasons not to do this. @rossberg, even Wob and WAML don't depend on @jakobkummerow, would we be interested in experimenting with wide function pointers for function references?
Perhaps this would be better split off into a separate issue or discussion, but I would be interested in specific examples of this. |
OO languages use a v-table for each class, with a field for every method of the given class. In a typical OO runtime, those fields are code pointers. In WebAssembly, those fields are typed function references. Because those typed function references can be implemented by functions from any module instance, and because different instances of the same module share code pointers in the instance-object model, they (with The same issue arises with closures in functional languages. Even for the ones that use an instance-object model, normally the closure object contains the code pointer as a field. If the closure happens to depend on the module instance, then the pointer to the module-instance object is included directly in the closure object. But with Thus |
I weakly support dropping
I doubt it. V8 heavily relies on a uniform representation internally. Exceptions are conceivable, e.g. we could special-case "array of funcref" or something. For playing such games, I don't think we care much whether |
Can you clarify this? Why is it problematic for a wasm |
Sure, struct fields could potentially also be special-cased. |
We didn't discuss Waml yet during the meetings, but how is for example the identity function compiled to Wasm? For sure it needs to work on data as well as on functions, so I'd say you need |
Although we have not yet discussed Waml, @rossberg did confirm in the last meeting that Waml does not use |
Anyref is the common supertype of externref and Wasm-internal reference types(*). As such, it is the only type that can abstract from whether some abstract reference (or type) external to a module is implemented in Wasm or by the host. Since making such abstraction possible is a stated goal of Wasm, anyref is needed. As I already said in reply to this question during my talk, anyref would hence likely show up in something like Waml if you tried to support some form of FFI, which it currently doesn't have. (*) Requiring that every Wasm reference can be externalised in every possible host environment would be quite a significant requirement, so assuming anyref = externref as opposed to just anyref >= externref is probably not a good idea in general, even if it could work in JS engines specifically. |
Polymorphic typed functional languages compiling to wasm need a way to downcast from their uniform representation to the lowerings of their concrete types. But there is no way to downcast from But what Waml's FFI could do is use Object-oriented languages would also do the same. They want to be able to assume all (object) values have methods like |
You are missing the point of what I just said, namely achieving independence from how an outside reference is actually implemented (host vs Wasm). Whether that is then boxed up or not inside the language runtime is a different question. (FWIW, no downcast is needed for polymorphic functions if the lowered instantiation type is anyref.) |
Then please clarify your point, as what you're saying seems contradictory to me.
One of the problems with the
This would require the FFI to use This also forces Could you articulate why the boxing approach is unsatisfactory to you? |
I thought we had long settled this. Yes, you need a uniform representation for anyref and externref. And no, there is no way around it given the goal of implementation independence for imported modules. Not unless you completely change the Wasm compilation model. Even without anyref you'd have the exact same problem with abstract type imports. Again, boxing is a separate question and does neither help nor hinder in that regard. |
I do not know how you came under this impression. #143 is precisely on this topic, over a year old, and unresolved.
There are also unresolved issues regarding the type imports proposal related to the expectation that all imports should be subtypes of Even if we want to stick with the model that imported types should not need to be known before compilation, you can achieve this by allowing unbound imported types to be instantiated with any reference type or with |
I don't see how that can work efficiently. One might be garbage-collected, the other might not? Both might use different tagging schemes? In general, the compiler already needs to know which is which. The GC will need to know it. Size is not enough. If they don't statically know how to interpret the representation they'd have to compile in multiple alternative paths everywhere, and dynamically select based on some per-type mode flag, which is no better than a uniform representation. Can you provide any concrete example of a real life runtime that would not simply implement a uniform representation in such a scenario? |
Every browser's GC would have no problem with this. I hear the problems you are raising, but If you want external references to be implemented using a different GC and to be usable as an abstract import, then you'll need to defer compilation until after type-import instantiation. At least that's a choice the engine will have available to it if the subtyping is removed. |
Trying to prototype an FFI system taking advantage of |
I believe I already articulated those issues above. My impression is that the outstanding issue is that @rossberg is concerned that, without this subtyping, |
I value implementation feedback higher than long discussion threads like this. I was under the impression that this was a settled matter, too. I feel we risk falling back into old patterns and making no or negative progress. We are now closer than ever to an MVP that has a set of capabilities that are motivated by clear use cases and implementable in all known engines. If we want to make modifications then our discussions don't need to be hypothetical. We are deep in the implementation phase and work should be focused there. For this issue, it's clear to me from the above discussion that removing I would also strongly encourage us to keep the possibility of type imports of nonref types open. Otherwise we are baking in the same (foolish IMHO) assumptions that were baked into Java's erased generics. |
Items 1 and 2 of the OP both identify issues with full-blown subtyping that are not issues with size compatibility.
It's not. Comments above articulate why languages would not want to use
Before @rossberg raised his objections, we were discussing concrete implementation tasks that could be explored should this subtyping be removed. Specifically item 3 in the OP is about addressing performance issues with the MVP. I did my own experiments outside of wasm to estimate the overhead caused by the |
Right now the trade off is between hypothetical engine optimizations to the representation of If we still have no concrete experience with either side of this discussion when it comes time to cut the MVP for real, it will be prudent to make the decision based on our principles of incremental development. If the MVP does not include |
This seems like a reasonable plan.
What I can't tell is that if the engine optimizations were implemented and found to improve performance, would @rossberg and @titzer still object to removing So a way to make concrete progress on this issue is to have the group discuss whether they would support removing |
Holding the door open for closures implemented as fat pointers just increases uncertainty to untenable levels. I speak with authority when I say this optimization just will not work in Wasm engines as envisioned. It is by no means trivial to do. (And I don't say that lightly--that's how Virgil implements closures, I love it, it works great, I wish we could--but alas, after years and years of thinking how to do that in Wasm engines, particularly V8 and Wizard, this has about zero chance to happen, because of concerns like value representation in interpreters, interacting with a host environment, the way Wasm instances already work, the requirement of a ton of compiler and runtime work in web engines, etc). I am disappointed by that, but, I think we should all move on from the fat pointer dream. @RossTate I'll be entirely honest. I hate to be so blunt, but it's bothered me for some time, and it's a bad practice that now needs to be called out. I'm frustrated with your cavalier benchmarking. Big round numbers repeated often, becoming mythic, but really their genesis is microbenchmarks just in passing, written by hand but no source code shown, no link, no machine code, using completely different compilers, which usually isn't even mentioned what one, a totally different or no runtime system, no mention of allocators, etc, and absolutely no way to reproduce any numbers whatsoever. So throwing numbers like 30% around, from a benchmark nobody understands, attributing this to a "dependent load", when in all likelihood no one inspected the machine code by hand, is absolutely not how decisions are made. Bad numbers are worse than misleading, they are just tokens in advocacy of a design choice and delegitimize good numbers. Bad numbers from cavalier side-benchmarking deserve to just be ignored, frankly. Instead, please write some Wasm. I'll be convinced by Wasm. Or write machine code. I'll be convinced by machine code. Or compile something to Wasm. Or compile Wasm to something. Or tweak an engine. I've been trying to say this politely for months and years. Provide something measurements in Wasm ecosystem other people can reproduce. For this discussion, we gotta stop blocking on hypotheticals or using numbers from other systems. There are at least 5 high performance Wasm engines out there and there's no excuse to not do benchmarking and optimization work on one of them. If we are to work as a group together, then I will strongly advocate that our work be done there. To the point where in my head I will massively discount benchmarking work that could and should have been done in Wasm but wasn't. Short of fat pointers, which I am sad won't work, I predict zero performance benefit in V8 from removing No one addressed @rossberg point above that My larger point is that we are just going around and around in circles having the same discussions all over again. Even when some of us believe that we have consensus. We need to keep moving forward and I honestly see at best, no strong motivation to change the status quo and at worst, a significant reduction in functionality. Our process is far too slow and inefficient to ship things half-done. At this rate it will take us another 4 years to add |
Like other people that have posted data from benchmarks, I have made further details available when requested. The (Java) source code was linked upon request here: #249 (comment). The peer-reviewed paper and artifact for the language and implementation is here, which provides the source code of the compiler on GitHub and a VM containing a compiled executable of that source code and scripts to reproduce the plots I presented in the earlier meeting. For the published experiments and this benchmark, we did review the code produced by the compiler. (Though, for the published experiments, since we generate hundreds of variations of the programs, we reviewed a few random samples as well as outliers and the fully untyped and fully typed extrema.) In addition, I described the overall methodology of the experiment, which you could easily adapt to other programs, other compilers and runtimes, and other languages. I also described the rationale behind the benchmark design, and I mentioned characteristics of the benchmark that should be taken into account when interpreting its measurement. Others reached out to discuss the experiments in more detail in order to make their own assessment of them. You could have done the same. Formative evaluation, such as my experiment, is a critical part of the design process. Yes, it should not be mistaken for the sort of summative evaluation that you would see in a science publication, with p-values and such, but proper summative evaluation requires substantial resources that we do not have, and requiring all formative evaluations to be conducted as such would be crippling. Others have presented similar formative evaluations of overheads with much less information on how those estimations were arrived at, and those numbers have been repeatedly used in discussions that you've been involved in, and you expressed no criticisms. You are selectively applying this standard to just my formative evaluations in an effort to dismiss observations you find inconvenient.
Rather than speaking with authority (and not offering any insights into why you arrived at this assessment, which could have helped us as a group contribute more effectively), you could have done what @tlively did and ask an actual active implementer of a current wasm engine for their thoughts. This follow-up response suggests that the specific representation optimization that would help with things like class/interface-method dispatch is viable.
It is absurd for you to say that I "could and should" have done this experiment in a wasm and at the same time say that the necessary optimization for performing that experiment "will not work in Wasm engines".
We already had meetings discussing the kinds of compilation and abstraction we want to support. We decided we are doing whole-program compilation, possibly with module splitting, and with no attempt to abstract implementation details. So this concern seems to be out of scope for the MVP. Besides, as you mentioned, the problem is solved by unbound type imports, so there's a clear path towards addressing the issue without the subtyping. (There's also the larger issue of whether @rossberg's concern is even valid/reasonable. The survey of how various GC VMs support FFI and abstraction thereof, and no one had a type that's the union of GCed VM values and foreign values.)
We did discuss this, and the issue was left clearly unresolved (literally "Open"). If you believe there is consensus (or a group vote has been made), then close issues with the summary of that resolution. Assuming consensus without performing such a check is presumptuous. As for having the same discussions over again, we recently got more highly relevant information on this topic, and so now we are discussing the issue again to determine if we now have enough understanding to make a decision. To that end, this still seems like a good plan. Though, if you'd like to not discuss this again, we could consider taking these further and decide on whether to remove |
Frankly I find this comment massively disrespectful and no one else in this community is doing that. As a cofounder of Wasm and TL in V8 I designed its optimizing compiler, I derped its Wasm engine from nothing, I managed and led the team that maintains and evolves it and now I'm writing a new Wasm engine from scratch. Pretending my opinion doesn't count because I don't work on V8 anymore is just flat out an insult. I don't assert things on authority often, and not without good reason. I've learned that for some special reason this proposal is repeatedly derailed and being brief is the only rectification. Otherwise we spend so much energy on trivialities. I've repeatedly tried to remedy that by keeping us on point and making progress. So, I will, again, be brief. Closures as fat pointers won't work as envisioned. It's a wild goose chase that I wouldn't send people on. I could produce a load of noise, but I won't. If that doesn't satisfy you, then the burden of work lies on you, not others. At this point I really regret how toxic this thread has gotten and I regret interacting here. It takes a lot of energy to talk about basic things and it's dysfunctional. |
It's not that I do not recognize that you have some authority to speak with on this topic, nor did I mean to suggest that your opinion does not count. I see how that excerpt gives that impression, and I apologize for that unintended insult on my part. In past meetings you have shared insights from your experience that I have greatly appreciated. What I had intended to express in that excerpt was that I was upset that you claimed authority that can also be attributed to someone else who had already expressed a cursory assessment (which I referenced in the sentence following your excerpt) without acknowledging that their assessment conflicts with your own, as if their assessment did not exist or did not matter. I am not sure if acknowledging the harmful impact of my error will have any effect at this point, but I thought it was at least worth trying. |
I'd like to clarify that my assessment does not conflict with @titzer's statement. I expect no performance benefit from dropping anyref. We're not going to move to fat-pointer representations for funcrefs in general. Special-casing "unboxed" storage in a few select cases (with boxing as unified-representation object on demand) is a different strategy, and does not depend/block on the existence of In short, purely from a V8 implementation point of view, our perspective on Personally, I'm confused by the state of the overall discussion around funcref/externref/anyref; in particular it seems to me that we don't have consensus on what our goals/requirements are, and in consequence, what we consider valid arguments. Specifically, if we argue that "we need anyref in addition to externref because by using externref on its interface, a module can require objects that another module can't fake/polyfill, but that's a key requirement of Wasm, and only anyref provides that capability", then that seems like a reason not to have externref in its current form at all -- yet externref in its current form is something that we definitely decided to have. Was that, perhaps, a mistake that we should undo? Or am I just missing something? It's also not clear to me how to resolve the apparent contradiction between "anyref provides key required functionality" and "we currently have no known use case for this functionality". But again, maybe I'm just missing something; it's hard to wrap my head around all the hypotheticals that have been thrown around here. There's also the unresolved "wart" that the current design makes it observable whether an implementation uses i31ref-style tagging for some externref values (issue #76). Depending on how we resolve that, a performance cost could easily arise from that. (Of course it's not this thread's purpose to address my personal confusion, so feel free to ignore me; my reason for spelling this out is a faint hope that articulating the sources of my confusion might help the larger discussion gain clarity/focus.) |
Thanks for the thoughtful post, @jakobkummerow! I'm not sure I can address your confusion, but I can add my thoughts so that maybe we can all develop a mutual understanding of the goals and constraints.
I'm not sure how that special casing is compatible with that subtyping. With the subtyping, you can have a type Similar problems arise from abstract type imports, and it occurs to me that that might also pose another problem. I recall @titzer mentioning in a meeting that My understanding from when the reference types proposal was being shipped was that the pressing need for It would be very helpful to be offered some insight as to why fat representation of |
Yeah, such a scheme would have to work around these cases somehow, such as by not being applied when subtyping is required. Maybe that makes it too difficult to be viable. For vtables it might still work out, not sure.
Off the top of my head:
|
Thanks very much for the information, @jakobkummerow! Can I bug you for one more thing that would help me understand the picture? Is |
We only have externref because we removed anyref late in the game before and were left with a gap that we had to fill somehow. It's not something anybody desperately wanted. |
As a high-level reply to this: the discussion culture on this repo is way too toxic and personal agendas way too destructive to make closing issues feasible anymore. Trying that for even a single issue would immediately derail into so much more wasted time and energy that I have long given up on formally resolving issues on this repo. |
We currently use |
Thanks for the info, @skuzmich! Can you articulate a little more how you use it? Is this for your |
No. Currently our We use |
Ah, thank you for the information and links! Looking through the code, it looks like the one point where you use As a side note, your link also references a reminder that a good JS API would enable wasm references to extend JS classes. Similar issues came up with J2CL, and the ability to write good (custom) coercions between JS and wasm references seems like an important JS API consideration. But I don't think we should need to add a core wasm type and subtyping to support such coercions. Unfortunately, it's been unclear how to progress the JS API to address your needs without first knowing whether we can or cannot hook into a nominal type system to facilitate these things. |
Define "essentially" and "advanced", but probably the latter :-) So this could be seen as a variant of "special-cased fat-pointer-like representation". Memory locality obviously isn't great, but that's hard to change because the types are all fundamentally different: one array holds on-heap/GC'ed pointers (and is on the managed heap itself), one holds plain 32-bit integers, one holds 64-bit off-heap pointers; there's no good way to put all three into a single array. |
Thanks for the information, @jakobkummerow! It's definitely helping me get a clear picture of how things work and what the constraints are.
The type imports proposal, as I currently understand it, would require this. A module could export its |
@RossTate i agree, we (ab)use this subtyping only in a shallow way. Having some form of explicit fast coercion that retains identity when round-tripping is probably fine for Kotlin. |
#271, which makes For the sake of that future discussion, if we have it, here are some of the salient points from this discussion:
|
The Overview describes the purpose of
anyref
to be to support the sort of uniform representation typically used by polymorphically or dynamically typed GC languages. However, we have seen that these languages are not usinganyref
for their uniform representation. In short, no one has first-class values corresponding to function references, and as such everyone is usingeqref
for its uniform representation at the least. So there appears to be no particularly good reason to haveanyref
.On the other hand, there are good reasons to not have
anyref
:externref <: anyref
forces JS values to be coerced to and fromexternref
. For example, V8 uses SMIs, which conflict withi31ref
, and so the coercion toexternref
requires boxing SMIs and requires unboxingi31ref
-values-as-JS-values, and the reverse coercion has to do the opposite. Removinganyref
would enable us to eliminate this coercion.externref
field accessors/members. However, the coercions required by theexternref <: anyref
subtyping prevent this, since those fields would store JS values in their "native" JS representation rather than their wasm representation. Removinganyref
would enable us to have direct field accessors/members into typed JS objects.funcref <: anyref
subtyping forces function references to be represented the same as data references, preventing this optimization. Removinganyref
could help us overcome the performance issues the current proposal is facing.So, because
anyref
is not needed for its intended purpose, and because its presence obstructs various features and optimizations, I propose we remove it.The text was updated successfully, but these errors were encountered: