Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASTER PLAN: Changes to druntime to incorporate new GC #8

Open
schveiguy opened this issue Nov 16, 2024 · 0 comments
Open

MASTER PLAN: Changes to druntime to incorporate new GC #8

schveiguy opened this issue Nov 16, 2024 · 0 comments

Comments

@schveiguy
Copy link
Owner

schveiguy commented Nov 16, 2024

Introduction

As detailed in my last dconf talk, we are working on a new GC to use in D to improve overall experience with D.

This new GC is implemented in SDC, which means for now, it's compiled separately and hooked from druntime via extern(C) functions.

There are several problems with doing this:

  1. The current druntime assumes too much about the GC implementation.
  2. The runtime asks the GC to store bits for allocations that are sometimes used and sometimes treated just as bits. For example the APPENDABLE bit is just blindly set and cleared according to the whims of the user. However, when scanning large blocks, the conservative GC sees this bit and will only scan data in the used space.
  3. The API for passing in information about the type being stored is spread over 3 utilities:
    1. TypeInfo is passed into certain functions. Only one GC cares about this -- the precise GC. And that is only to get the bitmap for pointer scanning. And this is copied out of the typeinfo before being placed in the GC, so the typeinfo isn't directly stored.
    2. The other information is all gathered from the flags passed into the allocation function. e.g. NO_SCAN, APPENDABLE.
    3. The runtime stores the array metadata and finalizer in the block, separately from the GC.

You can see some big coupling problems here. The most glaring is that there are bits that are stored in the GC for no purpose except to carry information important to the runtime. Except when someone wants to cheat -- such as "knowing" how the array used space is stored for scanning large blocks.

Really, the feature of storing finalizers and the array metadata should be a function of the GC, not the runtime. Abstracting this into the GC interface allows the GC to innovate on these features. And in fact, the SDC GC does this.

The second big problem is TypeInfo. The SDC GC cannot use druntime, as it does not have access to it. So it has no means of accessing information inside the object.

On top of that, TypeInfo is slow and cumbersome. Many of the new template hooks have been rewritten to simply use the actual type to build things like the attribute flags. And all we need out of it is the pointer bitmap, which ironically is stored from a call to __traits(getPointerBitmap) in the compiler! So we have that information too.

Finally, using TypeInfo as the API for finalization means we have to do stupid things like construct a "Fake" TypeInfo inside the AA runtime to properly deal with destruction of keys/values.

Proof of Concept

As a proof of concept, I have modified the array and GC API to abstract these features into the GC. This is in the branch https://github.com/schveiguy/dmd/tree/arrayrewrite, and has been built successfully to have the option to use the SDC GC.

This can be tested using releases done by Symmetry (Linux x64 only currently): https://github.com/symmetryinvestments/dmd-umbrella/tags

Overall plan

So what is the plan to get this incorporated? We have 3 phases to go through. I've tried to split them into reasonable chunks to make the migration understandable, and each piece should be valid on its own merits.

Phase 1: update druntime

First up, we need to migrate the array runtime into the GC. Currently the array runtime exists partly in rt.lifetime, and partly in core.internal.array. In addition, we have a finalizer storage that needs to be handled by the GC. This is necessarily intertwined with array handling, since both "metadata" pieces need to live inside the allocated block for the current GC. The steps roughly should follow:

  1. Migrate the block cache into the GC package. PR Move block info caching into the gc package dlang/dmd#17067.
  2. Migrate functions that manage the block metadata into the conservative GC. PR Adding block metadata facilities into the GC package dlang/dmd#17073
  3. Rework array API functions to abstract pieces that can be moved to the GC. PR rework common array operations into functions that can be migrated to the GC dlang/dmd#20586
  4. Add functions to the gcinterface that will manage array runtime. PR Update GC interface to contain array management functions. dlang/dmd#20608

Second, we need to have the GC take ownership of managing both finalizers and array runtime. This means we instrument malloc and friends to actually set up the block metadata directly, instead of having the runtime do it.

  1. Use the flags in the attributes to determine how block metadata is initialized. Remove initialization code from runtime. PR Move all array and finalizer functionality into the GC dlang/dmd#20698
  2. Remove any implicit knowledge of the block metadata from runtime, and any calling of gc internal functions, use public interface only. PR Put all metadata calls into GC dlang/dmd#20833

Third, we need to remove dependency on TypeInfo directly in the GC. In the runtime, we still need TypeInfo to know the correct information about the finalizer and array element size. To this end, we will convert the TypeInfo parameter into 2 parameters -- the bitfield of pointers for the precise scanner, and context void* pointer, which is passed from the runtime for its use during finalization. In reality, the context parameter will be the TypeInfo. In the future we may make a different kind of "limited" TypeInfo which only contains the things we need and doesn't require virtual functions. This also would aid in simplifying the AA finalizer situation.

This poses a new problem, in that the public API for GC allocation receives the TypeInfo. We will have to keep these overloads. But the implementation class does not need to deal with TypeInfo.

  1. Migrate GC functions that accept TypeInfo to accept the appropriate parameters (pointer bitmap and context). Replace the TypeInfo using functions with wrappers that call the correct new functions in core.memory. PR TBD

At this point, the GC interface will be migrated to the point at which custom array runtime and finalizer implementation systems can be added (such as SDC's GC). No special knowledge of the metadata layout in the block should be in runtime outside the GC implementation.

Phase 2: Add SDC hook project to dub so SDC GC can be used.

Details Coming soon...

Phase 3: Port SDC GC to normal D

Details coming soon...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant