-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebis-dev: Reducing the number of page faults #2571
Comments
Do I understand this correctly that now this bitvector is updated all the time? It only needs to do so during GC. |
The problem is that So far it is only used to distinguish cached index pointers from strings and during copying but in either case I can think of workarounds to make it unnecessary. The mark phase of the GC algorithm seems to require a bit vector eventually. But I agree, it would be preferable to only write to it when a string is written to the heap. My hope was that resizing it below its allocated capacity would amount to the no-op @triska described since that memory should have already been zeroed at allocation time. I now know I was wrong to believe that. Ideally we could upgrade string tracking in this way to a sorted list of (lower, upper) pairs.. which isn't only much more compact, but would only have to be processed sequentially during GC. |
@UWN's message indicates that this bit vector can be constructed on marking, as a result of scanning the heap starting from the root nodes, which is even better than what I stated in the initial post. |
Yes, but the root nodes only detect live partial string reasons, and then only partially. The marking algorithm works by scanning the heap one cell at a time in either direction. Unless a record like a bit vector or pair list is kept, there's no way to distinguish partial strings that aren't accessible from the root nodes. |
What do you mean with "in either direction"? Strings are only stored in the forward direction. Marking of a string starts from a root (or other) node that points to the start of the string. We know from the tag of the node that a string starts, and all cells up to the next
Cells with unaccessible content (whether strings or not) can be reclaimed by GC no? |
I was actually mistaken to attribute the full heap scan of the GC algorithm to the mark phase. it occurs in the compaction/sweep phase. I'm referring to the algorithm detailed in this paper as the one I plan to use when it comes time to implement GC (i.e. after the latest round of rebis-dev issues are settled). The algorithm uses Morris's compaction algorithm which performs two full sweeps of the heap, the first in the backward direction (high addresses to low), the second in the forward. The problem addressed by Since rebis-dev departs from that assumption by embedding strings into the heap, we need some way to distinguish heap string data from heap cells while the compaction scan is done regardless of whether the scanned data are reachable from root terms. Towards the end, the paper discusses building a tree to track and sweep only the reachable cells so the algorithm can limit its attention completely to them, but it does not go into detail on how to realize it. I'm glad to entertain alternatives people want to propose if they allow us to avoid this and other problems. It should be possible to keep the marking algorithm while changing the compaction algorithm to work around this by using more space. |
All I can recommend is to try to keep everything as simple and robust as possible. A single mark phase starting from root nodes should be enough to register the locations of all reachable strings in the heap and to store the location of these cells in a bitvector. Personally, I see no reason to try to avoid constructing such a bitvector at some point during GC, especially not at the cost of extreme complications that will only introduce further problems. |
Re: I find the current implementation of PStrs quite convoluted; it relies on:
Why not have something like this: struct PStrStart {
length: u16,
next: u40,
}
struct PStrLoc {
addr: u40,
offset: u16,
}
|
We want to support string differences efficiently. Consider for example the following partial string (i.e., we don't yet know its length, since the tail is a variable): This is the key feature used by Line 86 in 5a869e8
In this representation, there is no need to keep track of or modify any lengths. |
#2569 states:
"The heap now uses a bit vector to track locations of partial strings in the heap. It's written to whenever the heap is, ..."
It seems the number of page faults can be significantly reduced by batching and reducing accesses to this bit vector: When a chunk of heap space becomes available (preferably allocated in larger blocks), this vector can be correspondingly extended by all zero bits in one go. Only when strings are written to the heap should it be necessary to change some of these bits to 1, again in one go for as many cells as the string occupies.
The text was updated successfully, but these errors were encountered: