Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebis-dev: Reducing the number of page faults #2571

Open
triska opened this issue Sep 25, 2024 · 9 comments
Open

rebis-dev: Reducing the number of page faults #2571

triska opened this issue Sep 25, 2024 · 9 comments

Comments

@triska
Copy link
Contributor

triska commented Sep 25, 2024

#2569 states:

"The heap now uses a bit vector to track locations of partial strings in the heap. It's written to whenever the heap is, ..."

It seems the number of page faults can be significantly reduced by batching and reducing accesses to this bit vector: When a chunk of heap space becomes available (preferably allocated in larger blocks), this vector can be correspondingly extended by all zero bits in one go. Only when strings are written to the heap should it be necessary to change some of these bits to 1, again in one go for as many cells as the string occupies.

@UWN
Copy link

UWN commented Sep 25, 2024

Do I understand this correctly that now this bitvector is updated all the time? It only needs to do so during GC.

@mthom
Copy link
Owner

mthom commented Sep 25, 2024

The problem is that pstr_vec needs to be expanded separately in parallel to the heap since it's a separate data structure. The act of pushing a cell to the heap then requires a bit to be pushed to the vector so that their (cell-indexed) lengths always match.

So far it is only used to distinguish cached index pointers from strings and during copying but in either case I can think of workarounds to make it unnecessary. The mark phase of the GC algorithm seems to require a bit vector eventually. But I agree, it would be preferable to only write to it when a string is written to the heap. My hope was that resizing it below its allocated capacity would amount to the no-op @triska described since that memory should have already been zeroed at allocation time. I now know I was wrong to believe that.

Ideally we could upgrade string tracking in this way to a sorted list of (lower, upper) pairs.. which isn't only much more compact, but would only have to be processed sequentially during GC.

@triska
Copy link
Contributor Author

triska commented Sep 26, 2024

The mark phase of the GC algorithm seems to require a bit vector eventually.

@UWN's message indicates that this bit vector can be constructed on marking, as a result of scanning the heap starting from the root nodes, which is even better than what I stated in the initial post.

@mthom
Copy link
Owner

mthom commented Sep 27, 2024

Yes, but the root nodes only detect live partial string reasons, and then only partially. The marking algorithm works by scanning the heap one cell at a time in either direction. Unless a record like a bit vector or pair list is kept, there's no way to distinguish partial strings that aren't accessible from the root nodes.

@triska
Copy link
Contributor Author

triska commented Sep 27, 2024

The marking algorithm works by scanning the heap one cell at a time in either direction.

What do you mean with "in either direction"? Strings are only stored in the forward direction. Marking of a string starts from a root (or other) node that points to the start of the string. We know from the tag of the node that a string starts, and all cells up to the next 0 byte in the heap are registered as being occupied by a string in the bitvector.

there's no way to distinguish partial strings that aren't accessible from the root nodes.

Cells with unaccessible content (whether strings or not) can be reclaimed by GC no?

@mthom
Copy link
Owner

mthom commented Sep 27, 2024

I was actually mistaken to attribute the full heap scan of the GC algorithm to the mark phase. it occurs in the compaction/sweep phase. I'm referring to the algorithm detailed in this paper as the one I plan to use when it comes time to implement GC (i.e. after the latest round of rebis-dev issues are settled). The algorithm uses Morris's compaction algorithm which performs two full sweeps of the heap, the first in the backward direction (high addresses to low), the second in the forward.

The problem addressed by pstr_vec is that the algorithm assumes the heap is a homogeneous data structure containing only heap cells. The algorithm was designed to minimize space usage during collections but that's not nearly as much a concern today as when the paper was published in the 80s.

Since rebis-dev departs from that assumption by embedding strings into the heap, we need some way to distinguish heap string data from heap cells while the compaction scan is done regardless of whether the scanned data are reachable from root terms. Towards the end, the paper discusses building a tree to track and sweep only the reachable cells so the algorithm can limit its attention completely to them, but it does not go into detail on how to realize it.

I'm glad to entertain alternatives people want to propose if they allow us to avoid this and other problems. It should be possible to keep the marking algorithm while changing the compaction algorithm to work around this by using more space.

@triska
Copy link
Contributor Author

triska commented Sep 27, 2024

All I can recommend is to try to keep everything as simple and robust as possible. A single mark phase starting from root nodes should be enough to register the locations of all reachable strings in the heap and to store the location of these cells in a bitvector. Personally, I see no reason to try to avoid constructing such a bitvector at some point during GC, especially not at the cost of extreme complications that will only introduce further problems.

@adri326
Copy link
Contributor

adri326 commented Jan 27, 2025

Re: I find the current implementation of PStrs quite convoluted; it relies on:

  • sentinel values (which causes a lot of headache when the string contains one or more null byte)
  • a bit vector (which must be kept up to date)
  • the presence of an [|] atom or a PStrLoc following them
  • a PStrLoc heap cell pointing to it

Why not have something like this:

struct PStrStart {
    length: u16,
    next: u40,
}

struct PStrLoc {
    addr: u40,
    offset: u16,
}
... | PStrLoc   | ...
         |
         +--------------+
         |              |
         | (addr)       v (8*(addr+1) + offset)
         v              [==========
... | PStrStart | <string contents> | ...
       |    |                     ^
       |    |                     | (length)
       |    +---------------------+
       |
       | (next)
       v          ===============]
... | PStrStart | <rest of string> | ...
       |
       | (next)
       v
       0
  • PStrLoc contains the offset within the partial string and the address of the PStrStart
  • PStrStart is responsible for storing metadata about the partial string: its length (in bytes) and the address of the next segment, or zero (it's my understanding that the very first heap cell has special meaning and should therefore not contain a PStr)
  • no more sentinel values (which automatically solves rebis-dev: Strings starting with a null byte aren't stored on the heap #2757 & friends)
  • no more separate bitvector
  • no more marking the last byte of the PStr (which could be the first sentinel value), since we can now mark the dedicated PStrStart
  • no need for a dedicated [|] atom following the PStr: if pstr_start.next == 0 && pstr_loc.offset >= pstr_start.length, then the PStrLoc can be interpreted as [|]
  • no need to scan the PStr for the sentinel value whenever we want to read the PStr: accessing the first character can now be an O(1) operation

@triska
Copy link
Contributor Author

triska commented Jan 27, 2025

We want to support string differences efficiently. Consider for example the following partial string (i.e., we don't yet know its length, since the tail is a variable): [a,b,c|Ls]. It must be possible to efficiently let more become known about the string, by instantiating Ls.

This is the key feature used by library(pio): When more becomes known about a variable, any string it is part of automatically grows longer:

partial_string(Chars, Ls, Ls0),

In this representation, there is no need to keep track of or modify any lengths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants