From e364d5f0cddc03d164f0d7dfc8d6c4f1616364b7 Mon Sep 17 00:00:00 2001 From: piegames Date: Tue, 13 Jun 2023 20:27:23 +0200 Subject: [PATCH 1/5] Update the RFC process documentation (#150) * Update the RFC process documentation - Nudge authors towards using Semantic Line Breaks for writing the RFC text - Nudge authors towards posting their text as pre-RFC in the forum first - Tweak wording around who declares FCP slightly - Make clear that the shepherds have to announce FCP, and when exactly the period officially starts - Be more open about when to do the implementation work relative to the RFC, and reflect the current practice of starting implementation work early - Make more clear that RFCs should not be amended, and that important information should live e.g. in documentation instead. * RFC template: Use semantic line breaks Given the short text this may not make a huge difference, but given that every RFC starts with a copy of that template, the hope is that it will nudge people to continue writing in that style. * RFC template: Add "Prior art" section This is mostly inspired by Rust's RFC template. Also added a couple of words to the "Alternatives" section, similarly inspired by Rust's template. * fixup! Update the RFC process documentation * fixup! Update the RFC process documentation Co-authored-by: 7c6f434c <7c6f434c@mail.ru> * fixup! Update the RFC process documentation Co-authored-by: Valentin Gagarin --------- Co-authored-by: 7c6f434c <7c6f434c@mail.ru> Co-authored-by: Valentin Gagarin --- 0000-template.md | 36 +++++++++++-------- README.md | 91 ++++++++++++++++++++++++++---------------------- 2 files changed, 71 insertions(+), 56 deletions(-) diff --git a/0000-template.md b/0000-template.md index 43569cf57..a25d40566 100644 --- a/0000-template.md +++ b/0000-template.md @@ -16,35 +16,44 @@ One paragraph explanation of the feature. # Motivation [motivation]: #motivation -Why are we doing this? What use cases does it support? What is the expected -outcome? +Why are we doing this? What use cases does it support? What is the expected outcome? # Detailed design [design]: #detailed-design -This is the core, normative part of the RFC. Explain the design in enough -detail for somebody familiar with the ecosystem to understand, and implement. -This should get into specifics and corner-cases. Yet, this section should also -be terse, avoiding redundancy even at the cost of clarity. +This is the core, normative part of the RFC. +Explain the design in enough detail for somebody familiar with the ecosystem to understand, and implement. +This should get into specifics and corner-cases. +Yet, this section should also be terse, avoiding redundancy even at the cost of clarity. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions -This section illustrates the detailed design. This section should clarify all -confusion the reader has from the previous sections. It is especially important -to counterbalance the desired terseness of the detailed design; if you feel -your detailed design is rudely short, consider making this section longer -instead. +This section illustrates the detailed design. +This section should clarify all confusion the reader has from the previous sections. +It is especially important to counterbalance the desired terseness of the detailed design; +if you feel your detailed design is rudely short, consider making this section longer instead. # Drawbacks [drawbacks]: #drawbacks -Why should we *not* do this? +What are the disadvantages of doing this? # Alternatives [alternatives]: #alternatives What other designs have been considered? What is the impact of not doing this? +For each design decision made, discuss possible alternatives and compare them to the chosen solution. +The reader should be convinced that this is indeed the best possible solution for the problem at hand. + +# Prior art +[prior-art]: #prior-art + +You are unlikely to be the first one to tackle this problem. +Try to dig up earlier discussions around the topic or prior attempts at improving things. +Summarize, discuss what was good or bad, and compare to the current proposal. +If applicable, have a look at what other projects and communities are doing. +You may also discuss related work here, although some of that might be better located in other sections. # Unresolved questions [unresolved]: #unresolved-questions @@ -54,5 +63,4 @@ What parts of the design are still TBD or unknowns? # Future work [future]: #future-work -What future work, if any, would be implied or impacted by this feature -without being directly part of the work? +What future work, if any, would be implied or impacted by this feature without being directly part of the work? diff --git a/README.md b/README.md index f32b30af2..37cb9099c 100644 --- a/README.md +++ b/README.md @@ -115,44 +115,45 @@ graph TD 0. Have a cool idea! -1. Fill in the RFC. Put care into the details: RFCs that do not present +0. Fill in the RFC. Put care into the details: RFCs that do not present convincing motivation, demonstrate understanding of the impact of the design, or are disingenuous about the drawbacks or alternatives tend to be - poorly-received. You might want to create a PR in your fork of the RFCs - repository to help you flesh it out with a few supporters or chat/video - conference with a few people involved in the topic of the RFC. -2. In case your RFC is a technical proposal, you might want to prepare a + poorly-received. Consider using [Semantic Line Breaks](https://sembr.org/) + in order to get better diffs on later amendments. +0. Consider publishing your RFC as pre-RFC [in the forum](https://discourse.nixos.org/c/dev/rfc-steering-committee/33) + to gather initial feedback and iron out the remaining typos. +0. In case your RFC is a technical proposal, you might want to prepare a prototype of your idea to firstly make yourself aware of potential pitfalls and also help reviewers understand the RFC. Code may be able to explain some issues in short. -3. Submit a pull request. As a pull request the RFC will receive design feedback +0. Submit a pull request. As a pull request the RFC will receive design feedback from the larger community, and the author should be prepared to revise it in response. -4. For the nomination process for potential members of the RFC Shepherd Team, +0. For the nomination process for potential members of the RFC Shepherd Team, that is specific to each RFC, anyone interested can either nominate another person or themselves to be a potential member of the RFC Shepherd Team. This can already be done when submitting the PR. -5. The RFC Steering Committee assigns a subset of the nominees to the RFC +0. The RFC Steering Committee assigns a subset of the nominees to the RFC Shepherd Team and designates a leader for it. This has to be done unanimously. -6. Build consensus and integrate feedback. RFCs that have broad support are much +0. Build consensus and integrate feedback. RFCs that have broad support are much more likely to make progress than those that don't receive any comments. Feel free to reach out to the RFC Shepherd Team leader in particular to get help identifying stakeholders and obstacles. -7. The RFC Shepherd Team will discuss the RFC pull request, as much as possible +0. The RFC Shepherd Team will discuss the RFC pull request, as much as possible in the comment thread of the pull request itself. Discussion outside of the pull request, either offline or in a video conference, that might be preferable to get to a solution for complex issues, will be summarized on the pull request comment thread. -8. RFCs rarely go through this process unchanged, especially as alternatives and +0. RFCs rarely go through this process unchanged, especially as alternatives and drawbacks are shown. You can make edits, big and small, to the RFC to clarify or change the design, but make changes as new commits to the pull request, and leave a comment on the pull request explaining your changes. Specifically, do not squash or rebase commits after they are visible on the pull request. -9. At some point, a member of the RFC Shepherd Team will propose a "motion for - final comment period" (FCP), along with a disposition for the RFC (merge or - close). +0. At some point, a member of the RFC Shepherd Team will propose to start the + "Final Comment Period" (FCP) on behalf of the team, along with a disposition + for the RFC (usually "merge" or "close"). * This step is taken when enough of the tradeoffs have been discussed that the RFC Shepherd Team is in a position to make a decision. That does not require consensus amongst all participants in the RFC thread (which is @@ -165,21 +166,25 @@ graph TD * For RFCs with lengthy discussion, the motion to FCP is usually preceded by a summary comment trying to lay out the current state of the discussion and major tradeoffs/points of disagreement. - * Before actually entering FCP, all members of the RFC Shepherd Team must - sign off the motion. -10. The FCP lasts ten calendar days, so that it is open for at least 5 business - days. It is also advertised widely, e.g. in NixOS Weekly and through - Discourse announcements. This way all stakeholders have a chance to lodge - any final objections before a decision is reached. -11. In most cases, the FCP period is quiet, and the RFC is either merged or - closed. However, sometimes substantial new arguments or ideas are raised, - the FCP is canceled, and the RFC goes back into development mode. -12. In case of acceptance, the RFC Steering Committee merges the PR. - Otherwise the RFC's pull request is closed. If no - consensus can be reached on the RFC but the idea in general is accepted, it - gets closed, too. A note is added that is should be proposed again, when the - circumstances, that are stopping the discussion to come to another decision, - change. + * In order to actually enter FCP, it must be made clear that all members of + the RFC Shepherd Team sign off the motion, e.g. through comments, reactions, + approving reviews or a meeting protocol. +0. The FCP is advertised widely by the shepherds, most importantly in the relevant + [Discourse announcements category](https://discourse.nixos.org/c/announcements/rfc-announcements/22). + It lasts ten calendar days starting with the Discourse announcement, so that + it is open for at least 5 business days. This way all stakeholders have a + chance to lodge any final objections before a decision is reached. +0. In most cases, the FCP is quiet, and the RFC is either merged or + closed. However, sometimes substantial new arguments or ideas are raised, + the FCP is canceled, and the RFC goes back into development mode. + The feedback during FCP may result in minor adjustments to the RFC, this is + not necessarily a reason to cancel FCP. +0. In case of acceptance, the RFC Steering Committee merges the PR. + Otherwise the RFC's pull request is closed. If no + consensus can be reached on the RFC but the idea in general is accepted, it + gets closed, too. A note is added that is should be proposed again, when the + circumstances, that are stopping the discussion to come to another decision, + change. ### Unhappy Cases @@ -215,8 +220,11 @@ This RFC is being closed due to lack interest. If enough shepherds are found thi ## The RFC life-cycle -Once an RFC is accepted the authors may implement it and submit the feature as a -pull request to the Nix or Nixpkgs repo. Being accepted is not a rubber stamp, +Most RFCs describe changes that eventually need to be implemented, usually in +form of pull requests against one of the Nix\* repositories. Ideally, +implementations are ready to be merged alongside the RFC when it gets accepted. +Other times, implementation happens only after the RFC gets accepted. +Being accepted is not a rubber stamp, and in particular still does not mean the feature will ultimately be merged; it does mean that in principle all the major stakeholders have agreed to the feature and are amenable to merging it. In general though this means that the @@ -231,17 +239,16 @@ implementation, it is by far the most effective way to see an RFC through to completion: authors should not expect that other project developers will take on responsibility for implementing their accepted feature. -Minor modifications to accepted RFCs can be done in follow-up pull requests. We -strive to write each RFC in a manner that it will reflect the final design of -the feature; but the nature of the process means that we cannot expect every -merged RFC to actually reflect what the end result will be after implementation. - -In general, once accepted, RFCs should not be substantially changed. Only very -minor changes should be submitted as amendments. More substantial changes should -be new RFCs, with a note added to the original RFC. Exactly what counts as a -"very minor change" is up to the RFC Shepherd Team of the RFC to be amended, to -be decided in cooperation with the RFC Steering Committee. - +RFC documents are intended to be seen as the documentation of a decision and a +snapshot of a moment in time, rather than a specification-like normative document. +Think more of a Matrix Spec Proposal and less like an IETF RFC. Therefore, +once accepted, RFCs should generally not be substantially changed. Only very +minor changes should be submitted as amendments (via a follow-up pull request). +It is the general expectation that any information intended to be normative and +"outlive" the initial RFC process should live outside of the RFC document, mostly +in documentation and code. These may be subject to change as usual, and of course +any "substantial" changes will again require a new RFC. Usually there is no need +to update the original RFC to keep it up with updates on the implementation. ## Members of the RFC Steering Committee From baecf1ec61291536ec47f15fdec0006c76ea84ef Mon Sep 17 00:00:00 2001 From: John Ericson Date: Thu, 25 May 2023 10:42:34 -0400 Subject: [PATCH 2/5] Copy Template --- rfcs/0000-local-overlay-store.md | 58 ++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 rfcs/0000-local-overlay-store.md diff --git a/rfcs/0000-local-overlay-store.md b/rfcs/0000-local-overlay-store.md new file mode 100644 index 000000000..43569cf57 --- /dev/null +++ b/rfcs/0000-local-overlay-store.md @@ -0,0 +1,58 @@ +--- +feature: (fill me in with a unique ident, my_awesome_feature) +start-date: (fill me in with today's date, YYYY-MM-DD) +author: (name of the main author) +co-authors: (find a buddy later to help out with the RFC) +shepherd-team: (names, to be nominated and accepted by RFC steering committee) +shepherd-leader: (name to be appointed by RFC steering committee) +related-issues: (will contain links to implementation PRs) +--- + +# Summary +[summary]: #summary + +One paragraph explanation of the feature. + +# Motivation +[motivation]: #motivation + +Why are we doing this? What use cases does it support? What is the expected +outcome? + +# Detailed design +[design]: #detailed-design + +This is the core, normative part of the RFC. Explain the design in enough +detail for somebody familiar with the ecosystem to understand, and implement. +This should get into specifics and corner-cases. Yet, this section should also +be terse, avoiding redundancy even at the cost of clarity. + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +This section illustrates the detailed design. This section should clarify all +confusion the reader has from the previous sections. It is especially important +to counterbalance the desired terseness of the detailed design; if you feel +your detailed design is rudely short, consider making this section longer +instead. + +# Drawbacks +[drawbacks]: #drawbacks + +Why should we *not* do this? + +# Alternatives +[alternatives]: #alternatives + +What other designs have been considered? What is the impact of not doing this? + +# Unresolved questions +[unresolved]: #unresolved-questions + +What parts of the design are still TBD or unknowns? + +# Future work +[future]: #future-work + +What future work, if any, would be implied or impacted by this feature +without being directly part of the work? From 5f3c97fe03edee88f79d5abe31644e8b5c241a04 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Thu, 25 May 2023 11:27:28 -0400 Subject: [PATCH 3/5] First version for opening PR Co-authored-by: Ryan Mulligan Co-authored-by: Connor Brewster Co-authored-by: Ben Radford Co-authored-by: Divam --- rfcs/0000-local-overlay-store.md | 383 +++++++++++++++++++++++++++++-- 1 file changed, 365 insertions(+), 18 deletions(-) diff --git a/rfcs/0000-local-overlay-store.md b/rfcs/0000-local-overlay-store.md index 43569cf57..e952c3c6f 100644 --- a/rfcs/0000-local-overlay-store.md +++ b/rfcs/0000-local-overlay-store.md @@ -1,5 +1,5 @@ --- -feature: (fill me in with a unique ident, my_awesome_feature) +feature: local-overlay-store start-date: (fill me in with today's date, YYYY-MM-DD) author: (name of the main author) co-authors: (find a buddy later to help out with the RFC) @@ -11,48 +11,395 @@ related-issues: (will contain links to implementation PRs) # Summary [summary]: #summary -One paragraph explanation of the feature. +Add a new `local-overlay` store implementation to Nix. +This will be a local store that is layered upon another local filesystem store (local store or daemon). +This allows locally extending a shared store that is periodically updated with additional store objects. # Motivation [motivation]: #motivation -Why are we doing this? What use cases does it support? What is the expected -outcome? +## Technical motivation + +Many organizational users of Nix have a large collection of Nix store objects they wish to share with a number of consumers, be they human users, build farm workers etc. +The existing ways of doing this are: + +- Share nothing: Copy store objects to each consumer + +- Share everything: Use single mutable NFS store, carefully synchronizing updates. + +- Overlay everything: Mount all of `/nix` (store and DB) with OverlayFS. + +Each has serious drawbacks: + +- "Share nothing" wastes tons of space as many duplicate store objects are stored separately. + +- "Share everything" incurs major overhead from synchronization, even if consumers are making store objects they don't intend any other consumer to use. + It also poses an inflexible security model where the actions of one consumer effect all of them. + +- Overlay everything cannot take advantage of new store objects added to the lower store, because its "fork" of the DB covers up the lower store's. + (Furthermore, separate files from the DB proper like an out of date SQLite Write-Ahead-Logging (WAL) file *could* leak through, causing chaos.) + +The new `local-overlay` store also uses OverlayFS, but just for the store directory. +The database is still a regular fresh empty one to start, and instead Nix explicitly knows about the lower store, so it can get any information for the DB it needs from it manually. +This avoids all 3 downsides: + +- Store objects are never duplicated by the overlay store. + OverlayFS ensures that one copy in either layer is enough, and we are careful never to wastefully include a store object in the upper layer if it is already in the lower layer. + +- No excess synchronization. + Local changes are just that: local, not shared with any other consumer. + The lower store is never written to (no modifications or even filesystem locks) so we should not encounter any slow write and sync paths in filesystem implementations like NFS. + +- No rigidity of the lower store. + Since Nix sees both the `overlay-store`'s DB and the lower store, it is possible for it to be aware of and utilize objects that were added to the lower store after the upper store was created. + +This gives us a "best of all three worlds" solution. + +## Marketing motivation + +It is quite common for organizations using Nix to first adopt it behind the scenes. +That is to say, Nix is used to prepare some artifacts which are then presented to a consumer that need not be aware they were made with Nix. +Later though, because of Nix's gaining popularity, there may be a desire to reveal its usage so consumers can use Nix themselves. +Rather than Nix being a controversial tool worth hiding, it can be a popular tool worth exposing. +Nix-unware usage can still work, but Nix-aware usage can do additional things. + +The `local-overlay` store can serve as a crucial tool to bridge these two modes of using Nix. +The lower store can be as before +--- however the artifacts were disseminated in the "hidden Nix" first phase of adoption +--- perhaps with only a small tweak to expose the DB / daemon socket if it wasn't before. +The `local-overlay` store is new, but purely local, separate for each user that wants to use Nix, and completely not impacting any user that doesn't. + +By providing the `local-overlay` store, we are essentially completing a reusable step-by-step guide for Nix users to "Nixify their workplace" in a very conscientious and non-disruptive manner. + +## Motivation in action + +See [Replit's own announcement](https://blog.replit.com/super-colliding-nix-stores) of this feature (in its current non-upstreamed) form aimed at its users. +This covers many of the same points above, but for the perspective of Replit users that would like to use Nix rather than the Nix community. # Detailed design [design]: #detailed-design -This is the core, normative part of the RFC. Explain the design in enough -detail for somebody familiar with the ecosystem to understand, and implement. -This should get into specifics and corner-cases. Yet, this section should also -be terse, avoiding redundancy even at the cost of clarity. +## Basic principle + +`local-overlay` is a store representing the extension of a lower store with a collection of additional store objects. +(We don't refer to the upper layer as an "upper store" because it is not self-contained +--- it doesn't abide by the closure property because objects in the upper layer can refer to objects that are only in the lower layer.) + +## Class hierarchy, configuration settings, and initialization + +```mermaid +flowchart TD + subgraph LocalFSStore-instance + LS + LM + end + + LS(store directory) + LM(Abstract metadata source) + + subgraph LocalOverlayStore-instance + US + UM + end + + US(store directory) + UM(SQLite DB) + + UD(Directory for additional store objects) + + LocalOverlayStore-instance -->|`lower-store` configuation option| LocalFSStore-instance + LocalOverlayStore-instance -->|`upper-layer` configuation option| UD + + US -->|OverlayFS lower layer| LS + US -->|OverlayFS upper layer| UD +``` + +`LocalOverlayStore` is a subclass of `LocalStore` implementing the `local-overlay` store. +It has additional configuration items for: + + - `lower-store`: The lower store, which must be a `LocalFSStore` + + This is specified with an escaped URL just like the `remote-store` setting of the two SSH stores types. + + - `upper-layer`: The directory used as the upper layer of the OverlayFS + + - `check-mount`: Whether to check the filesystem mount configuration + + Here is an example of how the nix.conf configuration might look: + ``` + store = local-overlay?lower-store=/mnt/lower/%3Fread-only%3Dtrue&upper-layer=/mnt/scratch/nix/upper/store + ``` + +With `check-mount` enabled, on initialization it checks that an OverlayFS mount exists matching these parameters: + + - The lower layer must be the lower store's "real store directory" + + - The upper layer must be the directory specified for this purpose + +The database for the `local-overlay` store is exactly like that for a regular local store: + + - Same schema, including foreign key constraints + + - Created empty on opening the store if it doesn't exist + + - Opened existing one otherwise + +## Data structure + +These are diagrams for the possible states of a single store object. +Except for the closure property mandating that present store objects must have all their references also be present, store objects are independent. +We can therefore to a large extent get away with considering only a single store object. + +### Graph Key + +All the graphs below follow these conventions: + +#### Nodes + +- Right angle corners: Absent store object +- Rounded corners: Present store object + +#### Edges + +- Solid edge: "adding" or "deduplicating" direction +- Dotted edge: "deleting" direction + +The graph when reduced to just one type of edge is always acyclic, and thus represents a partial order. + +### Lower Store + +While in use as a lower store by one or more overlay stores, a store most only grow "monotonically". +That is to say, the only way it is allowed to change is by extended it with additional store objects. + +```mermaid +graph TD + A -->|Add in store| B + + A[Absent Store Object] + B("Present store Object") +``` + +### Overlay Store logical view + +The overlay store by contrast allows regular arbitrary options, almost. + +```mermaid +graph TD + A -->|Add in store| B + B -.->|Delete from store| A + + A[Absent Store Object] + B("Present store Object") +``` + +The exception is objects that are part of the lower store. +They cannot be logically deleted, but are always part of the overlay store. + +### Both stores simplified view + +We can take the [Cartesian product](https://en.wikipedia.org/wiki/Cartesian_product_of_graphs) of these two graphs, +and additionally tweak it to cover the exception from above: + +```mermaid +flowchart TD + A -->|Create in lower store| B + A -->|Create in overlay store| C + + B -->|Create in overlay store| Both + C -->|Create in lower store| Both + + C -.->|Delete in overlay store| A + Both -.->|Faux-delete in overlay store| B + + A[Absent Store Object] + B("Lower Layer Present") + C("Upper Layer Present") + Both("Both layers present") +``` + +In particular, "Faux-delete" refers to the exception. +While the upper layer can "forget" about the store object, since the lower layer still contains it, the overlay store does also. +In particular, any of the 3 "present" nodes mean the store object is logically present. + +### Both Store accurate physical view + +The above graph doesn't cover duplication vs deduplication in the "both" state. +It also doesn't distinguish between stores directories and metadata. +Let's now make a more complex graph which covers both of these things: + +```mermaid +flowchart TD + A -->|Create in lower store| B + A -->|Create in overlay store| C + + B -->|Create in upper store after created in lower store| D1 + C -->|Create in lower store after created in upper store| D0 + + + D0 -->|Manual deduplication| D1 + + subgraph Both + %%"Both layers metadata" + D0 + D1 + end + + C -.->|Delete in overlay store| A + Both -.->|Faux-delete in overlay store| B + + A[Absent Store Object] + B("Lower layer filesystem data
Lower store metadata") + C("Upper layer filesystem data
Overlay DB present") + D0("Both layers filesystem data
Lower store metadata
Overlay DB Metadata") + D1("Lower layer filesystem data
Lower store metadata
Overlay DB metadata") +``` + +Whenever the filesystem part of a store object reside in a layer, the metadata must also reside in that layer. +I.e. a lower layer of store dir store object must have lower store metadata (exact mechanism is abstract and unspecified, could be SQlite DB or daemon), and an upper layer of store object must have an overlay DB entry. +However, when we just have the store object in the lower layer, we may also have metadata in the upper layer. +That means there are two cases when the metadata is in both layers: the duplicated case (both layers filesystem) and deduplicated case (just lower layer filesystem). + +## Operation + +As discussed in the motivation, store objects are never knowingly duplicated. +The `local-overlay` store while in operation ensures that store objects are stored exactly once: +either in the lower store or the upper layer directory. +No file system data should ever be duplicated by `local-overlay` itself. + +Non-filesystem data, what goes in the DB (references, signatures, etc.) is duplicated. +Any store object from the lower store that the `local-overlay` needs has that information copied into the `local-overlay` store's DB. +This includes information for the closure of any such store object, because the normal closure property enforced by the DB's foreign key constraints is upheld. + +Store objects can still end up duplicated if the lower store later gains a store object the upper store already had. +This is because when the `local-overlay` is remounted, it doesn't know how the lower store may have changed, and when the lower store is added to, any upper store directories / DBs are not in general visible either. +We can have a "fsck" operation however that manually scans for missing / duplicated objects. + +## Read-only `local` Store + +In order to facilitate using `local-overlay` where the lower store is entirely read only (read only SQLite files too, not just store directory), it is useful to also implement a new "read-only" setting on the `local` store. +The main thing this does is use SQLite's [immutable mode](https://www.sqlite.org/c3ref/open.html). + +This is a separate feature; +it is perfectly possible to implement the `local-overlay` without this or vice-versa. +But for maximum usability, we want to do both. # Examples and Interactions [examples-and-interactions]: #examples-and-interactions -This section illustrates the detailed design. This section should clarify all -confusion the reader has from the previous sections. It is especially important -to counterbalance the desired terseness of the detailed design; if you feel -your detailed design is rudely short, consider making this section longer -instead. +Because the `local-overlay` store is a completely separate store implementation, the interactions with the rest of Nix are fairly minimal and well-defined. +In particular, users of other stores and not the `local-overlay` store will not be impacted at all. # Drawbacks [drawbacks]: #drawbacks -Why should we *not* do this? +## Read-only local store is delicate + +SQLite's immutuable mode doesn't mean "we promise not to change the database". +It relies on the database not just being *logically* immutable (the meaning doesn't change), but *physically* immutable (no bytes of the on-disk files change). +This is because it is forgoing synchronization altogether and thus relying that nothing can be rearranged e.g. in the middle of a query (invalidating in-progress traversals of data structures, etc.). +This means there is no hope of, say "append only" mode where "publishers" only add new store objects to a local store, while read-only mode "subscribers" are able to deal with other store objects just fine. + +This is an inconvenience for rolling out new versions of the lower store, but not a show stopper. +One solution is "multi version concurrency control" where consumers get an immutable snapshot of the lower store for the duration of each login. +New snapshots can only be gotten when consumers log in again, and old snapshots can only be retired once every consumer viewing them logs out. + +## `local-overlay` lacks normal form for the database + +A slight drawback with the architecture is a lack of a normal form. +A store object in the lower store may or may not have a DB entry in the `overlay-local` store. +This is the two "both" nodes in the last diagram. + +This introduces some flexibility in the system: the same "logical" layered store can be represented in multiple different "physical" configurations. +This isn't a problem *per se*, but does mean there is a bit more complexity to consider during testing and system administration. + +## Deleting isn't intuitive + +For "deleting" lower store objects in the `local-overlay` store, +we don't actually remove them but just remove the upper DB entry. +This is somewhat surprising, but reflects the fact that the lower store is logically immutable (even when it isn't a `local` store opened in read-only mode). +By deleting the upper DB entry, we are not removing the object from the `local-overlay` store, but we are still resetting it to the initial state. # Alternatives [alternatives]: #alternatives -What other designs have been considered? What is the impact of not doing this? +## Stock Nix with OverlayFS ("overlay everything" from motivation) + +It is possible to use OverlayFS on the entire Nix store with stock Nix. +Just as added store objects would appear in the upper layer, so would the SQLite DB after any modification. + +There are a few problems with this: + +1. Once the lower store is "forked" in this way, there is no "merge". + The modified DB in the upper layer will completely cover up the lower store's DB. + If any new store objects are added to the lower store, the overlayFS-combined local store will never know. + +2. The database can be very large. + For example, Replit's 16 TB store has a 634 MB database. + OverlayFS doesn't know how to duplicate only part of a SQLite database, so we have to duplicate the whole thing per user/consumer. + This will waste lots of space with information the consumer may not care about. + And for a many-user product like Replit's, that will be wasting precious disk quota from the user's perspective too. + +## Use `nix-store --load-db` with the above + +As suggested [on discourse](https://discourse.nixos.org/t/super-colliding-nix-stores/28462/7), it is possible to augment the DB with additional paths after it has been created. +Indeed, this is how the ISO installer for NixOS works. +Running this periodically can allow the consumer to pick up any new paths added to the lower store since the last run. + +This does solve the first problem, but with poor performance penalty. +We don't know what paths have changed, so we have slurp up the entire DB. +With Replit's 16 TB store's DB, this took 10 minutes. + +We could out of band try to track "revisions" so just new paths are added, but then we are adding a new source of complexity vs the "statelessness" of on-demand accessing the lower store (and relying on monotonicity). + +## Bind mounts instead of overlayfs for `local-overlay` Store + +Instead of mounting the entire lower store dir underneath ours via OverlayFS, we could bind mount individual store objects as we need them. +The bind-mounting store would not longer be the union of the lower store with the additional store objects, instead the lower store acts more as a special optimized substituter. + +This gives a normal form in that objects are bind-mounted if and only if they have a DB entry in the bind-mounting store; +There is no "have store object, don't yet have DB entry" middle state to worry about. + +The downside of this is that Nix needs elevate permissions in order to create those bind mounts, and the impact of having arbitrarily many bind mounts is unknown. +Even if this design works fine once set up, the imposition of an O(n) initialization setting up each bind mount is prohibitive for many use-cases. + +## Store implementations using FUSE + +We could have a single FUSE mount that could manually implement the "bind on demand" semantics described above without cluttering the mount namespace with an entry per each shared store object. +FUSE however is quite primitive, in that every read need to be shuffled via the FUSE server. +There is nothing like a "control plane vs data plane" separation where Nix could tell the OS "this directory is that other directory", and the OS can do the rest without involving the FUSE server. +That means the performance of FUSE are potentially worse than these in-kernel mounting solutions. + +Also, this would require a good deal more code than the other solutions. +Perhaps this is a good thing; the status quo of Nix having to keep the OS and DB in sync is not as elegant as Nix squarely being the "source of truth", providing both filesystem and non-filesystem data about each store object directly. +But it represents as significantly greater departure from the status quo. # Unresolved questions [unresolved]: #unresolved-questions -What parts of the design are still TBD or unknowns? +None at this time. # Future work [future]: #future-work -What future work, if any, would be implied or impacted by this feature -without being directly part of the work? +## Alternative to read-only local store + +We are considering a hybrid between the `local` store and `file://` store. +This would *not* use NARs, but would use "NAR info" files instead of a SQLite database. +This would side-step the concurrency issues of SQLite's read-only mode, and make "append only" usage of the store not require much synchronization. +(This is because the internal structure of the filesystem, unlike the internal structure of SQLite, is not visible to clients.) + +It is true that this is much slower used directly --- that is why Nix switched to using SQLite in the first place --- but in combination with a `local-overlay` store this doesn't matter. +Since non-filesystem data is copied into the `local-overlay` store's DB, it will effectively act as a cache, speeding up future queries. +Each NAR info file only needs to be read once. + +## "Pivoting" between lower store + +Suppose a lower store wants to garbage collect some paths that overlay stores use. +As written above, this is illegal. +But if there was a "migration period" where the overlay stores could know what store objects were going to go away, they could copy them to the upper layer. + +A way to implement that by exposed both the old and new versions of the lower store. +(This is simply enough to do with file systems that expose snapshots; it is out of scope for Nix itself). +Nix would them look at both the old and new lower stores, compute the diff, and then copy things over. + +This goes well with the "fsck" operation, which also needs to check the entire upper layer for dangling references. From 04e4b56c85ba8c989a03bccbddef37e647a8626b Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 14 Jun 2023 09:57:43 -0400 Subject: [PATCH 4/5] Mention that the read-only local store is already approved on an experimental basis --- rfcs/0000-local-overlay-store.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rfcs/0000-local-overlay-store.md b/rfcs/0000-local-overlay-store.md index e952c3c6f..245e0cdf8 100644 --- a/rfcs/0000-local-overlay-store.md +++ b/rfcs/0000-local-overlay-store.md @@ -276,6 +276,9 @@ We can have a "fsck" operation however that manually scans for missing / duplica ## Read-only `local` Store +*This was [already approved](https://github.com/NixOS/nix/pull/8356#event-9483342493) by the Nix team on an experimental basis, as an experimental feature that is trivial enough to be approved with out requiring an RFC. +It is still included here just to provide context.* + In order to facilitate using `local-overlay` where the lower store is entirely read only (read only SQLite files too, not just store directory), it is useful to also implement a new "read-only" setting on the `local` store. The main thing this does is use SQLite's [immutable mode](https://www.sqlite.org/c3ref/open.html). From a0689f9a9de055dc357ad1080ac6703e259a2566 Mon Sep 17 00:00:00 2001 From: John Ericson Date: Wed, 14 Jun 2023 10:07:39 -0400 Subject: [PATCH 5/5] Fill in some RFC metadata --- rfcs/0000-local-overlay-store.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/rfcs/0000-local-overlay-store.md b/rfcs/0000-local-overlay-store.md index 245e0cdf8..de3a97853 100644 --- a/rfcs/0000-local-overlay-store.md +++ b/rfcs/0000-local-overlay-store.md @@ -1,11 +1,11 @@ --- feature: local-overlay-store -start-date: (fill me in with today's date, YYYY-MM-DD) -author: (name of the main author) -co-authors: (find a buddy later to help out with the RFC) +start-date: 2023-06-14 +author: John Ericson (@Ericson2314) +co-authors: Ben Radford (@benradf) shepherd-team: (names, to be nominated and accepted by RFC steering committee) shepherd-leader: (name to be appointed by RFC steering committee) -related-issues: (will contain links to implementation PRs) +related-issues: https://github.com/NixOS/nix/pull/8397 --- # Summary