Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of granular bcachefs formatting and multi-disk support #961

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ProjectInitiative
Copy link
Contributor

These changes provide advanced usage of bcachefs formatting options. Because bcachefs provides
a one-stop-shop for wiping, partitioning, formatting and mounting with unique features for
each step, it has been broken out into it's own type if users wish to go beyond a single
disk setup.

examples/bcachefs.nix has been updated to showcase this advanced usage and some of the
available options.

Because all three processes take place via the type object, requiring "disks" attribute set
is not longer a hard requirement if a bcachefs type is provided. A check has been added to
account for that use case.

I will note that I have not gone through the official testing process called out in the contributing guidelines quite yet. I will looking into this soon, just wanted to get this PR out there for visibility :)

I have tested provisioning a VM with a 3 disk pool with this branch using nixos-anywhere, and also running the disko tool manually on another, already provisioned VM.

@ProjectInitiative ProjectInitiative force-pushed the master branch 2 times, most recently from b2b7b6d to 9dabcb4 Compare February 8, 2025 16:14
@ProjectInitiative
Copy link
Contributor Author

I am planning on re-working this to follow the mdadm and zfs arch with mult-drive support, it is proving to be a good challenge.

@Mic92
Copy link
Member

Mic92 commented Feb 12, 2025

Looking forward to it.

@ProjectInitiative
Copy link
Contributor Author

@Lassulus You mentioned wanting to take a look at the current state. This is very rough, but this is what I currently have.

I added in some markers in the individual _create scripts (BCACHEFS POSITION and BCACHEFS_MEMBER POSITION) within bcachefs.nix and bcachefs_member.nix to track the order in which they get collated. When I load it into repl and look at the top level _create function, I see them out of order (member should come first). I am running into an order of ops issue (I think) where I am not properly tracking that disks tagged with bcachefs_member need to be processed first.

 nixosConfigurations.testmachine.config.disko.devices._create
"set -efux\n\ndisko_devices_dir=$(mktemp -d)\ntrap 'rm -rf \"$disko_devices_dir\"' EXIT\nmkdir -p \"$disko_devices_dir\"\n\n( # bcachefs pool1   /mnt/pool #\n  declare -a formatOptions=('--compression=zstd')\n  declare -a mountOptions=(verbose degraded)\n  mountpoint=/mnt/pool\n  name=pool1\n  type=bcachefs\n  echo \"Creating bcachefs device: $device\" >&2\n  \n  echo BCACHEFS POSITION\n
...
...
...

Curious your thoughts. Thanks

@ProjectInitiative
Copy link
Contributor Author

ProjectInitiative commented Feb 23, 2025

Linking the following for visibility:
koverstreet/bcachefs-tools#308
systemd/systemd#8234

@ProjectInitiative
Copy link
Contributor Author

Some additional pitfalls discovered while testing with real hardware:

  • Using nixos-anywhere fails because when installing on systems with kernels < 6.13, UUID installations fail. A "nofail" mount would need to be added to all installs which seems hacky and not suitable for people running older kernels.
  • Encryption won't work out of the box as I have not found (yet) a good way to implement the interactive passphrase required at setup even though mounting options allow for key-files and passphrase files. Not sure why initial provisioning does not support this.

Successful deployments are possible, but require additional "helpers" external to this project. I am not sure this is a feasible implementation at this time. Might be good as an experimental feature with a write up on what is needed to supplement within a user's nixos config.

This addition attempts to integrate with the current arch of supporting partitioning.

examples/bcachefs-multi-disk.nix has been added to showcase this advanced usage and some of the
available options.

Current limitations and disko changes to allow for compatibility:
* lib/types/gpt.nix was updated to generate a deterministic UUID based on the partition name if one is not provided. Throughout testing, partition labels were not creating correctly.
  * This should not have an effect on other, other than changing the default behavior to favor UUID over partition labels
* Testing mostly succeeds. Multi-disk root setups are not working due to some bcachefs and systemd issues that are currently open. These issues are linked under _config of lib/types/bcachefs.nix
  * Because of this, the fileSystems nix config is not working on reboots due to the FS UUID presenting as available even if not all disks are initialized yet. A systemd-mount alternative has been provided.
  * The final reboot test might need to be omitted.
* Bcachefs works the best with the latest (or kernel 6.13 or new) boot.kernelPackages = config.pkgs.linuxPackages_latest; should be added to the _config section for systems, but pkgs is not provided in this module.
udevadm settle
fi

${lib.optionalString (config.content != null) config.content._create}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this makes sense? I don't think we can have other filesystems on top of bcachefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ${lib.optionalString (config.content != null) config.content._create} is a leftover artifact from the example I modeled this from, but you are correct, we cannot have any other filesystems on top of bcachefs. I'll remove this if this ever becomes viable.

Comment on lines +136 to +140
if echo "$output" | grep -iq "no such device"; then
echo "Notice: bcachefs mount failed with 'No such device'. This is expected on kernels < 6.13."
echo "Current kernel version: $(uname -r)"
echo "The mount will succeed when you boot into your final system with a newer kernel."
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be better to fail here instead of silently skipping? since we will end up with a wrong mounted filesystems afterwards

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the downfall of the current implementation. I wrote a little more in the comments of this PR. It is essentially dead, because of some weird upstream bugs in bcachefs and systemd that prevent multi-disk UUIDs from mounting on boot, so does and will fail in almost every test I have completed using the disko testing framework. I worked around this by ignoring this error during provisioning and adding a custom systemd mount on my servers. This issue makes this PR DOA for the time being :(

Comment on lines +98 to +110
/**
Generate a random UUID using uuidgen

mkUUID :: AttrSet -> str
**/
mkUUID =
{ pkgs
, lib ? pkgs.lib
}:
lib.removeSuffix "\n" (builtins.readFile (pkgs.runCommand "gen-uuid" {} ''
${pkgs.util-linux}/bin/uuidgen -r > $out
''));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this code is not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not if I would be changing the default behavior to just hash the name. Good catch

description = ''
The UUID (also known as GUID) of the partition. Note that this is distinct from the UUID of the filesystem.
type = lib.types.nullOr (lib.types.strMatching "[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}");
default = let
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to change the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency reasons. Bcachefs does not support mount by label, and managing a multi-disk string such as /dev/sda1:/dev/sdX... between steps seemed unmaintainable. Hashing the old determining factor such as the partition names into a deterministic, pre-defined UUID allows bcachefs to set the UUID before formatting. Otherwise if a user left it blank, there would be no easy way to determine which devices are maintained across reboots. Unlike zpool or mdadm which create /dev/devices predictably, bcachefs does provide that functionality

# form with disko
# https://github.com/koverstreet/bcachefs-tools/issues/308
# https://github.com/systemd/systemd/issues/8234#issuecomment-1868238750
##############################################################################

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NixOS has some special work around for this, though I am not sure what exactly it is. I do have a working multi-device bcachefs rootfs server that has been running for months, though currently I am struggling to get a working bootloader on an arch install for a friend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are able to point me to that work around please post it here. I also worked around this by having an external systemd mount for my non-root pools with a no-fail attribute and retry counter. However due to this weird boot-dependency issue this PR is blocked as it can't pass the VM tests which test root and reboots.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to figure out what it is, probably something with the initramfs, though I don't know much about thoses, but it probably will help with the Arch Install. My config is here: https://github.com/Silverdev2482/Router-Server bcachefs systems just works out of the box, I also can get you any other info about that server, though I am currently on my phone.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I got it working on the arch install with systemd-boot, I still get some mounting errors on boot, but they appear to be mostly ignorable, besides the delay waiting for it to pass. I think what did it was removing fsck from some config file for the initramfs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too good with VMs, but I could learn, what VM tests are failing? maybe if I try to get the VM's to boot I could understand why it doesn't work. Though I'm not familiar with disko, just manual formatting, but I would like to help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems is found here
When disko does it's provisioning script, it tries to mount the FS to verify it works. However the kernel used in the testing VM is not the most up-to-date (This would also fail for nixos-anywhere provisioning if it doesn't have the most up-to-date kernel either). For the bcachefs to be mounted by UUID, I believe kernel 6.13 or newer is required. Otherwise it just fails with "no device found". The other test that fails is when the VM does reboot, it isn't mounted via the systemd mount due to the issues linked at the top of this PR. Systemd is not handling multi-disk FS mounting very well at the moment.

In theory, we could skip these tests, but I am opposed because it would require users to explicitly provide their own post boot mounting strategy, and that is not maintainable, reproducible or easily debuggable. Furthermore, it means a multi-disk setup can't be used as a boot FS (at least at this present time)

See the testing guide for more on how the disko test VM works. It is derived from a Nixos testing framework

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to be a bit in over my head, but I haven't given up hope, Right now I'm just trying to spin up the VM, and then I will try to set up a multi device root VM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants