-
-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addition of granular bcachefs formatting and multi-disk support #961
base: master
Are you sure you want to change the base?
Conversation
b2b7b6d
to
9dabcb4
Compare
I am planning on re-working this to follow the mdadm and zfs arch with mult-drive support, it is proving to be a good challenge. |
Looking forward to it. |
e2a5d83
to
32ac0ae
Compare
@Lassulus You mentioned wanting to take a look at the current state. This is very rough, but this is what I currently have. I added in some markers in the individual nixosConfigurations.testmachine.config.disko.devices._create
"set -efux\n\ndisko_devices_dir=$(mktemp -d)\ntrap 'rm -rf \"$disko_devices_dir\"' EXIT\nmkdir -p \"$disko_devices_dir\"\n\n( # bcachefs pool1 /mnt/pool #\n declare -a formatOptions=('--compression=zstd')\n declare -a mountOptions=(verbose degraded)\n mountpoint=/mnt/pool\n name=pool1\n type=bcachefs\n echo \"Creating bcachefs device: $device\" >&2\n \n echo BCACHEFS POSITION\n
...
...
... Curious your thoughts. Thanks |
26c7841
to
4d78db4
Compare
Linking the following for visibility: |
Some additional pitfalls discovered while testing with real hardware:
Successful deployments are possible, but require additional "helpers" external to this project. I am not sure this is a feasible implementation at this time. Might be good as an experimental feature with a write up on what is needed to supplement within a user's nixos config. |
4d78db4
to
5a9144a
Compare
5a9144a
to
1a351bf
Compare
This addition attempts to integrate with the current arch of supporting partitioning. examples/bcachefs-multi-disk.nix has been added to showcase this advanced usage and some of the available options. Current limitations and disko changes to allow for compatibility: * lib/types/gpt.nix was updated to generate a deterministic UUID based on the partition name if one is not provided. Throughout testing, partition labels were not creating correctly. * This should not have an effect on other, other than changing the default behavior to favor UUID over partition labels * Testing mostly succeeds. Multi-disk root setups are not working due to some bcachefs and systemd issues that are currently open. These issues are linked under _config of lib/types/bcachefs.nix * Because of this, the fileSystems nix config is not working on reboots due to the FS UUID presenting as available even if not all disks are initialized yet. A systemd-mount alternative has been provided. * The final reboot test might need to be omitted. * Bcachefs works the best with the latest (or kernel 6.13 or new) boot.kernelPackages = config.pkgs.linuxPackages_latest; should be added to the _config section for systems, but pkgs is not provided in this module.
1a351bf
to
418dbf9
Compare
udevadm settle | ||
fi | ||
|
||
${lib.optionalString (config.content != null) config.content._create} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this makes sense? I don't think we can have other filesystems on top of bcachefs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ${lib.optionalString (config.content != null) config.content._create}
is a leftover artifact from the example I modeled this from, but you are correct, we cannot have any other filesystems on top of bcachefs. I'll remove this if this ever becomes viable.
if echo "$output" | grep -iq "no such device"; then | ||
echo "Notice: bcachefs mount failed with 'No such device'. This is expected on kernels < 6.13." | ||
echo "Current kernel version: $(uname -r)" | ||
echo "The mount will succeed when you boot into your final system with a newer kernel." | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be better to fail here instead of silently skipping? since we will end up with a wrong mounted filesystems afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the downfall of the current implementation. I wrote a little more in the comments of this PR. It is essentially dead, because of some weird upstream bugs in bcachefs and systemd that prevent multi-disk UUIDs from mounting on boot, so does and will fail in almost every test I have completed using the disko testing framework. I worked around this by ignoring this error during provisioning and adding a custom systemd mount on my servers. This issue makes this PR DOA for the time being :(
/** | ||
Generate a random UUID using uuidgen | ||
|
||
mkUUID :: AttrSet -> str | ||
**/ | ||
mkUUID = | ||
{ pkgs | ||
, lib ? pkgs.lib | ||
}: | ||
lib.removeSuffix "\n" (builtins.readFile (pkgs.runCommand "gen-uuid" {} '' | ||
${pkgs.util-linux}/bin/uuidgen -r > $out | ||
'')); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this code is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess not if I would be changing the default behavior to just hash the name. Good catch
description = '' | ||
The UUID (also known as GUID) of the partition. Note that this is distinct from the UUID of the filesystem. | ||
type = lib.types.nullOr (lib.types.strMatching "[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}"); | ||
default = let |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to change the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency reasons. Bcachefs does not support mount by label, and managing a multi-disk string such as /dev/sda1:/dev/sdX...
between steps seemed unmaintainable. Hashing the old determining factor such as the partition names into a deterministic, pre-defined UUID allows bcachefs to set the UUID before formatting. Otherwise if a user left it blank, there would be no easy way to determine which devices are maintained across reboots. Unlike zpool or mdadm which create /dev/devices
predictably, bcachefs does provide that functionality
# form with disko | ||
# https://github.com/koverstreet/bcachefs-tools/issues/308 | ||
# https://github.com/systemd/systemd/issues/8234#issuecomment-1868238750 | ||
############################################################################## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NixOS has some special work around for this, though I am not sure what exactly it is. I do have a working multi-device bcachefs rootfs server that has been running for months, though currently I am struggling to get a working bootloader on an arch install for a friend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are able to point me to that work around please post it here. I also worked around this by having an external systemd mount for my non-root pools with a no-fail attribute and retry counter. However due to this weird boot-dependency issue this PR is blocked as it can't pass the VM tests which test root and reboots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to figure out what it is, probably something with the initramfs, though I don't know much about thoses, but it probably will help with the Arch Install. My config is here: https://github.com/Silverdev2482/Router-Server bcachefs systems just works out of the box, I also can get you any other info about that server, though I am currently on my phone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I got it working on the arch install with systemd-boot, I still get some mounting errors on boot, but they appear to be mostly ignorable, besides the delay waiting for it to pass. I think what did it was removing fsck from some config file for the initramfs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too good with VMs, but I could learn, what VM tests are failing? maybe if I try to get the VM's to boot I could understand why it doesn't work. Though I'm not familiar with disko, just manual formatting, but I would like to help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the problems is found here
When disko does it's provisioning script, it tries to mount the FS to verify it works. However the kernel used in the testing VM is not the most up-to-date (This would also fail for nixos-anywhere provisioning if it doesn't have the most up-to-date kernel either). For the bcachefs to be mounted by UUID, I believe kernel 6.13 or newer is required. Otherwise it just fails with "no device found". The other test that fails is when the VM does reboot, it isn't mounted via the systemd mount due to the issues linked at the top of this PR. Systemd is not handling multi-disk FS mounting very well at the moment.
In theory, we could skip these tests, but I am opposed because it would require users to explicitly provide their own post boot mounting strategy, and that is not maintainable, reproducible or easily debuggable. Furthermore, it means a multi-disk setup can't be used as a boot FS (at least at this present time)
See the testing guide for more on how the disko test VM works. It is derived from a Nixos testing framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to be a bit in over my head, but I haven't given up hope, Right now I'm just trying to spin up the VM, and then I will try to set up a multi device root VM.
I will note that I have not gone through the official testing process called out in the contributing guidelines quite yet. I will looking into this soon, just wanted to get this PR out there for visibility :)
I have tested provisioning a VM with a 3 disk pool with this branch using nixos-anywhere, and also running the disko tool manually on another, already provisioned VM.