Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up kernel cmdline and define design for any distro-provided changes from built-in defaults #6894

Open
ddstreetmicrosoft opened this issue Dec 4, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@ddstreetmicrosoft
Copy link
Contributor

ddstreetmicrosoft commented Dec 4, 2023

Describe the bug
For m2 our kernel cmdline is unnecessarily excessive, e.g.:
security=selinux selinux=1 rd.auto=1 root=PARTUUID=859e6294-b70a-450b-8e9a-61f05b11727c init=/lib/systemd/systemd ro no-vmw-sta crashkernel=256M lockdown=integrity sysctl.kernel.unprivileged_bpf_disabled=1 net.ifnames=0 plymouth.enable=0 systemd.legacy_systemd_cgroup_controller=no systemd.unified_cgroup_hierarchy=1 console=ttyS0 loglevel=7
this should be trimmed down, as most of those can be removed or adjusted in a better place.

This bug can also serve to discuss the design of how non-default system options are provided, e.g. 'hardening' options.

@ddstreetmicrosoft ddstreetmicrosoft added the bug Something isn't working label Dec 4, 2023
@ddstreetmicrosoft
Copy link
Contributor Author

@ddstreetmicrosoft
Copy link
Contributor Author

In azl3 using grub2-mkconfig (which I believe we're switching to, and removing the 'static' grub config completely), the built-in cmdline params are currently:

GRUB_CMDLINE_LINUX=" rd.auto=1 init=/lib/systemd/systemd net.ifnames=0 plymouth.enable=0 systemd.legacy_systemd_cgroup_controller=yes systemd.unified_cgroup_hierarchy=0 lockdown=integrity sysctl.kernel.unprivileged_bpf_disabled=1 loglevel=3 "

Breaking that down to each param:

rd.auto=1

this tells the dracut initrd to auto-assemble luks/raid/lvm/etc. I assume we want to keep this, but it's at least worth consideration for removal.

init=/lib/systemd/systemd

Not needed at all, the default is /sbin/init which is a symlink to /lib/systemd/systemd. This should be removed.

net.ifnames=0

Predictable interface names are a good thing; the world of network configuration management was a dark place before them and we don't want to continue living there. This absolutely should be removed.

This comes with one caveat: some applications aren't very smart and hardcode interface names, most commonly 'eth0'. This allows the applications to 'work' on any system with a single network interface. I believe this unfortunately includes k8s, e.g.:
https://github.com/kubernetes/kubernetes/blob/55f2bc10435160619d1ece8de49a1c0f8fcdf276/pkg/kubelet/stats/helper.go#L33

However, if k8s (or any other application) requires this, it should be added (i.e. predictable naming disabled) only for the specific image(s) intended for those use cases. We should not disable predictable naming by default for our base image(s).

plymouth.enable=0

plymouth isn't in our images by default; this is not needed and should be removed.

systemd.legacy_systemd_cgroup_controller=yes

we don't want to use the legacy cgroupv1 so we should remove this.

systemd.unified_cgroup_hierarchy=0

same as above, we should remove this.

lockdown=integrity

Presumably we want to keep this, although it may be useful to leave this off our 'base' images and add it into only our 'hardened' images. Also if we haven't already, it might be worth looking into compiling our kernel with the lockdown default set to integrity; that is definitely possible, but it also might prevent users from being able to disable lockdown, which we (probably?) don't want to do.

sysctl.kernel.unprivileged_bpf_disabled=1

The options for this param are:
0 (unprivileged bpf allowed, change to 1 or 2 possible)
1 (unprivileged bpf not allowed, and not possible to change this setting without reboot)
2 (unprivileged bpf not allowed, change to 0 or 1 possible)

Our kernel defaults this to 2, so unprivileged bpf does default to disabled without this parameter. The only difference this parameter makes is to prevent root from re-enabling unprivileged bpf.

I this we should remove this and leave it as the default compiled-in value (2).

loglevel=3

we absolutely should remove this. not only is loglevel 3 too low as a default, but we generally should not specify a default loglevel on the cmdline like this.

@christopherco
Copy link
Contributor

Breaking that down to each param:

rd.auto=1

this tells the dracut initrd to auto-assemble luks/raid/lvm/etc. I assume we want to keep this, but it's at least worth consideration for removal.

We should consider if we can remove it from the base configuration.

init=/lib/systemd/systemd

Not needed at all, the default is /sbin/init which is a symlink to /lib/systemd/systemd. This should be removed.

Agreed.

net.ifnames=0

Predictable interface names are a good thing; the world of network configuration management was a dark place before them and we don't want to continue living there. This absolutely should be removed.

This comes with one caveat: some applications aren't very smart and hardcode interface names, most commonly 'eth0'. This allows the applications to 'work' on any system with a single network interface. I believe this unfortunately includes k8s, e.g.: https://github.com/kubernetes/kubernetes/blob/55f2bc10435160619d1ece8de49a1c0f8fcdf276/pkg/kubelet/stats/helper.go#L33

However, if k8s (or any other application) requires this, it should be added (i.e. predictable naming disabled) only for the specific image(s) intended for those use cases. We should not disable predictable naming by default for our base image(s).

Agreed with removing net.ifnames=0. This was on our list to remove generally in azl3. Plus we have some experience with enabling predicable interface names with k8s ( @jiria )

plymouth.enable=0

plymouth isn't in our images by default; this is not needed and should be removed.

Agreed.

systemd.legacy_systemd_cgroup_controller=yes

we don't want to use the legacy cgroupv1 so we should remove this.

systemd.unified_cgroup_hierarchy=0

same as above, we should remove this.

Agreed. Azl3 will use cgroupsv2 by default.

lockdown=integrity

Presumably we want to keep this, although it may be useful to leave this off our 'base' images and add it into only our 'hardened' images. Also if we haven't already, it might be worth looking into compiling our kernel with the lockdown default set to integrity; that is definitely possible, but it also might prevent users from being able to disable lockdown, which we (probably?) don't want to do.

Alternative is to look into the kernel patch that many mainstream distros include, which tie kernel lockdown=integrity to EFI Secure Boot enabled (and lockdown=none if Secure Boot is disabled). After all, the real security value behind lockdown only matters when EFI Secure Boot is also enabled.
For example:
https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/patch-6.7-redhat.patch#_3773
https://src.fedoraproject.org/rpms/kernel/blob/rawhide/f/patch-6.7-redhat.patch#_331

sysctl.kernel.unprivileged_bpf_disabled=1

The options for this param are: 0 (unprivileged bpf allowed, change to 1 or 2 possible) 1 (unprivileged bpf not allowed, and not possible to change this setting without reboot) 2 (unprivileged bpf not allowed, change to 0 or 1 possible)

Our kernel defaults this to 2, so unprivileged bpf does default to disabled without this parameter. The only difference this parameter makes is to prevent root from re-enabling unprivileged bpf.

I this we should remove this and leave it as the default compiled-in value (2).

IMO we would actually select (=1) as a compiled-in default value but this is not possible today without adding our own kernel patch to enable this. I am pretty strongly against allowing any unprivileged bpf programs running on Azure Linux anywhere, and the built in default value of (2) still opens the door for this value to be changed at runtime.

Now, I do understand the argument that only root can set the value from (2) to (0) at runtime, but if setting the value (=1) early in boot makes things harder / causes detections because attacker needs to reboot the live node to make the change, I'd like to pursue that route (fully understanding that if someone managed to get root, that same session could load their own bpf programs).

Note: We actually did propose the equivalent of (=1) patch upstream and unfortunately the upstream maintainer rejected it on the basis that it was an "insane level of 'security' paranoia" 😅
https://lore.kernel.org/bpf/CAADnVQLqXKPpfe3M1fnrUj=Cq91KucX9R95eSF+ExavWo2Wv_Q@mail.gmail.com/
https://microsoft.visualstudio.com/OS/_workitems/edit/38553356

loglevel=3

we absolutely should remove this. not only is loglevel 3 too low as a default, but we generally should not specify a default loglevel on the cmdline like this.

Agreed and already being covered in this issue - #6870

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants