nixpkgs/nixos/modules/security/systemd-confinement.nix

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

198 lines
8.1 KiB
Nix
Raw Normal View History

{ config, pkgs, lib, utils, ... }:
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
let
toplevelConfig = config;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
inherit (lib) types;
inherit (utils.systemdUtils.lib) mkPathSafeName;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
in {
options.systemd.services = lib.mkOption {
type = types.attrsOf (types.submodule ({ name, config, ... }: {
options.confinement.enable = lib.mkOption {
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
type = types.bool;
default = false;
description = ''
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
If set, all the required runtime store paths for this service are
bind-mounted into a `tmpfs`-based
{manpage}`chroot(2)`.
'';
};
options.confinement.fullUnit = lib.mkOption {
type = types.bool;
default = false;
description = ''
Whether to include the full closure of the systemd unit file into the
chroot, instead of just the dependencies for the executables.
::: {.warning}
While it may be tempting to just enable this option to
make things work quickly, please be aware that this might add paths
to the closure of the chroot that you didn't anticipate. It's better
to use {option}`confinement.packages` to **explicitly** add additional store paths to the
chroot.
:::
'';
};
options.confinement.packages = lib.mkOption {
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
type = types.listOf (types.either types.str types.package);
default = [];
description = let
mkScOption = optName: "{option}`serviceConfig.${optName}`";
in ''
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
Additional packages or strings with context to add to the closure of
the chroot. By default, this includes all the packages from the
${lib.concatMapStringsSep ", " mkScOption [
"ExecReload" "ExecStartPost" "ExecStartPre" "ExecStop"
"ExecStopPost"
]} and ${mkScOption "ExecStart"} options. If you want to have all the
dependencies of this systemd unit, you can use
{option}`confinement.fullUnit`.
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
::: {.note}
The store paths listed in {option}`path` are
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
**not** included in the closure as
well as paths from other options except those listed
above.
:::
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
};
options.confinement.binSh = lib.mkOption {
type = types.nullOr types.path;
default = toplevelConfig.environment.binsh;
defaultText = lib.literalExpression "config.environment.binsh";
example = lib.literalExpression ''"''${pkgs.dash}/bin/dash"'';
description = ''
The program to make available as {file}`/bin/sh` inside
the chroot. If this is set to `null`, no
{file}`/bin/sh` is provided at all.
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
This is useful for some applications, which for example use the
{manpage}`system(3)` library function to execute commands.
'';
};
options.confinement.mode = lib.mkOption {
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
type = types.enum [ "full-apivfs" "chroot-only" ];
default = "full-apivfs";
description = ''
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
The value `full-apivfs` (the default) sets up
private {file}`/dev`, {file}`/proc`,
{file}`/sys` and {file}`/tmp` file systems in a separate user
name space.
If this is set to `chroot-only`, only the file
system name space is set up along with the call to
{manpage}`chroot(2)`.
::: {.note}
This doesn't cover network namespaces and is solely for
file system level isolation.
:::
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
'';
};
config = let
inherit (config.confinement) binSh fullUnit;
wantsAPIVFS = lib.mkDefault (config.confinement.mode == "full-apivfs");
in lib.mkIf config.confinement.enable {
serviceConfig = {
RootDirectory = "/var/empty";
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
TemporaryFileSystem = "/";
PrivateMounts = lib.mkDefault true;
# https://github.com/NixOS/nixpkgs/issues/14645 is a future attempt
# to change some of these to default to true.
#
# If we run in chroot-only mode, having something like PrivateDevices
# set to true by default will mount /dev within the chroot, whereas
# with "chroot-only" it's expected that there are no /dev, /proc and
# /sys file systems available.
#
# However, if this suddenly becomes true, the attack surface will
# increase, so let's explicitly set these options to true/false
# depending on the mode.
MountAPIVFS = wantsAPIVFS;
PrivateDevices = wantsAPIVFS;
PrivateTmp = wantsAPIVFS;
PrivateUsers = wantsAPIVFS;
ProtectControlGroups = wantsAPIVFS;
ProtectKernelModules = wantsAPIVFS;
ProtectKernelTunables = wantsAPIVFS;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
};
confinement.packages = let
execOpts = [
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
"ExecReload" "ExecStart" "ExecStartPost" "ExecStartPre" "ExecStop"
"ExecStopPost"
];
execPkgs = lib.concatMap (opt: let
isSet = config.serviceConfig ? ${opt};
in lib.flatten (lib.optional isSet config.serviceConfig.${opt})) execOpts;
unitAttrs = toplevelConfig.systemd.units."${name}.service";
allPkgs = lib.singleton (builtins.toJSON unitAttrs);
unitPkgs = if fullUnit then allPkgs else execPkgs;
in unitPkgs ++ lib.optional (binSh != null) binSh;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
};
}));
};
config.assertions = lib.concatLists (lib.mapAttrsToList (name: cfg: let
whatOpt = optName: "The 'serviceConfig' option '${optName}' for"
+ " service '${name}' is enabled in conjunction with"
+ " 'confinement.enable'";
in lib.optionals cfg.confinement.enable [
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
{ assertion = !cfg.serviceConfig.RootDirectoryStartOnly or false;
message = "${whatOpt "RootDirectoryStartOnly"}, but right now systemd"
+ " doesn't support restricting bind-mounts to 'ExecStart'."
+ " Please either define a separate service or find a way to run"
+ " commands other than ExecStart within the chroot.";
}
{ assertion = !cfg.serviceConfig.DynamicUser or false;
message = "${whatOpt "DynamicUser"}. Please create a dedicated user via"
+ " the 'users.users' option instead as this combination is"
+ " currently not supported.";
}
{ assertion = cfg.serviceConfig ? ProtectSystem -> cfg.serviceConfig.ProtectSystem == false;
message = "${whatOpt "ProtectSystem"}. ProtectSystem is not compatible"
+ " with service confinement as it fails to remount /usr within"
+ " our chroot. Please disable the option.";
}
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
]) config.systemd.services);
config.systemd.packages = lib.concatLists (lib.mapAttrsToList (name: cfg: let
rootPaths = let
contents = lib.concatStringsSep "\n" cfg.confinement.packages;
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
in pkgs.writeText "${mkPathSafeName name}-string-contexts.txt" contents;
chrootPaths = pkgs.runCommand "${mkPathSafeName name}-chroot-paths" {
closureInfo = pkgs.closureInfo { inherit rootPaths; };
serviceName = "${name}.service";
excludedPath = rootPaths;
} ''
nixos/systemd-confinement: Allow shipped unit file In issue #157787 @martined wrote: Trying to use confinement on packages providing their systemd units with systemd.packages, for example mpd, fails with the following error: system-units> ln: failed to create symbolic link '/nix/store/...-system-units/mpd.service': File exists This is because systemd-confinement and mpd both provide a mpd.service file through systemd.packages. (mpd got updated that way recently to use upstream's service file) To address this, we now place the unit file containing the bind-mounted paths of the Nix closure into a drop-in directory instead of using the name of a unit file directly. This does come with the implication that the options set in the drop-in directory won't apply if the main unit file is missing. In practice however this should not happen for two reasons: * The systemd-confinement module already sets additional options via systemd.services and thus we should get a main unit file * In the unlikely event that we don't get a main unit file regardless of the previous point, the unit would be a no-op even if the options of the drop-in directory would apply Another thing to consider is the order in which those options are merged, since systemd loads the files from the drop-in directory in alphabetical order. So given that we have confinement.conf and overrides.conf, the confinement options are loaded before the NixOS overrides. Since we're only setting the BindReadOnlyPaths option, the order isn't that important since all those paths are merged anyway and we still don't lose the ability to reset the option since overrides.conf comes afterwards. Fixes: https://github.com/NixOS/nixpkgs/issues/157787 Signed-off-by: aszlig <aszlig@nix.build>
2022-02-02 12:12:49 +00:00
mkdir -p "$out/lib/systemd/system/$serviceName.d"
serviceFile="$out/lib/systemd/system/$serviceName.d/confinement.conf"
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
echo '[Service]' > "$serviceFile"
# /bin/sh is special here, because the option value could contain a
# symlink and we need to properly resolve it.
${lib.optionalString (cfg.confinement.binSh != null) ''
binsh=${lib.escapeShellArg cfg.confinement.binSh}
realprog="$(readlink -e "$binsh")"
echo "BindReadOnlyPaths=$realprog:/bin/sh" >> "$serviceFile"
''}
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
while read storePath; do
if [ -L "$storePath" ]; then
# Currently, systemd can't cope with symlinks in Bind(ReadOnly)Paths,
# so let's just bind-mount the target to that location.
echo "BindReadOnlyPaths=$(readlink -e "$storePath"):$storePath"
elif [ "$storePath" != "$excludedPath" ]; then
echo "BindReadOnlyPaths=$storePath"
fi
done < "$closureInfo/store-paths" >> "$serviceFile"
'';
in lib.optional cfg.confinement.enable chrootPaths) config.systemd.services);
nixos: Add 'chroot' options to systemd.services Currently, if you want to properly chroot a systemd service, you could do it using BindReadOnlyPaths=/nix/store (which is not what I'd call "properly", because the whole store is still accessible) or use a separate derivation that gathers the runtime closure of the service you want to chroot. The former is the easier method and there is also a method directly offered by systemd, called ProtectSystem, which still leaves the whole store accessible. The latter however is a bit more involved, because you need to bind-mount each store path of the runtime closure of the service you want to chroot. This can be achieved using pkgs.closureInfo and a small derivation that packs everything into a systemd unit, which later can be added to systemd.packages. That's also what I did several times[1][2] in the past. However, this process got a bit tedious, so I decided that it would be generally useful for NixOS, so this very implementation was born. Now if you want to chroot a systemd service, all you need to do is: { systemd.services.yourservice = { description = "My Shiny Service"; wantedBy = [ "multi-user.target" ]; chroot.enable = true; serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice"; }; } If more than the dependencies for the ExecStart* and ExecStop* (which btw. also includes "script" and {pre,post}Start) need to be in the chroot, it can be specified using the chroot.packages option. By default (which uses the "full-apivfs"[3] confinement mode), a user namespace is set up as well and /proc, /sys and /dev are mounted appropriately. In addition - and by default - a /bin/sh executable is provided as well, which is useful for most programs that use the system() C library call to execute commands via shell. The shell providing /bin/sh is dash instead of the default in NixOS (which is bash), because it's way more lightweight and after all we're chrooting because we want to lower the attack surface and it should be only used for "/bin/sh -c something". Prior to submitting this here, I did a first implementation of this outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality from systemd-lib.nix, just because it's only a single line. However, I decided to just re-use the one from systemd here and subsequently made it available when importing systemd-lib.nix, so that the systemd-chroot implementation also benefits from fixes to that functionality (which is now a proper function). Unfortunately, we do have a few limitations as well. The first being that DynamicUser doesn't work in conjunction with tmpfs, because it already sets up a tmpfs in a different path and simply ignores the one we define. We could probably solve this by detecting it and try to bind-mount our paths to that different path whenever DynamicUser is enabled. The second limitation/issue is that RootDirectoryStartOnly doesn't work right now, because it only affects the RootDirectory option and not the individual bind mounts or our tmpfs. It would be helpful if systemd would have a way to disable specific bind mounts as well or at least have some way to ignore failures for the bind mounts/tmpfs setup. Another quirk we do have right now is that systemd tries to create a /usr directory within the chroot, which subsequently fails. Fortunately, this is just an ugly error and not a hard failure. [1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62 [2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124 [3]: The reason this is called "full-apivfs" instead of just "full" is to make room for a *real* "full" confinement mode, which is more restrictive even. [4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix Signed-off-by: aszlig <aszlig@nix.build>
2019-03-10 11:21:55 +00:00
}