cudaPackages: split outputs

This change which involves creating multiple outputs for CUDA redistributable packages. We use a script to find out, ahead of time, the outputs each redist package provides. From that, we are able to create multiple outputs for supported redist packages, allowing users to specify exactly which components they require. Beyond the script which finds outputs ahead of time, there is some custom code involved in making this happen. For example, the way Nixpkgs typically handles multiple outputs involves making `dev` the default output when available, and adding `out` to `dev`'s `propagatedBuildInputs`. Instead, we make each output independent of the others. If a user wants only to include the headers found in a redist package, they can do so by choosing the `dev` output. If they want to include dynamic libraries, they can do so by specifying the `lib` output, or `static` for static libraries. To avoid breakages, we continue to provide the `out` output, which becomes the union of all other outputs, effectively making the split outputs opt-in.
2023-06-29 08:24:57 +00:00 · 2023-06-29 08:24:57 +00:00 · d5e5246e76
commit d5e5246e76
parent ad1abff502
17 changed files with 14691 additions and 109 deletions
--- a/doc/languages-frameworks/cuda.section.md
+++ b/doc/languages-frameworks/cuda.section.md
@ -54,3 +54,65 @@ for your specific card(s).

 Library maintainers should consult [NVCC Docs](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/)
 and release notes for their software package.
+
+## Adding a new CUDA release {#adding-a-new-cuda-release}
+
+> **WARNING**
+>
+> This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on [Matrix](https://matrix.to/#/#cuda:nixos.org).
+
+The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
+
+All new projects should use the CUDA redistributables available in [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) in place of [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit), as they are much easier to maintain and update.
+
+### Updating CUDA redistributables {#updating-cuda-redistributables}
+
+1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
+2. Copy the `redistrib_*.json` corresponding to the release to `pkgs/development/compilers/cudatoolkit/redist/manifests`.
+3. Generate the `redistrib_features_*.json` file by running:
+
+    ```bash
+    nix run github:ConnorBaker/cuda-redist-find-features -- <path to manifest>
+    ```
+
+    That command will generate the `redistrib_features_*.json` file in the same directory as the manifest.
+
+4. Include the path to the new manifest in `pkgs/development/compilers/cudatoolkit/redist/extension.nix`.
+
+### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit}
+
+> **WARNING**
+>
+> While the CUDA Toolkit runfile installer is still available in Nixpkgs as the [`cudaPackages.cudatoolkit`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.cudatoolkit) attribute, its use is not recommended and should it be considered deprecated. Please migrate to the CUDA redistributables provided by the [`cudaPackages`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages) package set.
+>
+> To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available.
+
+1. Go to NVIDIA's CUDA Toolkit runfile installer download page: <https://developer.nvidia.com/cuda-downloads>
+2. Select the appropriate OS, architecture, distribution, and version, and installer type.
+
+   - For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
+   - NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
+
+3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:
+
+   ```bash
+   nix store prefetch-file --hash-type sha256 <link>
+   ```
+
+4. Update `pkgs/development/compilers/cudatoolkit/versions.toml` to include the release.
+
+### Updating the CUDA package set {#updating-the-cuda-package-set}
+
+1. Include a new `cudaPackages_<major>_<minor>` package set in `pkgs/top-level/all-packages.nix`.
+
+   - NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
+
+2. Successfully build the closure of the new package set, updating `pkgs/development/compilers/cudatoolkit/redist/overrides.nix` as needed. Below are some common failures:
+
+| Unable to ... | During ... | Reason | Solution | Note |
+| --- | --- | --- | --- | --- |
+| Find headers | `configurePhase` or `buildPhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain the headers |
+| Find libraries | `configurePhase` | Missing dependency on a `dev` output | Add the missing dependency | The `dev` output typically contain CMake configuration files |
+| Find libraries | `buildPhase` or `patchelf` | Missing dependency on a `lib` or `static` output | Add the missing dependency | The `lib` or `static` output typically contain the libraries |
+
+In the scenario you are unable to run the resulting binary: this is arguably the most complicated as it could be any combination of the previous reasons. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its `DT_NEEDED` section. As a first step, ensure that dependencies are patched with [`cudaPackages.autoAddOpenGLRunpath`](https://search.nixos.org/packages?channel=unstable&type=packages&query=cudaPackages.autoAddOpenGLRunpath). Failing that, try running the application with [`nixGL`](https://github.com/guibou/nixGL) or a similar wrapper tool. If that works, it likely means that the application is attempting to load a library that is not in the `RPATH` or `RUNPATH` of the binary.
--- a/pkgs/development/compilers/cudatoolkit/hooks/nvcc-setup-hook.sh
+++ b/pkgs/development/compilers/cudatoolkit/hooks/nvcc-setup-hook.sh
@ -2,4 +2,4 @@

 # CMake's enable_language(CUDA) runs a compiler test and it doesn't account for
 # CUDAToolkit_ROOT. We have to help it locate libcudart
-export NVCC_APPEND_FLAGS+=" -L@cudartRoot@/lib -I@cudartRoot@/include"
+export NVCC_APPEND_FLAGS+=" -L@cudartLib@/lib -L@cudartStatic@/lib -I@cudartInclude@/include"
--- a/pkgs/development/compilers/cudatoolkit/hooks/setup-cuda-hook.sh
+++ b/pkgs/development/compilers/cudatoolkit/hooks/setup-cuda-hook.sh
@ -56,7 +56,7 @@ setupCUDAToolkitCompilers() {
    # CMake's enable_language(CUDA) runs a compiler test and it doesn't account for
    # CUDAToolkit_ROOT. We have to help it locate libcudart
    if [[ -z "${nvccDontPrependCudartFlags-}" ]] ; then
-        export NVCC_APPEND_FLAGS+=" -L@cudartRoot@/lib -I@cudartRoot@/include"
+        export NVCC_APPEND_FLAGS+=" -L@cudartLib@/lib -L@cudartStatic@/lib -I@cudartInclude@/include"
    fi
 }

--- a/pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix
+++ b/pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix
@ -1,3 +1,10 @@
+# Type Aliases
+#
+# See ./extension.nix:
+# - ReleaseAttrs
+# - ReleaseFeaturesAttrs
+#
+# General callPackage-supplied arguments
 { lib
 , stdenv
 , backendStdenv
@ -5,23 +12,58 @@
 , autoPatchelfHook
 , autoAddOpenGLRunpathHook
 , markForCudatoolkitRootHook
+, lndir
+, symlinkJoin
+}:
+# Function arguments
+{
+  # Short package name (e.g., "cuda_cccl")
+  # pname : String
+  pname
+, # Long package name (e.g., "CXX Core Compute Libraries")
+  # description : String
+  description
+, # platforms : List System
+  platforms
+, # version : Version
+  version
+, # releaseAttrs : ReleaseAttrs
+  releaseAttrs
+, # releaseFeaturesAttrs : ReleaseFeaturesAttrs
+  releaseFeaturesAttrs
+,
 }:
-
-pname:
-attrs:
-
 let
-  arch = "linux-x86_64";
+  # Useful imports
+  inherit (lib.lists) optionals;
+  inherit (lib.meta) getExe;
+  inherit (lib.strings) optionalString;
 in
 backendStdenv.mkDerivation {
-  inherit pname;
-  inherit (attrs) version;
+  # NOTE: Even though there's no actual buildPhase going on here, the derivations of the
+  # redistributables are sensitive to the compiler flags provided to stdenv. The patchelf package
+  # is sensitive to the compiler flags provided to stdenv, and we depend on it. As such, we are
+  # also sensitive to the compiler flags provided to stdenv.
+  inherit pname version;
+  strictDeps = true;

-  src = assert (lib.hasAttr arch attrs); fetchurl {
-    url = "https://developer.download.nvidia.com/compute/cuda/redist/${attrs.${arch}.relative_path}";
-    inherit (attrs.${arch}) sha256;
+  outputs = with releaseFeaturesAttrs;
+    [ "out" ]
+    ++ optionals hasBin [ "bin" ]
+    ++ optionals hasLib [ "lib" ]
+    ++ optionals hasStatic [ "static" ]
+    ++ optionals hasDev [ "dev" ]
+    ++ optionals hasDoc [ "doc" ]
+    ++ optionals hasSample [ "sample" ];
+
+  src = fetchurl {
+    url = "https://developer.download.nvidia.com/compute/cuda/redist/${releaseAttrs.relative_path}";
+    inherit (releaseAttrs) sha256;
  };

+  # We do need some other phases, like configurePhase, so the multiple-output setup hook works.
+  dontBuild = true;
+
  nativeBuildInputs = [
    autoPatchelfHook
    # This hook will make sure libcuda can be found
@ -46,23 +88,87 @@ backendStdenv.mkDerivation {
    "$ORIGIN"
  ];

-  dontBuild = true;
+  installPhase = with releaseFeaturesAttrs;
+    # Pre-install hook
+    ''
+      runHook preInstall
+    ''
+    # doc and dev have special output handling. Other outputs need to be moved to their own
+    # output.
+    # Note that moveToOutput operates on all outputs:
+    # https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L105-L107
+    + ''
+      mkdir -p "$out"
+      rm LICENSE
+      mv * "$out"
+    ''
+    # Handle bin, which defaults to out
+    + optionalString hasBin ''
+      moveToOutput "bin" "$bin"
+    ''
+    # Handle lib, which defaults to out
+    + optionalString hasLib ''
+      moveToOutput "lib" "$lib"
+    ''
+    # Handle static libs, which isn't handled by the setup hook
+    + optionalString hasStatic ''
+      moveToOutput "**/*.a" "$static"
+    ''
+    # Handle samples, which isn't handled by the setup hook
+    + optionalString hasSample ''
+      moveToOutput "samples" "$sample"
+    ''
+    # Post-install hook
+    + ''
+      runHook postInstall
+    '';

-  # TODO: choose whether to install static/dynamic libs
-  installPhase = ''
-    runHook preInstall
-    rm LICENSE
-    mkdir -p $out
-    mv * $out
-    runHook postInstall
+  # The out output leverages the same functionality which backs the `symlinkJoin` function in
+  # Nixpkgs:
+  # https://github.com/NixOS/nixpkgs/blob/d8b2a92df48f9b08d68b0132ce7adfbdbc1fbfac/pkgs/build-support/trivial-builders/default.nix#L510
+  #
+  # That should allow us to emulate "fat" default outputs without having to actually create them.
+  #
+  # It is important that this run after the autoPatchelfHook, otherwise the symlinks in out will reference libraries in lib, creating a circular dependency.
+  postPhases = [ "postPatchelf" ];
+  # For each output, create a symlink to it in the out output.
+  # NOTE: We must recreate the out output here, because the setup hook will have deleted it
+  # if it was empty.
+  # NOTE: Do not use optionalString based on whether `outputs` contains only `out` -- phases
+  # which are empty strings are skipped/unset and result in errors of the form "command not
+  # found: <customPhaseName>".
+  postPatchelf = ''
+    mkdir -p "$out"
+    for output in $outputs; do
+      if [ "$output" = "out" ]; then
+        continue
+      fi
+      ${getExe lndir} "''${!output}" "$out"
+    done
  '';

+  # Make the CUDA-patched stdenv available
  passthru.stdenv = backendStdenv;

+  # Setting propagatedBuildInputs to false will prevent outputs known to the multiple-outputs
+  # from depending on `out` by default.
+  # https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L196
+  # Indeed, we want to do the opposite -- fat "out" outputs that contain all the other outputs.
+  propagatedBuildOutputs = false;
+
+  # By default, if the dev output exists it just uses that.
+  # However, because we disabled propagatedBuildOutputs, dev doesn't contain libraries or
+  # anything of the sort. To remedy this, we set outputSpecified to true, and use
+  # outputsToInstall, which tells Nix which outputs to use when the package name is used
+  # unqualified (that is, without an explicit output).
+  outputSpecified = true;
+
  meta = {
-    description = attrs.name;
+    inherit description platforms;
    license = lib.licenses.unfree;
    maintainers = lib.teams.cuda.members;
-    platforms = lib.optionals (lib.hasAttr arch attrs) [ "x86_64-linux" ];
+    # Force the use of the default, fat output by default (even though `dev` exists, which
+    # causes Nix to prefer that output over the others if outputSpecified isn't set).
+    outputsToInstall = [ "out" ];
  };
 }
--- a/pkgs/development/compilers/cudatoolkit/redist/extension.nix
+++ b/pkgs/development/compilers/cudatoolkit/redist/extension.nix
@ -1,33 +1,139 @@
-final: prev: let
-
+# Type Aliases
+#
+# ReleaseAttrs : {
+#   "relative_path" : String,
+#   "sha256" : String,
+#   "md5" : String,
+#   "size" : String,
+# }
+#
+# NOTE: PackageAttrs must have at least one of the arches.
+# PackageAttrs : {
+#   "name" : String,
+#   "license" : String,
+#   "version" : String,
+#   "license_path" : None | String,
+#   "linux-aarch64" : None | ReleaseAttrs,
+#   "linux-ppc64le" : None | ReleaseAttrs,
+#   "linux-sbsa" : None | ReleaseAttrs,
+#   "linux-x86_64" : None | ReleaseAttrs,
+#   "windows-x86_64" : None | ReleaseAttrs,
+# }
+#
+# ReleaseFeaturesAttrs : {
+#   "hasBin" : Boolean,
+#   "hasDev" : Boolean,
+#   "hasDoc" : Boolean,
+#   "hasLib" : Boolean,
+#   "hasOut" : Boolean,
+#   "hasSample" : Boolean,
+#   "hasStatic" : Boolean,
+#   "rootDirs" : List String,
+# }
+#
+# NOTE: PackageFeatureAttrs must have at least one of the arches.
+# PackageFeatureAttrs : {
+#   "linux-aarch64" : None | ReleaseFeaturesAttrs,
+#   "linux-ppc64le" : None | ReleaseFeaturesAttrs,
+#   "linux-sbsa" : None | ReleaseFeaturesAttrs,
+#   "linux-x86_64" : None | ReleaseFeaturesAttrs,
+#   "windows-x86_64" : None | ReleaseFeaturesAttrs,
+# }
+#
+final: prev:
+let
+  # NOTE: We use hasAttr throughout instead of the (?) operator because hasAttr does not require
+  # us to interpolate our variables into strings (like ${attrName}).
+  inherit (builtins) attrNames concatMap hasAttr listToAttrs removeAttrs;
  inherit (final) callPackage;
-  inherit (prev) cudaVersion lib;
+  inherit (prev) cudaVersion;
+  inherit (prev.lib.attrsets) nameValuePair optionalAttrs;
+  inherit (prev.lib.lists) optionals;
+  inherit (prev.lib.trivial) flip importJSON pipe;

-  ### Cuda Toolkit Redist
-
-  # Manifest files for redist cudatoolkit. These can be found at
+  # Manifest files for CUDA redistributables (aka redist). These can be found at
  # https://developer.download.nvidia.com/compute/cuda/redist/
-  cudaToolkitRedistManifests = {
-    "11.4" = ./manifests/redistrib_11.4.4.json;
-    "11.5" = ./manifests/redistrib_11.5.2.json;
-    "11.6" = ./manifests/redistrib_11.6.2.json;
-    "11.7" = ./manifests/redistrib_11.7.0.json;
-    "11.8" = ./manifests/redistrib_11.8.0.json;
-    "12.0" = ./manifests/redistrib_12.0.1.json;
-    "12.1" = ./manifests/redistrib_12.1.1.json;
-    "12.2" = ./manifests/redistrib_12.2.0.json;
+  # Maps a cuda version to the specific version of the manifest.
+  cudaVersionMap = {
+    "11.4" = "11.4.4";
+    "11.5" = "11.5.2";
+    "11.6" = "11.6.2";
+    "11.7" = "11.7.0";
+    "11.8" = "11.8.0";
+    "12.0" = "12.0.1";
+    "12.1" = "12.1.1";
+    "12.2" = "12.2.0";
  };

-  # Function to build a single cudatoolkit redist package
-  buildCudaToolkitRedistPackage = callPackage ./build-cuda-redist-package.nix { };
+  # Check if the current CUDA version is supported.
+  cudaVersionMappingExists = hasAttr cudaVersion cudaVersionMap;

-  # Function that builds all cudatoolkit redist packages given a cuda version and manifest file
-  buildCudaToolkitRedistPackages = { version, manifest }: let
-    attrs = lib.filterAttrs (key: value: key != "release_date") (lib.importJSON manifest);
-  in lib.mapAttrs buildCudaToolkitRedistPackage attrs;
+  # Maps a cuda version to its manifest files.
+  # The manifest itself is from NVIDIA, but the features manifest is generated
+  # by us ahead of time and allows us to split pacakges into multiple outputs.
+  # Package names (e.g., "cuda_cccl") are mapped to their attributes or features.
+  # Since we map each attribute to a package name, we need to make sure to get rid of meta
+  # attributes included in the manifest. Currently, these are any of the following:
+  # - release_date
+  # - release_label
+  # - release_product
+  redistManifests =
+    let
+      # Remove meta attributes from the manifest
+      # removeAttrs : AttrSet String b -> Attr String b
+      removeMetaAttrs = flip removeAttrs [ "release_date" "release_label" "release_product" ];
+      # processManifest : Path -> Attr Set (String PackageAttrs)
+      processManifest = flip pipe [ importJSON removeMetaAttrs ];
+      # fullCudaVersion : String
+      fullCudaVersion = cudaVersionMap.${cudaVersion};
+    in
+    {
+      # features : Attr Set (String PackageFeatureAttrs)
+      features = processManifest ./manifests/redistrib_features_${fullCudaVersion}.json;
+      # manifest : Attr Set (String PackageAttrs)
+      manifest = processManifest ./manifests/redistrib_${fullCudaVersion}.json;
+    };

-  # All cudatoolkit redist packages for the current cuda version
-  cudaToolkitRedistPackages = lib.optionalAttrs (lib.hasAttr cudaVersion cudaToolkitRedistManifests)
-    (buildCudaToolkitRedistPackages { version = cudaVersion; manifest = cudaToolkitRedistManifests.${cudaVersion}; });
+  # Function to build a single redist package
+  buildRedistPackage = callPackage ./build-cuda-redist-package.nix { };

-in cudaToolkitRedistPackages
+  # Function that builds all redist packages given manifests
+  buildRedistPackages = { features, manifest }:
+    let
+      wrapper = pname:
+        let
+          # Get the redist architectures the package provides distributables for
+          packageAttrs = manifest.${pname};
+
+          # Check if supported
+          # TODO(@connorbaker): Currently hardcoding x86_64-linux as the only supported platform.
+          isSupported = packageAttrs ? linux-x86_64;
+
+          # Build the derivation
+          drv = buildRedistPackage {
+            inherit pname;
+            # TODO(@connorbaker): We currently discard the license attribute.
+            inherit (manifest.${pname}) version;
+            description = manifest.${pname}.name;
+            platforms = [ "x86_64-linux" ];
+            releaseAttrs = manifest.${pname}.linux-x86_64;
+            releaseFeaturesAttrs = features.${pname}.linux-x86_64;
+          };
+
+          # Wrap in an optional so we can filter out the empty lists created by unsupported
+          # packages with concatMap.
+          wrapped = optionals isSupported [ (nameValuePair pname drv) ];
+        in
+        wrapped;
+
+      # concatMap provides us an easy way to filter out packages for unsupported platforms.
+      # We wrap the buildRedistPackage call in a list to prevent errors when the package is not
+      # supported (by returning an empty list).
+      redistPackages = listToAttrs (concatMap wrapper (attrNames manifest));
+    in
+    redistPackages;
+
+  # All redistributable packages for the current CUDA version
+  redistPackages = optionalAttrs cudaVersionMappingExists (buildRedistPackages redistManifests);
+in
+redistPackages
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.4.4.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.4.4.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.5.2.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.5.2.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.6.2.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.6.2.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.7.0.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.7.0.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.8.0.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_11.8.0.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.0.1.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.0.1.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.1.1.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.1.1.json
--- a/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.2.0.json
+++ b/pkgs/development/compilers/cudatoolkit/redist/manifests/redistrib_features_12.2.0.json
--- a/pkgs/development/compilers/cudatoolkit/redist/overrides.nix
+++ b/pkgs/development/compilers/cudatoolkit/redist/overrides.nix
@ -1,6 +1,8 @@
 final: prev:
 let
  inherit (prev) lib pkgs;
+  cudaVersionOlder = lib.versionOlder final.cudaVersion;
+  cudaVersionAtLeast = lib.versionAtLeast final.cudaVersion;
 in
 (lib.filterAttrs (attr: _: (prev ? "${attr}")) {
  ### Overrides to fix the components of cudatoolkit-redist
@ -10,51 +12,83 @@ in

  libcufile = prev.libcufile.overrideAttrs (oldAttrs: {
    buildInputs = oldAttrs.buildInputs ++ [
-      prev.libcublas
+      final.libcublas.lib
      pkgs.numactl
      pkgs.rdma-core
    ];
    # libcuda needs to be resolved during runtime
-    autoPatchelfIgnoreMissingDeps = true;
+    autoPatchelfIgnoreMissingDeps =
+      ["libcuda.so.1"]
+      # Before 12.0 libcufile depends on itself for some reason.
+      ++ lib.optionals (cudaVersionOlder "12.0") [
+        "libcufile.so.0"
+      ];
  });

-  libcusolver = final.addBuildInputs prev.libcusolver [
-    prev.libcublas
-  ];
+  libcusolver = final.addBuildInputs prev.libcusolver (
+    # Always depends on this
+    [final.libcublas.lib]
+    # Dependency from 12.0 and on
+    ++ lib.optionals (cudaVersionAtLeast "12.0") [
+      final.libnvjitlink.lib
+    ]
+    # Dependency from 12.1 and on
+    ++ lib.optionals (cudaVersionAtLeast "12.1") [
+      final.libcusparse.lib
+    ]
+  );

-  cuda_nvcc = prev.cuda_nvcc.overrideAttrs (oldAttrs:
-    let
-      inherit (prev.backendStdenv) cc;
-    in
-    {
-      # Required by cmake's enable_language(CUDA) to build a test program
-      # When implementing cross-compilation support: this is
-      # final.pkgs.targetPackages.cudaPackages.cuda_cudart
-      env.cudartRoot = "${prev.lib.getDev final.cuda_cudart}";
+  libcusparse = final.addBuildInputs prev.libcusparse (
+    lib.optionals (cudaVersionAtLeast "12.0") [
+      final.libnvjitlink.lib
+    ]
+  );

-      # Point NVCC at a compatible compiler
+  cuda_gdb = final.addBuildInputs prev.cuda_gdb (
+    # x86_64 only needs gmp from 12.0 and on
+    lib.optionals (cudaVersionAtLeast "12.0") [
+      pkgs.gmp
+    ]
+  );

-      # Desiredata: whenever a package (e.g. magma) adds cuda_nvcc to
-      # nativeBuildInputs (offsets `(-1, 0)`), magma should also source the
-      # setupCudaHook, i.e. we want it the hook to be propagated into the
-      # same nativeBuildInputs.
-      #
-      # Logically, cuda_nvcc should include the hook in depsHostHostPropagated,
-      # so that the final offsets for the propagated hook would be `(-1, 0) +
-      # (0, 0) = (-1, 0)`.
-      #
-      # In practice, TargetTarget appears to work:
-      # https://gist.github.com/fd80ff142cd25e64603618a3700e7f82
-      depsTargetTargetPropagated = [
-        final.setupCudaHook
-      ];
-    });
+  cuda_nvcc = prev.cuda_nvcc.overrideAttrs (_: {
+    # Required by cmake's enable_language(CUDA) to build a test program
+    # When implementing cross-compilation support: this is
+    # final.pkgs.targetPackages.cudaPackages.cuda_cudart
+    env = {
+      # Given the multiple-outputs each CUDA redist has, we can specify the exact components we
+      # need from the package. CMake requires:
+      # - the cuda_runtime.h header, which is in the dev output
+      # - the dynamic library, which is in the lib output
+      # - the static library, which is in the static output
+      cudartInclude = "${final.cuda_cudart.dev}";
+      cudartLib = "${final.cuda_cudart.lib}";
+      cudartStatic = "${final.cuda_cudart.static}";
+    };
+
+    # Point NVCC at a compatible compiler
+
+    # Desiredata: whenever a package (e.g. magma) adds cuda_nvcc to
+    # nativeBuildInputs (offsets `(-1, 0)`), magma should also source the
+    # setupCudaHook, i.e. we want it the hook to be propagated into the
+    # same nativeBuildInputs.
+    #
+    # Logically, cuda_nvcc should include the hook in depsHostHostPropagated,
+    # so that the final offsets for the propagated hook would be `(-1, 0) +
+    # (0, 0) = (-1, 0)`.
+    #
+    # In practice, TargetTarget appears to work:
+    # https://gist.github.com/fd80ff142cd25e64603618a3700e7f82
+    depsTargetTargetPropagated = [
+      final.setupCudaHook
+    ];
+  });

  cuda_nvprof = prev.cuda_nvprof.overrideAttrs (oldAttrs: {
    nativeBuildInputs = oldAttrs.nativeBuildInputs ++ [ pkgs.addOpenGLRunpath ];
-    buildInputs = oldAttrs.buildInputs ++ [ prev.cuda_cupti ];
+    buildInputs = oldAttrs.buildInputs ++ [ final.cuda_cupti.lib ];
    # libcuda needs to be resolved during runtime
-    autoPatchelfIgnoreMissingDeps = true;
+    autoPatchelfIgnoreMissingDeps = ["libcuda.so.1"];
  });

  cuda_demo_suite = final.addBuildInputs prev.cuda_demo_suite [
@ -62,8 +96,8 @@ in
    pkgs.libGLU
    pkgs.libglvnd
    pkgs.mesa
-    prev.libcufft
-    prev.libcurand
+    final.libcufft.lib
+    final.libcurand.lib
  ];

  nsight_compute = prev.nsight_compute.overrideAttrs (oldAttrs: {
@ -100,7 +134,7 @@ in

  nvidia_driver = prev.nvidia_driver.overrideAttrs (oldAttrs: {
    # libcuda needs to be resolved during runtime
-    autoPatchelfIgnoreMissingDeps = true;
+    autoPatchelfIgnoreMissingDeps = ["libcuda.so.1"];
    # No need to support this package as we have drivers already
    # in linuxPackages.
    meta.broken = true;
--- a/pkgs/development/libraries/science/math/cudnn/generic.nix
+++ b/pkgs/development/libraries/science/math/cudnn/generic.nix
@ -1,6 +1,7 @@
 { stdenv,
  backendStdenv,
  lib,
+  lndir,
  zlib,
  useCudatoolkitRunfile ? false,
  cudaVersion,
@ -10,14 +11,6 @@
  autoPatchelfHook,
  autoAddOpenGLRunpathHook,
  fetchurl,
-  # The distributed version of CUDNN includes both dynamically liked .so files,
-  # as well as statically linked .a files.  However, CUDNN is quite large
-  # (multiple gigabytes), so you can save some space in your nix store by
-  # removing the statically linked libraries if you are not using them.
-  #
-  # Setting this to true removes the statically linked .a files.
-  # Setting this to false keeps these statically linked .a files.
-  removeStatic ? false,
 }: {
  version,
  url,
@ -48,11 +41,16 @@ in
  backendStdenv.mkDerivation {
    pname = "cudatoolkit-${cudaMajorVersion}-cudnn";
    version = versionTriple;
+    strictDeps = true;
+    outputs = ["out" "lib" "static" "dev"];

    src = fetchurl {
      inherit url hash;
    };

+    # We do need some other phases, like configurePhase, so the multiple-output setup hook works.
+    dontBuild = true;
+
    # Check and normalize Runpath against DT_NEEDED using autoPatchelf.
    # Prepend /run/opengl-driver/lib using addOpenGLRunpath for dlopen("libcudacuda.so")
    nativeBuildInputs = [
@ -74,27 +72,49 @@ in
    #
    # Note also that version <=8.3.0 contained a subdirectory "lib64/" but in
    # version 8.3.2 it seems to have been renamed to simply "lib/".
+    #
+    # doc and dev have special output handling. Other outputs need to be moved to their own
+    # output.
+    # Note that moveToOutput operates on all outputs:
+    # https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L105-L107
    installPhase =
      ''
        runHook preInstall

-        mkdir -p $out
-        cp -a include $out/include
-        [ -d "lib/" ] && cp -a lib $out/lib
-        [ -d "lib64/" ] && cp -a lib64 $out/lib64
-      ''
-      + strings.optionalString removeStatic ''
-        rm -f $out/lib/*.a
-        rm -f $out/lib64/*.a
-      ''
-      + ''
+        mkdir -p "$out"
+        mv * "$out"
+        moveToOutput "lib64" "$lib"
+        moveToOutput "lib" "$lib"
+        moveToOutput "**/*.a" "$static"
+
        runHook postInstall
      '';

    # Without --add-needed autoPatchelf forgets $ORIGIN on cuda>=8.0.5.
    postFixup = strings.optionalString (strings.versionAtLeast versionTriple "8.0.5") ''
-      patchelf $out/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
-      patchelf $out/lib/libcudnn_ops_infer.so --add-needed libcublas.so --add-needed libcublasLt.so
+      patchelf $lib/lib/libcudnn.so --add-needed libcudnn_cnn_infer.so
+      patchelf $lib/lib/libcudnn_ops_infer.so --add-needed libcublas.so --add-needed libcublasLt.so
+    '';
+
+    # The out output leverages the same functionality which backs the `symlinkJoin` function in
+    # Nixpkgs:
+    # https://github.com/NixOS/nixpkgs/blob/d8b2a92df48f9b08d68b0132ce7adfbdbc1fbfac/pkgs/build-support/trivial-builders/default.nix#L510
+    #
+    # That should allow us to emulate "fat" default outputs without having to actually create them.
+    #
+    # It is important that this run after the autoPatchelfHook, otherwise the symlinks in out will reference libraries in lib, creating a circular dependency.
+    postPhases = ["postPatchelf"];
+    # For each output, create a symlink to it in the out output.
+    # NOTE: We must recreate the out output here, because the setup hook will have deleted it
+    # if it was empty.
+    # NOTE: Do not use optionalString based on whether `outputs` contains only `out` -- phases
+    # which are empty strings are skipped/unset and result in errors of the form "command not
+    # found: <customPhaseName>".
+    postPatchelf = ''
+      mkdir -p "$out"
+      ${lib.meta.getExe lndir} "$lib" "$out"
+      ${lib.meta.getExe lndir} "$static" "$out"
+      ${lib.meta.getExe lndir} "$dev" "$out"
    '';

    passthru = {
@ -111,6 +131,19 @@ in
      majorVersion = versions.major versionTriple;
    };

+    # Setting propagatedBuildInputs to false will prevent outputs known to the multiple-outputs
+    # from depending on `out` by default.
+    # https://github.com/NixOS/nixpkgs/blob/2920b6fc16a9ed5d51429e94238b28306ceda79e/pkgs/build-support/setup-hooks/multiple-outputs.sh#L196
+    # Indeed, we want to do the opposite -- fat "out" outputs that contain all the other outputs.
+    propagatedBuildOutputs = false;
+
+    # By default, if the dev output exists it just uses that.
+    # However, because we disabled propagatedBuildOutputs, dev doesn't contain libraries or
+    # anything of the sort. To remedy this, we set outputSpecified to true, and use
+    # outputsToInstall, which tells Nix which outputs to use when the package name is used
+    # unqualified (that is, without an explicit output).
+    outputSpecified = true;
+
    meta = with lib; {
      # Check that the cudatoolkit version satisfies our min/max constraints (both
      # inclusive). We mark the package as broken if it fails to satisfies the
@ -127,5 +160,8 @@ in
      license = licenses.unfree;
      platforms = ["x86_64-linux"];
      maintainers = with maintainers; [mdaiter samuela];
+      # Force the use of the default, fat output by default (even though `dev` exists, which
+      # causes Nix to prefer that output over the others if outputSpecified isn't set).
+      outputsToInstall = ["out"];
    };
  }
--- a/pkgs/development/libraries/science/math/magma/generic.nix
+++ b/pkgs/development/libraries/science/math/magma/generic.nix
@ -113,13 +113,17 @@ stdenv.mkDerivation {
    lapack
    blas
  ] ++ lists.optionals cudaSupport (with cudaPackages; [
-    cuda_cudart
-    libcublas # cublas_v2.h
-    libcusparse # cusparse.h
+    cuda_cudart.dev # cuda_runtime.h
+    cuda_cudart.lib # cudart
+    cuda_cudart.static # cudart_static
+    libcublas.dev # cublas_v2.h
+    libcublas.lib # cublas
+    libcusparse.dev # cusparse.h
+    libcusparse.lib # cusparse
  ] ++ lists.optionals (strings.versionOlder cudaVersion "11.8") [
-    cuda_nvprof # <cuda_profiler_api.h>
+    cuda_nvprof.dev # <cuda_profiler_api.h>
  ] ++ lists.optionals (strings.versionAtLeast cudaVersion "11.8") [
-    cuda_profiler_api # <cuda_profiler_api.h>
+    cuda_profiler_api.dev # <cuda_profiler_api.h>
  ]) ++ lists.optionals rocmSupport [
    hip
    hipblas
--- a/pkgs/development/python-modules/torch/default.nix
+++ b/pkgs/development/python-modules/torch/default.nix
@ -196,7 +196,8 @@ in buildPythonPackage rec {
    export TORCH_CUDA_ARCH_LIST="${gpuTargetString}"
    export CC=${cudatoolkit.cc}/bin/gcc CXX=${cudatoolkit.cc}/bin/g++
  '' + lib.optionalString (cudaSupport && cudnn != null) ''
-    export CUDNN_INCLUDE_DIR=${cudnn}/include
+    export CUDNN_INCLUDE_DIR=${cudnn.dev}/include
+    export CUDNN_LIB_DIR=${cudnn.lib}/lib
  '' + lib.optionalString rocmSupport ''
    export ROCM_PATH=${rocmtoolkit_joined}
    export ROCM_SOURCE_DIR=${rocmtoolkit_joined}
@ -290,7 +291,7 @@ in buildPythonPackage rec {

  buildInputs = [ blas blas.provider pybind11 ]
    ++ lib.optionals stdenv.isLinux [ linuxHeaders_5_19 ] # TMP: avoid "flexible array member" errors for now
-    ++ lib.optionals cudaSupport [ cudnn nccl ]
+    ++ lib.optionals cudaSupport [ cudnn.dev cudnn.lib nccl ]
    ++ lib.optionals rocmSupport [ openmp ]
    ++ lib.optionals (cudaSupport || rocmSupport) [ magma ]
    ++ lib.optionals stdenv.isLinux [ numactl ]