cudaPackages: add docs

2023-11-07 14:35:37 +00:00 · 2023-11-07 14:35:37 +00:00 · bfaefd0873
commit bfaefd0873
parent 8e800cedaf
3 changed files with 97 additions and 9 deletions
--- a/doc/languages-frameworks/cuda.section.md
+++ b/doc/languages-frameworks/cuda.section.md
@ -68,16 +68,45 @@ All new projects should use the CUDA redistributables available in [`cudaPackage
 ### Updating CUDA redistributables {#updating-cuda-redistributables}

 1. Go to NVIDIA's index of CUDA redistributables: <https://developer.download.nvidia.com/compute/cuda/redist/>
-2. Copy the `redistrib_*.json` corresponding to the release to `pkgs/development/compilers/cudatoolkit/redist/manifests`.
-3. Generate the `redistrib_features_*.json` file by running:
+2. Make a note of the new version of CUDA available.
+3. Run

-    ```bash
-    nix run github:ConnorBaker/cuda-redist-find-features -- <path to manifest>
-    ```
+   ```bash
+   nix run github:connorbaker/cuda-redist-find-features -- \
+      download-manifests \
+      --log-level DEBUG \
+      --version <newest CUDA version> \
+      https://developer.download.nvidia.com/compute/cuda/redist \
+      ./pkgs/development/cuda-modules/cuda/manifests
+   ```

-    That command will generate the `redistrib_features_*.json` file in the same directory as the manifest.
+   This will download a copy of the manifest for the new version of CUDA.
+4. Run

-4. Include the path to the new manifest in `pkgs/development/compilers/cudatoolkit/redist/extension.nix`.
+   ```bash
+   nix run github:connorbaker/cuda-redist-find-features -- \
+      process-manifests \
+      --log-level DEBUG \
+      --version <newest CUDA version> \
+      https://developer.download.nvidia.com/compute/cuda/redist \
+      ./pkgs/development/cuda-modules/cuda/manifests
+   ```
+
+   This will generate a `redistrib_features_<newest CUDA version>.json` file in the same directory as the manifest.
+5. Update the `cudaVersionMap` attribute set in `pkgs/development/cuda-modules/cuda/extension.nix`.
+
+### Updating cuTensor {#updating-cutensor}
+
+1. Repeat the steps present in [Updating CUDA redistributables](#updating-cuda-redistributables) with the following changes:
+   - Use the index of cuTensor redistributables: <https://developer.download.nvidia.com/compute/cutensor/redist>
+   - Use the newest version of cuTensor available instead of the newest version of CUDA.
+   - Use `pkgs/development/cuda-modules/cutensor/manifests` instead of `pkgs/development/cuda-modules/cuda/manifests`.
+   - Skip the step of updating `cudaVersionMap` in `pkgs/development/cuda-modules/cuda/extension.nix`.
+
+### Updating supported compilers and GPUs {#updating-supported-compilers-and-gpus}
+
+1. Update `nvcc-compatibilities.nix` in `pkgs/development/cuda-modules/` to include the newest release of NVCC, as well as any newly supported host compilers.
+2. Update `gpus.nix` in `pkgs/development/cuda-modules/` to include any new GPUs supported by the new release of CUDA.

 ### Updating the CUDA Toolkit runfile installer {#updating-the-cuda-toolkit}

@ -99,7 +128,7 @@ All new projects should use the CUDA redistributables available in [`cudaPackage
   nix store prefetch-file --hash-type sha256 <link>
   ```

-4. Update `pkgs/development/compilers/cudatoolkit/versions.toml` to include the release.
+4. Update `pkgs/development/cuda-modules/cudatoolkit/releases.nix` to include the release.

 ### Updating the CUDA package set {#updating-the-cuda-package-set}

@ -107,7 +136,7 @@ All new projects should use the CUDA redistributables available in [`cudaPackage

   - NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.

-2. Successfully build the closure of the new package set, updating `pkgs/development/compilers/cudatoolkit/redist/overrides.nix` as needed. Below are some common failures:
+2. Successfully build the closure of the new package set, updating `pkgs/development/cuda-modules/cuda/overrides.nix` as needed. Below are some common failures:

 | Unable to ... | During ... | Reason | Solution | Note |
 | --- | --- | --- | --- | --- |
--- a/pkgs/development/cuda-modules/README.md
+++ b/pkgs/development/cuda-modules/README.md
@ -0,0 +1,32 @@
+# cuda-modules
+
+> [!NOTE]
+> This document is meant to help CUDA maintainers understand the structure of the CUDA packages in Nixpkgs. It is not meant to be a user-facing document.
+> For a user-facing document, see [the CUDA section of the manual](../../../doc/languages-frameworks/cuda.section.md).
+
+The files in this directory are added (in some way) to the `cudaPackages` package set by [cuda-packages.nix](../../top-level/cuda-packages.nix).
+
+## Top-level files
+
+Top-level nix files are included in the initial creation of the `cudaPackages` scope. These are typically required for the creation of the finalized `cudaPackages` scope:
+
+- `backend-stdenv.nix`: Standard environment for CUDA packages.
+- `flags.nix`: Flags set, or consumed by, NVCC in order to build packages.
+- `gpus.nix`: A list of supported NVIDIA GPUs.
+- `nvcc-compatibilities.nix`: NVCC releases and the version range of GCC/Clang they support.
+
+## Top-level directories
+
+- `cuda`: CUDA redistributables! Provides extension to `cudaPackages` scope.
+- `cudatoolkit`: monolothic CUDA Toolkit run-file installer. Provides extension to `cudaPackages` scope.
+- `cudnn`: NVIDIA cuDNN library.
+- `cutensor`: NVIDIA cuTENSOR library.
+- `generic-builders`:
+  - Contains a builder `manifest.nix` which operates on the `Manifest` type defined in `modules/generic/manifests`. Most packages are built using this builder.
+  - Contains a builder `multiplex.nix` which leverages the Manifest builder. In short, the Multiplex builder adds multiple versions of a single package to single instance of the CUDA Packages package set. It is used primarily for packages like `cudnn` and `cutensor`.
+- `modules`: Nixpkgs modules to check the shape and content of CUDA redistributable and feature manifests. These modules additionally use shims provided by some CUDA packages to allow them to re-use the `genericManifestBuilder`, even if they don't have manifest files of their own. `cudnn` and `tensorrt` are examples of packages which provide such shims. These modules are further described in the [Modules](./modules/README.md) documentation.
+- `nccl`: NVIDIA NCCL library.
+- `nccl-tests`: NVIDIA NCCL tests.
+- `saxpy`: Example CMake project that uses CUDA.
+- `setup-hooks`: Nixpkgs setup hooks for CUDA.
+- `tensorrt`: NVIDIA TensorRT library.
--- a/pkgs/development/cuda-modules/modules/README.md
+++ b/pkgs/development/cuda-modules/modules/README.md
@ -0,0 +1,27 @@
+# Modules
+
+Modules as they are used in `modules` exist primarily to check the shape and content of CUDA redistributable and feature manifests. They are ultimately meant to reduce the repetitive nature of repackaging CUDA redistributables.
+
+Building most redistributables follows a pattern of a manifest indicating which packages are available at a location, their versions, and their hashes. To avoid creating builders for each and every derivation, modules serve as a way for us to use a single `genericManifestBuilder` to build all redistributables.
+
+## `generic`
+
+The modules in `generic` are reusable components meant to check the shape and content of NVIDIA's CUDA redistributable manifests, our feature manifests (which are derived from NVIDIA's manifests), or hand-crafted Nix expressions describing available packages. They are used by the `genericManifestBuilder` to build CUDA redistributables.
+
+Generally, each package which relies on manifests or Nix release expressions will create an alias to the relevant generic module. For example, the [module for CUDNN](./cudnn/default.nix) aliases the generic module for release expressions, while the [module for CUDA redistributables](./cuda/default.nix) aliases the generic module for manifests.
+
+Alternatively, additional fields or values may need to be configured to account for the particulars of a package. For example, while the release expressions for [CUDNN](./cudnn/releases.nix) and [TensorRT](./tensorrt/releases.nix) are very close, they differ slightly in the fields they have. The [module for CUDNN](./modules/cudnn/default.nix) is able to use the generic module for release expressions, while the [module for TensorRT](./modules/tensorrt/default.nix) must add additional fields to the generic module.
+
+### `manifests`
+
+The modules in `generic/manifests` define the structure of NVIDIA's CUDA redistributable manifests and our feature manifests.
+
+NVIDIA's redistributable manifests are retrieved from their web server, while the feature manifests are produced by [`cuda-redist-find-features`](https://github.com/connorbaker/cuda-redist-find-features).
+
+### `releases`
+
+The modules in `generic/releases` define the structure of our hand-crafted Nix expressions containing information necessary to download and repackage CUDA redistributables. These expressions are created when NVIDIA-provided manifests are unavailable or otherwise unusable. For example, though CUDNN has manifests, a bug in NVIDIA's CI/CD causes manifests for different versions of CUDA to use the same name, which leads to the manifests overwriting each other.
+
+### `types`
+
+The modules in `generic/types` define reusable types used in both `generic/manifests` and `generic/releases`.