doc: improve fetchers overview, deduplicate readme content, follow doc conventions (#297654)

* doc: improve fetchers overview, deduplicate readme content

* Improve caveat explanation and some fetchurl content

* move out consumer docs on source fetching

* move note on mirror URLs to the relevant section

this may be better suited for the `fetchurl` reference, but it's probably better to
just render that information into the manual. for now, because
- contributor documentation encourages mirrors
- we can expect contributors to dig into the source
- linking source files is trivial in in-code documentation
we leave it there.

* move instructions for updating hashes to the manual

* Add more clarity on text, reorganise source hash methods

---------

Co-authored-by: Valentin Gagarin <valentin.gagarin@tweag.io>
Co-authored-by: Dominic Mills-Howell <dominic.millz27@gmail.com>
Co-authored-by: lolbinarycat <dogedoge61+github@gmail.com>
This commit is contained in:
Daniel Sidhion 2024-04-02 23:18:11 -07:00 committed by GitHub
parent 625c7d5a45
commit 0decb324b3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 152 additions and 105 deletions

View File

@ -1,66 +1,161 @@
# Fetchers {#chap-pkgs-fetchers}
Building software with Nix often requires downloading source code and other files from the internet.
To this end, Nixpkgs provides *fetchers*: functions to obtain remote sources via various protocols and services.
To this end, we use functions that we call _fetchers_, which obtain remote sources via various protocols and services.
Nix provides built-in fetchers such as [`builtins.fetchTarball`](https://nixos.org/manual/nix/stable/language/builtins.html#builtins-fetchTarball).
Nixpkgs provides its own fetchers, which work differently:
Nixpkgs fetchers differ from built-in fetchers such as [`builtins.fetchTarball`](https://nixos.org/manual/nix/stable/language/builtins.html#builtins-fetchTarball):
- A built-in fetcher will download and cache files at evaluation time and produce a [store path](https://nixos.org/manual/nix/stable/glossary#gloss-store-path).
A Nixpkgs fetcher will create a ([fixed-output](https://nixos.org/manual/nix/stable/glossary#gloss-fixed-output-derivation)) [derivation](https://nixos.org/manual/nix/stable/language/derivations), and files are downloaded at build time.
A Nixpkgs fetcher will create a ([fixed-output](https://nixos.org/manual/nix/stable/glossary#gloss-fixed-output-derivation)) [derivation](https://nixos.org/manual/nix/stable/glossary#gloss-derivation), and files are downloaded at build time.
- Built-in fetchers will invalidate their cache after [`tarball-ttl`](https://nixos.org/manual/nix/stable/command-ref/conf-file#conf-tarball-ttl) expires, and will require network activity to check if the cache entry is up to date.
Nixpkgs fetchers only re-download if the specified hash changes or the store object is not otherwise available.
Nixpkgs fetchers only re-download if the specified hash changes or the store object is not available.
- Built-in fetchers do not use [substituters](https://nixos.org/manual/nix/stable/command-ref/conf-file#conf-substituters).
Derivations produced by Nixpkgs fetchers will use any configured binary cache transparently.
This significantly reduces the time needed to evaluate the entirety of Nixpkgs, and allows [Hydra](https://nixos.org/hydra) to retain and re-distribute sources used by Nixpkgs in the [public binary cache](https://cache.nixos.org).
For these reasons, built-in fetchers are not allowed in Nixpkgs source code.
This significantly reduces the time needed to evaluate Nixpkgs, and allows [Hydra](https://nixos.org/hydra) to retain and re-distribute sources used by Nixpkgs in the [public binary cache](https://cache.nixos.org).
For these reasons, Nix's built-in fetchers are not allowed in Nixpkgs.
The following table shows an overview of the differences:
The following table summarises the differences:
| Fetchers | Download | Output | Cache | Re-download when |
|-|-|-|-|-|
| `builtins.fetch*` | evaluation time | store path | `/nix/store`, `~/.cache/nix` | `tarball-ttl` expires, cache miss in `~/.cache/nix`, output store object not in local store |
| `pkgs.fetch*` | build time | derivation | `/nix/store`, substituters | output store object not available |
:::{.tip}
`pkgs.fetchFrom*` helpers retrieve _snapshots_ of version-controlled sources, as opposed to the entire version history, which is more efficient.
`pkgs.fetchgit` by default also has the same behaviour, but can be changed through specific attributes given to it.
:::
## Caveats {#chap-pkgs-fetchers-caveats}
The fact that the hash belongs to the Nix derivation output and not the file itself can lead to confusion.
For example, consider the following fetcher:
Because Nixpkgs fetchers are fixed-output derivations, an [output hash](https://nixos.org/manual/nix/stable/language/advanced-attributes#adv-attr-outputHash) has to be specified, usually indirectly through a `hash` attribute.
This hash refers to the derivation output, which can be different from the remote source itself!
```nix
fetchurl {
url = "http://www.example.org/hello-1.0.tar.gz";
hash = "sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww=";
}
```
This has the following implications that you should be aware of:
A common mistake is to update a fetchers URL, or a version parameter, without updating the hash.
- Use Nix (or Nix-aware) tooling to produce the output hash.
```nix
fetchurl {
url = "http://www.example.org/hello-1.1.tar.gz";
hash = "sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww=";
}
```
- When changing any fetcher parameters, always update the output hash.
Use one of the methods from [](#sec-pkgs-fetchers-updating-source-hashes).
Otherwise, existing store objects that match the output hash will be re-used rather than fetching new content.
**This will reuse the old contents**.
Remember to invalidate the hash argument, in this case by setting the `hash` attribute to an empty string.
:::{.note}
A similar problem arises while testing changes to a fetcher's implementation.
If the output of the derivation already exists in the Nix store, test failures can go undetected.
The [`invalidateFetcherByDrvHash`](#tester-invalidateFetcherByDrvHash) function helps prevent reusing cached derivations.
:::
```nix
fetchurl {
url = "http://www.example.org/hello-1.1.tar.gz";
hash = "";
}
```
## Updating source hashes {#sec-pkgs-fetchers-updating-source-hashes}
Use the resulting error message to determine the correct hash.
There are several ways to obtain the hash corresponding to a remote source.
Unless you understand how the fetcher you're using calculates the hash from the downloaded contents, you should use [the fake hash method](#sec-pkgs-fetchers-updating-source-hashes-fakehash-method).
```
error: hash mismatch in fixed-output derivation '/path/to/my.drv':
specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
got: sha256-lTeyxzJNQeMdu1IVdovNMtgn77jRIhSybLdMbTkf2Ww=
```
1. []{#sec-pkgs-fetchers-updating-source-hashes-fakehash-method} The fake hash method: In your package recipe, set the hash to one of
A similar problem arises while testing changes to a fetcher's implementation. If the output of the derivation already exists in the Nix store, test failures can go undetected. The [`invalidateFetcherByDrvHash`](#tester-invalidateFetcherByDrvHash) function helps prevent reusing cached derivations.
- `""`
- `lib.fakeHash`
- `lib.fakeSha256`
- `lib.fakeSha512`
Attempt to build, extract the calculated hashes from error messages, and put them into the recipe.
:::{.warning}
You must use one of these four fake hashes and not some arbitrarily-chosen hash.
See [](#sec-pkgs-fetchers-secure-hashes) for details.
:::
:::{.example #ex-fetchers-update-fod-hash}
# Update source hash with the fake hash method
Consider the following recipe that produces a plain file:
```nix
{ fetchurl }:
fetchurl {
url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.05/.version";
hash = "sha256-ZHl1emidXVojm83LCVrwULpwIzKE/mYwfztVkvpruOM=";
}
```
A common mistake is to update a fetcher parameter, such as `url`, without updating the hash:
```nix
{ fetchurl }:
fetchurl {
url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version";
hash = "sha256-ZHl1emidXVojm83LCVrwULpwIzKE/mYwfztVkvpruOM=";
}
```
**This will produce the same output as before!**
Set the hash to an empty string:
```nix
{ fetchurl }:
fetchurl {
url = "https://raw.githubusercontent.com/NixOS/nixpkgs/23.11/.version";
hash = "";
}
```
When building the package, use the error message to determine the correct hash:
```shell
$ nix-build
(some output removed for clarity)
error: hash mismatch in fixed-output derivation '/nix/store/7yynn53jpc93l76z9zdjj4xdxgynawcw-version.drv':
specified: sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
got: sha256-BZqI7r0MNP29yGH5+yW2tjU9OOpOCEvwWKrWCv5CQ0I=
error: build of '/nix/store/bqdjcw5ij5ymfbm41dq230chk9hdhqff-version.drv' failed
```
:::
2. Prefetch the source with [`nix-prefetch-<type> <URL>`](https://search.nixos.org/packages?buckets={%22package_attr_set%22%3A[%22No%20package%20set%22]%2C%22package_license_set%22%3A[]%2C%22package_maintainers_set%22%3A[]%2C%22package_platforms%22%3A[]}&query=nix-prefetch), where `<type>` is one of
- `url`
- `git`
- `hg`
- `cvs`
- `bzr`
- `svn`
The hash is printed to stdout.
3. Prefetch by package source (with `nix-prefetch-url '<nixpkgs>' -A <package>.src`, where `<package>` is package attribute name).
The hash is printed to stdout.
This works well when you've upgraded the existing package version and want to find out new hash, but is useless if the package can't be accessed by attribute or the package has multiple sources (`.srcs`, architecture-dependent sources, etc).
4. Upstream hash: use it when upstream provides `sha256` or `sha512`.
Don't use it when upstream provides `md5`, compute `sha256` instead.
A little nuance is that `nix-prefetch-*` tools produce hashes with the `nix32` encoding (a Nix-specific base32 adaptation), but upstream usually provides hexadecimal (`base16`) encoding.
Fetchers understand both formats.
Nixpkgs does not standardise on any one format.
You can convert between hash formats with [`nix-hash`](https://nixos.org/manual/nix/stable/command-ref/nix-hash).
5. Extract the hash from a local source archive with `sha256sum`.
Use `nix-prefetch-url file:///path/to/archive` if you want the custom Nix `base32` hash.
## Obtaining hashes securely {#sec-pkgs-fetchers-secure-hashes}
It's always a good idea to avoid Man-in-the-Middle (MITM) attacks when downloading source contents.
Otherwise, you could unknowingly download malware instead of the intended source, and instead of the actual source hash, you'll end up using the hash of malware.
Here are security considerations for this scenario:
- `http://` URLs are not secure to prefetch hashes.
- Upstream hashes should be obtained via a secure protocol.
- `https://` URLs give you more protections when using `nix-prefetch-*` or for upstream hashes.
- `https://` URLs are secure when using the [fake hash method](#sec-pkgs-fetchers-updating-source-hashes-fakehash-method) *only if* you use one of the listed fake hashes.
If you use any other hash, the download will be exposed to MITM attacks even if you use HTTPS URLs.
In more concrete terms, if you use any other hash, the [`--insecure` flag](https://curl.se/docs/manpage.html#-k) will be passed to the underlying call to `curl` when downloading content.
## `fetchurl` and `fetchzip` {#fetchurl}

View File

@ -320,5 +320,7 @@
"login.defs(5)": "https://man.archlinux.org/man/login.defs.5",
"unshare(1)": "https://man.archlinux.org/man/unshare.1.en",
"nix-shell(1)": "https://nixos.org/manual/nix/stable/command-ref/nix-shell.html",
"mksquashfs(1)": "https://man.archlinux.org/man/extra/squashfs-tools/mksquashfs.1.en"
"mksquashfs(1)": "https://man.archlinux.org/man/extra/squashfs-tools/mksquashfs.1.en",
"curl(1)": "https://curl.se/docs/manpage.html",
"netrc(5)": "https://man.cx/netrc"
}

View File

@ -94,11 +94,7 @@ Now that this is out of the way. To add a package to Nixpkgs:
- All other [`meta`](https://nixos.org/manual/nixpkgs/stable/#chap-meta) attributes are optional, but its still a good idea to provide at least the `description`, `homepage` and [`license`](https://nixos.org/manual/nixpkgs/stable/#sec-meta-license).
- You can use `nix-prefetch-url url` to get the SHA-256 hash of source distributions. There are similar commands as `nix-prefetch-git` and `nix-prefetch-hg` available in `nix-prefetch-scripts` package.
- A list of schemes for `mirror://` URLs can be found in [`pkgs/build-support/fetchurl/mirrors.nix`](build-support/fetchurl/mirrors.nix).
The exact syntax and semantics of the Nix expression language, including the built-in function, are [described in the Nix manual](https://nixos.org/manual/nix/stable/language/).
- The exact syntax and semantics of the Nix expression language, including the built-in functions, are [Nix language reference](https://nixos.org/manual/nix/stable/language/).
5. To test whether the package builds, run the following command from the root of the nixpkgs source tree:
@ -397,7 +393,7 @@ All versions of a package _must_ be included in `all-packages.nix` to make sure
See the Nixpkgs manual for more details on [standard meta-attributes](https://nixos.org/nixpkgs/manual/#sec-standard-meta-attributes).
### Import From Derivation
## Import From Derivation
[Import From Derivation](https://nixos.org/manual/nix/unstable/language/import-from-derivation) (IFD) is disallowed in Nixpkgs for performance reasons:
[Hydra](https://github.com/NixOS/hydra) evaluates the entire package set, and sequential builds during evaluation would increase evaluation times to become impractical.
@ -406,13 +402,16 @@ Import From Derivation can be worked around in some cases by committing generate
## Sources
### Fetching Sources
Always fetch source files using [Nixpkgs fetchers](https://nixos.org/manual/nixpkgs/unstable/#chap-pkgs-fetchers).
Use reproducible sources with a high degree of availability.
Prefer protocols that support proxies.
There are multiple ways to fetch a package source in nixpkgs. The general guideline is that you should package reproducible sources with a high degree of availability. Right now there is only one fetcher which has mirroring support and that is `fetchurl`. Note that you should also prefer protocols which have a corresponding proxy environment variable.
A list of schemes for `mirror://` URLs can be found in [`pkgs/build-support/fetchurl/mirrors.nix`](build-support/fetchurl/mirrors.nix), and is supported by [`fetchurl`](https://nixos.org/manual/nixpkgs/unstable/#fetchurl).
Other fetchers which end up relying on `fetchurl` may also support mirroring.
You can find many source fetch helpers in `pkgs/build-support/fetch*`.
The preferred source hash type is `sha256`.
In the file `pkgs/top-level/all-packages.nix` you can find fetch helpers, these have names on the form `fetchFrom*`. The intention of these are to provide snapshot fetches but using the same api as some of the version controlled fetchers from `pkgs/build-support/`. As an example going from bad to good:
Examples going from bad to best practices:
- Bad: Uses `git://` which won't be proxied.
@ -438,7 +437,7 @@ In the file `pkgs/top-level/all-packages.nix` you can find fetch helpers, these
}
```
- Best: Fetches a snapshot archive and you get the rev you want.
- Best: Fetches a snapshot archive for the given revision.
```nix
{
@ -451,63 +450,14 @@ In the file `pkgs/top-level/all-packages.nix` you can find fetch helpers, these
}
```
When fetching from GitHub, commits must always be referenced by their full commit hash. This is because GitHub shares commit hashes among all forks and returns `404 Not Found` when a short commit hash is ambiguous. It already happens for some short, 6-character commit hashes in `nixpkgs`.
It is a practical vector for a denial-of-service attack by pushing large amounts of auto generated commits into forks and was already [demonstrated against GitHub Actions Beta](https://blog.teddykatz.com/2019/11/12/github-actions-dos.html).
> [!Note]
> When fetching from GitHub, always reference revisions by their full commit hash.
> GitHub shares commit hashes among all forks and returns `404 Not Found` when a short commit hash is ambiguous.
> It already happened in Nixpkgs for short, 6-character commit hashes.
>
> Pushing large amounts of auto generated commits into forks is a practical vector for a denial-of-service attack, and was already [demonstrated against GitHub Actions Beta](https://blog.teddykatz.com/2019/11/12/github-actions-dos.html).
Find the value to put as `hash` by running `nix-shell -p nix-prefetch-github --run "nix-prefetch-github --rev 1f795f9f44607cc5bec70d1300150bfefcef2aae NixOS nix"`.
#### Obtaining source hash
Preferred source hash type is sha256. There are several ways to get it.
1. Prefetch URL (with `nix-prefetch-XXX URL`, where `XXX` is one of `url`, `git`, `hg`, `cvs`, `bzr`, `svn`). Hash is printed to stdout.
2. Prefetch by package source (with `nix-prefetch-url '<nixpkgs>' -A PACKAGE.src`, where `PACKAGE` is package attribute name). Hash is printed to stdout.
This works well when you've upgraded existing package version and want to find out new hash, but is useless if package can't be accessed by attribute or package has multiple sources (`.srcs`, architecture-dependent sources, etc).
3. Upstream provided hash: use it when upstream provides `sha256` or `sha512` (when upstream provides `md5`, don't use it, compute `sha256` instead).
A little nuance is that `nix-prefetch-*` tools produce hash encoded with `base32`, but upstream usually provides hexadecimal (`base16`) encoding. Fetchers understand both formats. Nixpkgs does not standardize on any one format.
You can convert between formats with nix-hash, for example:
```ShellSession
$ nix-hash --type sha256 --to-base32 HASH
```
4. Extracting hash from local source tarball can be done with `sha256sum`. Use `nix-prefetch-url file:///path/to/tarball` if you want base32 hash.
5. Fake hash: set the hash to one of
- `""`
- `lib.fakeHash`
- `lib.fakeSha256`
- `lib.fakeSha512`
in the package expression, attempt build and extract correct hash from error messages.
> [!Warning]
> You must use one of these four fake hashes and not some arbitrarily-chosen hash.
> See [here][secure-hashes]
This is last resort method when reconstructing source URL is non-trivial and `nix-prefetch-url -A` isnt applicable (for example, [one of `kodi` dependencies](https://github.com/NixOS/nixpkgs/blob/d2ab091dd308b99e4912b805a5eb088dd536adb9/pkgs/applications/video/kodi/default.nix#L73)). The easiest way then would be replace hash with a fake one and rebuild. Nix build will fail and error message will contain desired hash.
#### Obtaining hashes securely
[secure-hashes]: #obtaining-hashes-securely
Let's say Man-in-the-Middle (MITM) sits close to your network. Then instead of fetching source you can fetch malware, and instead of source hash you get hash of malware. Here are security considerations for this scenario:
- `http://` URLs are not secure to prefetch hash from;
- hashes from upstream (in method 3) should be obtained via secure protocol;
- `https://` URLs are secure in methods 1, 2, 3;
- `https://` URLs are secure in method 5 *only if* you use one of the listed fake hashes. If you use any other hash, `fetchurl` will pass `--insecure` to `curl` and may then degrade to HTTP in case of TLS certificate expiration.
### Patches
## Patches
Patches available online should be retrieved using `fetchpatch`.