Currently any error encountered in n_dhcp4_c_connection_dispatch_io()
causes a dispatch failure and interrupts the library state
machine. The recvmsg() on the socket can fail for different reasons;
one of these is for example that the UDP request previously sent got a
ICMP port-unreachable response. This can be reproduced in the
following way:
ip netns add ns1
ip link add veth0 type veth peer name veth1
ip link set veth1 netns ns1
ip link set veth0 up
cat > dhcpd.conf <<EOF
server-identifier 172.25.0.1;
max-lease-time 120;
default-lease-time 120;
subnet 172.25.0.0 netmask 255.255.255.0 {
range 172.25.0.100 172.25.0.200;
}
EOF
ip -n ns1 link set veth1 up
ip -n ns1 address add dev veth1 172.25.0.1/24
ip netns exec ns1 iptables -A INPUT -p udp --dport 67 -j REJECT
ip netns exec ns1 dhcpd -4 -cf dhcpd.conf -pf /tmp/dhcp-server.pid
If a client is started on veth0, it is able to obtain a lease despite
the firewall rule blocking DHCP, because dhcpd uses a packet
socket. Then it fails during the renewal because the recvmsg() fails:
dhcp4 (veth0): send REQUEST of 172.25.0.178 to 172.25.0.1
dhcp4 (veth0): error -111 dispatching events
dhcp4 (veth0): state changed bound -> fail
The client should consider such errors non fatal and keep running.
https://bugzilla.redhat.com/show_bug.cgi?id=1829178https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/486
(cherry picked from commit c5d1d4c498)
Currently if an error is encountered during a send() of a message, the
client fails and there is no possibility of recover, since no timers
are armed after a failed event dispatch. An easy way to reproduce a
failure is to add a firewall rule like:
iptables -A OUTPUT -p udp --dport 67 -j REJECT
which makes the send() fail with EPERM during the renew. In such case,
the client should continue (failing) until it reaches the rebind phase
at T2, when it will be able to renew the lease using the packet
socket.
In general, a failure to send a packet should not cause the failure of
the client.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/419https://bugzilla.redhat.com/show_bug.cgi?id=1806516
Properly initialize 'overload' when the space in the file section
ends.
shared/n-dhcp4/src/n-dhcp4-outgoing.c: In function ‘n_dhcp4_outgoing_append’:
shared/n-dhcp4/src/n-dhcp4-outgoing.c:198:17: error: ‘overload’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
Move back to INIT state after the lease expires, as per section 4.4.5
of RFC 2131. Previously the client just moved to EXPIRED, closed the
connection and cleared the probe, leaving to the caller of the library
the choice to create a new client instance and to start from
scratch. However, it seems more useful that the client, once
initialized, always tries to get a lease even after an expiration.
If the server sends a packet with multiple instances of the same
option, they are concatenated during n_dhcp4_incoming_linearize() and
evaluated as a single option as per section 7 of RFC 3396.
However, there are broken server implementations that send
self-contained options in multiple copies. They are reassembled to
form a single instance by the nettools client, which then fails to
parse them because they have a length greater than the expected one.
This problem can be reproduced by starting a server with:
dnsmasq --bind-interfaces --interface veth1 -d
--dhcp-range=172.25.1.100,172.25.1.200,1m
--dhcp-option=54,172.25.1.1
In this way dnsmasq sends a duplicate option 54 (server-id) when the
client requests it in the 'parameter request list' option, as
dhcp=systemd and dhcp=nettools currently do.
While this is a violation of the RFC by the server, both isc-dhcp and
systemd-networkd client implementations have mechanisms to deal with
this situation. dhclient simply takes the first bytes of the
aggregated option. systemd-networkd doesn't follow RFC 3396 and
doesn't aggregate multiple options; it considers only the last
occurrence of each option.
Change the parsing code to accept options that are longer than
necessary.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/324
The latter requires __auto_type which is not available in GCC versions
older than 4.9. Fix the following compile error on RHEL 7.8:
CC src/src_libNetworkManagerBase_la-NetworkManagerUtils.lo
shared/n-dhcp4/src/n-dhcp4-c-probe.c: In function 'n_dhcp4_client_probe_transition_nak':
shared/n-dhcp4/src/n-dhcp4-c-probe.c:1008:17: error: unknown type name '__auto_type'
probe->ns_nak_restart_delay = c_clamp(probe->ns_nak_restart_delay * 2,
^
shared/n-dhcp4/src/n-dhcp4-c-probe.c:1008:17: error: unknown type name '__auto_type'
shared/n-dhcp4/src/n-dhcp4-c-probe.c:1008:17: error: unknown type name '__auto_type'
Fixes: 218782a9a3 ('n-dhcp4: restart the transaction after a NAK')
It is not enough to set the INIT state after a NAK; a timeout
(ns_deferred) must be set so that it is added to the event fd. The
client retries immediately the first time, so that in the successful
case it gets an address quickly. To avoid flooding the network in case
of servers always replying with NAKs, next attempts are done with
intervals from 2 seconds to 5 minutes using exponential backoff. See
also systemd commit [1].
[1] 1d1a3e0afbhttps://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/325
When the client enters the INIT state, it calls listen() on the
connection connection to create the packet socket. However, if the
client is coming from the REBOOTING state after a NAK, the connection
is already in the listening state; do nothing in such case.
Instead of terminating the program when the dispatch function returns
an invalid return code, log an error message and convert the error
code to a valid, generic one.
https://bugs.archlinux.org/task/64880
After t1, the client tries to renew the lease by contacting via the
udp socket the server specified in the server-id option. If this
fails, after t2 it tries to contact any server using broadcast. For
this to work, the packet socket must be used.
Currently the client always starts from the INIT state (i.e. sending a
discover message). If a requested-ip was specified by the caller, it
is added as an option in the discover.
It was reported that some DHCP servers don't respond to discover
messages with the requested-ip option set [1][2].
The RFC allows to skip the discover by entering the INIT-REBOOT state
and starting directly with a broadcast request message containing the
requested IP address. Implement that.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1781856
[2] https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/310
RHEL7 supports clock_gettime(CLOCK_BOOTIME), but it does not support
timerfd_create(CLOCK_BOOTIME). Creating a timerfd will fail with EINVAL.
Fallback to CLOCK_MONOTONIC.
Compare this to n-acd which also has compatibility code to fallback to
CLOCK_MONOTONIC. However when n-acd falls back to CLOCK_MONOTONIC, it uses
monotonic clock also for clock_gettime().
For n-dhcp4, the timestamps are also exposed in the public API
(n_dhcp4_client_lease_get_lifetime()). Hence, for timestamps n-dhcp4
still uses and requires clock_gettime(CLOCK_BOOTIME). Only the internal
timeout handling with the timerfd falls back to CLOCK_MONOTONIC.
https://github.com/nettools/n-dhcp4/pull/13https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/362
(cherry picked from commit a1771c738d)
Otherwise, other applications cannot bind to port 0.0.0.0:68 at the same time.
This is for example what dhclient wants to do. So even when running
dhclient on another, unrelated interface, it would fail to bind the UDP
socket and quit.
Note that also systemd-networkd's DHCPv4 client sets this socket option.
Presumably for the same reasons.
Signed-off-by: Thomas Haller <thaller@redhat.com>
https://github.com/nettools/n-dhcp4/pull/12
(cherry picked from commit 53b74bc614)
In some cases it is useful to have the library log what it is doing
for debugging purposes; add a simple API that allows setting a
syslog-style logging level and specifying a logging function.
https://github.com/nettools/n-dhcp4/pull/8
Currently in any of the BOUND, RENEWING and REBINDING states the probe
checks the expiration of T1, T2 and lifetime. This is not correct
because, for example, if the timer fires in the RENEWING state, the
probe must not transition to RENEWING again (i.e. check again that
now >= T1). Note that there is no guarantee that the timer triggers
exactly once for T1, T2 and lifetime expirations because the timer is
also used for the retransmission logic in NDhcp4CConnection.
Therefore, add some checks to ensure that only correct transitions are
allowed.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/merge_requests/341https://bugzilla.redhat.com/show_bug.cgi?id=1773456
The API already had n_dhcp4_client_lease_get_lifetime(), which is the CLOCK_BOOTTIME
when the lease expires (or ((uint64_t)-1)). But it might be interesting to
know the actual lease duration and when the lease was received (and the
time started to count).
Expose an API for that. With this, one can also calculate the original, exact lease
lifetime, by subtracting n_dhcp4_client_lease_get_basetime() from n_dhcp4_client_lease_get_lifetime(),
while taking care of ((uint64_t)-1).
First of all, from the naming of n_dhcp4_incoming_query_u32() it is
confusing to coerce 0xFFFFFFFF to zero. It should just return the
plain value.
Also note that n_dhcp4_incoming_query_u32() only has three callers:
n_dhcp4_incoming_query_lifetime(), n_dhcp4_incoming_query_t1() and
n_dhcp4_incoming_query_t2().
Looking further, those three functions only have one caller:
n_dhcp4_incoming_get_timeouts(). Note how the code there already tries
to handle UINT32_MAX and interprets it as infinity (UINT64_MAX).
But as it was, UINT32_MAX never actually was returned.
It seems that RFC [1] does not specially define the meanings of
0xFFFFFFFF and 0. It sounds reasonable to assume that 0 just means
0 lifetime, and 0xFFFFFFFF means infinity. On the other hand, compare
this to systemd's code [2], which coerces 0 to 1. This does not seem
right to me though. Note how systemd returns 0xFFFFFFFF as-is.
Drop the special handling of 0xFFFFFFFF from n_dhcp4_incoming_query_u32().
It now just returns the plain value and it's up to n_dhcp4_incoming_get_timeouts()
to make sense of that. This will fix behavior, so that 0xFFFFFFFF will be
reported as infinity, and not as zero.
[1] https://tools.ietf.org/html/rfc2132#section-9.2
[2] 68c2b5ddb1/src/libsystemd-network/sd-dhcp-lease.c (L553)
It is error prone when a function consumes an input only in certain
cases (and telling the caller via the return code). At least in these
cases, the message is never used afterwards, and we can always pass
it on.
When we build n-dhcp4 for NetworkManager we get a compiler warning.
This can also be reproduced by building n-dhcp4 alone:
$ CFLAGS='-Werror=declaration-after-statement' meson build && ninja -C build
...
[36/47] Compiling C object 'src/25a6634@@ndhcp4-private@sta/n-dhcp4-outgoing.c.o'.
FAILED: src/25a6634@@ndhcp4-private@sta/n-dhcp4-outgoing.c.o
ccache cc -Isrc/25a6634@@ndhcp4-private@sta -Isrc -I../src -Isubprojects/c-list/src -I../subprojects/c-list/src -Isubprojects/c-siphash/src -I../subprojects/c-siphash/src -Isubprojects/c-stdaux/src -I../subprojects/c-stdaux/src -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c11 -g -D_GNU_SOURCE -Werror=declaration-after-statement -fPIC -fvisibility=hidden -fno-common -MD -MQ 'src/25a6634@@ndhcp4-private@sta/n-dhcp4-outgoing.c.o' -MF 'src/25a6634@@ndhcp4-private@sta/n-dhcp4-outgoing.c.o.d' -o 'src/25a6634@@ndhcp4-private@sta/n-dhcp4-outgoing.c.o' -c ../src/n-dhcp4-outgoing.c
../src/n-dhcp4-outgoing.c: In function ‘n_dhcp4_outgoing_new’:
../src/n-dhcp4-outgoing.c:63:9: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
63 | static_assert(N_DHCP4_NETWORK_IP_MINIMUM_MAX_SIZE >= N_DHCP4_OUTGOING_MAX_PHDR +
| ^~~~~~~~~~~~~
(cherry picked from commit 9e7ca3e091)
In particular, avoid including linux/netdevice.h from headers. This is
not a problem on newer distros, but required for CentOS 7.6.
Signed-off-by: Tom Gundersen <teg@jklm.no>