When deciding whether to touch a device we sometimes look at whether
the active connection is external/assumed. In many cases however,
there is no active connection around (e.g. while moving the device
from state unmanaged to disconnected before assuming).
So in most cases we instead look at the device-state-reason to decide
whether to touch the interface (see nm_device_state_reason_check()).
Often it's desirable to have no state and passing data as function
arguments. However, the state reason has to be passed along several hops
(e.g. a queued state change). Or a change to a master/slave can affect
the slave/master, where we pass on the state reason. Or an intermediate
event might invalidate a previous state reason. Passing the state
whether to touch a device or not as a state-reason is cumbersome
and limited.
Instead, the device should be aware of whats going on. Add a
sys-iface-state with:
- SYS_IFACE_STATE_EXTERNAL: meaning, NM should not touch it
- SYS_IFACE_STATE_ASSUME: meaning, NM is gracefully taking over
- SYS_IFACE_STATE_MANAGED: meaning, the device is managed by NM
- SYS_IFACE_STATE_REMOVED: the device no longer exists
This replaces most checks of nm_device_state_reason_check() and
nm_active_connection_get_activation_type() by instead looking at
the sys-iface-state of the device.
This patch probably has still issues, but the previous behavior was
not very clear either. We will need to identify those issues in future
tests and tweak the behavior. At least, now there is one flag that
describes how to behave.
Now we only search for a candiate with matching UUID. No need to
first lookup all activatable connections, just find the candidate
by UUID and see if it is activatable.
- rename find_ac_for_connection() to
active_connection_find_first_by_connection().
This function has the unexpected(?) behavior to either
search by the @connection using pointer identity, or
by looking up the UUID (depending on whether @connection
is a NMSettingsConnection).
- Split out a active_connection_find_first() part.
The name "find-first" makes it clear that we walk the list until
a matching active-connection is found. That's why I added the
max_state argument. Otherwise, it would have to be called
"find-first-non-disconnected".
- drop nm_manager_get_connection_device(). It only had two callers.
It also used the ambiguity of the @connection argument, but
only one of the two callers cared about that. Meaning,
_internal_activate_device() now explicitly does a lookup by
the NMSettingsConnection.
There is only one caller of assume_connection().
The tasks there are not clearly separate and it is clearer just to
have one large recheck_assume_connection() function which proceeds
step by step, instead of breaking it into separate parts and move
them apart in the source code. The latter implies that -- unless
we forward-declare assume_connection() -- the order of definition
with assume_connection,recheck_assume_connection is contrary the
flow of the code.
Having a separate function in this case is not a simplification
so merge it.
An EXTERNAL activation type comes with a nm-generated, volatile,
in-memory connection. When the user actively modifies that connection, we
shall mark the active connection as fully managed.
Before, we would have the concept of assumed connections, which is used
for (1) externally configured device that NetworkManager should not
touch and (2) connections that NetworkManager should gracefully take
over after a restart (seamlessly, non-destructively).
The behavior was unclear and mixed. It wasn't clear whether the device
is in no-touch mode (1) or gracefully take-over (2).
Previous commits already introduce separate activation types EXTERNAL (1)
and ASSUME (2).
Also, previously, we would for both (1) and (2) try to find a matching
connection and use it. That doesn't make sense for either.
In the external case (1), we should not pretend that an existing connection
is active. Let's always create a new in-memory connection for these
cases. Note that this means, external devices now will always generate
a connection, instead of pretending an existing one is active.
For the assume case (2), we shall not use nm_utils_match_connection() to
guess which connection might be active. It can only the one that was
active on a previous run of NetworkManager. So, use the information from
the state file and try to activate it. If that fails, it is not an
assume activation type. Note, that this means we now most of the time
don't do ASSUME anymore. Most of the time we do EXTERNAL activation
That is because the state information is only available after restart
of NetworkManager.
We need a distinction between external activations and assuming
connections. The former shall have the meaning of devices that are
*not* managed by NetworkManager, the latter are configurations that
are gracefully taken over after restart (but fully managed).
Express that in the activation-type of the active connection.
Also, no longer use the settings NM_SETTINGS_CONNECTION_FLAGS_VOLATILE
flag to determine whether an assumed connection is "external". These
concepts are entirely orthogonal (although in pratice, external
activations are in-memory and flagged as volatile, but the inverse
is not necessarily true).
Also change match_connection_filter() to consider all connections.
Later, we only call nm_utils_match_connection() for the connection
we want to assume -- which will be a regular settings connection,
not a generated one.
nm_device_uses_assumed_connection() basically called
nm_active_connection_get_assumed() on the device.
Rename those functions to be closer to the activation-type
flags.
The concepts of "assume", "external", and "assume_or_external"
will make sense with the following commits.
The concept of assumed-connection will change. Currently we mark
connections that are generated and assumed as "nm-generated-assumed".
That has several consequences, one of them being that such a settings
connection gets deleted when the device disconnects.
That is, such a settings connection lingers around as long as it's active,
but once it deactivates it gets automatically deleted. As such, it's
a more volatile concept of an in-memory connection.
The concept of such automatically cleaned up connections is useful beyond
generated-assumed. See the related bug rh#1401515.
Currently, we determine NMD_CONNECTION_PROPS_EXTERNAL based
on the settings connection. That is not optimal, because whether
a connection is assumed or externally managed, should be really a
property of the active-connection. So, in the this will change soon
and we would need yet another argument to nm_dispatcher_call().
Instead, drop the settings-connection and applied-connection
arguments and fetch them from the device as needed (but allow
to pass a specific act-request argument to explicitly state
which active connection to use).
Also, rename nm_dispatcher_call() to nm_dispatcher_call_device(),
it this is not a generic dispatcher call, but it is particularly
related to device events. Likewise, rename nm_dispatcher_call_sync()
to nm_dispatcher_call_device_sync().
Remove the redundant action argument. It is clear, that
nm_dispatcher_call_connectivity() is called with action
NM_DISPATCHER_ACTION_CONNECTIVITY_CHANGE.
On the other hand, add the async callbacks. Altough they are
not used at the moment, it seems more correct that an async
API has a callback and a call-id to cancel the invocation.
If the connection candidate contains a static route
without route-metric, but overrides the default
route-metric, matching would have failed:
ipv4.route-metric: 200
ipv4.routes: { ip = 192.168.5.0/32 }
Then, we must not compare existing routes using the device's
metric, but must use the overwrite from the connection.
When hostname changes, resolv.conf should be rewritten to update the
"search" option with the new domain parameters. If no device is
active nor going to activate, skip triggering resolv.conf update.
We use output redirection in numerous places; leaving the half-built
artifacts in place would cause the subsequent builds to succeed when it
should not.
The default route manager logs for each entry relevant information,
in a compact but cryptic way:
default-route: entry[0/dev:0x5633d5528560:enp0s25:1:+sync]: record:add 0.0.0.0/0 via 192.168.0.1 dev 2 metric 100 mss 0 rt-src user (100)
The flag whether a route is configured or not, was only expressed
via 0|1. Change that to log instead:
default-route: entry[0/dev:0x5633d5528560:enp0s25:+has:+sync]: record:add 0.0.0.0/0 via 192.168.0.1 dev 2 metric 100 mss 0 rt-src user (100)
Whenever we call update for a non-assumed, synced route, we must
force a resync with the platform. Even if according to our internal
book-keeping the route is already configured, the route may have
been removed externally. So we cannot assume that everything is
still up-to-date.
https://bugzilla.redhat.com/show_bug.cgi?id=1431268
Scenario:
Have a connection with DHCPv4 and a default-route. When externally
removing the default route (`ip route delete 0.0.0.0/0`) and issuing
`nmcli device reapply $IF`, the default route was not restored.
That was because when externally removing the default route,
we would remove the gateway from priv->con_ip4_config (see
update_ip4_config()). Later, when reapplying the connection,
the IP method doesn't actually change. So we would not restart
DHCP and thus there is no gateway around to add the default route.
The default route would only be restored after receiving a DHCP lease
in the far future.
Fix that, by always restarting the IP method.
Based on this flag, we decide that:
- if we are not gonna commit the first time and the
connection is configured with never-default, we
would not remove a default route added externally.
During a reapply, we however want to get rid of an externally
configured default route.
Commit 4a6fd0e83e ("device: honor the connection.autoconnect-retries
for 802.1X") added a reset of the autoconnect retries when the device
changes state, because the retry logic for 802.1x is implemented in
NMDeviceEthernet. For other connections, we should not reset the
retries as NMPolicy handles them.
Fixes: 4a6fd0e83e
Instead of throwing an assertion, fail DHCPv6 when a IPv6 link-local
address is not configured on the device. There are different reasons
why the assertion may fail: for example the address was removed
externally; or the device is gone (and thus the platform already
received the notification of addresses removal) but the device is still
connecting because its disposal happens in an idle callback.
None of these deserves an assertion, which should only be for
programming errors.
https://bugzilla.redhat.com/show_bug.cgi?id=1432251
The devices_inited_cb() callback is really supposed to only run
when there is nothing else left in the mainloop to dispatch.
But as we already schedule the idle action with G_PRIORITY_LOW+10
priority, it is very unlikely that there is anything else ready
to run (unless scheduled with an even lower priority, and then it
wouldn't help either because devices_inited_cb() would win again).
This doesn't really matter, because NMSetting's startup-complete never
switches back to FALSE once reached. Still, cleanup our signal handlers
when no longer needed.
And disconnect the signals before emitting "notify::startup".
Possibly pending messages from the netlink socket were not processed
since the platform instance was created earlier. As nm_manager_start()
may take a long time to run, make sure that there are no pending
messages before querying the devices.