Since commit ed640f857a ("manager: ignore unmanaged devices when
looking for parent by UUID"), unmanaged devices are ignored when
looking for potential parent connection matches. Therefore, a software
device can fail autoactivation because the parent is not managed yet
and NM never tries to reactivate it. Ensure that we retry other
devices when a parent device becomes managed.
Fixes: ed640f857ahttps://bugzilla.redhat.com/show_bug.cgi?id=1553595
These fields have the same purpose for IPv4 and IPv6. Also, they have an alias
with name _x, that can be indexed by an IS_IPv4 1/0 value.
Rename the fields so that the distinguisher 4/6/x is at the end. The point
is to make the name more similar.
Functions like these are conceptually very similar. Commonly,
what we want to do for one address family we also want to do
for the other.
Merge the two functions. This moves the similar parts closer
to each other and stand beside it. This is only the first part
of the merge, which is pretty trivial without larger changes
(to keep the diff simple). More next.
Now that there is no difference between initial capturing of
the configuration, and a later update_ip_config() call during
a signal from platform, we only need to make sure that the
IP config instances are initialized at least once.
In case we are called multiple times, there is nothing to do.
This was called by via
...
- manager:recheck_assume_connection()
- manager:get_existing_connection()
- nm_device_capture_initial_config()
- update_ext_ip_config(initial=TRUE)
and would parse resolv.conf, and try to fill the device IP config
with nameservers and dns-options.
But why? It would only have effect if NM was started with
nm_dns_manager_get_resolv_conf_explicit(), but is that really sensible?
And it would only take effect on devices that have a default route.
And for what is this information even used?
Let's not do it this way. If we need this information for assuming or
external sys-iface mode, then it should be explicitly loaded at the
appropriate moment.
For now, drop it and see what breaks. Then we can fix it properly. If
it even matters.
Drop capture_lease_config(). It was added by commit
0321073b3c.
Note that it was only called by
...
- manager:recheck_assume_connection()
- manager:get_existing_connection()
- nm_device_capture_initial_config()
- update_ext_ip_config(addr_family=AF_INET, initial=TRUE)
- capture_lease_config()
It had only effect when the device had IPv4 permanent addresses.
But then, capture_lease_config() would go on and iterate over
all connections (sorted by last-connect timestamp). It would
consider connection candidates that are compatible with the device,
and try to read the lease information from disk
It's really unclear what this means. For assuming (graceful take over),
do we need the lease information in the device? I don't think so,
because we will match an existing connection. The lease information
shall be read while activating (if necessary).
For external connections, we don't even have a matching connection
and we always generate a new one. It doesn't seem right to consider
leases from disk, for a different connection.
Just drop this. It's really ugly. If this causes an issue, it must be
fixed differently. We want to behave determinstically and well defined.
I don't even comprehend all the implications of what this had.
Also note that update_ext_ip_config() no longer clears
priv->dev_ip4_config.
A failure to configure an address family does not mean that the connection
is going to fail. It depends, for example on ipvx.may-fail.
Always export the NMIPxConfig instance in that case.
linklocal6_complete() had only one caller. The caller would check
whether the conditions for linklocal6_complete() are satisfied, and
then call it. Note that linklocal6_complete() would again assert
that these conditions hold. Don't do this. Just move the check
inside linklocal6_complete(), and rename to linklocal6_check_complete().
Also, linklocal6_complete() was called by update_ip_config(),
which was called by nm_device_capture_initial_config() and
queued_ip6_config_change().
It doesn't make sense to call linklocal6_complete() during
nm_device_capture_initial_config(). Move the call to
queued_ip6_config_change().
Likewise, in ndisc_ra_timeout() we also want to include tentative
addresses. Looking into priv->ip6_config to determine whether
an other IP configuration is active, is anyway odd, and likely
a bug.
Instead have one particular nm_ip6_config_get_address_first_nontentative() function,
make it more extendable. Now, we pass a match-type argument, which can control which
element to search.
This patch has no change in behavior, but it already makes clear, that
nm_ip6_config_get_address_first_nontentative() was buggy, because it would
also return addresses that failed DAD.
Don't do "if (var == FALSE)" for boolean variables.
Also, make booleans in NMDevicePrivate structure bitfields
and reorder the fields beside other bitfields. This allows
a tighter packing of the structure.
addrconf6_start_with_link_ready() cannot fail. Hence, don' return a boolean
success value.
linklocal6_start() can only either POSTPONE or succeed right away. Don't return
a NMActStageReturn value, TRUE/FALSE is enough.
This simplifies the callers that don't have to check for values that never
come.
connections_cached_list stays only valid until we remove/add connections
to NMSettings. Using the list without cloning requires to be aware of that.
When clearing the list, invalidate all pointers, in the hope that a following
use-after-free will blow up with an assertion.
We only do this in elevated assertion mode. It's not to prevent any bugs,
it's to better notice it.
NMSettings exposes a cached list of all connection. We don't need
to clone it. Note that this is not save against concurrent modification,
meaning, add/remove of connections in NMSettings will invalidate the
list.
However, it wasn't save against that previously either, because
altough we cloned the container (GSList), we didn't take an additional
reference to the elements.
This is purely a performance optimization, we don't need to clone the
list. Also, since the original list is of type "NMConnection *const*",
use that type insistently, instead of dependent API requiring GSList.
IMO, GSList is anyway not a very nice API for many use cases because
it requires an additional slice allocation for each element. It's
slower, and often less convenient to use.
We create the all_connections list in impl_manager_add_and_activate_connection().
With this list we either call nm_utils_complete_generic() directly,
or pass it to nm_device_complete_connection(). The latter also ends
up only calling nm_utils_complete_generic() with this argument.
nm_utils_complete_generic() doesn't care about the order in which
the connections are returned. Which also becomes apparent, because
we first sorted the list, and then reverted the order with
g_slist_prepend().
Do not sort the list, hence we also don't need to clone it.
nm_modem_complete_connection() cannot just return FALSE in case
the modem doesn't overwrite complete_connection(). It must set
the error variable.
This leads to a crash when calling AddAndActivate for Ofono type
modem. It does not affect the ModemManager implementation
NMModemBroadband, because that one implements the method.
Previously, NM_PTRARRAY_LEN() would not work if the pointer type is
an opaque type, which is common. For example:
NMConnection *const*connections = ...;
Kernel recently got support for exposing TUN/TAP information on netlink
[1], [2], [3]. Add support for it to the platform cache.
The advantage of using netlink is that querying sysctl bypasses the
order of events of the netlink socket. It is out of sync and racy. For
example, platform cache might still think that a tun device exists, but
a subsequent lookup at sysfs might fail because the device was deleted
in the meantime. Another point is, that we don't get change
notifications via sysctl and that it requires various extra syscalls
to read the device information. If the tun information is present on
netlink, put it into the cache. This bypasses checking sysctl while
we keep looking at sysctl for backward compatibility until we require
support from kernel.
Notes:
- we had two link types NM_LINK_TYPE_TAP and NM_LINK_TYPE_TUN. This
deviates from the model of how kernel treats TUN/TAP devices, which
makes it more complicated. The link type of a NMPlatformLink instance
should match what kernel thinks about the device. Point in case,
when parsing RTM_NETLINK messages, we very early need to determine
the link type (_linktype_get_type()). However, to determine the
type of a TUN/TAP at that point, we need to look into nested
netlink attributes which in turn depend on the type (IFLA_INFO_KIND
and IFLA_INFO_DATA), or even worse, we would need to look into
sysctl for older kernel vesions. Now, the TUN/TAP type is a property
of the link type NM_LINK_TYPE_TUN, instead of determining two
different link types.
- various parts of the API (both kernel's sysctl vs. netlink) and
NMDeviceTun vs. NMSettingTun disagree whether the PI is positive
(NM_SETTING_TUN_PI, IFLA_TUN_PI, NMPlatformLnkTun.pi) or inverted
(NM_DEVICE_TUN_NO_PI, IFF_NO_PI). There is no consistent way,
but prefer the positive form for internal API at NMPlatformLnkTun.pi.
- previously NMDeviceTun.mode could not change after initializing
the object. Allow for that to happen, because forcing some properties
that are reported by kernel to not change is wrong, in case they
might change. Of course, in practice kernel doesn't allow the device
to ever change its type, but the type property of the NMDeviceTun
should not make that assumption, because, if it actually changes, what
would it mean?
- note that as of now, new netlink API is not yet merged to mainline Linus
tree. Shortcut _parse_lnk_tun() to not accidentally use unstable API
for now.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1277457
[2] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=1ec010e705934c8acbe7dbf31afc81e60e3d828b
[3] https://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git/commit/?id=118eda77d6602616bc523a17ee45171e879d1818https://bugzilla.redhat.com/show_bug.cgi?id=1547213https://github.com/NetworkManager/NetworkManager/pull/77
The wireless mode property can be unset (NULL), in which case it
assumed to be equivalent to "infrastructure". If we convert an unset
mode to infrastructure, the connection will change on write,
triggering errors like:
settings-connection[...]: write: successfully updated (ifcfg-rh: persist (null)), connection was modified in the process
device (wlp4s0): Activation: (wifi) access point 'test1' has security, but secrets are required.
device (wlp4s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed')
device (wlp4s0): The connection was modified since activation
device (wlp4s0): state change: need-auth -> failed (reason 'no-secrets', sys-iface-state: 'managed')
To fix this, remove the MODE key when the mode is unset so that the
property is read back exactly as it was. Note that initscripts need
the MODE set, but in most cases there are other issues that make Wi-Fi
connection written by NM not compatible with initscripts.
https://bugzilla.redhat.com/show_bug.cgi?id=1549972
Add an alternative to g_clear_pointer(). The differences are:
- nm_clear_pointer() is more type safe as it does not cast neither the
pointer nor the destroy function. Commonly, the types should be compatible
and not requiring a cast. Casting in the macro eliminates some of the
compilers type checking. For example, while
g_clear_pointer (&priv->hash_table, g_ptr_array_unref);
compiles, nm_clear_pointer() would prevent such an invalid use.
- also, clear the destination pointer *before* invoking the destroy
function. Destroy might emit signals (like weak-pointer callbacks
of GArray clear functions). Clear the destination first, so that
we don't leave a dangling pointer there.
- return TRUE/FALSE depending on whether there was a pointer to clear.
I tested that redefining g_clear_pointer()/g_clear_object() with our
more typesafe nm_* variants still compiles and indicates no bugs. So
that is good. It's not really expected that turning on more static checks
would yield a large number of bugs, because generally our code is in a good
shape already. We have few such bugs, because we already turn all all warnings
and extra checks that make sense. That however is not an argument for
not introducing (and using) a more resticted implementation.
It's slightly more correct to first clear the pointer location
before invoking the destroy function. The destroy function might
emit other callbacks, and at a certain point the pointer becomes
dangling. Avoid this danling pointer, by first clearing the
memory, and then destroing the instance.
With g_clear_pointer(pptr, g_free), pptr is cast to a non-const pointer,
and hence there is no compiler warning about calling g_free() in a const
pointer. However, it still feels ugly to pass a const pointer to
g_clear_pointer(). We should either add an explicity cast, or just
make the pointer non-const.
I guess part of the problem is that C's "const char *" means that the
string itself is immutable, but also that it cannot be freed. We most
often want a different semantic: the string itself is immutable after
being initialized once, but the memory itself can and will be freed.
Such a notion of immutable C strings cannot be expressed.
For that, just remove the "const" from the declarations. Although we
don't want to modify the (content of the) string, it's still more a
mutable string.
Only in _vt_cmd_obj_copy_lnk_vlan(), add an explicity cast but keep the
type as const. The reason is, that we really want that NMPObject
instances are immutable (in the sense that they don't modify while
existing), but that doesn't mean the memory cannot be freed.
The g_clear_pointer() macro already contains a cast to GDestroyNotify. No
need to do it ourself. In fact, with the cast, this only works with the
particular g_clear_pointer() implementation, that first assigns the
destroy function to a local variable.
See-also: https://bugzilla.gnome.org/show_bug.cgi?id=674634#c52