Kernel rejects adding routes that have a gateway, if there is no direct
(onlink) route to that gateway. The exact conditions are non-trivial due
to the complexities of routing, but that's it basically.
Anyway. In NetworkManager we don't want to have such non-obvious
interdependencies. If the user configures a route with a gateway, but
"forgets" to configure a direct route to the gateway, we don't assume
that the user configured the wrong route. Instead, we assume the user
forgot to configure the additional route and add it automatically. That
is for convenience, but also because (as said) the rules for this are
non-trivial. Moreover, it's problematic to report an error in routing
during activation. Should we fail activation altogether? Should we just
log an error and otherwise silently proceed? Logging is not a sensible
behavior that the (possibly non-human) user can meaningfully handle. So
we instead try to make it work.
Previously, nm_platform_ip_route_sync() had the workaround of when we
failed to configure a route and it looked like it might be due to the
missing onlink route, we would add a suitable /32 / /128 route. The
problem is that we want that NML3Cfg is aware of what routes we want to
configure. The lower layer nm_platform_ip_route_sync() adding additional
routes makes that difficult (maybe nm_platform_ip_route_sync() could
return the additional routes that it added, but it doesn't).
The better solution seems to be that
nm_l3_config_data_add_dependent_onlink_routes() adds the required routes
in NML3Cfg during commit. This is done since commit 4073211595
('Revert "l3cfg: do not add dependent routes for non-default routes"').
Further, since commit ('platform: always reconfigure IP routes even if
removed externally') we also always try to re-add the routes we want,
regardless of whether they appear to be deleted by the user.
So a suitable onlink route really should be always there, and there is
no more need for this workaround.
NML3Cfg is stateful, that means it remembers which address/route it
configured earlier. That is important because the API users of NML3Cfg
only say what the want to configure now, and NML3Cfg needs to remove
addresses/routes that it configured earlier but are no longer to be
present. Also, NetworkManager wants to allow the user to add
addresses/routes externally with `ip addr|route add` and NetworkManager
not removing it. This is a common use case for dispatcher scripts, but
in general, we want to allow other components to add addresses/routes.
We try something similar with the removal of routes/addresses managed by
NetworkManager. When NetworkManager adds a route/address, which later
disappears, then we assume that the user intentionally removed the
address/route and take the hint to not re-add it.
However, it doesn't work. It is problematic for two reasons:
- kernel can automatically remove routes. For example, deleting an IPv4
address that is the prefsrc of a route, will cause kernel to delete
that route. Sure, we may be unable to re-configure the route at this
moment, but we shouldn't remember indefinitely that the route is
supposed to be absent. Rather, we should re-add it when possible.
- kernel is a pain with validating consistencies of routes. For example,
when a route has a nexthop gateway, then the gateway must be onlink
(directly reachable), or kernel refuses to add it with "Nexthop has
invalid gateway". Of course, when removing the onlink route kernel is
fine leaving the gateway route behind, which it would otherwise refuse
to add.
Anyway. Such interdependencies for when kernel rejects adding a route
with "Nexthop has invalid gateway" are non-trivial. We try to work
around that by always adding the necessary onlink routes. See
nm_l3_config_data_add_dependent_onlink_routes(). But if the user
externally removed the dependent onlink route, and NetworkManager
remembers to not re-adding it, then the efforts from
nm_l3_config_data_add_dependent_onlink_routes() are ignored. This
causes ripple effects and NetworkManager will also be unable to add the
nexthop route.
Trying to preserve absence of routes that NetworkManager would like to
configure is not tenable. Don't do it anymore. There was anyway no
guarantee that on the next update NetworkManager wouldn't try to re-add
the route in question. For example, if the route came from DHCP, and the
lease temporarily went away and came back, then NetworkManager probably
would have (correctly) forgotten that the user wished that the route be
absent. This did not work reliably and it just causes problems.
NMPrioq is taken from systemd's "prioq.c". It is a nice data structure,
that accepts and an index pointer, to directly access elements inside
the heap.
Previously, the API didn't require a consistent index, while the data is
not inside the heap. nm_prioq_{update,shuffle,remove}()) call find_item(),
which silently accepts wrong indexes and assumes the element is not in
the heap.
Keeping the index in sync with the data seems error prone. Accepting any
index without asserting may be convenient for the user (as the user is
not required to pre-initialize the index with NM_PRIOQ_IDX_NULL).
However, it also misses to catch potential bugs.
Now the index must be kept consistent, in particular also if the element
is not enqueued. This means, you must initialize them with
NM_PRIOQ_IDX_NULL.
NMPrioq is copied from systemd's "prioq.c". It uses the index pointer to
find elements in the heap directly. However, it's not very careful about
clearing or maintaining the index pointer if the element is not in the
queue. Like, find_item() accepts a bogus index and only finds the data
if the data and the index matches. That seems error prone.
In the first step, when we remove an item from the queue, ensure to
reset the index to NM_PRIOQ_IDX_NULL.
The nm_ip_addr_*() APIs are supposed to work with unaligned input,
so we could use them while parsing binary data (where the field
may not be properly aligned). For that reason, it used to first
copy the argument to a local (properly aligned) variable.
Rework that a bit, and use unaligned_read_ne32() for IPv4.
NMIPAddr is a union, so you always need to know the right addr_family.
Often we track the addr_family separately or know it implicitly. Then we
don't need to glue them together.
But also often we need to associate the address_family with the address
union. Add a struct that bundles the NMIPAddr with the addr_family,
making it a tagged union. The benefit is that we now also have one
implementation for equal, cmp and hash-update, instead of reimplementing
them.
For each connection that corresponds to a software device, we create a
"unrealized" device that then becomes realized just before the
connection starts activating. Currently, in certain conditions NM
creates two devices with the same name and type, one realized and one
not; this is not expected and can lead to other issues especially when
a software device is reactivated.
Avoid that by relaxing the check in system_create_virtual_device(): if
a device exists with the same name and type, we don't want to create
another even if the type-specific parameters differ.
By default, podman bind mounts a "/etc/resolv.conf" file. That prevents
NetworkManager (inside the container) to update the file, which leads to
warnings in the log and certain NM-ci tests won't pass due to that.
Disable handling of "/etc/resolv.conf" in podman. But also pre-deploy a
default resolv.conf, with the google name server 8.8.8.8. I don't
understand why, but even with "--dns=none", writing "/etc/resolv.conf"
while building the container doesn't take effect. Instead, write a
usable "/etc/resolv.conf" from "/etc/rc.d/rc.local".
The idea is to have useful and correct helper functions, that are
generic and reusable.
nmcs_utils_poll() was done with that intent, and it is a generally
useful function. As the implementation shows, it's not entirely trivial
to get all the parameters right, when it comes to glib-integration
(GMainContext and GTask) and polling.
Whether something like a generic poll helper is a useful thing at all,
may be a valid question. E.g. you need several hooks, and the usage is
still not trivial. Regardless of that, "nm-cloud-setup-utils.c" already
had such a helper. This is only about moving it to a place where it is
actually accessible to others.
And, if it turns out to be a good idea after all, then somebody else
could use it.
nmcs_utils_poll() calls nmcs_wait_for_objects_register(), which is
specific to nm-cloud-setup.
nmcs_utils_poll() should move to libnm-glib-aux, so it should not have a
direct dependency on nm-cloud-setup code. Add instead a hook that the
caller can use for registering the object.
nmcs_utils_poll() accepts two different user-data. One is passed to the
probe callbacks (and returned by nmcs_utils_poll_finish()). The other
one is passed to the callback.
Having separate user data might be useful. It's not useful for
nm_http_client_poll_req(), which both passes the same.
It's confusing to pass the same data as different user-data. Don't do
that. Use only one way.
A "minor" release can still be the latest release. It depends
on which minor release you do. The script isn't smart enough
to understand the difference, so make the hint a bit clearer.
In practice, this does not cause an issue, because NMEtherAddr quite
quite obviously only contains uint8 values and has alignment 1.
It's still ugly to case nm_ip_addr_zero to NMEtherAddr if they are
entirely unrelated types.
Fixes: 58e2ba0535 ('glib-aux: drop ethernet fields from NMIPAddr union')
It is slightly confusing to be required to be aware whether something is
a union or a struct. Hence, the union was wrapped in a struct.
However, we anyway almost always use the typedef NMIPAddr. The single
place where we forward declare the type, we can correctly use the union
specifier.
It's not really used anyway.
The idea was, that the ethernet MAC address would fit from the size, and
we might at a few places use that for convenience. But it's more
confusing. Also, because there is already NMEtherAddr and `struct
ether_addr`.