NMDevice holds a reference to NMManager, which holds a reference to NMPolicy.
It is not possible that we try to dispose NMPolicy while there are still devices
registered. That would be a bug, that we need to find and solve
differently. Add an assertion instead of trying to handle it.
Add an assertion to nm_policy_device_recheck_auto_activate_schedule(),
that the device is currently registered in NMPolicy. Calling it outside
would be odd, and likely a bug.
But if we only register the auto-activate while being registered, we
don't need to take an additional reference. We know that the object must
be be alive (also, we have assertions that in fact it is still alive).
Hook the information for tracking the activation of a device, to the
NMDevice itself. Sure, that slightly couples the NMPolicy closer to
NMDevice, but the result is still simpler code because we don't need a
separate ActivateData.
It also means we can immediately tell whether the auto activation check
for NMDevice is already scheduled and don't need to search through the
list.
NMPolicy really should be merged into NMManager. It has not a clear responsiblity
so that there are two separate objects only makes things confusing. Anyway. It
is permissible to look up the NMPolicy instance of a NMManager. Add an accessor.
It's the better name. Especially since there is no more signal involved,
the term "emit" doesn't match.
Note also how the previous approach using a signal tried to abstract
what is happening. So we were no longer rechecking-autoconnect, instead,
we were emitting-a-signal-to-recheck-autoconnect. Just be plain about
what it is doing and don't go through a layer of signal.
GObject signals don't make the code easier to understand, on the
contrary. They may have their purpose, when objects truly must/should
not be aware of each other, and need to be composed very loosely. That
is not the case here.
There really is only one subscriber to NM_DEVICE_RECHECK_AUTO_ACTIVATE
signal, and it only makes sense this way. Instead of going through a
signal invocation, just call the well known method directly. It becomes
clearer who calls this code (and it has a lower overhead).
When using cscope/ctags it also is easier to follow the code because the
tools understand function calls.
The delete_on_deactivate_link_delete() handler may be called after the
device was already removed from NMManager. Don't allow that.
Check whether the device is still exported on D-Bus as indication.
NM_reboot_openvswitch_vlan_configuration_var2 test exposes a race. What
the test does, is to create OVS profiles and repeatedly restart
NetworkManager, checking that those profiles autoconnect and the OVS
configuration gets created.
There is a race, where:
- the OVS interface exists, and an NMDeviceOvsInterface is created
- first ovsdb cleans up old interfaces, sending a json request.
- OVS deletes the interface, and NetworkManager first picks up the
platform signal (there is a race here, usually the ovsdb request
completes first, which will cleanup the NMDeviceOvsInterface in
a different way).
- when the device gets unrealized, we don't schedule a
check-autoactivate, so the device stays down.
See https://bugzilla.redhat.com/show_bug.cgi?id=2152864#c5 for a log
file with more details.
What should instead happen, is to autoactivate the OVS interface, which
then also fully configures the port and bridge interfaces.
Explicitly schedule an autoactivate when unrealizing devices.
Note that there are now several cases, where NetworkManager autoconnects
more eagerly. This even affects some CI tests and user-visible behavior.
But I think relying on "just don't call nm_device_emit_recheck_auto_activate()
to hope that autoconnect doesn't happen is wrong. It must always be
possible to trigger an autoconnect check, and the right thing must
happen. We only don't trigger autoconnect checks *all* the time, because
it would be a waste of CPU resources, but whenever we slightly suspect
that an autoconnect may happen, we must be allowed to trigger a check.
If a device is in a condition where it previously did not autoconnect,
and it also *should* not autoconnect, then we need to fix the code that
evaluates whether an autoconnect may happen (not avoid triggering a
check).
https://bugzilla.redhat.com/show_bug.cgi?id=2152864
Fixes-test: @NM_reboot_openvswitch_vlan_configuration_var2
Currently, when we delete a device then autoconnect does not kick in
right away. But that is only, because we happen not to schedule a
"autoactivate" recheck.
What should be happen, is that rechecking whether to autoconnect is
always allowed, and that we have the necessary state to know that
autoconnect currently should not work.
Instead, block autoconnect of the involved profile. That makes sense,
because clearly we don't want to autoconnect right again after `nmcli
device delete $iface`.
The "connection.stable-id" supports placeholders like "${CONNECTION}" or
"${DEVICE}".
The stable-id can also be specified in global connection defaults in
NetworkManager.conf, by leaving it unset in the profile. Global
connection defaults always follow the pattern, that they correspond to a
per-profile property, and only when the per-profile value indicates a
special default/unset value, the global connection default is consulted.
Finally, if the global connection default is also not configured in
NetworkManager.conf, a built-in default is used (which may not be
constant either, for example ipv6.ip6-privacy's built-in default depends
on a sysctl value).
In any case, every possible configuration that can be achieved should be
configurable both per-profile and via global connection default. That
was not given for the stable-id, because the built-in default generated
an ID in a way that could not be explicitly expressed otherwise.
So you could not:
- explicitly set the per-profile value to the built-in default, to avoid
that the global-connection-default overwrites it.
- explicitly set the global-connection-default to the built-in default,
to avoid that a lower priority [connection*] section overwrites the
stable-id again.
Fix that inconsistency to make it possible to explicitly set the
built-in default.
Change behavior for literally "default${CONNECTION}" and make it behave
as the built-in default. Also document that the built-in default has that
value.
It's unlikely that this breaks an existing configuration, but of course,
if any user configured "connection.stable-id=default${CONNECTION}", then
the behavior changes for them.
The actual formatting depends on the version of clang-format. Print the
used version, which is in particular interesting when we get an error in
our gitlab-ci check (which uses the correct version).
Using the ppp code is rather ugly.
Historically, the pppd headers don't follow a good naming convention,
and define things that cause conflicts with our headers:
/usr/include/pppd/patchlevel.h:#define VERSION "2.4.9"
/usr/include/pppd/pppd.h:typedef unsigned char bool;
Hence we had to include the pppd headers in certain order, and be
careful.
ppp 2.5 changes API and cleans that up. But since we need to support
also old versions, it does not immediately simplify anything.
Only include "pppd" headers in "nm-pppd-compat.c" and expose a wrapper
API from "nm-pppd-compat.h". The purpose is that "nm-pppd-compat.h"
exposes clean names, while all the handling of ppp is in the source
file.
This change does the following
* Adding in nm-pppd-compat.h to mask details regarding different
versions of pppd.
* Fix the nm-pppd-plugin.c regarding differences in API between
2.4.9 (current) and latet pppd 2.5.0 in master branch
* Additional fixes to the configure.ac to appropriately set defines used
for compilation
- We need to fetch more entries per page. 100 is the maximum without
pagination, but that is enough for us.
- Previously, we checked all stages. Now, let's skip the "prep" and "tier3" stages.
This change should work both with old and new pipelines.
We want that the tier2+ tests are only run manually. As those tests
depend on the respective prep step, there are 3 possibilities:
1) make prep manual and the tier test automatic. That is what we would
want, because then we can just manually trigger the prep step (one
click). However, in the past this didn't work.
2) make the prep automatic and the test manual. That works, the downside
is that we often run the prep step when its not needed. This is what
we used to do to workaround 1).
3) make prep and the test manual. Then there are no unnecessary tests
run, but triggering a manual test is cumbersome. First click to start
the prep step, then wait, then click again.
Revisit this. It seems 1) is working now. Yeay.
Also rename the prep stages, so that it's clear to which tier they
belong. I guess, I could move them instead to prep1, prep2, prep3
stages, but then there are a lot of columns on the web site.
The distro.name is not just a pretty name, its the name under which we fetch
the container. It is thus a well-known name, that we can rely on.
The "base_type" only depends on the distro name, and it makes no sense
to ever choose a different name. Tracking it in the "distributions"
array is thus redundant.
Move the mapping of distro.name to the base type to a separate place.
The tag we actually use already contains a hash of the input files and
is generated (by `ci-fairy generate-templates`). There is no need for having
this fixed prefix. As also seens by having a date there, which is maintained
badly and meaningless.
Drop it.
The benefit is that instead of one long running job for fedora:37 (the
current tier1 test), we have several smaller.
A minor downside is, that if the build is broken, then usually the very
first test would already fail. Previously, that meant that the follow up
tests were skipped. Now, they run all in parallel. However, test
failures should be the exception, so the wasted resources are probably
irrelevant. The upside is, that we can see which tests fail, and we run
them much faster (in parallel).
This is only done for the tier1 test, because those tests are started
automatically. Other tiers need to be triggered manually, which already
means a lot of clicking. Making those also matrix tests, would result in
an insane amount of clicking. As those other tests are run much more
seldom, having them huge is probably fine.
We have many test configurations (i.e. distros like fedora:37,
debian:9). Almost all of them run manually triggered, because running
them every time would be wasteful.
Still, even as we trigger those tests only seldom, whenever we trigger
them all together, they consume still too many resources of the
freedesktop.org gitlab infrastructure.
One possibility would be to just drop old distros (e.g. fedora:30).
Which tests are setup in gitlab-ci is constantly refined and adjusted.
So dropping some distros is not necessarily wrong and bound to happen
eventually.
However, I also don't find it great to just disable tests that are still
passing. If we want to avoid consuming too many resources, we can just
choose not to run those tests. We don't need to enforce that by deleting
tests. Once deleted, such a configuration cannot be tested anymore as it
would be too cumbersome to recreate the setup manually.
Instead, introduce stages/tiers to clearer mark configuration that we
should test even less frequently.
Note that it is still required from the developer to not trigger too
many tests at once, to not monopolize the CI resources. The stages
should make that clearer to see, but don't solve it. Deleting tests
might solve it, but only if we delete a significant number of those
tests, which seems not desirable.
pip on Debian 12 semi-forces us to use a venv. That's hard enough but
even more so when we just want to run meson which only relies on the
standard library anyway.
Since that flag doesn't exist on earlier versions, try both and hope one
invocation succeeds.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1595
Older versions of iproute2 don't support the "enclimit" argument. Work
around that from the unit tests.
Fixes: 1505ca3626 ('platform/tests: ip6gre & ip6gretap test cases (ip6 tunnel flags)')