The delete_on_deactivate_link_delete() handler may be called after the
device was already removed from NMManager. Don't allow that.
Check whether the device is still exported on D-Bus as indication.
NM_reboot_openvswitch_vlan_configuration_var2 test exposes a race. What
the test does, is to create OVS profiles and repeatedly restart
NetworkManager, checking that those profiles autoconnect and the OVS
configuration gets created.
There is a race, where:
- the OVS interface exists, and an NMDeviceOvsInterface is created
- first ovsdb cleans up old interfaces, sending a json request.
- OVS deletes the interface, and NetworkManager first picks up the
platform signal (there is a race here, usually the ovsdb request
completes first, which will cleanup the NMDeviceOvsInterface in
a different way).
- when the device gets unrealized, we don't schedule a
check-autoactivate, so the device stays down.
See https://bugzilla.redhat.com/show_bug.cgi?id=2152864#c5 for a log
file with more details.
What should instead happen, is to autoactivate the OVS interface, which
then also fully configures the port and bridge interfaces.
Explicitly schedule an autoactivate when unrealizing devices.
Note that there are now several cases, where NetworkManager autoconnects
more eagerly. This even affects some CI tests and user-visible behavior.
But I think relying on "just don't call nm_device_emit_recheck_auto_activate()
to hope that autoconnect doesn't happen is wrong. It must always be
possible to trigger an autoconnect check, and the right thing must
happen. We only don't trigger autoconnect checks *all* the time, because
it would be a waste of CPU resources, but whenever we slightly suspect
that an autoconnect may happen, we must be allowed to trigger a check.
If a device is in a condition where it previously did not autoconnect,
and it also *should* not autoconnect, then we need to fix the code that
evaluates whether an autoconnect may happen (not avoid triggering a
check).
https://bugzilla.redhat.com/show_bug.cgi?id=2152864
Fixes-test: @NM_reboot_openvswitch_vlan_configuration_var2
Currently, when we delete a device then autoconnect does not kick in
right away. But that is only, because we happen not to schedule a
"autoactivate" recheck.
What should be happen, is that rechecking whether to autoconnect is
always allowed, and that we have the necessary state to know that
autoconnect currently should not work.
Instead, block autoconnect of the involved profile. That makes sense,
because clearly we don't want to autoconnect right again after `nmcli
device delete $iface`.
The "connection.stable-id" supports placeholders like "${CONNECTION}" or
"${DEVICE}".
The stable-id can also be specified in global connection defaults in
NetworkManager.conf, by leaving it unset in the profile. Global
connection defaults always follow the pattern, that they correspond to a
per-profile property, and only when the per-profile value indicates a
special default/unset value, the global connection default is consulted.
Finally, if the global connection default is also not configured in
NetworkManager.conf, a built-in default is used (which may not be
constant either, for example ipv6.ip6-privacy's built-in default depends
on a sysctl value).
In any case, every possible configuration that can be achieved should be
configurable both per-profile and via global connection default. That
was not given for the stable-id, because the built-in default generated
an ID in a way that could not be explicitly expressed otherwise.
So you could not:
- explicitly set the per-profile value to the built-in default, to avoid
that the global-connection-default overwrites it.
- explicitly set the global-connection-default to the built-in default,
to avoid that a lower priority [connection*] section overwrites the
stable-id again.
Fix that inconsistency to make it possible to explicitly set the
built-in default.
Change behavior for literally "default${CONNECTION}" and make it behave
as the built-in default. Also document that the built-in default has that
value.
It's unlikely that this breaks an existing configuration, but of course,
if any user configured "connection.stable-id=default${CONNECTION}", then
the behavior changes for them.
The actual formatting depends on the version of clang-format. Print the
used version, which is in particular interesting when we get an error in
our gitlab-ci check (which uses the correct version).
Using the ppp code is rather ugly.
Historically, the pppd headers don't follow a good naming convention,
and define things that cause conflicts with our headers:
/usr/include/pppd/patchlevel.h:#define VERSION "2.4.9"
/usr/include/pppd/pppd.h:typedef unsigned char bool;
Hence we had to include the pppd headers in certain order, and be
careful.
ppp 2.5 changes API and cleans that up. But since we need to support
also old versions, it does not immediately simplify anything.
Only include "pppd" headers in "nm-pppd-compat.c" and expose a wrapper
API from "nm-pppd-compat.h". The purpose is that "nm-pppd-compat.h"
exposes clean names, while all the handling of ppp is in the source
file.
This change does the following
* Adding in nm-pppd-compat.h to mask details regarding different
versions of pppd.
* Fix the nm-pppd-plugin.c regarding differences in API between
2.4.9 (current) and latet pppd 2.5.0 in master branch
* Additional fixes to the configure.ac to appropriately set defines used
for compilation
- We need to fetch more entries per page. 100 is the maximum without
pagination, but that is enough for us.
- Previously, we checked all stages. Now, let's skip the "prep" and "tier3" stages.
This change should work both with old and new pipelines.
We want that the tier2+ tests are only run manually. As those tests
depend on the respective prep step, there are 3 possibilities:
1) make prep manual and the tier test automatic. That is what we would
want, because then we can just manually trigger the prep step (one
click). However, in the past this didn't work.
2) make the prep automatic and the test manual. That works, the downside
is that we often run the prep step when its not needed. This is what
we used to do to workaround 1).
3) make prep and the test manual. Then there are no unnecessary tests
run, but triggering a manual test is cumbersome. First click to start
the prep step, then wait, then click again.
Revisit this. It seems 1) is working now. Yeay.
Also rename the prep stages, so that it's clear to which tier they
belong. I guess, I could move them instead to prep1, prep2, prep3
stages, but then there are a lot of columns on the web site.
The distro.name is not just a pretty name, its the name under which we fetch
the container. It is thus a well-known name, that we can rely on.
The "base_type" only depends on the distro name, and it makes no sense
to ever choose a different name. Tracking it in the "distributions"
array is thus redundant.
Move the mapping of distro.name to the base type to a separate place.
The tag we actually use already contains a hash of the input files and
is generated (by `ci-fairy generate-templates`). There is no need for having
this fixed prefix. As also seens by having a date there, which is maintained
badly and meaningless.
Drop it.
The benefit is that instead of one long running job for fedora:37 (the
current tier1 test), we have several smaller.
A minor downside is, that if the build is broken, then usually the very
first test would already fail. Previously, that meant that the follow up
tests were skipped. Now, they run all in parallel. However, test
failures should be the exception, so the wasted resources are probably
irrelevant. The upside is, that we can see which tests fail, and we run
them much faster (in parallel).
This is only done for the tier1 test, because those tests are started
automatically. Other tiers need to be triggered manually, which already
means a lot of clicking. Making those also matrix tests, would result in
an insane amount of clicking. As those other tests are run much more
seldom, having them huge is probably fine.
We have many test configurations (i.e. distros like fedora:37,
debian:9). Almost all of them run manually triggered, because running
them every time would be wasteful.
Still, even as we trigger those tests only seldom, whenever we trigger
them all together, they consume still too many resources of the
freedesktop.org gitlab infrastructure.
One possibility would be to just drop old distros (e.g. fedora:30).
Which tests are setup in gitlab-ci is constantly refined and adjusted.
So dropping some distros is not necessarily wrong and bound to happen
eventually.
However, I also don't find it great to just disable tests that are still
passing. If we want to avoid consuming too many resources, we can just
choose not to run those tests. We don't need to enforce that by deleting
tests. Once deleted, such a configuration cannot be tested anymore as it
would be too cumbersome to recreate the setup manually.
Instead, introduce stages/tiers to clearer mark configuration that we
should test even less frequently.
Note that it is still required from the developer to not trigger too
many tests at once, to not monopolize the CI resources. The stages
should make that clearer to see, but don't solve it. Deleting tests
might solve it, but only if we delete a significant number of those
tests, which seems not desirable.
pip on Debian 12 semi-forces us to use a venv. That's hard enough but
even more so when we just want to run meson which only relies on the
standard library anyway.
Since that flag doesn't exist on earlier versions, try both and hope one
invocation succeeds.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1595
Older versions of iproute2 don't support the "enclimit" argument. Work
around that from the unit tests.
Fixes: 1505ca3626 ('platform/tests: ip6gre & ip6gretap test cases (ip6 tunnel flags)')
Every branch (for example "nm-1-40") has exactly one next branch, from
which patches should be backported (in that example that branch is
"nm-1-42").
While "find-backports" searches all newer branches for patches, it does
not make it clear form where the patch should come from.
That means, if you run the script `contrib/scripts/find-backports origin/nm-1-40`
it will check nm-1-42 and main branch, and might suggest to backport
patches that are only on main, but not "nm-1-42". That would be wrong,
because patches need to first go into nm-1-42, and then backported (from
there) further to nm-1-40.
Print a warning to highlight that.
If the client was waiting for IPv6 DAD to complete and the lease was
updated or lost, `wait_ipv6_dad` needs to be cleared; otherwise, at
the next platform change the client will try to evaluate the DAD state
with a different or no lease. In particular if there is no lease the
client will try to decline it because there are no valid addresses,
leading to an assertion failure:
../src/core/dhcp/nm-dhcp-client.c:997:_dhcp_client_decline: assertion failed: (l3cd)
Backtrace:
__GI_raise ()
__GI_abort ()
g_assertion_message ()
g_assertion_message_expr ()
_dhcp_client_decline (self=0x1af13b0, l3cd=0x0, error_message=0x8e25e1 "DAD failed", error=0x7ffec2c45cb0) at ../src/core/dhcp/nm-dhcp-client.c:997
l3_cfg_notify_cb (l3cfg=0x1bc47f0, notify_data=0x7ffec2c46c60, self=0x1af13b0) at ../src/core/dhcp/nm-dhcp-client.c:1190
g_closure_invoke ()
g_signal_emit_valist ()
g_signal_emit ()
_nm_l3cfg_emit_signal_notify () at ../src/core/nm-l3cfg.c:629
_nm_l3cfg_notify_platform_change_on_idle () at ../src/core/nm-l3cfg.c:1390
_platform_signal_on_idle_cb () at ../src/core/nm-netns.c:411
g_idle_dispatch ()
Fixes: 393bc628ff ('dhcp: wait DAD completion for DHCPv6 addresses')
https://bugzilla.redhat.com/show_bug.cgi?id=2179890https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1594
<error> is mostly about "really should not happen" scenarios. It's
closer to an assertion failure, and something that NetworkManager should
not happen.
Of course, things can go wrong, but <warn> is a sufficient. When ovsdb
gives unexpected communication, it's just a warning. At least, that's
also what all the similar cases in "nm-ovsdb.c" already do