Commit Graph

31938 Commits

Author SHA1 Message Date
Thomas Haller
700cd20d4d release: bump version to 1.43.0 (development) 2023-01-19 20:26:31 +01:00
Thomas Haller
093bbd0fcf release: bump version to 1.41.90 (1.42-rc1) 2023-01-19 20:15:14 +01:00
Thomas Haller
01730f5943 gitlab-ci: set OMP_NUM_THREADS=1 to avoid libgomp crash for msgmerge
It's not clear why this happens. But since recently in our gitlab-ci,
all the Fedora machines will fail. It happens in the step

  check_run_clean 6 && test $IS_FEDORA = 1 -o $IS_CENTOS = 1 && ./contrib/fedora/rpm/build_clean.sh -g -w crypto_gnutls -w debug -w iwd -w test -W meson

which explains why it only affects Fedora configurations.

It does not always fail, but the probability of failure is high.
The failure is:

  ...
  rm -f et.gmo && /usr/bin/msgmerge --for-msgfmt -o et.1po et.po NetworkManager.pot && /usr/bin/msgfmt -c --statistics --verbose -o et.gmo et.1po && rm -f et.1po
  libgomp: Thread creation failed: Resource temporarily unavailable
  make[3]: *** [Makefile:383: et.gmo] Error 1

Maybe some new resource restricting in gitlab. Let's add this workaround.
I don't really understand the cause, but this seems to avoid it, which is
good enough for me.
2023-01-19 19:31:28 +01:00
Thomas Haller
f2db9be627 platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes" (3)
This test is not (yet) stable. Disable for now.
2023-01-19 17:10:43 +01:00
Thomas Haller
20f71b392c platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes" (2)
This test is not (yet) stable. Disable for now.
2023-01-19 16:27:28 +01:00
Thomas Haller
5be942eac5 libnm/doc: document "weight" attribute for IPv4 routes 2023-01-19 16:11:19 +01:00
Fernando Fernandez Mancera
d02be1c123 NEWS: update 2023-01-19 16:07:57 +01:00
Thomas Haller
1e883ab6e6 gitlab-ci: avoid clean step in "run-test.sh" for manual invocation
When we run `NM_TEST_SELECT_RUN=x ./.gitlab-ci/run-test.sh` to run one
step only, we should not do the final clean, so that the build artifacts
are preserved.
2023-01-19 15:04:46 +01:00
Thomas Haller
8ec69afd8b platform/tests: temporarily disable failing check in "/route/test_cache_consistency_routes"
Hm, it seems to fail on some systems. Disable the test for now.
2023-01-19 13:25:52 +01:00
Fernando Fernandez Mancera
e266c420b2 l3cfg: schedule an update after every commit-type/config-data register/unregister
When we register/unregister a commit-type or when we add/remove a
config-data to NML3Cfg, that act only does the registration/addition.
Only on the next commit, are the changes actually done. The purpose
of that is to add/register multiple configurations and commit them later
when ready.

However, it would be wrong to not do the commit a short time after. The
configuration state is dirty with need to be committed, and that should
happen soon.

Worse, when a interface disappears, NMDevice will clear the ifindex and
the NML3Cfg instance, thereby unregistering all config data and commit
type. If we previously commited something, we need to do another follow-up
commit to cleanup that state.

That is for example important with ECMP routes, which are registered in
NMNetns. When NML3Cfg goes down, it always must unregister to properly
cleanup. Failure to do so, causes an assertion failure and crash. This
change fixes that.

Fix that by automatically schedule and idle commit on
register/unregister/add/remove of commit-type/config-data.
It should *always* be permissible to call a AUTO commit from
an idle handler, because various parties cannot use NML3Cfg
independently, and they cannot know when somebody else does a
commit.

Note that NML3Cfg remembers if it presiouvly did a commit
("commit_type_update_sticky"), so even if the last commit-type gets
unregistered, the next commit will still do a sticky update (one more
time).

The only remaining question is what happens during quitting. When
quitting, NetworkManager we may want to leave some interfaces up and
configured. If we were to properly cleanup the NML3Cfg we might need a
mechanism to handle that. However, currently we just leak everything
during quit, so that is not a concern now. It is something that needs
to be addressed in the future.

https://bugzilla.redhat.com/show_bug.cgi?id=2158394
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1505
2023-01-19 12:40:33 +01:00
Thomas Haller
adad1d4358 platform: merge branch 'th/platform-cache-consistency-routes'
https://bugzilla.redhat.com/show_bug.cgi?id=2060684

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1494
2023-01-19 11:54:10 +01:00
Thomas Haller
87522ad316 platform/tests: add test for cache consistency for IP routes
Routes can be added with `ip route add|change|replace|append|prepend`.
Add a test that randomly tries to add such routes, and checks that
the cache stays consistent.

https://bugzilla.redhat.com/show_bug.cgi?id=2060684
2023-01-19 11:53:08 +01:00
Thomas Haller
461279a9ce platform/tests: add nmtstp_assert_platform() helper 2023-01-19 09:15:55 +01:00
Thomas Haller
d5e33bf660 platform/tests: extend nmtstp_env1_add_test_func() to create more interfaces
nmtstp_env1_add_test_func() prepares a certain environment (with dummy
interface) that is used by some tests. Extend it, to allow creating more
than one interface (currently up to two).
2023-01-19 08:56:22 +01:00
Thomas Haller
5579fca916 platform: allow setting multi_idx instance for NMPlatform
The major point of NMDedupMultiIndex is that it can de-duplicate
the objects. It thus makes sense the everybody is using the same
instance. Make the multi-idx instance of NMPlatform configurable.

This is not used outside of unit tests, because the daemon currently
always creates one platform instance and everybody then re-uses the
instance of the platform.

While this is (currently) only used by tests, and that the performance
optimization of de-duplicating is irrelevant for tests, this is still
useful. The test can then check whether two separate NMPlatform objects
shared the same instance and whether it was de-duplicated.
2023-01-19 08:56:21 +01:00
Thomas Haller
2c22c96235 platform: add NMP_OBJECT_TYPE_NAME() macro 2023-01-19 08:56:21 +01:00
Thomas Haller
7752b2e059 platform: abort handling routes in _rtnl_handle_msg() when resync is required
There really is nothing left to do. Skip the rest and do a resync.
2023-01-19 08:56:21 +01:00
Thomas Haller
6fc0dc3fcb platform: resync route cache upon NLM_F_REPLACE flag
There really is no way around this. As we don't cache all the routes
(e.g. ignored based on rtm_protocol or rtm_type), we cannot know which
route was replaced, when we get a NLM_F_REPLACE message.

We need to request a new dump in that case, which can be expensive, if
there are a lot of routes or if replace happens frequently.

The only possible solutions would be:

1) NetworkManager caches all routes, but it also needs to make sure to
  get *everything* right. In particular, to understand every relevant
  route attribute (including those added in the future, which is
  impossible).

2) kernel provides a reasonable API (rhbz#1337855, rhbz#1337860) that
  allows to sufficiently understand what is going on based on the
  netlink notifications.
2023-01-19 08:56:21 +01:00
Thomas Haller
4ec2123aa2 platform: parse routes of any type to handle replace
When you issue

  ip route replace broadcast 1.2.3.4/32 dev eth0

then this route may well replace a (unicast) route that we have in
the cache.

Previously, we would right away ignore such messages in
_new_from_nl_route(), which means we miss the fact that a route gets
replaced.

Instead, we need to parse the message at least so far, that we can
detect and handle the replace.
2023-01-19 08:56:21 +01:00
Thomas Haller
854f2cc1fc platform: better handle ip route replace for ignored routes
We don't cache certain routes, for example based on the protocol. This is
a performance optimization to ignore routes that we usually don't care
about.

Still, if the user does `ip route replace` with such a route, then we
need to pass it to nmp_cache_update_netlink_route(), so that we can
properly remove the replaced route.

Knowing which route was replaces might be impossible, as our cache does
not contain all routes. Likely all that nmp_cache_update_netlink_route()
can to is to set "resync_required" for NLM_F_REPLACE. But for that it
should see the object first.

This also means, if we ever write a BPF filter to filter out messages
that contain NLM_F_REPLACE, because that would lead to cache inconsistencies.
2023-01-19 08:56:21 +01:00
Thomas Haller
c64053e6e6 platform: minor cleanup in nmp_cache_update_netlink_route()
It reads nicer. It will also work better with the change that follows.
2023-01-19 08:56:21 +01:00
Thomas Haller
a3cea7f6fb platform: fix nmp_lookup_init_route_by_weak_id() to honor the route-table
The route table is part of the weak-id. You can see that with:

  ip route replace unicast 1.2.3.4/32 dev eth0 table 57
  ip route replace unicast 1.2.3.4/32 dev eth0 table 58

afterwards, `ip route show table all` will list both routes. The replace
operation is only per-table. Note that NMP_CACHE_ID_TYPE_ROUTES_BY_WEAK_ID
already got this right.

Fixes: 10ac675299 ('platform: add support for routing tables to platform cache')
2023-01-19 08:56:21 +01:00
Thomas Haller
0d458dbf07 platform: avoid printing raw pointer values in log 2023-01-19 08:56:21 +01:00
Thomas Haller
c1b57a2c72 glib-aux/tests: enable TRACE logging level when debugging nmtst tests
By setting "NMTST_DEBUG" to any non-empty string, "is_debug" is enabled.
So "NMTST_DEBUG='debug,...'" is mostly redundant. Note however you can
still disable debug mode explicitly, like "NMTST_DEBUG=no-debug,...",
which can make sense if you want to set other flags but not enabling
debug mode (like "NMTST_DEBUG=no-debug,quick").

You can also explicitly set the log level ("NMTST_DEBUG='log-level=TRACE,...'"
or enable trace debugging with "NMTST_DEBUG='d,...'", where "d" (or "D")
is shorthand for "NMTST_DEBUG=log-level=TRACE,no-expect-message,...".

Anyway. If you explicitly set the log level with "log-level=" or "d",
debug logging is enabled with the "debug" flag. But it only logged at
level "<debug>". That seems not best. Instead, enable "<trace>" level by
default in debug mode.

That's useful, because there is not a clear distinction between
"<debug>" and "<trace>" level. When debugging, you really want all the
information you got, you can also filter out later (`grep` is a thing).
2023-01-19 08:56:20 +01:00
Thomas Haller
e435c1b07f glib-aux/tests: add nmtst_keeper_add() test helper 2023-01-19 08:56:20 +01:00
Thomas Haller
fe99d462ec glib-aux: rename and change nm_g_ptr_array_copy() to nm_g_ptr_array_new_clone()
There is g_ptr_array_copy() in glib, but only since 2.68 so we cannot use it.
We had a compat implementation nm_g_ptr_array_copy(), however that one always
requires an additional parameter, the free function of the new array.

g_ptr_array_copy() always does a deep clone, and uses the source array's
free function. We don't have access to the free function (seems quite a
limitation of GPtrArray API), so our nm_g_ptr_array_copy() cannot be
exactly the same.

Previously, nm_g_ptr_array_copy() aimed to be as similar as possible to
g_ptr_array_copy(), and it would require the caller that the free
function is the same as the array's. That seems an unnecessary
limitation, and our compat implementation still looks different and has
a different name. If we were able to fully re-implement it, we would
instead add it to "nm-glib.h".

Anyway. As our implementation already differs, there is no need for the
arbitrary limitation to only perform deep copies. Instead, also allow
shallow copies. Rename the function to nm_g_ptr_array_new_clone() to
make it clearly distinct from g_ptr_array_copy().
2023-01-19 08:56:20 +01:00
Thomas Haller
dabfea2fc2 curl: use CURLOPT_PROTOCOLS_STR instead of deprecated CURLOPT_PROTOCOLS
CURLOPT_PROTOCOLS [0] was deprecated in libcurl 7.85.0 with
CURLOPT_PROTOCOLS_STR [1] as a replacement.

Well, technically it was only deprecated in 7.87.0, and retroactively
marked as deprecated since 7.85.0 [2]. But CURLOPT_PROTOCOLS_STR exists
since 7.85.0, so that's what we want to use.

This causes compiler warnings and build errors:

  ../src/core/nm-connectivity.c: In function 'do_curl_request':
  ../src/core/nm-connectivity.c:770:5: error: 'CURLOPT_PROTOCOLS' is deprecated: since 7.85.0. Use CURLOPT_PROTOCOLS_STR [-Werror=deprecated-declarations]
    770 |     curl_easy_setopt(ehandle, CURLOPT_PROTOCOLS, CURLPROTO_HTTP | CURLPROTO_HTTPS);
        |     ^~~~~~~~~~~~~~~~
  In file included from ../src/core/nm-connectivity.c:13:
  /usr/include/curl/curl.h:1749:3: note: declared here
   1749 |   CURLOPTDEPRECATED(CURLOPT_PROTOCOLS, CURLOPTTYPE_LONG, 181,
        |   ^~~~~~~~~~~~~~~~~

This patch is largely taken from systemd patch [2].

Based-on-patch-by: Frantisek Sumsal <frantisek@sumsal.cz>

[0] https://curl.se/libcurl/c/CURLOPT_PROTOCOLS.html
[1] https://curl.se/libcurl/c/CURLOPT_PROTOCOLS_STR.html
[2] 6967571bf2
[3] e61a4c0b7c

Fixes: 7a1734926a ('connectivity,cloud-setup: restrict curl protocols to HTTP and HTTPS')
2023-01-18 20:21:52 +01:00
Thomas Haller
a60476b27f core: fix enum argument in prototype of nm_utils_kill_process_sync(), etc.
This avoids a new compiler warning with gcc 13.0.0-0.9.fc38:

  ../src/core/nm-core-utils.c:482:1: error: conflicting types for 'nm_utils_kill_child_async' due to enum/integer mismatch; have 'void(pid_t,  int,  NMLogDomain,  const char *, guint32,  void (*)(pid_t,  gboolean,  int,  void *), void *)' {aka 'void(int,  int,  NMLogDomain,  const char *, unsigned int,  void (*)(int,  int,  int,  void *), void *)'} [-Werror=enum-int-mismatch]
    482 | nm_utils_kill_child_async(pid_t                   pid,
        | ^~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from ../src/core/nm-core-utils.c:9:

Fixes: 067202b34e ('core: use explict NMLogDomain enum instead of int')
2023-01-18 19:38:54 +01:00
Thomas Haller
8ece80390d gitlab-ci: bump ci-templates tag to generate new container images 2023-01-18 19:38:49 +01:00
Lubomir Rintel
38d3834e2c merge: branch 'lr/nl-retry'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1501
2023-01-17 19:25:51 +01:00
Thomas Haller
037fdcaf20 CONTRIBUTING: add hint for using cscope 2023-01-17 16:29:06 +01:00
Thomas Haller
8f13cb490b build: fix make cscope step for removed directories 2023-01-17 16:29:05 +01:00
Thomas Haller
a45029b7b0 tests: merge branch 'th/leak-test-data'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1492
2023-01-17 16:27:41 +01:00
Thomas Haller
5d81b472dc glib-aux: use struct initialization in nm_dedup_multi_index_new() 2023-01-17 16:26:51 +01:00
Thomas Haller
f1874e6790 glib-aux/tests: avoid valgrind leak with nmtst_add_test_func()
When only running a subset of the tests (with "-p"), then valgrind
indicates a leak. Avoid that.

  $ ./tools/run-nm-test.sh -m src/core/platform/tests/test-route-linux -v
  # no leak

  $ ./tools/run-nm-test.sh -m src/core/platform/tests/test-route-linux -v -p /route/ip4
  # many leaks:
  ==1662102== 107 (96 direct, 11 indirect) bytes in 1 blocks are definitely lost in loss record 388 of 448
  ==1662102==    at 0x4848464: calloc (vg_replace_malloc.c:1340)
  ==1662102==    by 0x4F615F0: g_malloc0 (gmem.c:163)
  ==1662102==    by 0x1621A6: _nmtst_add_test_func_full (nm-test-utils.h:918)
  ==1662102==    by 0x1623EB: _nmtstp_setup_tests (test-route.c:2179)
  ==1662102==    by 0x16E53D: main (test-common.c:2693)
  ==1662102==
  {
     <insert_a_suppression_name_here>
     Memcheck:Leak
     match-leak-kinds: definite
     fun:calloc
     fun:g_malloc0
     fun:_nmtst_add_test_func_full
     fun:_nmtstp_setup_tests
     fun:main
  }
2023-01-17 16:26:51 +01:00
Thomas Haller
43860e2b74 glib-aux/tests: add mechanims to track and free test data
This allows to free resources (a pointer) at the end of the test.
The purpose is to avoid valgrind warnings about leaks. While a leak
in the test is not a severe issue by itself, it does interfere with
checking for actual leaks. Thus every leak must be avoided.
2023-01-17 16:26:50 +01:00
Thomas Haller
d0dff07687 glib-aux/tests: embed testpath in NmtstTestData struct
Only allocate one chunk of memory to contain all data of
NmtstTestData.

This isn't about performance (which doesn't matter for test code).
It's about packing all in one struct and being able to free all at
once with a simple g_free(). We no longer need _nmtst_test_data_free()
with this.

Note that NmtstTestData is never mutated, it just holds some data.
As such, the single place where such a structure gets initialized,
can become a bit more complicated, in exchange for having a trivial
free operation (and anyway there no functions that modify the data
or that would care about the data layout).
2023-01-17 16:26:15 +01:00
Thomas Haller
e4104a9f12 glib-aux/tests: use struct initialization in _nmtst_add_test_func_full()
It's nicer, and doesn't require g_malloc0() to ensure all
fields are initialized.
2023-01-17 16:25:15 +01:00
Thomas Haller
eda7d08a02 glib-aux/tests: use C99 flexible array members for pointers in NmtstTestData
It's just nicer code. The previous code was correct, in particular, the
alignment of the data was most likely correct. Still, this is nicer.
2023-01-17 16:25:15 +01:00
Thomas Haller
6850766679 glib-aux/tests: expose current test name that is running under nmtst_add_test_func_full*() 2023-01-17 16:25:15 +01:00
Thomas Haller
3cd02b6ed6 libnm,platform: fix range for "weight" property of next hops for routes
In kernel, the valid range for the weight is 1-256 (on netlink this is
expressed as u8 in rtnh_hops, ranging 0-255).

We need an additional value, to represent

- unset weight, for non-ECMP routes in kernel.

- in libnm API, to express routes that should not be merged as ECMP
  routes (the default).

Extend the type in NMPlatformIP4Route.weight to u16, and fix the code
for the special handling of the numeric range.

Also the libnm API needs to change. Modify the type of the attribute on
D-Bus from "b" to "u", to use a 32 bit integer. We use 32 bit, because
we already have common code to handle 32 bit unsigned integers, despite
only requiring 257 values. It seems better to stick to a few data types
(u32) instead of introducing more, only because the range is limited.

Co-Authored-By: Fernando Fernandez Mancera <ffmancera@riseup.net>

Fixes: 1bbdecf5e1 ('platform: manage ECMP routes')
2023-01-17 14:05:13 +01:00
Lubomir Rintel
ff472d8e59 rpm: don't BR dhcp-client
We don't need the builder to install dhcp-client to build with support for
it. Requiring it to be installed it not great because it has runtime
implications -- it installs a dispatcher script.

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1503
2023-01-17 13:56:57 +01:00
Thomas Haller
7af9562f28 device: fix available-connections for a device for user-request
There are two callers of available_connections_add(). One from
cp_connection_added_or_updated() (which is when a connection
gets added/modified) and one from nm_device_recheck_available_connections().

They both call first nm_device_check_connection_available() to see
whether the profile is available on the device. They certainly
need to pass the same check flags, otherwise a profile might
be available in some cases, and not in others.

I didn't actually test this, but I think this could result
in a profile wrongly not being listed as an available-connection.
Moreover, that might mean, that `nmcli connection up $PROFILE`
might work to find the device/profile, but `nmcli device up $DEVICE`
couldn't find the suitable profile (because the latter calls
nm_device_get_best_connection(), which iterates the
available-connections). I didn't test this, because regardless of
that, it seems obvious that the conditions for when we call
available_connections_add() must be the same from both places.
So the only question is what is the right condition, and it would
seem that _NM_DEVICE_CHECK_CON_AVAILABLE_FOR_USER_REQUEST is the right
flag.

Fixes: 02dbe670ca ('device: for available connections check whether they are available for user-request')

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1496
2023-01-17 09:34:28 +01:00
Beniamino Galvani
f930d55fea all: add support for ovs-dpdk n-rxq-desc and n-txq-desc
https://bugzilla.redhat.com/show_bug.cgi?id=2156385

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1500
2023-01-17 08:45:04 +01:00
Lubomir Rintel
b8738002ed platform: retry link change on RESULT_FAILED_RESYNC
Sometimes the buffer space of the netlink socket runs out and we lose
the response to our link change:

  <info>  [1670321010.2952] platform-linux: netlink[rtnl]: read: too many netlink events. Need to resynchronize platform cache
  <warn>  [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3

With 3 above being WAIT_FOR_NL_RESPONSE_RESULT_FAILED_RESYNC.

Let's try harder.

https://bugzilla.redhat.com/show_bug.cgi?id=2154350
2023-01-16 12:52:40 +01:00
Thomas Haller
4721f83003 nmcli: avoid message about guessed wep-key-type when setting WEP password
$ nmcli --offline connection add type wifi con-name hotspot ssid hotspot-ssid wifi.mode ap wifi-sec.key-mgmt none wifi-sec.wep-key-type 1 wifi-sec.wep-key0 1234567890

would previously always print a message

  Info: WEP key is guessed to be of '1 (key)'

At least, when we explicitly set the key-type, this message is bogus.
Suppress it.

It's anyway questionable whether printing such warnings does anything good.
We would still get the warning with the arguments swapped, which seems wrong:

  $ nmcli --offline connection add type wifi con-name hotspot ssid hotspot-ssid wifi.mode ap wifi-sec.key-mgmt none wifi-sec.wep-key0 1234567890 wifi-sec.wep-key-type 1
  Info: WEP key is guessed to be of '1 (key)'

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1497
2023-01-16 10:28:29 +01:00
Thomas Haller
3b7e0ae083 firewall: merge branch 'th/iptables-wait'
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1182

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1495
2023-01-16 10:20:09 +01:00
Thomas Haller
84a71771d9 firewall: pass "--wait 2" to iptables to wait for concurrent invocations
iptables takes a file lock at /run/xtables.lock. By default, if
the file is locked, iptables will fail with error. When that happens,
the iptables rules won't be configured, and the shared mode
(for which we use iptables) will not be setup properly.

Instead, pass "--wait 2", to block. Yes, it's ugly that we use
blocking program invocations, but that's how it is. Also, iptables
should be fast to not be a problem in practice.
2023-01-16 10:19:39 +01:00
Thomas Haller
53422c8693 firewall: automatically add iptables path to _share_iptables_call() call
No need to redundantly specify the path. Also, next we will specify the
"--wait" option, so this will work better.
2023-01-16 10:19:34 +01:00
Lubomir Rintel
1e6fd1288d platform: log something nice about RESULT_FAILED_RESYNC
This is not nice:

  <warn>  [1670321010.3467] platform-linux: do-change-link[2]: failure changing link: internal failure 3

Let's explain what "internal failure 3" is.
2023-01-16 08:30:35 +01:00