If the 802.1X authentication fails and 802-1x.optional is set,
continue with activation. In this case, subscribe to the auth-state
supplicant property so that any dynamic IP method can be restarted
when the authentication succeeds. This is because upon authentication
the switch could have changed the VLAN we are connected to.
Refactor reading the phase2 authentication method for 802.1X.
Previously the reader only considered the first item of the
space-separated list; but since the 802.1x setting can hold distinct
phase2-auth and phase2-autheap properties - both mapped to the same
ifcfg-rh variable - we should parse the whole list. We only emit a
warning when multiple methods of the same type are found to avoid
breaking existing manually written ifcfg files.
Moreover, the reader implemented different checks for each of the
outer tunneled methods (PEAP, TTLS and FAST); drop those checks and
accept whatever the 802.1X setting also consider as valid. Note that
some combinations that are in principle valid, like PEAP + EAP-MD5,
were dropped before.
The pending action gets logged. We should not log plain pointer
values because they may be used to defeat ASLR.
Instead, construct the pending action using the "version_id". This
number is also unique, and suits sufficiently well. With debug logging
you can still grep the log for the corresponding active-connection (and
anyway it's obvious from the context).
These "pending-actions" only have one purpose: to mark the device
as busy and thereby delay "startup complete" to be reached. That
in turn delays "NetworkManager-wait-online" service.
Of course, "NetworkManager-wait-online" waits for some form of readiness
and is not extensively configurable (e.g. you cannot exclude devices from
being waited). However, the intent is to wait that all devices are "settled".
That means among others, that the timeouts waiting for carrier and Wi-Fi scan
results passed, and devices either don't have a connection profile to autoactivate,
or they autoactivated profiles and are in state "connected".
A major point here is that the device is considered ready, once it
reaches the state "connected". Note that if you configure both IPv4 and
IPv6 addressing modes, than "ipv4.may-fail=yes" and "ipv6.may-fail=yes"
means, that the device is considered fully activated once one address
family completes. Again, this is not very configurable, but by setting
"ipv6.may-fail=no", you can require that the device has indeed IPv6
addressing completed.
Now, the determining factor for declaring "startup complete" is whether the
device is in state "connected". That may or may not mean that DHCPv4,
autoconf or DHCPv6 completed, as it depends on a overall state of the
device. So, it is wrong to have distinct pending actions for these operations.
Remove them.
This fixes that we wrongly would wait too long before declaring startup
complete. But it is also a change in behavior.
The default timeout is really only here to abort the test when it
definitely failed to avoid keeping it hanging forever. It's not part
of any strict assertions and should be large.
Fixes: bb4b749595 ('clients/tests: don't wait for first job before scheduling parallel jobs')
Previously, the test would kick off 15 processes in parallel, but
the first job in the queue would block more processes from being
started.
That is, async_start() would only start 15 processes, but since none of
them were reaped before async_wait() was called, no more than 15 jobs
were running during the start phase. That is not a real issue, because
the start phase is non-blocking and queues all the jobs quickly. It's
not really expected that during that time many processes already completed.
Anyway, this was a bit ugly.
The bigger problem is that async_wait() would always block for the
first job to complete, before starting more processes. That means,
if the first job in the queue takes unusually long, then this blocks
other processes from getting reaped and new processes from being
started.
Instead, don't block only one one jobs, but poll them in turn for a
short amount of time. Whichever process exits first will be completed
and more jobs will be started.
In fact, in the current setup it's hard to notice any difference,
because all nmcli invocations take about the same time and are
relatively fast. That this approach parallelizes better can be seen
when the runtime of jobs varies stronger (and some invocations take
a notably longer time). As we later want to run nmcli under valgrind,
this probably will make a difference.
An alternative would be not to poll()/wait() for child processes,
but somehow get notified. For example, we could use a GMainContext
and watch child processes. But that's probably more complicated
to do, so let's keep the naive approach with polling.
We try to set only one time the MTU from the connection to not
interfere with manual user changes.
If at some point the parent interface changes temporarily MTU to a
lower value (for example, because the connection was reactivated), the
kernel will also lower the MTU on child interface and we will not
update it ever again.
Add a workaround to this. If we detect that the MTU we want to set
from connection is higher that the allowed one, go into a state where
we follow the parent MTU until it is possible to set again the desired
MTU. This is a bit ugly, but I can't think of any nicer way to do it.
https://bugzilla.redhat.com/show_bug.cgi?id=1751079
A MACsec connection doesn't have an ordering dependency with its
parent connection and so it's possible that the parent gets activated
later and sets a greater MTU than the original one.
It is reasonable and useful to keep the MACsec MTU configured by
default as the maximum allowed by the parent interface, that is the
parent MTU minus the encapsulation overhead (32). The user can of
course override this by setting an explicit value in the
connection. We already do something similar for VLANs.
https://bugzilla.redhat.com/show_bug.cgi?id=1723690
Introduce a generic function to set a MTU based on parent's one. Also
define a device-specific @mtu_parent_delta value that specifies the
difference from parent MTU that should be set by default. For VLAN it
is zero but other interface types (for example MACsec) require a
positive value due to encapsulation overhead.
Since commit 159ff23268 ('dhcp/dhclient-utils: skip over
dhclient.conf blocks') we skip blocks enclosed in lines containing '{'
and '}' because NM should ignore 'lease', 'alias' and other
declarations. However, conditional statements seem useful and should
not be skipped.
https://bugzilla.redhat.com/show_bug.cgi?id=1758550
PMF can be used with SAE, allow it. Actually, it is required according
to WPA3 specifications but there are implementations that don't
require it (hostapd can be configured in a such way); so let's not
make it mandatory for WPA3.
Fixes: 6640fb4b36 ('supplicant: add support for SAE key management')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/257
The script is run for every dispatcher event. Most of the events are not
actually relevant, and we just need to determine that there is nothing
to do and quit.
Avoid calling "dirname" and "basename".
The supported ifcfg-file has a very specific form. We can just check
(and parse) it in one got regular expression in bash.
A shell script is executed line-by-line. Note that for most dispatcher
events, "10-ifcfg-rh-routes.sh" has nothing to do and will just quit.
Move those checks earlier, to avoid bash executing the code that won't
be needed most of the time.
found by cppcheck
[src/devices/nm-device.c:3032] -> [src/devices/nm-device.c:3025]: (warning) Either the condition '!handle' is redundant or there is possible null pointer dereference: handle.
https://github.com/NetworkManager/NetworkManager/pull/352
The previous commit marks all synchronous libnm API as deprecated.
In practice, the macro _NM_DEPRECATED_SYNC_METHOD expands to
nothing, because there is no immediate urgency to force users
to migrate.
However nm_client_check_connectivity() is especially bad: it
makes a synchronous call and then updates the content of the
cache artificially. Usually, NMClient's cache of D-Bus objects
is only updated by "PropertiesChanged" D-Bus signals.
nm_client_check_connectivity() instead will act on the response to
the "CheckConnectivity" D-Bus call -- a response that is picked
out of order from the ordered sequence of messages -- and will
update the cache instead of honoring the usual "PropertiesChanged"
signal.
I think such behavior is fundamentally broken. For a trivial property like
NM_CLIENT_CONNECTIVITY such behavior is odd at best. Note how applying
this approach to other functions (like nm_client_deactivate_connection(),
which would affect a much larger state) would not be feasible.
I also imagine it to be complicate to preserve this behavior when
reworking libnm, as I plan to do.
See also commit b799de281b ('libnm: update property in the manager
after connectivity check'), which introduced this behavior to "fix"
bgo#784629.
Note that D-Bus is fundamentally asynchronous. Doing blocking calls
on top of D-Bus is odd, especially for libnm's NMClient. That is because
NMClient essentially is a client-side cache of the objects from the D-Bus
interface. This cache should be filled exclusively by (asynchronous) D-Bus
events (PropertiesChanged). So, making a blocking D-Bus call means to wait
for a response and return it, while queuing all messages that are received
in the meantime.
Basically there are three ways how a synchronous API on NMClient could behave:
1) the call just calls g_dbus_connection_call_sync(). This means
that libnm sends a D-Bus request via GDBusConnection, and blockingly
waits for the response. All D-Bus messages that get received in the
meantime are queued in the GMainContext that belongs to NMClient.
That means, none of these D-Bus events are processed until we
iterate the GMainContext after the call returns. The effect is,
that NMClient (and all cached objects in there) are unaffected by
the D-Bus request.
Most of the synchronous API calls in libnm are of this kind.
The problem is that the strict ordering of D-Bus events gets
violated.
For some API this is not an immediate problem. Take for example
nm_device_wifi_request_scan(). The call merely blockingly tells
NetworkManager to start scanning, but since NetworkManager's D-Bus
API does not directly expose any state that tells whether we are
currently scanning, this out of order processing of the D-Bus
request is a small issue.
The problem is more obvious for nm_client_networking_set_enabled().
After calling it, NM_CLIENT_NETWORKING_ENABLED is still unaffected
and unchanged, because the PropertiesChanged signal from D-Bus
is not yet processed.
This means, while you make such a blocking call, NMClient's state
does not change. But usually you perform the synchronous call
to change some state. In this form, the blocking call is not useful,
because NMClient only changes the state after iterating the GMainContext,
and not after the blocking call returns.
2) like 1), but after making the blocking g_dbus_connection_call_sync(),
update the NMClient cache artificially. This is what
nm_manager_check_connectivity() does, to "fix" bgo#784629.
This also has the problem of out-of-order events, but it kinda
solves the problem of not changing the state during the blocking
call. But it does so by hacking the state of the cache. I think
this is really wrong because the state should only be updated from
the ordered stream of D-Bus messages (PropertiesChanged signal and
similar). When libnm decides to modify the state, there may be already
D-Bus messages queued that affect this very state.
3) instead of calling g_dbus_connection_call_sync(), use the
asynchronous g_dbus_connection_call(). If we would use a sepaate
GMainContext for all D-Bus related calls, we could ensure that
while we block for the response, we iterate that internal main context.
This might be nice, because all events are processed in order and
after the blocking call returns, the NMClient state is up to date.
The are problems however: current blocking API does not do this,
so it's a significant change in behavior. Also, it might be
unexpected to the user that during the blocking call the entire
content of NMClient's cache might change and all pointers to the
cache might be invalidated. Also, of course NMClient would invoke
signals for all the changes that happen.
Another problem is that this would be more effort to implement
and it involves a small performance overhead for all D-Bus related
calls (because we have to serialize all events in an internal
GMainContext first and then invoke them on the caller's context).
Also, if the users wants this behavior, they could implement it themself
by running libnm in their own GMainContext. Note that libnm might
have bugs to make that really working, but that should be fixed
instead of adding such synchrnous API behavior.
Read also [1], for why blocking calls are wrong.
[1] https://smcv.pseudorandom.co.uk/2008/11/nonblocking/
So, all possible behaviors for synchronous API have severe behavioural
issues. Mark all this API as deprecated. Also, this serves the purpose of
identifying blocking D-Bus calls in libnm.
Note that "deprecated" here does not really mean that the API is going
to be removed. We don't break API. The user may:
- continue to use this API. It's deprecated, awkward and discouraged,
but if it works, by all means use it.
- use asynchronous API. That's the only sensible way to use D-Bus.
If libnm lacks a certain asynchronous counterpart, it should be
added.
- use GDBusConnection directly. There really isn't anything wrong
with D-Bus or GDBusConnection. This deprecated API is just a wrapper
around g_dbus_connection_call_sync(). You may call it directly
without feeling dirty.
---
The only other remainging API is the synchronous GInitable call for
NMClient. That is an entirely separate beast and not particularly
wrong (from an API point of view).
Note that synchronous API in NMSecretAgentOld, NMVpnPluginOld and
NMVpnServicePlugin as not deprecated here. These types are not part
of the D-Bus cache and while they have similar issues, it's less severe
because they have less state.
When opening a merge request from a fork of NetworkManager, then the
pipeline runs with the a checkout of the fork. That means, checkpatch
would compare the branch against "master" (or "nm-x-y" stable branches)
of the fork, instead of upstream.
That doesn't seem too useful. Instead, also add upstream NetworkManager
as git remote, fetch the branches, and use the branches from there as
base for checkpatch.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/issues/255
The Bluetooth DUN device's NMModem would signal the reset of ifindex to zero
when it's disconnected and the NMDeviceBt would accordingly update the
bluetooth device's ip ifindex. This is not okay since commit ab4578302d
('device: refactor nm_device_set_ip_ifindex() and set_ip_iface()') which,
although claiming to be a refactoring, made such use of
nm_device_set_ip_ifindex() illegal. Resetting the ifindex is anyway not
necessary, since it's taken care of _cleanup_generic_post().
Let's leave the ifindex alone once the device is activated, in a manner
analogous to what NMDeviceModem.
Fixes: ab4578302d ('device: refactor nm_device_set_ip_ifindex() and set_ip_iface()')
Fixes: 78ca2a70c7 ('device: don't set invalid ip-iface'):
Otherwise, the script tries to run
dbus-run-session -- exec ...
which fails (because `exec` is a shell command, not a program).
After the failure, the code falls through to run the test under
valgrind.
Fixes: 6a58c55ca4 ('run-nm-test: Just use exec instead of running and exiting')
If a new command was requested while a client was in the process of being
created we were just requesting a new client.
This was causing leak, so let's strongly ensure this is not the case.