- "priv->nlh" to "priv->sk_rtnl": as we also have an genl socket,
"nlh" is not a good name. The point is that this is rtnetlink.
Also, "h" sounds like a handle, that is, a file descriptor.
Make this clearer with a "sk_" prefix.
- "priv->genl" to "priv->sk_genl_sync": This socket is only used for synchronous
operations, that is, it is passed to various independent components, that use
it to send a request and wait for the response (while consuming all messages).
We will have a use for a second socket, hence the "_sync" part.
The "sk_" prefix is for consistency with "sk_rtnl".
- "priv->event_source" to "priv->rtnl_event_source". Just make it
clearer, that this is for the rtnetlink socket. In any case,
this field is hardly used at all, it can have a sturdy name.
Sockets are really a fundamental thing we require to operate.
We cannot meaningfully operate, if we fail to create them.
That is also why a too low file descriptor limit is fatal
and unsupported. This is similar with out of memory situations.
Just require that we always are able to create the generic
netlink socket.
There are only two callers of nl_socket_new(). One for NETLINK_GENERIC
and one for NETLINK_ROUTE.
We already were enabling ext-ack for the rtnetlink socket. Also enable
it for the genl socket.
Do that, but just moving this inside nl_socket_new(). I cannot imagine a
case where we don't want this.
Create and use new nl_socket_new().
nl_socket_alloc() really does nothing but allocating the struct and
initializing the fd to -1. In all cases, we want to call nl_connect()
right after.
Combine the two. Then we also cannot have a "struct nl_sock" without a
valid fd. This means several error checks can be dropped.
Note that former nl_connect() did several things at once. Maybe, for
more flexibility one would need to tweak what should be done there.
For now that is not necessary. In any case, if we need more flexibility,
then we would control what nl_connect() (now nl_socket_new()) does, and not
the split between nl_socket_alloc() and nl_connect().
Comments on the same line as field names are not rendered well by clang-format.
Even if manually edited, it seems not a preferable way to comment on a field.
Move the comment in the line before.
The property wait-activation-delay will delay the activation of an
interface the specified amount of milliseconds. Please notice that it
could be delayed some milliseconds more due to other events in
NetworkManager.
This could be used in multiple scenarios where the user needs to define
an arbitrary delay e.g LACP bond configure where the LACP negotiation
takes a few seconds and traffic is not allowed, so they would like to
use nm-online and a setting configured with this new property to wait
some seconds. Therefore, when nm-online is finished, LACP bond should be
ready to receive traffic.
The delay will happen right before the device is ready to be activated.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1248https://bugzilla.redhat.com/show_bug.cgi?id=2008337
This is an interface to the Checkpoint/Restore functionality that's
available for quite some time. It runs a command with a checkpoint taken
and rolls back unless success is confirmed before the checkpoint times
out:
$ nmcli dev checkpoint eth0 -- nmcli dev dis eth0
Device 'eth0' successfully disconnected.
Type "Yes" to commit the changes: No
Checkpoint was removed.
The details about how it's used are documented in nmcli(1) and
nmcli-examples(7).
When the input ends, we indeed eventually want to shut down.
Nevertheless, it might be that we terminated the input *because* we're
already shutting down and want do do our cleanup. Let's not take the
shortcut to nmc_exit() in case the main loop is no longer running.
This doesn't affect existing uses of nmc_readline(), but will be useful
in a future patch.
This makes get_device_list() return an array of NMDevices with a
reference taken and a destroy notifier that unhooks disconnect_state_cb,
so that it could replace the GSList of the same utility used by
disconnect/delete commands.
Suggested-by: Thomas Haller <thaller@redhat.com>
A pointer array is slightly more efficient here, since we don't really
need the ability to insert elements in the middle. In fact, we'd prefer
if we could just add to the end, so that we'd spare some callers from a
need to do a g_slist_reverse().
Even though that alone being a good reason to use a GPtrArray instead of
GSList, I'm doing this for so that I could actually use the returned value
as-is in a call to nm_client_checkpoint_create() in a future patch.
Don't consider "--" a device name. Instead, treat it as a signal to stop
reading the device list.
If a caller expects nothing beyond the device names, it now has to
check.
Prior to this patch, get_device_list() would give the caller no clue
about how many options did it consume. That is okay -- it would always
process all argument until the end, so the no callers would really care.
In a further patch, I'd like to allow termination of the device name
list (with a "--" arguments), so it will be possible to specify further
arguments.
Let's change the protype of this routine to use pointers to argc/argv,
that it will be possible to adjust them.
When we're deactivating an externally created device that has a master
because we're activating a connection on it, actually remove the device
from the master. Otherwise unpleasant things happen:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
This patch changes that.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above. Sad.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now things go south. The stage1 cleans the device up, removing it from
the master and the device itself decides it should deactivate itself
because it lots its master regardless of the fact that it should not
have one and it's in fact an unwanted carryover from previous activation.
I believe this is also wrong.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
device[0a458361f9fed8f5] (dummy0): queue-state[deactivating, reason:connection-assumed, id:298]: queue state change
device[0a458361f9fed8f5] (dummy0): activation-stage: synchronously invoke activate_stage2_device_config
device (dummy0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Now things are really weird. We synchronously go to config, effectively
overriding the queued deactivation. We've really messed up.
Sometimes weird things happen.
Let dummy0 be an externally created device that has a master. We decide
to activate a connection that has no master on it:
active-connection[0x55ed7ba78400]: constructed (NMActRequest, version-id 4, type managed)
device[0a458361f9fed8f5] (dummy0): sys-iface-state: external -> managed
device[0a458361f9fed8f5] (dummy0): queue activation request waiting for currently active connection to disconnect
device (dummy0): disconnecting for new activation request.
device (dummy0): state change: activated -> deactivating (reason 'new-activation', sys-iface-state: 'managed')
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (enslaved)(no-config)
Note the "no-config" above. We'set priv->master = NULL, but didn't
communicate the change to the platform. I believe this is not good.
device (br0): bridge port dummy0 was detached
device (dummy0): released from master device br0
active-connection[0x55ed7ba782e0]: set state deactivating (was activated)
device (dummy0): ip4: set state none (was done, reason: ip-state-clear)
device (dummy0): ip6: set state none (was done, reason: ip-state-clear)
device (dummy0): state change: deactivating -> disconnected (reason 'new-activation', sys-iface-state: 'managed')
platform: (dummy0) emit signal link-changed changed: 102: dummy0
<NOARP,UP,LOWER_UP;broadcast,noarp,up,running,lowerup> mtu 1500 master 101 arp 1 dummy* init
addrgenmode none addr EA:8D:DD:DF:1F:B7 brd FF:FF:FF:FF:FF:FF driver dummy rx:0,0 tx:39,4746
Now the platform sent us a new link, the "master" property is still set.
device[0a458361f9fed8f5] (dummy0): queued link change for ifindex 102
device[0a458361f9fed8f5] (dummy0): deactivating device (reason 'new-activation') [60]
device (dummy0): ip: set (combined) state none (was done, reason: ip-state-clear)
config: device-state: write #102 (/run/NetworkManager/devices/102); managed=managed, perm-hw-addr-fake=EA:8D:DD:DF:1F:B7, route-metric-default=0-0
active-connection[0x55ed7ba782e0]: set state deactivated (was deactivating)
active-connection[0x55ed7ba782e0]: check-master-ready: already signalled (state deactivated, master 0x55ed7ba781c0 is in state activated)
device (dummy0): Activation: starting connection 'dummy1' (ec6fca51-84e6-4a5b-a297-f602252c9f69)
device[0a458361f9fed8f5] (dummy0): activation-stage: schedule activate_stage1_device_prepare
l3cfg[ae290b5c1f585d6c,ifindex=102]: emit signal (platform-change-on-idle, obj-type-flags=0x2a)
device (br0): master: add one slave 0a458361f9fed8f5/dummy0
Amidst the new activation we're processing the netlink message we got.
We set priv->master back, effectively nullifying the release above.
device (dummy0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'in-state-change'
active-connection[0x55ed7ba78400]: set state activating (was unknown)
manager: NetworkManager state is now CONNECTING
active-connection[0x55ed7ba78400]: check-master-ready: not signalling (state activating, no master)
device[8fff58d61c7686ce] (br0): slave dummy0 state change 30 (disconnected) -> 40 (prepare)
device[0a458361f9fed8f5] (dummy0): remove_pending_action (1): 'in-state-change'
device (br0): master: release one slave 0a458361f9fed8f5/dummy0 (not enslaved) (force-configure)
platform: (dummy0) link: releasing 102 from master 'br0' (101)
device (br0): detached bridge port dummy0
Now stage1 cleans the device up, removing it from the master.
device[0a458361f9fed8f5] (dummy0): Activation: connection 'dummy1' master deactivated
device (dummy0): ip4: set state none (was pending, reason: ip-state-clear)
device (dummy0): ip6: set state none (was pending, reason: ip-state-clear)
device[0a458361f9fed8f5] (dummy0): add_pending_action (2): 'queued-state-change-deactivating'
We decide to deal with this by enqueuing a deactivation. That is not
great -- we shouldn't even have had this master!
This patch takes the deactivation path only if we were willingly
enslaved to the master in question.
The @bond_mode_8023ad test has been seen failing, with a log like this:
<debug> [...3.0484] device[...] (eth1): Activation: connection 'bond0.0' master deactivated
<debug> [...3.0484] device[...] (eth1): add_pending_action (2): 'queued-state-change-deactivating'
<debug> [...3.0484] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: queue state change
What happened is that eth1 has been activating. It was already enslaved
to a bond and was in an ip-config state when the bond was removed.
A change to "deactivating" state has been enqueued. But then this
happened:
<trace> [...3.0942] device[...] (eth1): ip4: check-state: state done => done, is_failed=0, is_pending=0,
is_started=0 temp_na=0, may-fail-4=1, may-fail-6=1; disabled4; manualip4=done; ignore6 manualip6=done
<trace> [...3.0942] device[...] (eth1): ip: check-state: (combined) state pending => done
<debug> [...3.0943] device[...] (eth1): ip: set (combined) state done (was pending, reason: check-ip-state)
<info> [...3.0943] device (eth1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
<debug> [...3.0943] device[...] (eth1): add_pending_action (3): 'in-state-change'
<debug> [...3.0943] device[...] (eth1): queue-state[deactivating, reason:new-activation, id:709]: clear queued state change
The IP config succeeded and the queued "deactivating" change was
overriden by the IP4 check result, prompting a change to "ip-check".
With the master still missing. Not good.
Let's terminate the appempts to check the IP state when we cancel the
activation, so that it doesn't override the enqueued state change.
Fixes-test: @bond_mode_8023ad
https://bugzilla.redhat.com/show_bug.cgi?id=2080928https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1245
pppd also tries to configure addresses by itself through some
ioctls. If we remove between those calls an address that was added,
pppd fails and quits.
To avoid this race condition, don't remove addresses while IPCP and
IPV6CP are running. Once pppd sends an IP configuration, it has
finished configuring the interface and we can proceed normally.
https://bugzilla.redhat.com/show_bug.cgi?id=2085382
Currently we call nm_device_update_dynamic_ip_setup() in
carrier_changed() every time the carrier goes up again and the device
is activating, to kick a restart of DHCP.
Since we process link events in a idle handler, it can happen that the
handler is called only once for different events; in particular
device_link_changed() might be called once for a link-down/link-up
sequence.
carrier_changed() is "level-triggered" - it cares only about the
current carrier state. nm_device_update_dynamic_ip_setup() should
instead be "edge-triggered" - invoked every time the link goes from
down to up. We have a mechanism for that in device_link_changed(), use
it.
Fixes-test: @ipv4_spurious_leftover_route
https://bugzilla.redhat.com/show_bug.cgi?id=2079406https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1250
ipv6 DNS received on ppp interface were being ignored because their
priority was not set.
Fix this by using default priority in impl_ppp_manager_set_ip6_config(),
as was done for ip4_config in b2e559fab2 ("core: initialize l3cd
dns-priority for ppp and wwan")
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/issues/1022
Yes, we anyway log the timestamps for every log message. So one could
always calculate the offset. However, when you read a logfile, it can be
cumbersome to stop looking at where you currently are to find the
start/end of a call. For convenience, log the duration explicitly.
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1251
- add code comments explaining some things.
- for NM_CMP_FIELD*() variants have a corresponding NM_CMP_DIRECT*()
macro and use it (aside the "memcmp" variants, which don't translate
directly).
l3cd instances must be removed from the old l3cfg before calling
_cleanup_ip_pre(). Otherwise, _cleanup_ip_pre() unregisters them from
the device, and later _dev_l3_register_l3cds(self, l3cfg_old, FALSE,
FALSE) does nothing because the device doesn't have any l3cd.
Previously the l3cds would linger in the l3cfg, keeping a reference to
it and causing a memory leak; the leak was not detected by valgrind
because the l3cfg was still referenced by the NMNetns.
Fixes: 58287cbcc0 ('core: rework IP configuration in NetworkManager using layer 3 configuration')
Fixes-test: @stable_mem_consumption2
https://bugzilla.redhat.com/show_bug.cgi?id=2083453https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1252
7db7dc4bab53 probe: merge branch 'th/decline-fixes'
bb61737788dd probe: fix internal state after declining lease
c5d0f38ab7a9 probe: maintain the probe's lease list in "n-dhcp4-c-probe.c"
48bf2788336e probe: return error when calling accept/decline/select in unexpected state
git-subtree-dir: src/n-dhcp4
git-subtree-split: 7db7dc4bab5312218135464d8550a86845ca6fdd