dhcpcd v5.99 and later automatically enabled IPv6 behavior unless
specifically disabled. This is undesirable for two reason:
1) dhcpcd sends IPv4 Router Solicitations, which NetworkManager
handles itself, so there's no need to do it twice. NetworkManager
knows better than dhcpcd whether IPv6 is supposed to be used for
that interface or not.
2) Some devices don't react well to IPv6 when they aren't expecting
it. For example, older Qualcomm Gobi-based devices will listen
for Router Solicitations and attempt to set up IPv6, but if other
settings are not done correctly, or the firmware doesn't actually
support it, the firmware will then crash. So simply upgrading your
dhcpcd from 5.x to 6.x magically stops WWAN working for these
devices.
Previously, the device activation would stall in this case, because
the code wasn't expecting it to happen. In particular, this happens
when trying to assume a device that is up but has no IP config.
https://bugzilla.gnome.org/show_bug.cgi?id=715181
When request for getting secrets is being freed in request_free(),
cancel_callback is get_cancel_cb(). It uses parent->current as a secret agent
object. However, this object can be already freed and thus there is a problem
getting priv in nm_secret_agent_cancel_secrets:
g_return_if_fail (self != NULL);
priv = NM_SECRET_AGENT_GET_PRIVATE (self);
(gdb) p self
$66 = (NMSecretAgent *) 0x7fae9afd42e0
(gdb) p *self
$67 = {parent = {g_type_instance = {g_class = 0x0}, ref_count = 0, qdata = 0x0}}
#0 nm_secret_agent_cancel_secrets (self=0x7fae9afd42e0, call=0x1) at settings/nm-secret-agent.c:325
#1 0x00007fae9a774882 in request_free (req=0x7fae9afc48f0) at settings/nm-agent-manager.c:496
#2 0x00007fae967b251a in g_hash_table_remove_internal (hash_table=0x7fae9aefdf00, key=0x2, notify=1) at ghash.c:1276
#3 0x00007fae9a72b340 in dispose (object=0x7fae9af77200) at nm-activation-request.c:446
#4 0x00007fae96cbeee8 in g_object_unref (_object=0x7fae9af77200) at gobject.c:3160
#5 0x00007fae9a73d87c in _active_connection_cleanup (user_data=<optimized out>) at nm-manager.c:359
#6 0x00007fae967c32a6 in g_main_dispatch (context=0x7fae9aedb180) at gmain.c:3066
#7 g_main_context_dispatch (context=context@entry=0x7fae9aedb180) at gmain.c:3642
#8 0x00007fae967c3628 in g_main_context_iterate (context=0x7fae9aedb180, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3713
#9 0x00007fae967c3a3a in g_main_loop_run (loop=0x7fae9aedb860) at gmain.c:3907
So we need to ref() 'agent' when adding it to pending list, so that the object
is not freed if the secret agent unregisters and is removed.
Test case:
1. run NM and nm-applet
2. activate a Wi-Fi network
3. nm-applet will ask for a password; ignore the popup window and kill nm-applet
4. start nm-applet again
5. click the same Wi-Fi network in nm-applet
6. NM will experience problems in nm_secret_agent_cancel_secrets() or crashes
(the procedure may not be 100%, but reproduces most of the time)
https://bugzilla.redhat.com/show_bug.cgi?id=922855
GLib-CRITICAL **: g_hash_table_iter_next: assertion 'ri->version == ri->hash_table->version' failed
It is not allowed to modify hash table while it is iterated. Unfortunately,
request_remove_agent() may remove the request from the 'requests' hash table,
making it not usable in the loop hash table looping.
We need to store the request into a temporary list and call request_next_agent()
on them later (after the hash loop).
Test case:
1. start NM and nm-applet
2. activate a Wi-Fi WPA connection
3. nm-applet displays a dialog asking for a password
4. kill nm-applet
5. NetworkManager removes the nm-applet's secret agent
and runs into removing the request from hash table in the
iterating loop (via get_complete_cb)
#0 get_complete_cb (parent=0x7f3f250f2970, secrets=0x0, agent_dbus_owner=0x0, agent_username=0x0, error=0x7f3f250f7830, user_data=0x7f3f25020e10)
at settings/nm-agent-manager.c:1111
#1 0x00007f3f23b46ea5 in req_complete_error (error=0x7f3f250f7830, req=0x7f3f250f2970) at settings/nm-agent-manager.c:509
#2 request_next_agent (req=0x7f3f250f2970) at settings/nm-agent-manager.c:615
#3 0x00007f3f23b48596 in request_remove_agent (agent=0x7f3f250f4a20, req=0x7f3f250f2970) at settings/nm-agent-manager.c:631
#4 remove_agent (self=<optimized out>, owner=0x7f3f250dbff0 ":1.275") at settings/nm-agent-manager.c:130
#5 0x00007f3f23b4868d in impl_agent_manager_unregister (self=0x7f3f25020e10, context=0x7f3f250f5480) at settings/nm-agent-manager.c:374
#0 0x00007f3f1fb9c4e9 in g_logv (log_domain=0x7f3f1fbfef4e "GLib", log_level=G_LOG_LEVEL_CRITICAL, format=<optimized out>, args=args@entry=0x7fff156b77c0) at gmessages.c:989
#1 0x00007f3f1fb9c63f in g_log (log_domain=log_domain@entry=0x7f3f1fbfef4e "GLib", log_level=log_level@entry=G_LOG_LEVEL_CRITICAL,
format=format@entry=0x7f3f1fc0889a "%s: assertion '%s' failed") at gmessages.c:1025
#2 0x00007f3f1fb9c679 in g_return_if_fail_warning (log_domain=log_domain@entry=0x7f3f1fbfef4e "GLib",
pretty_function=pretty_function@entry=0x7f3f1fc03c30 <__PRETTY_FUNCTION__.4571> "g_hash_table_iter_next",
expression=expression@entry=0x7f3f1fc038f0 "ri->version == ri->hash_table->version") at gmessages.c:1034
#3 0x00007f3f1fb849c0 in g_hash_table_iter_next (iter=<optimized out>, key=<optimized out>, value=<optimized out>) at ghash.c:733
#4 0x00007f3f23b484e5 in remove_agent (self=<optimized out>, owner=0x7f3f250dbff0 ":1.275") at settings/nm-agent-manager.c:129
#5 0x00007f3f23b4868d in impl_agent_manager_unregister (self=0x7f3f25020e10, context=0x7f3f250f5480) at settings/nm-agent-manager.c:374
There can be multiple default routes for an interface with different
metrics. Grab the gateway of the default route with the lowest
metric as the overall gateway of the IP config. Otherwise the rest
could get left in the config and applied at random times.
If the interface who's IP configuration is being captured has the default
route, then read DNS servers from resolv.conf into the NMIP[4|6]Config.
This allows NetworkManager to repopulate resolv.conf if anything changes.
For example, if the system does not define a persistent hostname, then
when a device which has generated a connection activates, a hostname
lookup will be performed. The results of that lookup may change resolv.conf,
and thus NetworkManager must rewrite resolv.conf. Without capturing
DNS information at startup when generating connections, an empty
resolv.conf would be written.
When generating connections at startup for active interfaces, the
generation code may not always be able to read DNS information for
the connection. Thus, the device's IP4Config won't have any
nameservers and the device won't be considered for reverse-address
lookup. However, since any device that gets this far is already
the "best" device and has the default route, and thus should be the
one used for reverse-address lookup.
Second, reorganize the code better handle dual-stack in the
future by checking the IP configs directly, instead of the
devices. Since 'best4' and 'best6' may be different devices,
we want to operate on the IP configs, not devices, to handle
situations where the best IP4Config may not be suitable for
reverse lookup, but the best IP6Config is.
https://bugzilla.redhat.com/show_bug.cgi?id=1031763
When doing nm_auth_chain_unref(), the code iterated over the ->calls and
cancelled them. However, some of these calls might not have passed on to
polkit_authority_check_authorization(), but instead being scheduled with
g_idle_add(). These calls have to be canceled too because the NMAuthChain
will already be destroyed when auth_call_complete() calls.
Now, we g_source_remove() these calls and free them immediatly. Before
these calls leaked and led to use after free crash.
Also fix a memory leak by always get the results with
polkit_authority_check_authorization_finish(), even when being
cancelled.
This is the backtrace of the crash:
#0 0x00007f166efda359 in g_slist_remove () from /lib64/libglib-2.0.so.0
#1 0x00007f167311bcc1 in auth_call_complete ()
#2 0x00007f166efbde06 in g_main_context_dispatch ()
from /lib64/libglib-2.0.so.0
#3 0x00007f166efbe158 in g_main_context_iterate.isra.22 ()
from /lib64/libglib-2.0.so.0
#4 0x00007f166efbe55a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#5 0x00007f16730d3c0d in main ()
Co-Authored-By: Dan Williams <dcbw@redhat.com>
Signed-off-by: Thomas Haller <thaller@redhat.com>
The router has no idea what the local configuration or user preferences are,
so sending routes with a prefix length of 0 is at best misinformed and at
worst breaks things. The kernel also ignores plen=0 routes in its in-kernel
RA processing code in net/ipv6/ndisc.c.
https://bugzilla.redhat.com/show_bug.cgi?id=1029213
Else loopback is managed, and could be easily disconnected, which causes various
issues with applications. So do not manage it for now, to be on the safer side.
https://bugzilla.redhat.com/show_bug.cgi?id=1032594
When activating a team slave and 'teamd' binary is not installed, the
priv->state of master device will be NM_DEVICE_STATE_FAILED, which is greater
than NM_DEVICE_STATE_ACTIVATED.
<info> Activation (nm-team) Stage 1 of 5 (Device Prepare) started...
<warn> Activation (nm-team) to start teamd: not found
<info> (nm-team): device state change: prepare -> failed (reason 'none') [40 120 0]
...
<debug> master_state_cb(): (0x81d6968): master ActiveConnection [0x91d69d0] 'team0' failed
<info> (eth1): device state change: config -> failed (reason 'dependency-failed') [50 120 50]
...
<debug> slave_state_changed(): (nm-team): slave eth1 state change 50 (config) -> 120 (failed)
--- ASSERTION ---
Set accept_ra and use_tempaddr to "0" when managing a device (and
restore them to their original values after unmanaging it) to ensure
that calling nm_device_bring_up() on a managed device won't ever cause
kernel IPv6 autoconf to happen. Remove some other redundant accept_ra
setting.
Fix up the deconfigure case of dispose() to clear the device's IP6
config as well as its IP4 config.
https://bugzilla.gnome.org/show_bug.cgi?id=700414
update_accept_ra_path() and update_ip6_privacy_save() were freeing
their path variables if they failed to read the existing values, but
if this ever actually happened it would cause problems later since
other code assumed that the variables were always set. Use
"priv->ip6_accept_ra_save = -1", etc, instead to indicate that the
value couldn't be read (and so shouldn't be restored later).
Merge the accept_ra and use_tempaddr code save/restore code together,
since they're always called together.
Fix the accept_ra-restoring code to correctly handle an original value
of "2".
Call update_ip6_properties_paths() from nm_device_set_ip_iface()
rather than act_stage3_ip6_config_start(), since set_ip_iface() is
when the paths actually change. Also, split the default-value-saving
code out into a separate function, since we only care about doing that
at construct time; if the IP6 property paths change later (because
iface != ip_iface), then we don't need to save and restore the values
on the ip_iface, since the interface will go away when we're done with
it.
https://bugzilla.gnome.org/show_bug.cgi?id=700414
Rename ip6_privacy_tempaddr_path and ip6_privacy_tempaddr_save to
ip6_use_tempaddr_*, to match the sysctls, for consistency with the
accept_ra variables.
We used to call nm_device_deactivate() when moving a device from
UNMANAGED to UNAVAILABLE (unless we were assuming the existing
connection), but this got lost when default-unmanaged was added. Fix
it to do this again, so the device will be in a known-clean state when
it is activated.
Previously, the default wired connection was removed on quit when the
device was cleaned up. This is inconsistent with other connections.
Leave the default wired connection up when quitting to fix this
inconsistency.
This allows default wired connections to be assumed when NM starts.
It is useful to bind the loopack connection to the loopback interface,
and it also allows activating it.
$ nmcli con up lo
Else "Error: no device found for connection 'lo'" is returned, because
connection_compatible() in libnm-glib/nm-device-generic.c wants the
connection to have an interface-name set.
The change to allow an NMConnection to only be active on a single
device accidentally broke the case of re-activating a connection on
the same device. Fix that.
Changing the default wired connection has always deleted the connection
(thus disconnecting the interface) and re-added it as a settings plugin
connection. That was always sub-optimal, but until the 'unsaved' connection
stuff landed this summer, we couldn't do anything about that. Clean
that all up, adding the connection as an unsaved connection right from
the start, which allows changes to the connection without having to
delete and recreate it, thus preventing disconnection of any interface
that is using the connection.
A new signal is added to NMSettingsConnection that is only emitted when
the connection is changed from D-Bus (thus indicating an explicit user-
requested change) since the connection may be modified internally by
NetworkManager. NM-triggered changes should not result in the connection
no longer being a default-wired connection.
https://bugzilla.gnome.org/show_bug.cgi?id=712188https://bugzilla.redhat.com/show_bug.cgi?id=1029464
If the connection has never been saved to disk, it won't have a path yet,
but that doesn't mean we should crash. Next, when reloading connections,
only try to do connection matching on connections that have paths, otherwise
all in-memory-only connections would be removed at the end of
read_connections().
Doing so may cause NetworkManager to run into an very intensive loop in
svUnescape() in shvar.c.
This is 'top' output for very long (invalid team config) - 9309865 bytes long:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
26855 root 20 0 305m 35m 6092 R 99.8 0.9 8:08.11 NetworkManager
and still not finished.
If an assumed bridge/bond/team/whatever happened to be in the process
of activating (perhaps it had no recognized slaves and was waiting for
them to continue with IP configuration) when NM quits, don't deactivate
the device and blow away the assumed configuration.
Don't set NMManager:state to CONNECTING when assuming a connection,
since it's not actually "connecting".
If there are active connections, but none has the default route, then
the global state should be CONNECTED_LOCAL, not CONNECTED_GLOBAL.
Also tweak the semantics of CONNECTING/DISCONNECTING slightly; we only
set state to CONNECTING when connecting a new connection if we are not
already CONNECTED_GLOBAL, and we only set it to DISCONNECTING if we
will be DISCONNECTED afterward.
Assumed connections shouldn't require touching the device, and the
device should was already set IFF_UP during stage2 (which is
skipped for assumed connections). Instead, what the code was really
trying to do, was to ensure tha the IP interface the device was
going to use was up.
The only cases where the IP interface might *not* be up after stage2
is where the IP interface is different than the device's interface,
like for Bluetooth, ADSL, WWAN, and PPPoE. Move the call to
nm_platform_link_set_up() into nm_device_set_ip_iface() which all
those device types will call.
Thus, only the device types that really need to up their IP interface
will do so, but other devices (including when activating assumed
connections) that don't need to do this, won't do it.
If the device has no IP configuration, is not a slave, and is not
a master, there's no point in generating a connection for it and
assuming that connection.
Fixes a problem where tun devices created by vpnc would be activated
with an empty assumed connection before NetworkManager could assign
the VPN IP config to it, and since IPv6 link-local timed out, the tun
device would be deactivated and VPN would be useless.
Follow the IP configuration the device currently has. For IPv6, this means
not using LINK_LOCAL if the interface doesn't have a LINK_LOCAL address.
Otherwise, NM assumes too much and may begin to activate an interface that
has no IP configuration, then time out and deactivate that device.
Assumed slave connections need to be added to their master devices,
which didn't used to happen because the devices activating assumed
connections jumped directly to stage3, bypassing all the master/slave
handling stuff.
Instead, make all assumed connections go through all activation stages,
but make sure that things which touch the device don't get done for
assumed connections. This requires moving the master/slave code out
of the override-able class methods because we need to call the
master/slave code for assumed connections, but we don't want to call
the override-able class activation methods.
If an assumed connection should have a master (bridge port, bond slave,
etc) it needs to notify its master that it's a slave. Since slaves
are ordered after their masters at start, the master should already
have a generated connection which we can use as the master.