We potentially emit a lot of signals. Don't look up the
signal by name because that adds quite some additional
overhead, like peeking for a GQuark.
Instead pass the numeric signal-id directly.
This enum was unused and meaningless because the platform signals
are emitted as a consequence of netlink messages. It is not clear
whether a netlink message was received due to an external event
or an internal action.
Unslaving from a bridge causes a wrong RTM_DELLINK event for
the former slave.
# ip link add dummy0 type dummy
# ip link add bridge0 type bridge
# ip link set bridge0 up
# ip link set dummy0 master bridge0
# ip monitor link &
# ip link set dummy0 nomaster
18: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master bridge0 state DOWN group default
link/ether 76:44:5f:b9:38:02 brd ff:ff:ff:ff:ff:ff
18: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
link/ether 76:44:5f:b9:38:02
Deleted 18: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
link/ether 76:44:5f:b9:38:02
18: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 76:44:5f:b9:38:02 brd ff:ff:ff:ff:ff:ff
19: bridge0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
19: bridge0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
Previously, during do_request_link() we would remember the link that is
about to be requested (delayed_deletion) and delay processing a new
RTM_DELLINK message until the end of do_request_link() -- and possibly
forget about about the deletion, if RTM_DELLINK was followed by a
RTM_NEWLINK.
However, this hack does not catch the case where an external command
unslaves the link.
Instead just accept the wrong event and raise a "removed" signal right
away. This brings the cache in an externally visible, wrong state that
will be fixed by a following "added" signal.
Still do that because working around the kernel bug is complicated. Also,
we already might emit wrong "added" signals for devices that are already
removed. As a consequence, a user should not consider the platform signals
until all events are processed.
Listeners to that signal should accept that added/removed link changes
can be wrong and should preferably handle them idly, when the events
have settled.
It can even be worse, that a RTM_DELLINK is not fixed by a following
RTM_NEWLINK:
...
# ip link set dummy0 nomaster
36: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master bridge0 state DOWN
link/ether e2:f2:20:98:3a:be brd ff:ff:ff:ff:ff:ff
36: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
link/ether e2:f2:20:98:3a:be
Deleted 36: dummy0: <BROADCAST,NOARP> mtu 1500 master bridge0 state DOWN
link/ether e2:f2:20:98:3a:be
37: bridge0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
37: bridge0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
So, when a slave is deleted, we have to refetch it too.
https://bugzilla.redhat.com/show_bug.cgi?id=1285719
On some kernels (at least RHEL-7.2) we receive a spurious RTM_NEWLINK
message after the RTM_DELLINK message for deleting a bond master.
On RHEL-7, the following commands give:
# ip link add dummy0 type dummy
# ip link add bond0 type bond
# ip link set bond0 up
# ip link set dummy0 master bond0
# ip monitor link &
# ip link del bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noqueue state DOWN
link/ether 1e:a6:6c:81:c1:8d brd ff:ff:ff:ff:ff:ff
Deleted 21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
link/ether 1e:a6:6c:81:c1:8d brd ff:ff:ff:ff:ff:ff
20: dummy0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
link/ether 1e:a6:6c:81:c1:8d brd ff:ff:ff:ff:ff:ff
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
link/ether da:ee:58:70:6f:e5 brd ff:ff:ff:ff:ff:ff
^^^^^^^^^^^^^^^ RTM_NEWLINK after RTM_DELLINK (and there follows no
RTM_DELLINK afterwards)
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
link/ether da:ee:58:70:6f:e5 brd ff:ff:ff:ff:ff:ff
20: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN
link/ether 1e:a6:6c:81:c1:8d brd ff:ff:ff:ff:ff:ff
20: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noqueue state DOWN
link/ether 1e:a6:6c:81:c1:8d brd ff:ff:ff:ff:ff:ff
Fix that by reverting clear_REFRESH_LINK(). This fix has two downsides:
- on kernels where this hack is not necessary, we unnecessarily refetch
a link
- the platform cache first removes the link, adds it again and removes
it. This is ugly, but should have no real consequences because all
listeners to the platform signals delay processing the signals to an
idle handler.
https://bugzilla.redhat.com/show_bug.cgi?id=1285719
This reverts commit f4f4e1cf09.
The related bug rh#1262908 in kernel causes missing netlink notifications
when moving a IFA_LINK interface to another netns.
Add a test for our workaround.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1262908
The related bug rh#1285827 in kernel causes a missing IFLA_LINK/parent
attribute when creating a veth pair:
# ip monitor link &
[1] 6745
# ip link add dev vm1 type veth peer name vm2
30: vm2@NONE: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
link/ether be:e3:b7:0e:14:52 brd ff:ff:ff:ff:ff:ff
31: vm1@vm2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN
link/ether da:e6:a6:c5:42:54 brd ff:ff:ff:ff:ff:ff
Add a workaround and test.
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1285827
Due to kernel bugs [1], the first netlink event about a new link
sometimes lacks the IFLA_LINKINFO with the link-type lnk data.
In the case the data is missing, schedule a re-fetch the link
hoping that it gets send.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1284001
Previsously, _LOGT() could be disabled at compile time. Thus it
was different then the other macros _LOGD(), _LOGI(), etc.
OTOH, _LOGt() was the macro that always was compiled in.
Swap the name of the macros. Now the upper-case macros are always
enabled, while the lower-case macro _LOGt() is enabled depending
on compile configuration.
Previously, we would not set the ifi_change field, so that all
flags in ifi_flags were considered. That required us to lookup
the currently set flags from the cache.
Change that, to set only the flags in the netlink message that
we want to change. This saves us a cache-lookup, but more importantly,
the cache might be out of date.
Previously, we could only set the ingress-qos-mappings/egress-qos-mappings.
Now also cache the mappings and expose them from the platform cache.
Also, support changing the vlan flags not only when creating the vlan
interface.
Instead of using libnl-route-3 library to serialize netlink messages,
construct the netlink messages ourselves.
This has several advantages:
- Creating the netlink message ourself is actually more straight
forward then having an intermediate layer between NM and the kernel.
Now it is immediately clear, how a platform request translates to
a netlink/kernel request.
You can look at the kernel sources how a certain netlink attribute
behaves, and then it's immediately clear how to set that (and vice
versa).
- Older libnl versions might have bugs or missing features for which
we needed to workaround (often by offering a reduced/broken/untested
functionality). Now we can get rid or workaround like _nl_has_capability(),
check_support_libnl_extended_ifa_flags(), HAVE_LIBNL_INET6_TOKEN.
Another example is a libnl bug when setting vlan ingress map which
isn't even yet fixed in libnl upstream.
- We no longer need libnl-route-3 at all and can drop that runtime
requirement, saving some 400k.
Constructing the messages ourselves also gives better performance
because we don't have to create the intermediate libnl object.
- In the future we will add more link-type support which is easier
to support by basing directly on the plain kernel/netlink API,
instead of requiring also libnl3 to expose this functionality.
E.g. adding macvtap support: we already parsed macvtap properties
ourselves because of missing libnl support. To *add* macvtap
support, we also would have to do it ourself (or extend libnl).
Having a static string buffer for convenience is useful not only
for platform. Define the string buffer in NetworkManagerUtils.h,
so that all to-string functions can reuse *one* buffer.
Of course, this has the potential danger, that different
to-string method might reuse the same buffer. Hence, low-level
library functions are adviced to use their own buffer, because
an upper level might already use the global buffer for another
string.
The peer-address (IFA_ADDRESS) can also be all-zero (0.0.0.0).
That is distinct from an usual address without explicit peer-address,
which implicitly has the same peer and local address.
Previously, we treated an all-zero peer_address as having peer and
local address equal. This is especially grave, because the peer is part
of the primary key for an IPv4 address. So we not only get a property of
the address wrong, but we wrongly consider two different addresses as
one and the same.
To properly handle these addresses, we always must explicitly set the peer.
For recent kernels, the peer-ifindex of veths is reported as
parent (IFA_LINK). Prefer that over the ethtool lookup.
For one, this avoids the extra ethtool call which has the
downside of sidestepping the platform cache. Also, looking
up the peer-ifindex in ethtool does not report whether the
peer lifes in another netns (NM_PLATFORM_LINK_OTHER_NETNS).
Only use ethtool as fallback for older kernels.
Previously, while detecting the link type we would lookup the
@kind in case it was missing.
Now, go one step further, and also prefer the link-type from the
cache.
Constructing the libnl3 object only to parse the message
is wasteful. It involves several memory allocations, thread
synchronization and parsing fields that we don't care about.
But moreover, older libnl version might not support all the
fields we are interested in, hence we have workarounds like
_nl_link_parse_info_data(). Certain features might not fully
work unless libnl supports it too (although kernel would).
As we already parse the data ourselves sometimes, just go
all they way and don't construct the intermediate libnl object.
This will allow us to drop the _nl_link_parse_info_data() workarounds
in next commits. That is important, because _nl_link_parse_info_data()
sidesteps our platform cache and is not in sync with the cache (not to
mention the extra work to explicitly refetch the data on every lookup).
Also, it gets us 60% on the way to no longer needing 'libnl-route-3.so'
at all and eventually drop requiring the library.
This function only accesses sysctl function to retrieve the tun-properties.
sysctl is already defined in the base class and equally inherited by linux
and fake platform. Move the implementation there.
gcc complains with the following warning:
"comparison between 'enum <anonymous>' and 'enum vlan_flags'"
Cast one side of the comparison to avoid that.
Fixes: ee7414ebc8
Kernel allows to add the same IPv4 address that only differs by
peer-address (IFL_ADDRESS):
$ ip link add dummy type dummy
$ ip address add 1.1.1.1 peer 1.1.1.3/24 dev dummy
$ ip address add 1.1.1.1 peer 1.1.1.4/24 dev dummy
RTNETLINK answers: File exists
$ ip address add 1.1.1.1 peer 1.1.2.3/24 dev dummy
$ ip address show dev dummy
2: dummy@NONE: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 52:58:a7:1e:e8:93 brd ff:ff:ff:ff:ff:ff
inet 1.1.1.1 peer 1.1.1.3/24 scope global dummy
valid_lft forever preferred_lft forever
inet 1.1.1.1 peer 1.1.2.3/24 scope global dummy
valid_lft forever preferred_lft forever
We must also consider peer-address, otherwise platform will treat
two different addresses as one and the same.
https://bugzilla.gnome.org/show_bug.cgi?id=756356
The peer-address seems less important then the prefix-length.
Also, nm_platform_ip4_address_delete() has the peer-address
argument as last.
Soon ip4_address_get() also receives a peer-address argument,
so get the order right first.
We get a lot of these debugging message, although the event is entirely
internal to NMLinuxPlatform and only interesting when debugging a problem
in platform itself.
Downgrade to TRACE level.
When debug-logging for platform is enabled, every access to sysctl
is cached (to log the last values).
This cache can grow quite large if the system has a large number of
interfaces (e.g. docker creating veth pairs for each container).
We already used to clear the cache, when we were about to access
sysctl *and* logging was disabled in the meantime.
Now, when logging setup changes, immediately clear the cache.
Having "nm-logging.c" call into platform code is a bit of a hack
and a better design would be to have logging code emit a signal to
which platform would subscribe. But that seems to involve much
more code (especially, as no other users care about such a signal
and because nm-logging is not a GObject).
Also, log a warning when the cache grows large to inform the user
about the cache and what he can do to clear it. The extra effort to
clear the cache when changing logging setup is done so that we do
what we tell the user: changing the logging level, will clear the
cache -- right away, not some time later when the next message is
logged.
When we receive an update for a link, cancel a scheduled
REFRESH_LINK delayed-action for that ifindex. At the point when we
scheduled refrehing the link, we only cared about receiving a
notification that was newer then the current state.
We scheduled requesting this new notification to resync the cache.
It is not necessary to actually request a new update, any update we
receive *after* requesting a new update will suffice.
This potentially saves extra round-trips re-requesting the link.
When moving a link to another netns, it gets removed from
NMPlatform's view.
Currently kernel does not sent a notification to inform about
that change (see related bug rh#1262908).
Ensure that we reload all linked interfaces which now might
have an invisible parent.
Defect type: CHECKED_RETURN
3. NetworkManager-1.0.6/src/platform/nm-linux-platform.c:1145: check_return: Calling "clock_gettime" without checking return value (as is done elsewhere 6 out of 7 times).