The variable with this purpose is usually called "IS_IPv4".
It's upper case, because usually this is a const variable, and because
it reminds of the NM_IS_IPv4(addr_family) macro. That letter case
is unusual, but it makes sense to me for the special purpose that this
variable has.
Anyway. The naming of this variable is a different point. Let's
use the variable name that is consistent and widely used.
(cherry picked from commit 8085c0121f)
_nm_ip_route_attribute_validate_all() validates all attributes together.
As such, it calls to nm_ip_route_attribute_validate(), which in turn
validates one attribute at a time.
Such full validation needs to check that (potentially conflicting)
attributes are valid together. Hence, _nm_ip_route_attribute_validate_all()
needs again peek into the attributes.
Refactor the code, so that we can extract the pieces that we need and
not need to parse them twice.
(cherry picked from commit 0413b1bf8a)
First of all, all of NMVariantAttributeSpec is internal API. We only
expose the typedef itself as public API, but not its fields nor
their meaning. So we can change things.
Change "str_type" to "type_detail", so that it can work for any kind of
attribute, not only for strings. Usually, we want to avoid special
cases and treat all attributes the same, based on their GVariant type.
But sometimes, it is necessary to do something special with an
attribute. This is what the "type_detail" encodes, but it's not only
relevant for strings.
(cherry picked from commit 6f277d8fa6)
Usually the normalization (canonicalize) and validation of the IP
address string both requires to parse the string. As we always do
validation first, we can use the parsed address and don't need to parse
it a second time.
(cherry picked from commit 00e4f21629)
Order the fields by their size, to minimize the alignment gaps.
I guess, that doesn't matter because the alignment of the heap
allocation is larger than what we can safe here. Still, there is
on reason to do it any other way.
Also, it's not possible via API to set family/prefix to values outside
their range, so an 8bit integer is always sufficient. And we don't want
that invariant to change. We don't ever want to allow the caller to set
values that are clearly invalid, and will assert against that early (g_return()).
Point is, we can do this and there is no danger of future problems.
And even if we will support larger values, it's all an implementation
detail anyway.
(cherry picked from commit 6208a1bb84)
`git bisect run` is peculiar about the exit code:
error: bisect run failed: exit code 134 from '...' is < 0 or >= 128
If we just "exec" the test, it usually will fail on an assert. That results
in SIGABRT or exit code 134. So out of the box that is annoying with
git-bisect.
Work around that and let the test wrapper always coerce any test failure
to exit code 1.
(cherry picked from commit f65747f6e9)
We made the choice, that NMPlatformIPRoute does not contain the actual
route table, instead it contains a "remapped" number: table_coerced.
That remapping done, so that the default (which we want semantically to
be 254, RT_TABLE_MAIN) is numerical zero so that struct initialization
doesn't you require to explicitly set the default.
Hence, we must always distinguish whether we have the real table number
or the "table_coerced", and you must convert back and forth between the
two.
Now, the parameter of nm_l3_config_data_merge() are real table numbers
(as also indicated by their name not having the term "coerced"). So
usually they are set to actually 254.
When we set the field of NMPlatformIPRoute, we must coerce it. This was
wrong, and we would see wrong table numbers in the log:
l3cfg[17b98e59a477b0f4,ifindex=2]: obj-state: track: [2a32eca99405767e, ip4-route, type unicast table 0 0.0.0.0/0 via ...
Fixes: b4aa35e72d ('l3cfg: extend nm_l3cfg_add_config() to accept default route table and metric')
(cherry picked from commit e23ebe9183)
In systemd, it's common that a D-Bus activatable service references
`SystemdService=dbus-$BUSNAME.service` instead the real service name.
Together with an `[Install].Alias=dbus-$BUSNAME.service` directive,
this allowed to enable/disable D-Bus activation without uninstalling the
service altogether ([1]).
Currently, when we install the RPM then `nm-priv-helper.service` is not
enabled, consequently the alias is not created, and D-Bus activation
does not work. I guess, we should fix that by enabling the service in
the %post section or via a systemd preset? Dunno.
Anyway. It seems that nm-priv-helper.service is more of an
implementation detail of NetworkManager. It makes not sense for the user
to interact directly, or to enable/disable D-Bus activation (because
that is how it works).
So, drop the alias.
See-also: [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Systemd/#activation_dbus
(cherry picked from commit d849807521)
"-" is not allowed as D-Bus path and interface name, and discouraged as
bus name. This cause nm-priv-helper to crash, because GDBus asserts the
the object path is valid.
Replace the '-' with '_'. This way, it's consistent with
"nm_dispatcher".
Fixes: d68ab6b8f0 ('nm-sudo: rename to nm-priv-helper')
(cherry picked from commit 16a45d07ed)
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
gcc-4.8.5-44.el7.x86_64 warns:
In file included from ./src/libnm-systemd-shared/src/basic/hashmap.h:10:0,
from ./src/libnm-systemd-shared/src/shared/dns-domain.h:10,
from src/libnm-systemd-shared/nm-sd-utils-shared.c:12:
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u64':
./src/libnm-systemd-shared/src/basic/util.h:30:20: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2ULL(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2ULL(x), NONCONST_LOG2ULL(x))
^
./src/libnm-systemd-shared/src/basic/util.h:34:16: note: in expansion of macro 'LOG2ULL'
return LOG2ULL(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2i':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:56:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
./src/libnm-systemd-shared/src/basic/util.h: In function 'log2u':
./src/libnm-systemd-shared/src/basic/util.h:53:18: error: first argument to '__builtin_choose_expr' not a constant
#define LOG2U(x) __builtin_choose_expr(__builtin_constant_p(x), CONST_LOG2U(x), NONCONST_LOG2U(x))
^
./src/libnm-systemd-shared/src/basic/util.h:60:16: note: in expansion of macro 'LOG2U'
return LOG2U(x);
^
The metered status can depend on the DHCP lease, as we accept the
ANDROID_METERED vendor option. That means, on a DHCP update we need
to re-evaluate the metered flag.
This fixes a potential race, where IPv6 might succeed first and
activation completes (with GUESS_NO metered flag). A subsequent
DHCPv4 update requires to re-evaluate that decision.
Fixes-test: @connection_metered_guess_yes
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/merge_requests/1080
We always call nl_recv() in a loop. If it would be necessary to clear
the variable, then it would need to happen inside the loop. But it's
not necessary.
Instead of allocating a receive buffer for each nl_recv() call, re-use a
pre-allocated buffer.
The buffer is part of NMPlatform and kept around. As we would not have more
than one NMPlatform instance per netns, we waste a limited amount of
memory.
The buffer gets initialized with 32k, which should be large enough for
any rtnetlink message that we might receive. As before, if the buffer
would be too small, then the first large message would be lost (as we don't
peek). But then we would allocate a larger buffer, resync the platform cache
and recover.
Add parameter to accept a pre-allocated buffer for nl_recv(). In
practice, rtnetlink messages are not larger than 32k, so we can always
pre-allocate it and avoid the need to heap allocate it.
In the past, nl_recv() was libnl3 code and thus used malloc()/realloc() to
allocate the buffer. That meant, we should free the buffer with libc's free()
function instead of glib's g_free(). That is what nm_auto_free is for.
Nowadays, nl_recv() is forked and glib-ified, and uses the glib wrappers to
allocate the buffer. Thus the buffer should also be freed with g_free()
(and gs_free).
In practice there is no difference, because glib's allocation directly
uses libc's malloc/free. This is purely a matter of style.
When the NM_UNMANAGED_PLATFORM_INIT flag is cleared last in
device_link_changed(), a recheck-assume is scheduled and then the
device goes immediately to UNAVAILABLE. During the state transition,
addresses and routes are removed from the interface. Then,
recheck-assume finds that the device can be assumed but it's too late
since the device was already deconfigured.
This is a problem as the whole point of assuming a device is to
activate a connection while leaving the device untouched.
In the NMCI "dracut_NM_vlan_over_bridge and dracut_NM_vlan_over_bond"
test, NM in real root tries to assume a vlan device that was activated
in initrd. When the interface gets deconfigured in UNAVAILABLE, the
connection to the NFS server breaks and the rootfs becomes
inaccessible.
The fix to this problem is to delay state transitions in
device_link_changed() to a idle handler, so that recheck-assume can
run before.
Fixes-test: @dracut_NM_vlan_over_bridge
Fixes-test: @dracut_NM_vlan_over_bond
https://bugzilla.redhat.com/show_bug.cgi?id=2047302
nm_device_set_unmanaged_by_user_settings() does nothing when the
device is unmanaged by platform-init. Remove the if branch to make
this more explicit.
The ACD state handling is unfortunately very complicated. That is, because
we obviously need to track state about how ACD is going (the acd_data, and
in particular NML3AcdAddrState). Then there are various things that can
happen, which are the AcdStateChangeMode enums. All these state-changes
come together in one function: _l3_acd_data_state_change(), which is
therefore complicated (I don't think that it would become simpler by
spreading this code out to different functions, on the contrary).
Anyway.
So, what happens when we need to reset the n-acd instance? For example,
because the MAC address of the link changed or some error. I guess, we
need to restart probing.
Previously, I think this was not handled properly. We already tried to
fix this several times, the last was commit b331606386 ('l3cfg: on
n-acd instance-reset clear also ready ACD state'). There is still an
issue ([1]).
The bug [1] is, that we are in state NM_L3_ACD_ADDR_STATE_READY, during
ACD_STATE_CHANGE_MODE_TIMEOUT event. That leads to an assertion
failure.
#5 0x00007f23be74698f in g_assertion_message_expr (domain=0x5629aca70359 "nm", file=0x5629aca62aab "src/core/nm-l3cfg.c", line=2395, func=0x5629acb26b30 <__func__.72.lto_priv.4> "_l3_acd_data_state_change", expr=<optimized out>) at ../glib/gtestutils.c:3091
#6 0x00005629ac937e46 in _l3_acd_data_state_change (self=0x5629add69790, acd_data=0x5629add8d520, state_change_mode=ACD_STATE_CHANGE_MODE_TIMEOUT, sender_addr=0x0, p_now_msec=0x7ffded506460) at src/core/nm-l3cfg.c:2395
#7 0x00005629ac939f4d in _l3_acd_data_timeout_cb (user_data=user_data@entry=0x5629add8d520) at src/core/nm-l3cfg.c:1933
#8 0x00007f23be71c5a1 in g_timeout_dispatch (source=0x5629addd7a80, callback=0x5629ac939ee0 <_l3_acd_data_timeout_cb>, user_data=0x5629add8d520) at ../glib/gmain.c:4889
#9 0x00007f23be71bd4f in g_main_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:3337
#10 g_main_context_dispatch (context=0x5629adc6da00) at ../glib/gmain.c:4055
That can only happen, (I think) when we scheduled the timeout
during an earlier ACD_STATE_CHANGE_MODE_INSTANCE_RESET event. Meaning,
we need to handle instance-reset better.
Instead, during instance-reset, switch always back to state PROBING, and
let the timeout figure it out.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=2047788
The first point is that ACD timeout is strongly tied to the current state. That
is (somewhat) obvious, because _l3_acd_data_state_set_full() will clear any pending
timeout. So you can only schedule a timeout *after* setting the state,
and setting the next state, will clear the timeout.
Likewise, note that l3_acd_data_state_change() for the event
ACD_STATE_CHANGE_MODE_TIMEOUT asserts that it is only called in the few
states where that is expected. See rhbz#2047788 where that assertion
gets hit.
The first point means that we must only schedule a timer when we are
also in a state that supports that. Add an assertion for that at the
point when scheduling the timeout. The assert at this point is useful,
because it catches the moment when we do the wrong thing (instead of
getting the assertion later during the timeout, when we no longer know
where the error happened).
See-also: https://bugzilla.redhat.com/show_bug.cgi?id=2047788
Now that higher priorities numbers really mean more important,
it seems that the VPN configuration should be rather important.
Bump the number, also in relation to NMDevice's L3ConfigDataType.
It might not matter too much, because usually the VPN tunnel device does
not have NDevice to add other l3cds and those from VPN might be alone.
Except, maybe with routing VPN (libreswan) that is different. Dunno.
NM_L3CFG_CONFIG_PRIORITY_IPV6LL and L3_CONFIG_DATA_TYPE_LL_6 are
somewhat related. They probably should be numerically identical.
Now, L3_CONFIG_DATA_TYPE_LL_6 cannot be zero (because
L3_CONFIG_DATA_TYPE_LL_4 is zero and L3ConfigDataType values must be
distinct). Therefore, also bump NM_L3CFG_CONFIG_PRIORITY_IPV6LL to 1.
Of course, NM_L3CFG_CONFIG_PRIORITY_IPV6LL is not obviously more important
than NM_L3CFG_CONFIG_PRIORITY_IPV4LL. But to be consistent with
L3_CONFIG_DATA_TYPE_LL_6 is probably more important here.
There was a disagreement whether higher priority numbers mean more/less
important. NMDevice and callers of nm_l3cfg_add_config() assumed that
higher numbers are more important, while _l3_config_datas_get_sorted_cmp()
did the inverse.
There is no obvious right or wrong. We only need to agree. It seems
slightly more sensible to keep the caller's interpretation.
This commit does not seem correct. The enum was moved with the declared
intent to make manual IP configuration preferred. But the code comment
in L3ConfigDataType states that higher numbers are more important.
Also, all the other values are intentionally ordered so that more
important method have higher numbers. For example, LL_4 < DHCP_4 < SHARED_4 <
DEV_4 (in increasing priority). While it's often not clear whether to
prefer one method over the other, or what the actual effect of (LL_4 < DHCP_4)
is, the numbers were chosen intentionally.
Only moving MANUALIP first, counters the relative order of the other values.
If there is the problem, that the code comment is wrong (and lower
numbers mean more important), then we would have to reverse all
enum values.
The real problem is that NML3Cfg's _l3_config_datas_get_sorted_cmp()
has the wrong order/understanding. So the real fix is there and will
be done next. That is, unless we would agree that _l3_config_datas_get_sorted_cmp()
is in the right, and prefer lower numbers -- in which case all values had to be
reversed.
This reverts commit af1903fe3f.
Works for the internal DHCP client only as sd-dhcp does the option
parsing for us.
The option 56 is not understood by dhclient so we would need to parse it
ourselves. Let's not do it for now, as the RFC seems to written in a
somewhat poor taste.
https://bugzilla.redhat.com/show_bug.cgi?id=2047415#c2
We have global data NMDhcpOption that describes the DHCP meta data.
There is a consistency check with NM_MORE_ASSERTS.
Improve the error message when the meta data is inconsistent to
help finding the bug.
When NetworkManager shuts down, it is not done properly. We cannot
ensure that all pending async operations are cancelled and therefore
there are possible device leaks. This means that not all the devices
will be disposed and finalized because a different component is handling
a reference to it.
Since the l3cfg refactor, this is affecting to ovs ports. When shutting
down sometimes there are ovs-interface or ovs-port devices that are not
being cleared correctly. When starting NetworkManager back, these
devices are not going to be created again because they already exist and
the existing compatible connections will be instruct to use the device.
But currently, ovsdb is only unrealizing a device after removal if the
state is UNMANAGED. This is wrong, because it will left an inconsistent
state in NetworkManager and the ovs-port/ovs-interface connection won't
be activated.
The interfaces removed by ovsdb must be unrealized if they are on
UNAVAILABLE state.
https://bugzilla.redhat.com/show_bug.cgi?id=2029937