Discussion:
[netcf-devel] [PATCH] dutil_linux: enable netlink peeking to avoid trunc
Jan Gutter
2017-01-12 10:29:24 UTC
Permalink
When enumerating devices on systems with a large amount of
network virtual functions, the netlink receive buffer could
be too small and the message gets truncated.

This patch enables peeking: libnl will first query the buffer
size, expand the receive buffer to the correct size, then
receive the full buffer.

For a similar issue in libvirt.git, look at commit ID:

8c70d04bab7278c96390a913fa949a17cd3124f9

Reviewed-by: Dinan Gunawardena <***@netronome.com>
Signed-off-by: Jan Gutter <***@netronome.com>
---
src/dutil_linux.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/src/dutil_linux.c b/src/dutil_linux.c
index f1bf8e0..742153a 100644
--- a/src/dutil_linux.c
+++ b/src/dutil_linux.c
@@ -687,6 +687,7 @@ int netlink_init(struct netcf *ncf) {
goto error;
if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0)
goto error;
+ nl_socket_enable_msg_peek(ncf->driver->nl_sock);

ncf->driver->link_cache = __rtnl_link_alloc_cache(ncf->driver->nl_sock);
if (ncf->driver->link_cache == NULL)
--
2.11.0
_______________________________________________
netcf-devel mailing list -- netcf-***@lists.fedorahosted.org
To unsubscribe send
Laine Stump
2017-01-12 16:27:40 UTC
Permalink
Post by Jan Gutter
When enumerating devices on systems with a large amount of
network virtual functions, the netlink receive buffer could
be too small and the message gets truncated.
This patch enables peeking: libnl will first query the buffer
size, expand the receive buffer to the correct size, then
receive the full buffer.
8c70d04bab7278c96390a913fa949a17cd3124f9
The difference between the two situations is that in libvirt's case,
libvirt itself is actually using a libnl socket to send/receive netlink
messages, so it can be expected that it needs to set the proper options
for buffer size, but netcf never explicitly sends/receives any netlink
messages - instead it sets up link and IP address caches (which are
filled in by libnl). This is a minor distinction, but still important -
if libnl is reading netlink messages for itself, it needs to be making
sure that it's properly setting up the netlink sockets it uses
internally to read the entire message. For that reason, upstream libnl
has recently been changed so that message peeking is turned on by
default (and a RHEL build with that change will be released soon).

Here is the upstream libnl commit that causes the increase in message size:

https://github.com/thom311/libnl/commit/90c6ebec9bd7adbe6dc7aca114b4304c1ba02f6d

and here is the commit that turns on message peeking by default:

https://github.com/thom311/libnl/commit/55ea6e6b6cd805f441b410971c9dd7575e783ef4

After all that background, though, I don't have a problem with
explicitly enabling message peeking in netcf too - it will eliminate bug
reports for anyone who has a libnl build that is from between those two
commits (but also has a new enough netcf). If your aim is to see this
fixed in RHEL or CentOS though, you're going to see the problem fixed
sooner if you just wait for the libnl update.

ACK to the patch. I added the information about the libnl commits to the
commit log, and added your name/email to the AUTHORS file, then pushed
it. Thanks for the contribution!
Post by Jan Gutter
---
src/dutil_linux.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/dutil_linux.c b/src/dutil_linux.c
index f1bf8e0..742153a 100644
--- a/src/dutil_linux.c
+++ b/src/dutil_linux.c
@@ -687,6 +687,7 @@ int netlink_init(struct netcf *ncf) {
goto error;
if (nl_connect(ncf->driver->nl_sock, NETLINK_ROUTE) < 0)
goto error;
+ nl_socket_enable_msg_peek(ncf->driver->nl_sock);
ncf->driver->link_cache = __rtnl_link_alloc_cache(ncf->driver->nl_sock);
if (ncf->driver->link_cache == NULL)
_______________________________________________
netcf-devel mailing list -- netcf-***@lists.fedorahosted.org
To unsubscribe send an email t
Jan Gutter
2017-01-13 11:53:12 UTC
Permalink
For
that reason, upstream libnl has recently been changed so that message
peeking is turned on by default (and a RHEL build with that change will be
released soon).
That's Alexander's way of solving the Gordian Knot. I'm pretty sure
that the amount of applications that require performance netlink is
far fewer than the amount of applications that could be hit by an
accidental truncation.
ACK to the patch. I added the information about the libnl commits to the
commit log, and added your name/email to the AUTHORS file, then pushed it.
Thanks for the contribution!
Thanks for the quick response!

Jan
_______________________________________________
netcf-devel mailing list -- netcf-***@lists.fedorahosted.org
To unsubscribe send an email to netcf-devel-***@list

Loading...