Message ID | 20230324104720.2498-1-fw@strlen.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] nbd/server: push pending frames after sending reply | expand |
Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben: > qemu-nbd doesn't set TCP_NODELAY on the tcp socket. > > Kernel waits for more data and avoids transmission of small packets. > Without TLS this is barely noticeable, but with TLS this really shows. > > Booting a VM via qemu-nbd on localhost (with tls) takes more than > 2 minutes on my system. tcpdump shows frequent wait periods, where no > packets get sent for a 40ms period. > > Add explicit (un)corking when processing (and responding to) requests. > "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. > > VM Boot time: > main: no tls: 23s, with tls: 2m45s > patched: no tls: 14s, with tls: 15s > > VM Boot time, qemu-nbd via network (same lan): > main: no tls: 18s, with tls: 1m50s > patched: no tls: 17s, with tls: 18s > > Future optimization: if we could detect if there is another pending > request we could defer the uncork operation because more data would be > appended. > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > nbd/server.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index a4750e41880a..848836d41405 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, true); > + > if (ret < 0) { > /* It wasn't -EIO, so, according to nbd_co_receive_request() > * semantics, we should return the error to the client. */ > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, false); > done: > nbd_request_put(req); > nbd_client_put(client); In the error paths, we never call set_cork(false) again. I suppose the reason that this is okay is because the next thing is actually that we close the socket? Kevin
Kevin Wolf <kwolf@redhat.com> wrote: > Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben: > > + qio_channel_set_cork(client->ioc, true); > > + > > if (ret < 0) { > > /* It wasn't -EIO, so, according to nbd_co_receive_request() > > * semantics, we should return the error to the client. */ > > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) > > goto disconnect; > > } > > > > + qio_channel_set_cork(client->ioc, false); > > done: > > nbd_request_put(req); > > nbd_client_put(client); > > In the error paths, we never call set_cork(false) again. I suppose the > reason that this is okay is because the next thing is actually that we > close the socket? Yes, no need to uncork before close.
On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote: > qemu-nbd doesn't set TCP_NODELAY on the tcp socket. > > Kernel waits for more data and avoids transmission of small packets. > Without TLS this is barely noticeable, but with TLS this really shows. > > Booting a VM via qemu-nbd on localhost (with tls) takes more than > 2 minutes on my system. tcpdump shows frequent wait periods, where no > packets get sent for a 40ms period. Thank you for this analysis. > > Add explicit (un)corking when processing (and responding to) requests. > "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. > > VM Boot time: > main: no tls: 23s, with tls: 2m45s > patched: no tls: 14s, with tls: 15s > > VM Boot time, qemu-nbd via network (same lan): > main: no tls: 18s, with tls: 1m50s > patched: no tls: 17s, with tls: 18s And the timings bear proof that it matters. > > Future optimization: if we could detect if there is another pending > request we could defer the uncork operation because more data would be > appended. nbdkit and libnbd do this with the MSG_MORE flag (plaintext) and TLS corking (tls); when building up a message to the other side, a flag is set any time we know we are likely to send more data very shortly. nbdkit wraps it under a flag SEND_MORE, which applies to both plaintext: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/connections.c#L415 and to TLS connections: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/crypto.c#L396 while libnbd uses MSG_MORE a bit more directly for the same purpose for plaintext, but isn't (yet) doing TLS corking: https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-issue-command.c#L53 https://gitlab.com/nbdkit/libnbd/-/blob/master/lib/internal.h#L57 I would love to see a future patch to qio_channel code to support MSG_MORE in the same way as nbdkit is using its SEND_MORE flag, as the caller often has more info on whether it is sending a short prefix or is done with a conceptual message and ready to uncork, and where the use of a flag can be more efficient than separate passes through cork/uncork calls. But even your initial work at properly corking is a good step in the right direction. And surprisingly, qemu IS using corking on the client side: https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525 just not on the server side, before your patch. > > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > nbd/server.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/nbd/server.c b/nbd/server.c > index a4750e41880a..848836d41405 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, true); > + > if (ret < 0) { > /* It wasn't -EIO, so, according to nbd_co_receive_request() > * semantics, we should return the error to the client. */ > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) > goto disconnect; > } > > + qio_channel_set_cork(client->ioc, false); Reviewed-by: Eric Blake <eblake@redhat.com> > done: > nbd_request_put(req); > nbd_client_put(client); > -- > 2.39.2 >
On Fri, Mar 24, 2023 at 07:20:17PM +0100, Florian Westphal wrote: > Kevin Wolf <kwolf@redhat.com> wrote: > > Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben: > > > + qio_channel_set_cork(client->ioc, true); > > > + > > > if (ret < 0) { > > > /* It wasn't -EIO, so, according to nbd_co_receive_request() > > > * semantics, we should return the error to the client. */ > > > @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) > > > goto disconnect; > > > } > > > > > > + qio_channel_set_cork(client->ioc, false); > > > done: > > > nbd_request_put(req); > > > nbd_client_put(client); > > > > In the error paths, we never call set_cork(false) again. I suppose the > > reason that this is okay is because the next thing is actually that we > > close the socket? > > Yes, no need to uncork before close. Uncorking may let the close be cleaner (especially in TLS, it is nicer to let the connection send the necessary handshaking that the connection is intentionally going down, rather than being abruptly ripped out by a man-in-the-middle attacker), but NBD already has to gracefully handle abrupt connection teardown, so we aren't losing anything if the cork delays the close or if corked data that would have given a clean shutdown is lost. If we teach qio_channel to honor MSG_MORE as a way to do auto-corking without explicit cork/uncork calls, that may take care of itself. But that's a bigger project, while this patch seems safe enough as-is.
On Fri, Mar 24, 2023 at 02:41:20PM -0500, Eric Blake wrote: > On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote: > > qemu-nbd doesn't set TCP_NODELAY on the tcp socket. Replying to myself, WHY aren't we setting TCP_NODELAY on the socket? > > And surprisingly, qemu IS using corking on the client side: > https://gitlab.com/qemu-project/qemu/-/blob/master/block/nbd.c#L525 > just not on the server side, before your patch. Corking matters more when TCP_NODELAY is enabled. The entire reason Nagle's algorithm exists (and is default on unless you enable TCP_NODELAY) is that the benefits of merging smaller piecemeal packets into larger traffic is a lot easier to do in a common location for code that isn't super-sensitive to latency and message boundaries. But once you get to the point where corking or MSG_MORE makes a difference, then it is obvious you know your message boundaries, and will benefit from TCP_NODELAY, at the expense of potentially more network traffic overhead. One more code search, and I find that we use TCP_NODELAY in all of: qemu client: https://gitlab.com/qemu-project/qemu/-/blob/master/nbd/client-connection.c#L143 nbdkit: https://gitlab.com/nbdkit/nbdkit/-/blob/master/server/sockets.c#L430 libnbd: https://gitlab.com/nbdkit/libnbd/-/blob/master/generator/states-connect.c#L41 so I think we _should_ be calling qio_channel_set_delay(false) for qemu-nbd as well. That doesn't negate your patch, but rather argues that we can go for even better performance with TCP_NODELAY also turned on.
Eric Blake <eblake@redhat.com> wrote: > On Fri, Mar 24, 2023 at 02:41:20PM -0500, Eric Blake wrote: > > On Fri, Mar 24, 2023 at 11:47:20AM +0100, Florian Westphal wrote: > > > qemu-nbd doesn't set TCP_NODELAY on the tcp socket. > > Replying to myself, WHY aren't we setting TCP_NODELAY on the socket? If the application protocol is strictly or mostly request/response then I agree that qemu-nbd should turn nagle off as well.
Am 24.03.2023 um 11:47 hat Florian Westphal geschrieben: > qemu-nbd doesn't set TCP_NODELAY on the tcp socket. > > Kernel waits for more data and avoids transmission of small packets. > Without TLS this is barely noticeable, but with TLS this really shows. > > Booting a VM via qemu-nbd on localhost (with tls) takes more than > 2 minutes on my system. tcpdump shows frequent wait periods, where no > packets get sent for a 40ms period. > > Add explicit (un)corking when processing (and responding to) requests. > "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. > > VM Boot time: > main: no tls: 23s, with tls: 2m45s > patched: no tls: 14s, with tls: 15s > > VM Boot time, qemu-nbd via network (same lan): > main: no tls: 18s, with tls: 1m50s > patched: no tls: 17s, with tls: 18s > > Future optimization: if we could detect if there is another pending > request we could defer the uncork operation because more data would be > appended. > > Signed-off-by: Florian Westphal <fw@strlen.de> Thanks, applied to the block branch. Kevin
diff --git a/nbd/server.c b/nbd/server.c index a4750e41880a..848836d41405 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -2667,6 +2667,8 @@ static coroutine_fn void nbd_trip(void *opaque) goto disconnect; } + qio_channel_set_cork(client->ioc, true); + if (ret < 0) { /* It wasn't -EIO, so, according to nbd_co_receive_request() * semantics, we should return the error to the client. */ @@ -2692,6 +2694,7 @@ static coroutine_fn void nbd_trip(void *opaque) goto disconnect; } + qio_channel_set_cork(client->ioc, false); done: nbd_request_put(req); nbd_client_put(client);
qemu-nbd doesn't set TCP_NODELAY on the tcp socket. Kernel waits for more data and avoids transmission of small packets. Without TLS this is barely noticeable, but with TLS this really shows. Booting a VM via qemu-nbd on localhost (with tls) takes more than 2 minutes on my system. tcpdump shows frequent wait periods, where no packets get sent for a 40ms period. Add explicit (un)corking when processing (and responding to) requests. "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. VM Boot time: main: no tls: 23s, with tls: 2m45s patched: no tls: 14s, with tls: 15s VM Boot time, qemu-nbd via network (same lan): main: no tls: 18s, with tls: 1m50s patched: no tls: 17s, with tls: 18s Future optimization: if we could detect if there is another pending request we could defer the uncork operation because more data would be appended. Signed-off-by: Florian Westphal <fw@strlen.de> --- nbd/server.c | 3 +++ 1 file changed, 3 insertions(+)