diff mbox series

[v6,11/11] net/9p: allocate appropriate reduced message buffers

Message ID 3f51590535dc96ed0a165b8218c57639cfa5c36c.1657920926.git.linux_oss@crudebyte.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series remove msize limit in virtio transport | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com kuba@kernel.org davem@davemloft.net edumazet@google.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning CHECK: Alignment should match open parenthesis
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Christian Schoenebeck July 15, 2022, 9:33 p.m. UTC
So far 'msize' was simply used for all 9p message types, which is far
too much and slowed down performance tremendously with large values
for user configurable 'msize' option.

Let's stop this waste by using the new p9_msg_buf_size() function for
allocating more appropriate, smaller buffers according to what is
actually sent over the wire.

Only exception: RDMA transport is currently excluded from this message
size optimization - for its response buffers that is - as RDMA transport
would not cope with it, due to its response buffers being pulled from a
shared pool. [1]

Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---
 net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)

Comments

Jason Gunthorpe Oct. 17, 2022, 5:03 p.m. UTC | #1
On Fri, Jul 15, 2022 at 11:33:56PM +0200, Christian Schoenebeck wrote:
> So far 'msize' was simply used for all 9p message types, which is far
> too much and slowed down performance tremendously with large values
> for user configurable 'msize' option.
> 
> Let's stop this waste by using the new p9_msg_buf_size() function for
> allocating more appropriate, smaller buffers according to what is
> actually sent over the wire.
> 
> Only exception: RDMA transport is currently excluded from this message
> size optimization - for its response buffers that is - as RDMA transport
> would not cope with it, due to its response buffers being pulled from a
> shared pool. [1]
> 
> Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
> Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> ---
>  net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)

It took me a while to sort out, but for any others - this patch is
incompatible with qemu 5.0. It starts working again after this qemu
patch:

commit cf45183b718f02b1369e18c795dc51bc1821245d
Author: Stefano Stabellini <stefano.stabellini@xilinx.com>
Date:   Thu May 21 12:26:25 2020 -0700

    Revert "9p: init_in_iov_from_pdu can truncate the size"
    
    This reverts commit 16724a173049ac29c7b5ade741da93a0f46edff7.
    It causes https://bugs.launchpad.net/bugs/1877688.
    
    Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
    Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
    Message-Id: <20200521192627.15259-1-sstabellini@kernel.org>
    Signed-off-by: Greg Kurz <groug@kaod.org>

It causes something like this:

# modprobe ib_cm
qemu-system-x86_64: VirtFS reply type 117 needs 17 bytes, buffer has 17, less than minimum

Jason
Christian Schoenebeck Oct. 17, 2022, 6:03 p.m. UTC | #2
On Monday, October 17, 2022 7:03:11 PM CEST Jason Gunthorpe wrote:
> On Fri, Jul 15, 2022 at 11:33:56PM +0200, Christian Schoenebeck wrote:
> > So far 'msize' was simply used for all 9p message types, which is far
> > too much and slowed down performance tremendously with large values
> > for user configurable 'msize' option.
> > 
> > Let's stop this waste by using the new p9_msg_buf_size() function for
> > allocating more appropriate, smaller buffers according to what is
> > actually sent over the wire.
> > 
> > Only exception: RDMA transport is currently excluded from this message
> > size optimization - for its response buffers that is - as RDMA transport
> > would not cope with it, due to its response buffers being pulled from a
> > shared pool. [1]
> > 
> > Link: https://lore.kernel.org/all/Ys3jjg52EIyITPua@codewreck.org/ [1]
> > Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> > ---
> > 
> >  net/9p/client.c | 42 +++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 35 insertions(+), 7 deletions(-)
> 
> It took me a while to sort out, but for any others - this patch is
> incompatible with qemu 5.0. It starts working again after this qemu
> patch:
> 
> commit cf45183b718f02b1369e18c795dc51bc1821245d
> Author: Stefano Stabellini <stefano.stabellini@xilinx.com>
> Date:   Thu May 21 12:26:25 2020 -0700
> 
>     Revert "9p: init_in_iov_from_pdu can truncate the size"
> 
>     This reverts commit 16724a173049ac29c7b5ade741da93a0f46edff7.
>     It causes https://bugs.launchpad.net/bugs/1877688.
> 
>     Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
>     Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
>     Message-Id: <20200521192627.15259-1-sstabellini@kernel.org>
>     Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> It causes something like this:
> 
> # modprobe ib_cm
> qemu-system-x86_64: VirtFS reply type 117 needs 17 bytes, buffer has 17,
> less than minimum

9p server in QEMU 5.0 was broken by mentioned, reverted QEMU patch, and it was 
already fixed in stable release 5.0.1.

It is not that recent kernel patch is breaking behaviour, but it triggers that 
(short-lived) QEMU bug more reliably, as 9p client is now using smaller 
messages more often. But even without this kernel patch, you would still get a 
QEMU hang with short I/O. So it is not a good idea to continue using that 
particular, old QEMU version, please update at least to QEMU 5.0.1.

Best regards,
Christian Schoenebeck
diff mbox series

Patch

diff --git a/net/9p/client.c b/net/9p/client.c
index 32a8f2f43479..f068f4b656b5 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -255,19 +255,35 @@  static struct kmem_cache *p9_req_cache;
  * p9_tag_alloc - Allocate a new request.
  * @c: Client session.
  * @type: Transaction type.
- * @t_size: Buffer size for holding this request.
- * @r_size: Buffer size for holding server's reply on this request.
+ * @t_size: Buffer size for holding this request
+ * (automatic calculation by format template if 0).
+ * @r_size: Buffer size for holding server's reply on this request
+ * (automatic calculation by format template if 0).
+ * @fmt: Format template for assembling 9p request message
+ * (see p9pdu_vwritef).
+ * @ap: Variable arguments to be fed to passed format template
+ * (see p9pdu_vwritef).
  *
  * Context: Process context.
  * Return: Pointer to new request.
  */
 static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
+	      const char *fmt, va_list ap)
 {
 	struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
-	int alloc_tsize = min(c->msize, t_size);
-	int alloc_rsize = min(c->msize, r_size);
+	int alloc_tsize;
+	int alloc_rsize;
 	int tag;
+	va_list apc;
+
+	va_copy(apc, ap);
+	alloc_tsize = min_t(size_t, c->msize,
+			    t_size ?: p9_msg_buf_size(c, type, fmt, apc));
+	va_end(apc);
+
+	alloc_rsize = min_t(size_t, c->msize,
+			    r_size ?: p9_msg_buf_size(c, type + 1, fmt, ap));
 
 	if (!req)
 		return ERR_PTR(-ENOMEM);
@@ -685,6 +701,7 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 {
 	int err;
 	struct p9_req_t *req;
+	va_list apc;
 
 	p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
 
@@ -696,7 +713,9 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	if (c->status == BeginDisconnect && type != P9_TCLUNK)
 		return ERR_PTR(-EIO);
 
-	req = p9_tag_alloc(c, type, t_size, r_size);
+	va_copy(apc, ap);
+	req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
+	va_end(apc);
 	if (IS_ERR(req))
 		return req;
 
@@ -731,9 +750,18 @@  p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	int sigpending, err;
 	unsigned long flags;
 	struct p9_req_t *req;
+	/* Passing zero for tsize/rsize to p9_client_prepare_req() tells it to
+	 * auto determine an appropriate (small) request/response size
+	 * according to actual message data being sent. Currently RDMA
+	 * transport is excluded from this response message size optimization,
+	 * as it would not cope with it, due to its pooled response buffers
+	 * (using an optimized request size for RDMA as well though).
+	 */
+	const uint tsize = 0;
+	const uint rsize = c->trans_mod->pooled_rbuffers ? c->msize : 0;
 
 	va_start(ap, fmt);
-	req = p9_client_prepare_req(c, type, c->msize, c->msize, fmt, ap);
+	req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap);
 	va_end(ap);
 	if (IS_ERR(req))
 		return req;