diff mbox series

[v5,11/11] net/9p: allocate appropriate reduced message buffers

Message ID 5fb0bcc402e032cbc0779f428be5797cddfd291c.1657636554.git.linux_oss@crudebyte.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series remove msize limit in virtio transport | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4 this patch: 4
netdev/cc_maintainers warning 4 maintainers not CCed: pabeni@redhat.com kuba@kernel.org davem@davemloft.net edumazet@google.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning CHECK: Alignment should match open parenthesis WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Christian Schoenebeck July 12, 2022, 2:31 p.m. UTC
So far 'msize' was simply used for all 9p message types, which is far
too much and slowed down performance tremendously with large values
for user configurable 'msize' option.

Let's stop this waste by using the new p9_msg_buf_size() function for
allocating more appropriate, smaller buffers according to what is
actually sent over the wire.

Only exception: RDMA transport is currently excluded from this, as
it would not cope with it. [1]

Link: https://lore.kernel.org/all/YkmVI6pqTuMD8dVi@codewreck.org/ [1]
Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
---

Is the !strcmp(c->trans_mod->name, "rdma") check in this patch maybe a bit
too hack-ish? Should there rather be transport API extension instead?

 net/9p/client.c | 39 ++++++++++++++++++++++++++++++++-------
 1 file changed, 32 insertions(+), 7 deletions(-)

Comments

Dominique Martinet July 12, 2022, 7:33 p.m. UTC | #1
Christian Schoenebeck wrote on Tue, Jul 12, 2022 at 04:31:36PM +0200:
> So far 'msize' was simply used for all 9p message types, which is far
> too much and slowed down performance tremendously with large values
> for user configurable 'msize' option.
> 
> Let's stop this waste by using the new p9_msg_buf_size() function for
> allocating more appropriate, smaller buffers according to what is
> actually sent over the wire.
> 
> Only exception: RDMA transport is currently excluded from this, as
> it would not cope with it. [1]
> 
> Link: https://lore.kernel.org/all/YkmVI6pqTuMD8dVi@codewreck.org/ [1]
> Signed-off-by: Christian Schoenebeck <linux_oss@crudebyte.com>
> ---
> 
> Is the !strcmp(c->trans_mod->name, "rdma") check in this patch maybe a bit
> too hack-ish? Should there rather be transport API extension instead?

hmm yeah that doesn't feel great, let's add a flag to struct
p9_trans_module

--
Dominique
Dominique Martinet July 12, 2022, 9:11 p.m. UTC | #2
Dominique Martinet wrote on Wed, Jul 13, 2022 at 04:33:35AM +0900:
> Christian Schoenebeck wrote on Tue, Jul 12, 2022 at 04:31:36PM +0200:
> > So far 'msize' was simply used for all 9p message types, which is far
> > too much and slowed down performance tremendously with large values
> > for user configurable 'msize' option.
> > 
> > Let's stop this waste by using the new p9_msg_buf_size() function for
> > allocating more appropriate, smaller buffers according to what is
> > actually sent over the wire.
> > 
> > Only exception: RDMA transport is currently excluded from this, as
> > it would not cope with it. [1]

Thinking back on RDMA:
- vs. one or two buffers as discussed in another thread, rdma will still
require two buffers, we post the receive buffer before sending as we
could otherwise be raced (reply from server during the time it'd take to
recycle the send buffer)
In practice the recv buffers should act liks a fifo and we might be able
to post the buffer we're about to send for recv before sending it and it
shouldn't be overwritten until it's sent, but that doesn't look quite good.

- for this particular patch, we can still allocate smaller short buffers
for requests, so we should probably keep tsize to 0.
rsize there really isn't much we can do without a protocol change
though...

--
Dominique
Christian Schoenebeck July 13, 2022, 9:19 a.m. UTC | #3
On Dienstag, 12. Juli 2022 23:11:42 CEST Dominique Martinet wrote:
> Dominique Martinet wrote on Wed, Jul 13, 2022 at 04:33:35AM +0900:
> > Christian Schoenebeck wrote on Tue, Jul 12, 2022 at 04:31:36PM +0200:
> > > So far 'msize' was simply used for all 9p message types, which is far
> > > too much and slowed down performance tremendously with large values
> > > for user configurable 'msize' option.
> > > 
> > > Let's stop this waste by using the new p9_msg_buf_size() function for
> > > allocating more appropriate, smaller buffers according to what is
> > > actually sent over the wire.
> > > 
> > > Only exception: RDMA transport is currently excluded from this, as
> > > it would not cope with it. [1]
> 
> Thinking back on RDMA:
> - vs. one or two buffers as discussed in another thread, rdma will still
> require two buffers, we post the receive buffer before sending as we
> could otherwise be raced (reply from server during the time it'd take to
> recycle the send buffer)
> In practice the recv buffers should act liks a fifo and we might be able
> to post the buffer we're about to send for recv before sending it and it
> shouldn't be overwritten until it's sent, but that doesn't look quite good.
> 
> - for this particular patch, we can still allocate smaller short buffers
> for requests, so we should probably keep tsize to 0.
> rsize there really isn't much we can do without a protocol change
> though...

Good to know! I don't have any RDMA setup here to test, so I rely on what you 
say and adjust this in v6 accordingly, along with the strcmp -> flag change of 
course.

As this flag is going to be very RDMA-transport specific, I'm still scratching 
my head for a good name though.

Best regards,
Christian Schoenebeck
Christian Schoenebeck July 13, 2022, 9:29 a.m. UTC | #4
On Mittwoch, 13. Juli 2022 11:19:48 CEST Christian Schoenebeck wrote:
> On Dienstag, 12. Juli 2022 23:11:42 CEST Dominique Martinet wrote:
> > Dominique Martinet wrote on Wed, Jul 13, 2022 at 04:33:35AM +0900:
> > > Christian Schoenebeck wrote on Tue, Jul 12, 2022 at 04:31:36PM +0200:
> > > > So far 'msize' was simply used for all 9p message types, which is far
> > > > too much and slowed down performance tremendously with large values
> > > > for user configurable 'msize' option.
> > > > 
> > > > Let's stop this waste by using the new p9_msg_buf_size() function for
> > > > allocating more appropriate, smaller buffers according to what is
> > > > actually sent over the wire.
> > > > 
> > > > Only exception: RDMA transport is currently excluded from this, as
> > > > it would not cope with it. [1]
> > 
> > Thinking back on RDMA:
> > - vs. one or two buffers as discussed in another thread, rdma will still
> > require two buffers, we post the receive buffer before sending as we
> > could otherwise be raced (reply from server during the time it'd take to
> > recycle the send buffer)
> > In practice the recv buffers should act liks a fifo and we might be able
> > to post the buffer we're about to send for recv before sending it and it
> > shouldn't be overwritten until it's sent, but that doesn't look quite
> > good.
> > 
> > - for this particular patch, we can still allocate smaller short buffers
> > for requests, so we should probably keep tsize to 0.
> > rsize there really isn't much we can do without a protocol change
> > though...
> 
> Good to know! I don't have any RDMA setup here to test, so I rely on what
> you say and adjust this in v6 accordingly, along with the strcmp -> flag
> change of course.
> 
> As this flag is going to be very RDMA-transport specific, I'm still
> scratching my head for a good name though.

Or, instead of inventing some exotic flag name, maybe introducing an enum for 
the individual 9p transport types?
 
Best regards,
Christian Schoenebeck
Dominique Martinet July 13, 2022, 9:29 a.m. UTC | #5
Christian Schoenebeck wrote on Wed, Jul 13, 2022 at 11:19:48AM +0200:
> > - for this particular patch, we can still allocate smaller short buffers
> > for requests, so we should probably keep tsize to 0.
> > rsize there really isn't much we can do without a protocol change
> > though...
> 
> Good to know! I don't have any RDMA setup here to test, so I rely on what you 
> say and adjust this in v6 accordingly, along with the strcmp -> flag change of 
> course.

Yeah... I've got a connect-x 3 (mlx4, got a cheap old one) card laying
around, I need to find somewhere to plug it in and actually run some
validation again at some point.
Haven't used 9p/RDMA since I left my previous work in 2020...

I'll try to find time for that before the merge


> As this flag is going to be very RDMA-transport specific, I'm still scratching 
> my head for a good name though.

The actual limitation is that receive buffers are pooled, so something
to like pooled_rcv_buffers or shared_rcv_buffers or anything along that
line?

--
Dominique
Dominique Martinet July 13, 2022, 9:56 a.m. UTC | #6
Christian Schoenebeck wrote on Wed, Jul 13, 2022 at 11:29:13AM +0200:
> > As this flag is going to be very RDMA-transport specific, I'm still
> > scratching my head for a good name though.
> 
> Or, instead of inventing some exotic flag name, maybe introducing an enum for 
> the individual 9p transport types?

That works for me as well

--
Dominique
Christian Schoenebeck July 13, 2022, 10:22 a.m. UTC | #7
On Mittwoch, 13. Juli 2022 11:29:57 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Wed, Jul 13, 2022 at 11:19:48AM +0200:
> > > - for this particular patch, we can still allocate smaller short buffers
> > > for requests, so we should probably keep tsize to 0.
> > > rsize there really isn't much we can do without a protocol change
> > > though...
> > 
> > Good to know! I don't have any RDMA setup here to test, so I rely on what
> > you say and adjust this in v6 accordingly, along with the strcmp -> flag
> > change of course.
> 
> Yeah... I've got a connect-x 3 (mlx4, got a cheap old one) card laying
> around, I need to find somewhere to plug it in and actually run some
> validation again at some point.
> Haven't used 9p/RDMA since I left my previous work in 2020...
> 
> I'll try to find time for that before the merge
> 
> > As this flag is going to be very RDMA-transport specific, I'm still
> > scratching my head for a good name though.
> 
> The actual limitation is that receive buffers are pooled, so something
> to like pooled_rcv_buffers or shared_rcv_buffers or anything along that
> line?

OK, I'll go this way then, as it's the easiest to do, can easily be refactored 
in future if someone really cares, and it feels less like a hack than 
injecting "if transport == rdma" into client code directly.

Best regards,
Christian Schoenebeck
diff mbox series

Patch

diff --git a/net/9p/client.c b/net/9p/client.c
index 56be1658870d..dc1a7b26fab4 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -255,19 +255,35 @@  static struct kmem_cache *p9_req_cache;
  * p9_tag_alloc - Allocate a new request.
  * @c: Client session.
  * @type: Transaction type.
- * @t_size: Buffer size for holding this request.
- * @r_size: Buffer size for holding server's reply on this request.
+ * @t_size: Buffer size for holding this request
+ * (automatic calculation by format template if 0).
+ * @r_size: Buffer size for holding server's reply on this request
+ * (automatic calculation by format template if 0).
+ * @fmt: Format template for assembling 9p request message
+ * (see p9pdu_vwritef).
+ * @ap: Variable arguments to be fed to passed format template
+ * (see p9pdu_vwritef).
  *
  * Context: Process context.
  * Return: Pointer to new request.
  */
 static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
+	      const char *fmt, va_list ap)
 {
 	struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
-	int alloc_tsize = min(c->msize, t_size);
-	int alloc_rsize = min(c->msize, r_size);
+	int alloc_tsize;
+	int alloc_rsize;
 	int tag;
+	va_list apc;
+
+	va_copy(apc, ap);
+	alloc_tsize = min_t(size_t, c->msize,
+			    t_size ?: p9_msg_buf_size(c, type, fmt, apc));
+	va_end(apc);
+
+	alloc_rsize = min_t(size_t, c->msize,
+			    r_size ?: p9_msg_buf_size(c, type + 1, fmt, ap));
 
 	if (!req)
 		return ERR_PTR(-ENOMEM);
@@ -685,6 +701,7 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 {
 	int err;
 	struct p9_req_t *req;
+	va_list apc;
 
 	p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
 
@@ -696,7 +713,9 @@  static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	if (c->status == BeginDisconnect && type != P9_TCLUNK)
 		return ERR_PTR(-EIO);
 
-	req = p9_tag_alloc(c, type, t_size, r_size);
+	va_copy(apc, ap);
+	req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
+	va_end(apc);
 	if (IS_ERR(req))
 		return req;
 
@@ -731,9 +750,15 @@  p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	int sigpending, err;
 	unsigned long flags;
 	struct p9_req_t *req;
+	/* Passing zero to p9_client_prepare_req() tells it to auto determine an
+	 * appropriate (small) request/response size according to actual message
+	 * data being sent. Currently RDMA transport is excluded from this message
+	 * size optimization, as it would not be able to cope with it.
+	 */
+	const uint max_size = !strcmp(c->trans_mod->name, "rdma") ? c->msize : 0;
 
 	va_start(ap, fmt);
-	req = p9_client_prepare_req(c, type, c->msize, c->msize, fmt, ap);
+	req = p9_client_prepare_req(c, type, max_size, max_size, fmt, ap);
 	va_end(ap);
 	if (IS_ERR(req))
 		return req;