Message ID | 62a4e09968a9a0f73780876dc6fb0f784bee5fae.1712923998.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | implement io_uring notification (ubuf_info) stacking | expand |
On 4/12/24 6:55 AM, Pavel Begunkov wrote: > We'll need to associate additional callbacks with ubuf_info, introduce > a structure holding ubuf_info callbacks. Apart from a more smarter > io_uring notification management introduced in next patches, it can be > used to generalise msg_zerocopy_put_abort() and also store > ->sg_from_iter, which is currently passed in struct msghdr. > > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > --- > drivers/net/tap.c | 2 +- > drivers/net/tun.c | 2 +- > drivers/vhost/net.c | 8 ++++++-- > include/linux/skbuff.h | 19 +++++++++++-------- > io_uring/notif.c | 8 ++++++-- > net/core/skbuff.c | 17 +++++++++++------ > 6 files changed, 36 insertions(+), 20 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
Pavel Begunkov wrote: > We'll need to associate additional callbacks with ubuf_info, introduce > a structure holding ubuf_info callbacks. Apart from a more smarter > io_uring notification management introduced in next patches, it can be > used to generalise msg_zerocopy_put_abort() and also store > ->sg_from_iter, which is currently passed in struct msghdr. This adds an extra indirection for all other ubuf implementations. Can that be avoided?
On 4/14/24 18:07, Willem de Bruijn wrote: > Pavel Begunkov wrote: >> We'll need to associate additional callbacks with ubuf_info, introduce >> a structure holding ubuf_info callbacks. Apart from a more smarter >> io_uring notification management introduced in next patches, it can be >> used to generalise msg_zerocopy_put_abort() and also store >> ->sg_from_iter, which is currently passed in struct msghdr. > > This adds an extra indirection for all other ubuf implementations. > Can that be avoided? It could be fitted directly into ubuf_info, but that doesn't feel right. It should be hot, so does it even matter? On the bright side, with the patch I'll also ->sg_from_iter from msghdr into it, so it doesn't have to be in the generic path. I think it's the right approach, but if you have a strong opinion I can fit it as a new field in ubuf_info.
Pavel Begunkov wrote: > On 4/14/24 18:07, Willem de Bruijn wrote: > > Pavel Begunkov wrote: > >> We'll need to associate additional callbacks with ubuf_info, introduce > >> a structure holding ubuf_info callbacks. Apart from a more smarter > >> io_uring notification management introduced in next patches, it can be > >> used to generalise msg_zerocopy_put_abort() and also store > >> ->sg_from_iter, which is currently passed in struct msghdr. > > > > This adds an extra indirection for all other ubuf implementations. > > Can that be avoided? > > It could be fitted directly into ubuf_info, but that doesn't feel > right. It should be hot, so does it even matter? That depends on the workload (working set size)? > On the bright side, > with the patch I'll also ->sg_from_iter from msghdr into it, so it > doesn't have to be in the generic path. I don't follow this: is this suggested future work? > > I think it's the right approach, but if you have a strong opinion > I can fit it as a new field in ubuf_info. If there is a significant cost, I suppose we could use INDIRECT_CALL or go one step further and demultiplex based on the new ops if (uarg->ops == &msg_zerocopy_ubuf_ops) msg_zerocopy_callback(..);
On 4/15/24 16:06, Willem de Bruijn wrote: > Pavel Begunkov wrote: >> On 4/14/24 18:07, Willem de Bruijn wrote: >>> Pavel Begunkov wrote: >>>> We'll need to associate additional callbacks with ubuf_info, introduce >>>> a structure holding ubuf_info callbacks. Apart from a more smarter >>>> io_uring notification management introduced in next patches, it can be >>>> used to generalise msg_zerocopy_put_abort() and also store >>>> ->sg_from_iter, which is currently passed in struct msghdr. >>> >>> This adds an extra indirection for all other ubuf implementations. >>> Can that be avoided? >> >> It could be fitted directly into ubuf_info, but that doesn't feel >> right. It should be hot, so does it even matter? > > That depends on the workload (working set size)? >>> On the bright side, >> with the patch I'll also ->sg_from_iter from msghdr into it, so it >> doesn't have to be in the generic path. > > I don't follow this: is this suggested future work? Right, a small change I will add later. Without ops though having 3 callback fields in uargs would be out of hands. >> I think it's the right approach, but if you have a strong opinion >> I can fit it as a new field in ubuf_info. > > If there is a significant cost, I suppose we could use > INDIRECT_CALL or go one step further and demultiplex > based on the new ops > > if (uarg->ops == &msg_zerocopy_ubuf_ops) > msg_zerocopy_callback(..); Let me note that the patch doesn't change the number of indirect calls but only adds one extra deref to get the callback, i.e. uarg->ops->callback() instead of uarg->callback(). Your snippet goes an extra mile and removes the indirect call. Can I take it as that you're fine with the direction of the patch? Or do you want me to change anything?
Pavel Begunkov wrote: > On 4/15/24 16:06, Willem de Bruijn wrote: > > Pavel Begunkov wrote: > >> On 4/14/24 18:07, Willem de Bruijn wrote: > >>> Pavel Begunkov wrote: > >>>> We'll need to associate additional callbacks with ubuf_info, introduce > >>>> a structure holding ubuf_info callbacks. Apart from a more smarter > >>>> io_uring notification management introduced in next patches, it can be > >>>> used to generalise msg_zerocopy_put_abort() and also store > >>>> ->sg_from_iter, which is currently passed in struct msghdr. > >>> > >>> This adds an extra indirection for all other ubuf implementations. > >>> Can that be avoided? > >> > >> It could be fitted directly into ubuf_info, but that doesn't feel > >> right. It should be hot, so does it even matter? > > > > That depends on the workload (working set size)? > >>> On the bright side, > >> with the patch I'll also ->sg_from_iter from msghdr into it, so it > >> doesn't have to be in the generic path. > > > > I don't follow this: is this suggested future work? > > Right, a small change I will add later. Without ops though > having 3 callback fields in uargs would be out of hands. > > >> I think it's the right approach, but if you have a strong opinion > >> I can fit it as a new field in ubuf_info. > > > > If there is a significant cost, I suppose we could use > > INDIRECT_CALL or go one step further and demultiplex > > based on the new ops > > > > if (uarg->ops == &msg_zerocopy_ubuf_ops) > > msg_zerocopy_callback(..); > > Let me note that the patch doesn't change the number of indirect > calls but only adds one extra deref to get the callback, i.e. > uarg->ops->callback() instead of uarg->callback(). Of course. Didn't mean to imply otherwise. > Your snippet > goes an extra mile and removes the indirect call. > > Can I take it as that you're fine with the direction of the > patch? Or do you want me to change anything? It's fine. I want to avoid new paths slowing down existing code where possible. But if this extra deref would prove significant we have a workaround.
On 4/14/24 6:07 PM, Pavel Begunkov wrote: > On the bright side, > with the patch I'll also ->sg_from_iter from msghdr into it, so it > doesn't have to be in the generic path. So, what's old is new again? That's where it started: https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/
On 4/16/24 15:50, David Ahern wrote: > On 4/14/24 6:07 PM, Pavel Begunkov wrote: >> On the bright side, >> with the patch I'll also ->sg_from_iter from msghdr into it, so it >> doesn't have to be in the generic path. > > So, what's old is new again? That's where it started: > > https://lore.kernel.org/netdev/20220628225204.GA27554@u2004-local/ Hah, indeed, your patch had it in uarg. I wonder why I didn't put them all in a table back then, if the argument was to keep struct ubuf_info leaner.
diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 9f0495e8df4d..bfdd3875fe86 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -754,7 +754,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control, skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(NULL, uarg, false); + uarg->ops->complete(NULL, uarg, false); } dev_queue_xmit(skb); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 0b3f21cba552..b7401d990680 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1906,7 +1906,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_zcopy_init(skb, msg_control); } else if (msg_control) { struct ubuf_info *uarg = msg_control; - uarg->callback(NULL, uarg, false); + uarg->ops->complete(NULL, uarg, false); } skb_reset_network_header(skb); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index c64ded183f8d..f16279351db5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -380,7 +380,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net, } } -static void vhost_zerocopy_callback(struct sk_buff *skb, +static void vhost_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *ubuf_base, bool success) { struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base); @@ -408,6 +408,10 @@ static void vhost_zerocopy_callback(struct sk_buff *skb, rcu_read_unlock_bh(); } +static const struct ubuf_info_ops vhost_ubuf_ops = { + .complete = vhost_zerocopy_complete, +}; + static inline unsigned long busy_clock(void) { return local_clock() >> 10; @@ -879,7 +883,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; - ubuf->ubuf.callback = vhost_zerocopy_callback; + ubuf->ubuf.ops = &vhost_ubuf_ops; ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG; refcount_set(&ubuf->ubuf.refcnt, 1); msg.msg_control = &ctl; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9d24aec064e8..a110e97e074a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -527,6 +527,11 @@ enum { #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) +struct ubuf_info_ops { + void (*complete)(struct sk_buff *, struct ubuf_info *, + bool zerocopy_success); +}; + /* * The callback notifies userspace to release buffers when skb DMA is done in * lower device, the skb last reference should be 0 when calling this. @@ -536,8 +541,7 @@ enum { * The desc field is used to track userspace buffer index. */ struct ubuf_info { - void (*callback)(struct sk_buff *, struct ubuf_info *, - bool zerocopy_success); + const struct ubuf_info_ops *ops; refcount_t refcnt; u8 flags; }; @@ -1662,14 +1666,13 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif +extern const struct ubuf_info_ops msg_zerocopy_ubuf_ops; + struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); -void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, - bool success); - int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, struct sk_buff *skb, struct iov_iter *from, size_t length); @@ -1757,13 +1760,13 @@ static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb) static inline void net_zcopy_put(struct ubuf_info *uarg) { if (uarg) - uarg->callback(NULL, uarg, true); + uarg->ops->complete(NULL, uarg, true); } static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref) { if (uarg) { - if (uarg->callback == msg_zerocopy_callback) + if (uarg->ops == &msg_zerocopy_ubuf_ops) msg_zerocopy_put_abort(uarg, have_uref); else if (have_uref) net_zcopy_put(uarg); @@ -1777,7 +1780,7 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) if (uarg) { if (!skb_zcopy_is_nouarg(skb)) - uarg->callback(skb, uarg, zerocopy_success); + uarg->ops->complete(skb, uarg, zerocopy_success); skb_shinfo(skb)->flags &= ~SKBFL_ALL_ZEROCOPY; } diff --git a/io_uring/notif.c b/io_uring/notif.c index b561bd763435..7caaebf94312 100644 --- a/io_uring/notif.c +++ b/io_uring/notif.c @@ -24,7 +24,7 @@ void io_notif_tw_complete(struct io_kiocb *notif, struct io_tw_state *ts) io_req_task_complete(notif, ts); } -static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg, +static void io_tx_ubuf_complete(struct sk_buff *skb, struct ubuf_info *uarg, bool success) { struct io_notif_data *nd = container_of(uarg, struct io_notif_data, uarg); @@ -43,6 +43,10 @@ static void io_tx_ubuf_callback(struct sk_buff *skb, struct ubuf_info *uarg, } } +static const struct ubuf_info_ops io_ubuf_ops = { + .complete = io_tx_ubuf_complete, +}; + struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) __must_hold(&ctx->uring_lock) { @@ -62,7 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx) nd->zc_report = false; nd->account_pages = 0; nd->uarg.flags = IO_NOTIF_UBUF_FLAGS; - nd->uarg.callback = io_tx_ubuf_callback; + nd->uarg.ops = &io_ubuf_ops; refcount_set(&nd->uarg.refcnt, 1); return notif; } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index b99127712e67..749abab23a67 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1708,7 +1708,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size) return NULL; } - uarg->ubuf.callback = msg_zerocopy_callback; + uarg->ubuf.ops = &msg_zerocopy_ubuf_ops; uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1; uarg->len = 1; uarg->bytelen = size; @@ -1734,7 +1734,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, u32 bytelen, next; /* there might be non MSG_ZEROCOPY users */ - if (uarg->callback != msg_zerocopy_callback) + if (uarg->ops != &msg_zerocopy_ubuf_ops) return NULL; /* realloc only when socket is locked (TCP, UDP cork), @@ -1845,8 +1845,8 @@ static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg) sock_put(sk); } -void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, - bool success) +static void msg_zerocopy_complete(struct sk_buff *skb, struct ubuf_info *uarg, + bool success) { struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg); @@ -1855,7 +1855,7 @@ void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, if (refcount_dec_and_test(&uarg->refcnt)) __msg_zerocopy_callback(uarg_zc); } -EXPORT_SYMBOL_GPL(msg_zerocopy_callback); + void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) { @@ -1865,10 +1865,15 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref) uarg_to_msgzc(uarg)->len--; if (have_uref) - msg_zerocopy_callback(NULL, uarg, true); + msg_zerocopy_complete(NULL, uarg, true); } EXPORT_SYMBOL_GPL(msg_zerocopy_put_abort); +const struct ubuf_info_ops msg_zerocopy_ubuf_ops = { + .complete = msg_zerocopy_complete, +}; +EXPORT_SYMBOL_GPL(msg_zerocopy_ubuf_ops); + int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, struct msghdr *msg, int len, struct ubuf_info *uarg)
We'll need to associate additional callbacks with ubuf_info, introduce a structure holding ubuf_info callbacks. Apart from a more smarter io_uring notification management introduced in next patches, it can be used to generalise msg_zerocopy_put_abort() and also store ->sg_from_iter, which is currently passed in struct msghdr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 +- drivers/vhost/net.c | 8 ++++++-- include/linux/skbuff.h | 19 +++++++++++-------- io_uring/notif.c | 8 ++++++-- net/core/skbuff.c | 17 +++++++++++------ 6 files changed, 36 insertions(+), 20 deletions(-)