diff mbox series

[net-next,v6,1/8] net: add get_netmem/put_netmem support

Message ID 20250227041209.2031104-2-almasrymina@google.com (mailing list archive)
State New
Headers show
Series Device memory TCP TX | expand

Commit Message

Mina Almasry Feb. 27, 2025, 4:12 a.m. UTC
Currently net_iovs support only pp ref counts, and do not support a
page ref equivalent.

This is fine for the RX path as net_iovs are used exclusively with the
pp and only pp refcounting is needed there. The TX path however does not
use pp ref counts, thus, support for get_page/put_page equivalent is
needed for netmem.

Support get_netmem/put_netmem. Check the type of the netmem before
passing it to page or net_iov specific code to obtain a page ref
equivalent.

For dmabuf net_iovs, we obtain a ref on the underlying binding. This
ensures the entire binding doesn't disappear until all the net_iovs have
been put_netmem'ed. We do not need to track the refcount of individual
dmabuf net_iovs as we don't allocate/free them from a pool similar to
what the buddy allocator does for pages.

This code is written to be extensible by other net_iov implementers.
get_netmem/put_netmem will check the type of the netmem and route it to
the correct helper:

pages -> [get|put]_page()
dmabuf net_iovs -> net_devmem_[get|put]_net_iov()
new net_iovs ->	new helpers

Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>

---

v2:
- Add comment on top of refcount_t ref explaining the usage in the XT
  path.
- Fix missing definition of net_devmem_dmabuf_binding_put in this patch.

---
 include/linux/skbuff_ref.h |  4 ++--
 include/net/netmem.h       |  3 +++
 net/core/devmem.c          | 10 ++++++++++
 net/core/devmem.h          | 20 ++++++++++++++++++++
 net/core/skbuff.c          | 30 ++++++++++++++++++++++++++++++
 5 files changed, 65 insertions(+), 2 deletions(-)

Comments

Jakub Kicinski March 1, 2025, 12:38 a.m. UTC | #1
On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote:
>  static inline void __skb_frag_ref(skb_frag_t *frag)
>  {
> -	get_page(skb_frag_page(frag));
> +	get_netmem(skb_frag_netmem(frag));
>  }

Silently handling types of memory the caller may not be expecting
always worries me. Why do we need this? 

In general, I'm surprised by the lack of bug reports for devmem.
Can you think of any way we could expose this more to syzbot?
First thing that comes to mind is a simple hack in netdevsim,
to make it insert a netmem handle (allocated locally, not a real
memory provider), every N packets (controllable via debugfs).
Would that work?
Mina Almasry March 1, 2025, 1:29 a.m. UTC | #2
On Fri, Feb 28, 2025 at 4:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote:
> >  static inline void __skb_frag_ref(skb_frag_t *frag)
> >  {
> > -     get_page(skb_frag_page(frag));
> > +     get_netmem(skb_frag_netmem(frag));
> >  }
>
> Silently handling types of memory the caller may not be expecting
> always worries me.

Sorry, I'm not following. What caller is not expecting netmem? Here
we're making sure __skb_frag_ref() handles netmem correctly, i.e. we
were not expecting netmem here before, and after this patch we'll
handle it correctly.

> Why do we need this?
>

The MSG_ZEROCOPY TX path takes a page reference on the passed memory
in zerocopy_fill_skb_from_iter() that kfree_skb() later drops when the
skb is sent. We need an equivalent for netmem, which only supports pp
refs today. This is my attempt at implementing a page_ref equivalent
to net_iov and generic netmem.

I think __skb_frag_[un]ref is used elsewhere in the TX path too,
tcp_mtu_probe for example calls skb_frag_ref eventually.

> In general, I'm surprised by the lack of bug reports for devmem.

I guess we did a good job making sure we don't regress the page paths.

The lack of support in any driver that qemu will run is an issue. I
wonder if also the fact that devmem needs some setup is also an issue.
We need headersplit enabled, udmabuf created, netlink API bound, and
then a connection referring to created and we don't support loopback.
I think maybe it all may make it difficult for syzbot to repro. I've
had it on my todo list to investigate this more.

> Can you think of any way we could expose this more to syzbot?
> First thing that comes to mind is a simple hack in netdevsim,
> to make it insert a netmem handle (allocated locally, not a real
> memory provider), every N packets (controllable via debugfs).
> Would that work?

Yes, great idea. I don't see why it wouldn't work.

We don't expect mixing of net_iovs and pages in the same skb, but
netdevsim could create one net_iov skb every N skbs.

I guess I'm not totally sure something is discoverable to syzbot. Is a
netdevsim hack toggleable via a debugfs sufficient for syzbot? I'll
investigate and ask.

--
Thanks,
Mina
Jakub Kicinski March 4, 2025, 12:20 a.m. UTC | #3
On Fri, 28 Feb 2025 17:29:13 -0800 Mina Almasry wrote:
> On Fri, Feb 28, 2025 at 4:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote:  
> > >  static inline void __skb_frag_ref(skb_frag_t *frag)
> > >  {
> > > -     get_page(skb_frag_page(frag));
> > > +     get_netmem(skb_frag_netmem(frag));
> > >  }  
> >
> > Silently handling types of memory the caller may not be expecting
> > always worries me.  
> 
> Sorry, I'm not following. What caller is not expecting netmem?
> Here we're making sure __skb_frag_ref() handles netmem correctly,
> i.e. we were not expecting netmem here before, and after this patch
> we'll handle it correctly.
> 
> > Why do we need this?
> >  
> 
> The MSG_ZEROCOPY TX path takes a page reference on the passed memory
> in zerocopy_fill_skb_from_iter() that kfree_skb() later drops when the
> skb is sent. We need an equivalent for netmem, which only supports pp
> refs today. This is my attempt at implementing a page_ref equivalent
> to net_iov and generic netmem.
> 
> I think __skb_frag_[un]ref is used elsewhere in the TX path too,
> tcp_mtu_probe for example calls skb_frag_ref eventually.

Any such caller must be inspected to make sure it generates
/ anticipates skbs with appropriate pp_recycle and readable settings.
It's possible that adding a set of _netmem APIs would be too much
churn, but if it's not - it'd make it clear which parts of the kernel
we have inspected.

> > In general, I'm surprised by the lack of bug reports for devmem.  
> 
> I guess we did a good job making sure we don't regress the page paths.

:)

> The lack of support in any driver that qemu will run is an issue. I
> wonder if also the fact that devmem needs some setup is also an issue.
> We need headersplit enabled, udmabuf created, netlink API bound, and
> then a connection referring to created and we don't support loopback.
> I think maybe it all may make it difficult for syzbot to repro. I've
> had it on my todo list to investigate this more.
> 
> > Can you think of any way we could expose this more to syzbot?
> > First thing that comes to mind is a simple hack in netdevsim,
> > to make it insert a netmem handle (allocated locally, not a real
> > memory provider), every N packets (controllable via debugfs).
> > Would that work?  
> 
> Yes, great idea. I don't see why it wouldn't work.
> 
> We don't expect mixing of net_iovs and pages in the same skb, but
> netdevsim could create one net_iov skb every N skbs.
> 
> I guess I'm not totally sure something is discoverable to syzbot. Is a
> netdevsim hack toggleable via a debugfs sufficient for syzbot? I'll
> investigate and ask.

Yeah, my unreliable memory is that syzbot has a mixed record discovering
problems with debugfs. If you could ask Dmitry for advice that'd be
ideal.
diff mbox series

Patch

diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h
index 0f3c58007488..9e49372ef1a0 100644
--- a/include/linux/skbuff_ref.h
+++ b/include/linux/skbuff_ref.h
@@ -17,7 +17,7 @@ 
  */
 static inline void __skb_frag_ref(skb_frag_t *frag)
 {
-	get_page(skb_frag_page(frag));
+	get_netmem(skb_frag_netmem(frag));
 }
 
 /**
@@ -40,7 +40,7 @@  static inline void skb_page_unref(netmem_ref netmem, bool recycle)
 	if (recycle && napi_pp_put_page(netmem))
 		return;
 #endif
-	put_page(netmem_to_page(netmem));
+	put_netmem(netmem);
 }
 
 /**
diff --git a/include/net/netmem.h b/include/net/netmem.h
index c61d5b21e7b4..a2148ffb203d 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -264,4 +264,7 @@  static inline unsigned long netmem_get_dma_addr(netmem_ref netmem)
 	return __netmem_clear_lsb(netmem)->dma_addr;
 }
 
+void get_netmem(netmem_ref netmem);
+void put_netmem(netmem_ref netmem);
+
 #endif /* _NET_NETMEM_H */
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 7c6e0b5b6acb..b1aafc66ebb7 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -325,6 +325,16 @@  net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
 	return ERR_PTR(err);
 }
 
+void net_devmem_get_net_iov(struct net_iov *niov)
+{
+	net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov));
+}
+
+void net_devmem_put_net_iov(struct net_iov *niov)
+{
+	net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov));
+}
+
 /*** "Dmabuf devmem memory provider" ***/
 
 int mp_dmabuf_devmem_init(struct page_pool *pool)
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 7fc158d52729..946f2e015746 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -29,6 +29,10 @@  struct net_devmem_dmabuf_binding {
 	 * The binding undos itself and unmaps the underlying dmabuf once all
 	 * those refs are dropped and the binding is no longer desired or in
 	 * use.
+	 *
+	 * net_devmem_get_net_iov() on dmabuf net_iovs will increment this
+	 * reference, making sure that the binding remains alive until all the
+	 * net_iovs are no longer used.
 	 */
 	refcount_t ref;
 
@@ -111,6 +115,9 @@  net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding)
 	__net_devmem_dmabuf_binding_free(binding);
 }
 
+void net_devmem_get_net_iov(struct net_iov *niov);
+void net_devmem_put_net_iov(struct net_iov *niov);
+
 struct net_iov *
 net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
 void net_devmem_free_dmabuf(struct net_iov *ppiov);
@@ -120,6 +127,19 @@  bool net_is_devmem_iov(struct net_iov *niov);
 #else
 struct net_devmem_dmabuf_binding;
 
+static inline void
+net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding)
+{
+}
+
+static inline void net_devmem_get_net_iov(struct net_iov *niov)
+{
+}
+
+static inline void net_devmem_put_net_iov(struct net_iov *niov)
+{
+}
+
 static inline void
 __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b241c9e6f38..6e853d55a3e8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,7 @@ 
 #include <linux/textsearch.h>
 
 #include "dev.h"
+#include "devmem.h"
 #include "netmem_priv.h"
 #include "sock_destructor.h"
 
@@ -7253,3 +7254,32 @@  bool csum_and_copy_from_iter_full(void *addr, size_t bytes,
 	return false;
 }
 EXPORT_SYMBOL(csum_and_copy_from_iter_full);
+
+void get_netmem(netmem_ref netmem)
+{
+	if (netmem_is_net_iov(netmem)) {
+		/* Assume any net_iov is devmem and route it to
+		 * net_devmem_get_net_iov. As new net_iov types are added they
+		 * need to be checked here.
+		 */
+		net_devmem_get_net_iov(netmem_to_net_iov(netmem));
+		return;
+	}
+	get_page(netmem_to_page(netmem));
+}
+EXPORT_SYMBOL(get_netmem);
+
+void put_netmem(netmem_ref netmem)
+{
+	if (netmem_is_net_iov(netmem)) {
+		/* Assume any net_iov is devmem and route it to
+		 * net_devmem_put_net_iov. As new net_iov types are added they
+		 * need to be checked here.
+		 */
+		net_devmem_put_net_iov(netmem_to_net_iov(netmem));
+		return;
+	}
+
+	put_page(netmem_to_page(netmem));
+}
+EXPORT_SYMBOL(put_netmem);