Message ID | 20250227041209.2031104-2-almasrymina@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Device memory TCP TX | expand |
On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote: > static inline void __skb_frag_ref(skb_frag_t *frag) > { > - get_page(skb_frag_page(frag)); > + get_netmem(skb_frag_netmem(frag)); > } Silently handling types of memory the caller may not be expecting always worries me. Why do we need this? In general, I'm surprised by the lack of bug reports for devmem. Can you think of any way we could expose this more to syzbot? First thing that comes to mind is a simple hack in netdevsim, to make it insert a netmem handle (allocated locally, not a real memory provider), every N packets (controllable via debugfs). Would that work?
On Fri, Feb 28, 2025 at 4:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote: > > static inline void __skb_frag_ref(skb_frag_t *frag) > > { > > - get_page(skb_frag_page(frag)); > > + get_netmem(skb_frag_netmem(frag)); > > } > > Silently handling types of memory the caller may not be expecting > always worries me. Sorry, I'm not following. What caller is not expecting netmem? Here we're making sure __skb_frag_ref() handles netmem correctly, i.e. we were not expecting netmem here before, and after this patch we'll handle it correctly. > Why do we need this? > The MSG_ZEROCOPY TX path takes a page reference on the passed memory in zerocopy_fill_skb_from_iter() that kfree_skb() later drops when the skb is sent. We need an equivalent for netmem, which only supports pp refs today. This is my attempt at implementing a page_ref equivalent to net_iov and generic netmem. I think __skb_frag_[un]ref is used elsewhere in the TX path too, tcp_mtu_probe for example calls skb_frag_ref eventually. > In general, I'm surprised by the lack of bug reports for devmem. I guess we did a good job making sure we don't regress the page paths. The lack of support in any driver that qemu will run is an issue. I wonder if also the fact that devmem needs some setup is also an issue. We need headersplit enabled, udmabuf created, netlink API bound, and then a connection referring to created and we don't support loopback. I think maybe it all may make it difficult for syzbot to repro. I've had it on my todo list to investigate this more. > Can you think of any way we could expose this more to syzbot? > First thing that comes to mind is a simple hack in netdevsim, > to make it insert a netmem handle (allocated locally, not a real > memory provider), every N packets (controllable via debugfs). > Would that work? Yes, great idea. I don't see why it wouldn't work. We don't expect mixing of net_iovs and pages in the same skb, but netdevsim could create one net_iov skb every N skbs. I guess I'm not totally sure something is discoverable to syzbot. Is a netdevsim hack toggleable via a debugfs sufficient for syzbot? I'll investigate and ask. -- Thanks, Mina
On Fri, 28 Feb 2025 17:29:13 -0800 Mina Almasry wrote: > On Fri, Feb 28, 2025 at 4:38 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 27 Feb 2025 04:12:02 +0000 Mina Almasry wrote: > > > static inline void __skb_frag_ref(skb_frag_t *frag) > > > { > > > - get_page(skb_frag_page(frag)); > > > + get_netmem(skb_frag_netmem(frag)); > > > } > > > > Silently handling types of memory the caller may not be expecting > > always worries me. > > Sorry, I'm not following. What caller is not expecting netmem? > Here we're making sure __skb_frag_ref() handles netmem correctly, > i.e. we were not expecting netmem here before, and after this patch > we'll handle it correctly. > > > Why do we need this? > > > > The MSG_ZEROCOPY TX path takes a page reference on the passed memory > in zerocopy_fill_skb_from_iter() that kfree_skb() later drops when the > skb is sent. We need an equivalent for netmem, which only supports pp > refs today. This is my attempt at implementing a page_ref equivalent > to net_iov and generic netmem. > > I think __skb_frag_[un]ref is used elsewhere in the TX path too, > tcp_mtu_probe for example calls skb_frag_ref eventually. Any such caller must be inspected to make sure it generates / anticipates skbs with appropriate pp_recycle and readable settings. It's possible that adding a set of _netmem APIs would be too much churn, but if it's not - it'd make it clear which parts of the kernel we have inspected. > > In general, I'm surprised by the lack of bug reports for devmem. > > I guess we did a good job making sure we don't regress the page paths. :) > The lack of support in any driver that qemu will run is an issue. I > wonder if also the fact that devmem needs some setup is also an issue. > We need headersplit enabled, udmabuf created, netlink API bound, and > then a connection referring to created and we don't support loopback. > I think maybe it all may make it difficult for syzbot to repro. I've > had it on my todo list to investigate this more. > > > Can you think of any way we could expose this more to syzbot? > > First thing that comes to mind is a simple hack in netdevsim, > > to make it insert a netmem handle (allocated locally, not a real > > memory provider), every N packets (controllable via debugfs). > > Would that work? > > Yes, great idea. I don't see why it wouldn't work. > > We don't expect mixing of net_iovs and pages in the same skb, but > netdevsim could create one net_iov skb every N skbs. > > I guess I'm not totally sure something is discoverable to syzbot. Is a > netdevsim hack toggleable via a debugfs sufficient for syzbot? I'll > investigate and ask. Yeah, my unreliable memory is that syzbot has a mixed record discovering problems with debugfs. If you could ask Dmitry for advice that'd be ideal.
diff --git a/include/linux/skbuff_ref.h b/include/linux/skbuff_ref.h index 0f3c58007488..9e49372ef1a0 100644 --- a/include/linux/skbuff_ref.h +++ b/include/linux/skbuff_ref.h @@ -17,7 +17,7 @@ */ static inline void __skb_frag_ref(skb_frag_t *frag) { - get_page(skb_frag_page(frag)); + get_netmem(skb_frag_netmem(frag)); } /** @@ -40,7 +40,7 @@ static inline void skb_page_unref(netmem_ref netmem, bool recycle) if (recycle && napi_pp_put_page(netmem)) return; #endif - put_page(netmem_to_page(netmem)); + put_netmem(netmem); } /** diff --git a/include/net/netmem.h b/include/net/netmem.h index c61d5b21e7b4..a2148ffb203d 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -264,4 +264,7 @@ static inline unsigned long netmem_get_dma_addr(netmem_ref netmem) return __netmem_clear_lsb(netmem)->dma_addr; } +void get_netmem(netmem_ref netmem); +void put_netmem(netmem_ref netmem); + #endif /* _NET_NETMEM_H */ diff --git a/net/core/devmem.c b/net/core/devmem.c index 7c6e0b5b6acb..b1aafc66ebb7 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -325,6 +325,16 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd, return ERR_PTR(err); } +void net_devmem_get_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_get(net_devmem_iov_binding(niov)); +} + +void net_devmem_put_net_iov(struct net_iov *niov) +{ + net_devmem_dmabuf_binding_put(net_devmem_iov_binding(niov)); +} + /*** "Dmabuf devmem memory provider" ***/ int mp_dmabuf_devmem_init(struct page_pool *pool) diff --git a/net/core/devmem.h b/net/core/devmem.h index 7fc158d52729..946f2e015746 100644 --- a/net/core/devmem.h +++ b/net/core/devmem.h @@ -29,6 +29,10 @@ struct net_devmem_dmabuf_binding { * The binding undos itself and unmaps the underlying dmabuf once all * those refs are dropped and the binding is no longer desired or in * use. + * + * net_devmem_get_net_iov() on dmabuf net_iovs will increment this + * reference, making sure that the binding remains alive until all the + * net_iovs are no longer used. */ refcount_t ref; @@ -111,6 +115,9 @@ net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) __net_devmem_dmabuf_binding_free(binding); } +void net_devmem_get_net_iov(struct net_iov *niov); +void net_devmem_put_net_iov(struct net_iov *niov); + struct net_iov * net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding); void net_devmem_free_dmabuf(struct net_iov *ppiov); @@ -120,6 +127,19 @@ bool net_is_devmem_iov(struct net_iov *niov); #else struct net_devmem_dmabuf_binding; +static inline void +net_devmem_dmabuf_binding_put(struct net_devmem_dmabuf_binding *binding) +{ +} + +static inline void net_devmem_get_net_iov(struct net_iov *niov) +{ +} + +static inline void net_devmem_put_net_iov(struct net_iov *niov) +{ +} + static inline void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding) { diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 5b241c9e6f38..6e853d55a3e8 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -89,6 +89,7 @@ #include <linux/textsearch.h> #include "dev.h" +#include "devmem.h" #include "netmem_priv.h" #include "sock_destructor.h" @@ -7253,3 +7254,32 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, return false; } EXPORT_SYMBOL(csum_and_copy_from_iter_full); + +void get_netmem(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) { + /* Assume any net_iov is devmem and route it to + * net_devmem_get_net_iov. As new net_iov types are added they + * need to be checked here. + */ + net_devmem_get_net_iov(netmem_to_net_iov(netmem)); + return; + } + get_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(get_netmem); + +void put_netmem(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) { + /* Assume any net_iov is devmem and route it to + * net_devmem_put_net_iov. As new net_iov types are added they + * need to be checked here. + */ + net_devmem_put_net_iov(netmem_to_net_iov(netmem)); + return; + } + + put_page(netmem_to_page(netmem)); +} +EXPORT_SYMBOL(put_netmem);