Message ID | 20241029230521.2385749-7-dw@davidwei.uk (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | io_uring zero copy rx | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply, async |
On Tue, Oct 29, 2024 at 4:06 PM David Wei <dw@davidwei.uk> wrote: > > From: Pavel Begunkov <asml.silence@gmail.com> > > Add a helper that takes an array of pages and initialises passed in > memory provider's area with them, where each net_iov takes one page. > It's also responsible for setting up dma mappings. > > We keep it in page_pool.c not to leak netmem details to outside > providers like io_uring, which don't have access to netmem_priv.h > and other private helpers. > I honestly prefer leaking netmem_priv.h details into the io_uring rather than having io_uring provider specific code in page_pool.c. > Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> > Signed-off-by: David Wei <dw@davidwei.uk> > --- > include/net/page_pool/memory_provider.h | 10 ++++ > net/core/page_pool.c | 63 ++++++++++++++++++++++++- > 2 files changed, 71 insertions(+), 2 deletions(-) > create mode 100644 include/net/page_pool/memory_provider.h > > diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h > new file mode 100644 > index 000000000000..83d7eec0058d > --- /dev/null > +++ b/include/net/page_pool/memory_provider.h > @@ -0,0 +1,10 @@ > +#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H > +#define _NET_PAGE_POOL_MEMORY_PROVIDER_H > + > +int page_pool_mp_init_paged_area(struct page_pool *pool, > + struct net_iov_area *area, > + struct page **pages); > +void page_pool_mp_release_area(struct page_pool *pool, > + struct net_iov_area *area); > + > +#endif > diff --git a/net/core/page_pool.c b/net/core/page_pool.c > index 9a675e16e6a4..8bd4a3c80726 100644 > --- a/net/core/page_pool.c > +++ b/net/core/page_pool.c > @@ -13,6 +13,7 @@ > > #include <net/netdev_rx_queue.h> > #include <net/page_pool/helpers.h> > +#include <net/page_pool/memory_provider.h> > #include <net/xdp.h> > > #include <linux/dma-direction.h> > @@ -459,7 +460,8 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, > __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); > } > > -static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) > +static bool page_pool_dma_map_page(struct page_pool *pool, netmem_ref netmem, > + struct page *page) I have to say this is confusing for me. Passing in both the netmem and the page is weird. The page is the one being mapped and the netmem->dma_addr is the one being filled with the mapping. Netmem is meant to be an abstraction over page. Passing both makes little sense to me. The reason you're doing this is because the io_uring memory provider is in a bit of a weird design IMO where the memory comes in pages but it doesn't want to use paged-backed-netmem. Instead it uses net_iov-backed-netmem and there is an out of band page to be managed. I think it would make sense to use paged-backed-netmem for your use case, or at least I don't see why it wouldn't work. Memory providers were designed to handle the hugepage usecase where the memory allocated by the provider is pages. Is there a reason that doesn't work for you as well? If you really need to use net_iov-backed-netmem, can we put this weirdness in the provider? I don't know that we want a generic-looking dma_map function which is a bit confusing to take a netmem and a page. > { > dma_addr_t dma; > > @@ -468,7 +470,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) > * into page private data (i.e 32bit cpu with 64bit DMA caps) > * This mapping is kept for lifetime of page, until leaving pool. > */ > - dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, > + dma = dma_map_page_attrs(pool->p.dev, page, 0, > (PAGE_SIZE << pool->p.order), pool->p.dma_dir, > DMA_ATTR_SKIP_CPU_SYNC | > DMA_ATTR_WEAK_ORDERING); > @@ -490,6 +492,11 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) > return false; > } > > +static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) > +{ > + return page_pool_dma_map_page(pool, netmem, netmem_to_page(netmem)); > +} > + > static struct page *__page_pool_alloc_page_order(struct page_pool *pool, > gfp_t gfp) > { > @@ -1154,3 +1161,55 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) > } > } > EXPORT_SYMBOL(page_pool_update_nid); > + > +static void page_pool_release_page_dma(struct page_pool *pool, > + netmem_ref netmem) > +{ > + __page_pool_release_page_dma(pool, netmem); > +} > + Is this wrapper necessary? Do you wanna rename the original function to remove the __ instead of a adding a wrapper? > +int page_pool_mp_init_paged_area(struct page_pool *pool, > + struct net_iov_area *area, > + struct page **pages) > +{ > + struct net_iov *niov; > + netmem_ref netmem; > + int i, ret = 0; > + > + if (!pool->dma_map) > + return -EOPNOTSUPP; > + > + for (i = 0; i < area->num_niovs; i++) { > + niov = &area->niovs[i]; > + netmem = net_iov_to_netmem(niov); > + > + page_pool_set_pp_info(pool, netmem); > + if (!page_pool_dma_map_page(pool, netmem, pages[i])) { > + ret = -EINVAL; > + goto err_unmap_dma; > + } > + } > + return 0; > + > +err_unmap_dma: > + while (i--) { > + netmem = net_iov_to_netmem(&area->niovs[i]); > + page_pool_release_page_dma(pool, netmem); > + } > + return ret; > +} > + > +void page_pool_mp_release_area(struct page_pool *pool, > + struct net_iov_area *area) > +{ > + int i; > + > + if (!pool->dma_map) > + return; > + > + for (i = 0; i < area->num_niovs; i++) { > + struct net_iov *niov = &area->niovs[i]; > + > + page_pool_release_page_dma(pool, net_iov_to_netmem(niov)); > + } > +} Similarly I these 2 functions belong in the provider to be honest.
On 11/1/24 17:33, Mina Almasry wrote: > On Tue, Oct 29, 2024 at 4:06 PM David Wei <dw@davidwei.uk> wrote: >> >> From: Pavel Begunkov <asml.silence@gmail.com> >> >> Add a helper that takes an array of pages and initialises passed in >> memory provider's area with them, where each net_iov takes one page. >> It's also responsible for setting up dma mappings. >> >> We keep it in page_pool.c not to leak netmem details to outside >> providers like io_uring, which don't have access to netmem_priv.h >> and other private helpers. >> > > I honestly prefer leaking netmem_priv.h details into the io_uring > rather than having io_uring provider specific code in page_pool.c. Even though Jakub didn't comment on this patch, but he definitely wasn't fond of giving all those headers to non net/ users. I guess I can't please everyone. One middle option is to make the page pool helper more granular, i.e. taking care of one netmem at a time, and moving the loop to io_uring, but I don't think it changes anything. ... >> #include <linux/dma-direction.h> >> @@ -459,7 +460,8 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, >> __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); >> } >> >> -static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) >> +static bool page_pool_dma_map_page(struct page_pool *pool, netmem_ref netmem, >> + struct page *page) > > I have to say this is confusing for me. Passing in both the netmem and > the page is weird. The page is the one being mapped and the > netmem->dma_addr is the one being filled with the mapping. the page argument provides a mapping, the netmem gives the object where the mapping is set. netmem could be the same as the page argument, but I don't think it's inherently wrong, and it's an internal helper anyway. I can entirely copy paste the function, I don't think it's anyhow an improvement. > Netmem is meant to be an abstraction over page. Passing both makes > little sense to me. The reason you're doing this is because the > io_uring memory provider is in a bit of a weird design IMO where the > memory comes in pages but it doesn't want to use paged-backed-netmem. Mina, as explained it before, I view it rather as an abstraction that helps with finer grained control over memory and extending it this way, I don't think it's such a stretch, and it doesn't change much for the networking stack overall. Not fitting into devmem TCP category doesn't make it weird. > Instead it uses net_iov-backed-netmem and there is an out of band page > to be managed. > > I think it would make sense to use paged-backed-netmem for your use > case, or at least I don't see why it wouldn't work. Memory providers It's a user page, we can't make assumptions about it, we can't reuse space in struct page like for pp refcounting (unlike when it's allocated by the kernel), we can't use the normal page refcounting. If that's the direction people prefer, we can roll back to v1 from a couple years ago, fill skbs fill user pages, attach ubuf_info to every skb, and whack-a-mole'ing all places where the page could be put down or such, pretty similarly what net_iov does. Honestly, I thought that reusing common infra so that the net stack doesn't need a different path per interface was a good idea. > were designed to handle the hugepage usecase where the memory > allocated by the provider is pages. Is there a reason that doesn't > work for you as well? > > If you really need to use net_iov-backed-netmem, can we put this > weirdness in the provider? I don't know that we want a generic-looking > dma_map function which is a bit confusing to take a netmem and a page.> ... >> + >> +static void page_pool_release_page_dma(struct page_pool *pool, >> + netmem_ref netmem) >> +{ >> + __page_pool_release_page_dma(pool, netmem); >> +} >> + > > Is this wrapper necessary? Do you wanna rename the original function > to remove the __ instead of a adding a wrapper? I only added it here to cast away __always_inline since it's used in a slow / setup path. It shouldn't change the binary, but I'm not a huge fan of leaving the hint for the code where it's not needed.
On Fri, Nov 1, 2024 at 11:16 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 11/1/24 17:33, Mina Almasry wrote: > > On Tue, Oct 29, 2024 at 4:06 PM David Wei <dw@davidwei.uk> wrote: > >> > >> From: Pavel Begunkov <asml.silence@gmail.com> > >> > >> Add a helper that takes an array of pages and initialises passed in > >> memory provider's area with them, where each net_iov takes one page. > >> It's also responsible for setting up dma mappings. > >> > >> We keep it in page_pool.c not to leak netmem details to outside > >> providers like io_uring, which don't have access to netmem_priv.h > >> and other private helpers. > >> > > > > I honestly prefer leaking netmem_priv.h details into the io_uring > > rather than having io_uring provider specific code in page_pool.c. > > Even though Jakub didn't comment on this patch, but he definitely > wasn't fond of giving all those headers to non net/ users. I guess > I can't please everyone. One middle option is to make the page > pool helper more granular, i.e. taking care of one netmem at > a time, and moving the loop to io_uring, but I don't think it > changes anything. > My issue is that these: +int page_pool_mp_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, + struct page **pages); +void page_pool_mp_release_area(struct page_pool *pool, Are masquerading as generic functions to be used by many mp but they're really io_uring specific. dmabuf and the hugepage provider would not use them AFAICT. Would have liked not to have code specific to one mp in page_pool.c, and I was asked to move the dmabuf specific functions to another file too IIRC. These helpers depend on: page_pool_set_pp_info: in netmem_priv.h net_iov_to_netmem(niov): in netmem.h page_pool_dma_map_page: can be put in page_pool/helpers.h? page_pool_release_page_dma(pool, netmem): can be put in page_pool/helpers.h? I would prefer moving page_pool_set_pp_info (and the stuff it calls into) to netmem.h and removing io_uring mp specific code from page_pool.c. > ... > >> #include <linux/dma-direction.h> > >> @@ -459,7 +460,8 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, > >> __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); > >> } > >> > >> -static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) > >> +static bool page_pool_dma_map_page(struct page_pool *pool, netmem_ref netmem, > >> + struct page *page) > > > > I have to say this is confusing for me. Passing in both the netmem and > > the page is weird. The page is the one being mapped and the > > netmem->dma_addr is the one being filled with the mapping. > > the page argument provides a mapping, the netmem gives the object > where the mapping is set. netmem could be the same as the page > argument, but I don't think it's inherently wrong, and it's an > internal helper anyway. I can entirely copy paste the function, I > don't think it's anyhow an improvement. > > > Netmem is meant to be an abstraction over page. Passing both makes > > little sense to me. The reason you're doing this is because the > > io_uring memory provider is in a bit of a weird design IMO where the > > memory comes in pages but it doesn't want to use paged-backed-netmem. > > Mina, as explained it before, I view it rather as an abstraction > that helps with finer grained control over memory and extending > it this way, I don't think it's such a stretch, and it doesn't > change much for the networking stack overall. Not fitting into > devmem TCP category doesn't make it weird. > > > Instead it uses net_iov-backed-netmem and there is an out of band page > > to be managed. > > > > I think it would make sense to use paged-backed-netmem for your use > > case, or at least I don't see why it wouldn't work. Memory providers > > It's a user page, we can't make assumptions about it, we can't > reuse space in struct page like for pp refcounting (unlike when > it's allocated by the kernel), we can't use the normal page > refcounting. > You actually can't reuse space in struct net_iov either for your own refcounting, because niov->pp_ref_count is a mirror to page->pp_ref_count and strictly follows the patterns of that. But that's the issue to be discussed on the other patch... > If that's the direction people prefer, we can roll back to v1 from > a couple years ago, fill skbs fill user pages, attach ubuf_info to > every skb, and whack-a-mole'ing all places where the page could be > put down or such, pretty similarly what net_iov does. Honestly, I > thought that reusing common infra so that the net stack doesn't > need a different path per interface was a good idea. > The common infra does support page-backed-netmem actually. > > were designed to handle the hugepage usecase where the memory > > allocated by the provider is pages. Is there a reason that doesn't > > work for you as well? > > > > If you really need to use net_iov-backed-netmem, can we put this > > weirdness in the provider? I don't know that we want a generic-looking > > dma_map function which is a bit confusing to take a netmem and a page.> > ... > >> + > >> +static void page_pool_release_page_dma(struct page_pool *pool, > >> + netmem_ref netmem) > >> +{ > >> + __page_pool_release_page_dma(pool, netmem); > >> +} > >> + > > > > Is this wrapper necessary? Do you wanna rename the original function > > to remove the __ instead of a adding a wrapper? > > I only added it here to cast away __always_inline since it's used in > a slow / setup path. It shouldn't change the binary, but I'm not a huge > fan of leaving the hint for the code where it's not needed. > Right, it probably makes sense to make the modifications you want on the original function rather than create a no-op wrapper to remove the __always_inline. -- Thanks, Mina
On 11/5/24 23:34, Mina Almasry wrote: > On Fri, Nov 1, 2024 at 11:16 AM Pavel Begunkov <asml.silence@gmail.com> wrote: ... >> Even though Jakub didn't comment on this patch, but he definitely >> wasn't fond of giving all those headers to non net/ users. I guess >> I can't please everyone. One middle option is to make the page >> pool helper more granular, i.e. taking care of one netmem at >> a time, and moving the loop to io_uring, but I don't think it ^^^ >> changes anything. >> > > My issue is that these: > > +int page_pool_mp_init_paged_area(struct page_pool *pool, > + struct net_iov_area *area, > + struct page **pages); > +void page_pool_mp_release_area(struct page_pool *pool, > > Are masquerading as generic functions to be used by many mp but > they're really io_uring specific. dmabuf and the hugepage provider > would not use them AFAICT. Would have liked not to have code specific > to one mp in page_pool.c, and I was asked to move the dmabuf specific > functions to another file too IIRC. > > These helpers depend on: > > page_pool_set_pp_info: in netmem_priv.h > net_iov_to_netmem(niov): in netmem.h > page_pool_dma_map_page: can be put in page_pool/helpers.h? > page_pool_release_page_dma(pool, netmem): can be put in page_pool/helpers.h? > > I would prefer moving page_pool_set_pp_info (and the stuff it calls > into) to netmem.h and removing io_uring mp specific code from > page_pool.c. I just already described it above but somewhat less detailed. FWIW, page_pool_dma_map_page() + page_pool_set_pp_info() has to be combined into a single function as separately they don't make a good public API. I can change it this way, but ultimately there is no much difference, it needs to export functions that io_uring can use. Does it make a better API? I doubt it, and it can be changed later anyway. Let's not block the patchset for minor bikeshedding. ... >>> Instead it uses net_iov-backed-netmem and there is an out of band page >>> to be managed. >>> >>> I think it would make sense to use paged-backed-netmem for your use >>> case, or at least I don't see why it wouldn't work. Memory providers >> >> It's a user page, we can't make assumptions about it, we can't >> reuse space in struct page like for pp refcounting (unlike when >> it's allocated by the kernel), we can't use the normal page >> refcounting. >> > > You actually can't reuse space in struct net_iov either for your own > refcounting, because niov->pp_ref_count is a mirror to > page->pp_ref_count and strictly follows the patterns of that. But > that's the issue to be discussed on the other patch... Right, which is why it's used for usual buffer refcounting that page pool / net stack recognises. It's also not a problem to extend it for performance reasons, those are internals and can be changed and there are ways to change it. >> If that's the direction people prefer, we can roll back to v1 from >> a couple years ago, fill skbs fill user pages, attach ubuf_info to >> every skb, and whack-a-mole'ing all places where the page could be >> put down or such, pretty similarly what net_iov does. Honestly, I >> thought that reusing common infra so that the net stack doesn't >> need a different path per interface was a good idea. >> > > The common infra does support page-backed-netmem actually. Please read again why user pages can't be passed directly, if you take a look at struct page and where pp_ref_count and other bits are, you'll find a huge union. >>> were designed to handle the hugepage usecase where the memory >>> allocated by the provider is pages. Is there a reason that doesn't >>> work for you as well? >>> >>> If you really need to use net_iov-backed-netmem, can we put this >>> weirdness in the provider? I don't know that we want a generic-looking >>> dma_map function which is a bit confusing to take a netmem and a page.> >> ... >>>> + >>>> +static void page_pool_release_page_dma(struct page_pool *pool, >>>> + netmem_ref netmem) >>>> +{ >>>> + __page_pool_release_page_dma(pool, netmem); >>>> +} >>>> + >>> >>> Is this wrapper necessary? Do you wanna rename the original function >>> to remove the __ instead of a adding a wrapper? >> >> I only added it here to cast away __always_inline since it's used in >> a slow / setup path. It shouldn't change the binary, but I'm not a huge >> fan of leaving the hint for the code where it's not needed. >> > > Right, it probably makes sense to make the modifications you want on > the original function rather than create a no-op wrapper to remove the > __always_inline. Removing __always_inline from a function deemed hot enough to need the attribute is not a good idea, not in an unrelated feature patchset. FWIW, it's not a new practice either. If maintainers will insist on removing it I'll do.
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h new file mode 100644 index 000000000000..83d7eec0058d --- /dev/null +++ b/include/net/page_pool/memory_provider.h @@ -0,0 +1,10 @@ +#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H +#define _NET_PAGE_POOL_MEMORY_PROVIDER_H + +int page_pool_mp_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, + struct page **pages); +void page_pool_mp_release_area(struct page_pool *pool, + struct net_iov_area *area); + +#endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 9a675e16e6a4..8bd4a3c80726 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -13,6 +13,7 @@ #include <net/netdev_rx_queue.h> #include <net/page_pool/helpers.h> +#include <net/page_pool/memory_provider.h> #include <net/xdp.h> #include <linux/dma-direction.h> @@ -459,7 +460,8 @@ page_pool_dma_sync_for_device(const struct page_pool *pool, __page_pool_dma_sync_for_device(pool, netmem, dma_sync_size); } -static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) +static bool page_pool_dma_map_page(struct page_pool *pool, netmem_ref netmem, + struct page *page) { dma_addr_t dma; @@ -468,7 +470,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) * into page private data (i.e 32bit cpu with 64bit DMA caps) * This mapping is kept for lifetime of page, until leaving pool. */ - dma = dma_map_page_attrs(pool->p.dev, netmem_to_page(netmem), 0, + dma = dma_map_page_attrs(pool->p.dev, page, 0, (PAGE_SIZE << pool->p.order), pool->p.dma_dir, DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING); @@ -490,6 +492,11 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } +static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) +{ + return page_pool_dma_map_page(pool, netmem, netmem_to_page(netmem)); +} + static struct page *__page_pool_alloc_page_order(struct page_pool *pool, gfp_t gfp) { @@ -1154,3 +1161,55 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid) } } EXPORT_SYMBOL(page_pool_update_nid); + +static void page_pool_release_page_dma(struct page_pool *pool, + netmem_ref netmem) +{ + __page_pool_release_page_dma(pool, netmem); +} + +int page_pool_mp_init_paged_area(struct page_pool *pool, + struct net_iov_area *area, + struct page **pages) +{ + struct net_iov *niov; + netmem_ref netmem; + int i, ret = 0; + + if (!pool->dma_map) + return -EOPNOTSUPP; + + for (i = 0; i < area->num_niovs; i++) { + niov = &area->niovs[i]; + netmem = net_iov_to_netmem(niov); + + page_pool_set_pp_info(pool, netmem); + if (!page_pool_dma_map_page(pool, netmem, pages[i])) { + ret = -EINVAL; + goto err_unmap_dma; + } + } + return 0; + +err_unmap_dma: + while (i--) { + netmem = net_iov_to_netmem(&area->niovs[i]); + page_pool_release_page_dma(pool, netmem); + } + return ret; +} + +void page_pool_mp_release_area(struct page_pool *pool, + struct net_iov_area *area) +{ + int i; + + if (!pool->dma_map) + return; + + for (i = 0; i < area->num_niovs; i++) { + struct net_iov *niov = &area->niovs[i]; + + page_pool_release_page_dma(pool, net_iov_to_netmem(niov)); + } +}