Message ID | 20240805212536.2172174-8-almasrymina@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Device Memory TCP | expand |
On Mon, 5 Aug 2024 21:25:20 +0000 Mina Almasry wrote: > + if (pool->p.queue) { > + /* We rely on rtnl_lock()ing to make sure netdev_rx_queue > + * configuration doesn't change while we're initializing the > + * page_pool. > + */ > + ASSERT_RTNL(); > + pool->mp_priv = pool->p.queue->mp_params.mp_priv; How do you know that the driver: - supports net_iov at all (let's not make implicit assumptions based on presence of queue API); - supports net_iov in current configuration (eg header-data split is enabled) - supports net_iov for _this_ pool (all drivers must have separate buffer pools for headers and data for this to work, some will use page pool for both) What comes to mind is adding an "I can gobble up net_iovs from this pool" flag in page pool params (the struct that comes from the driver), and then on the installation path we can check if after queue reset the refcount of the binding has increased. If it did - driver has created a pool as we expected, otherwise - fail, something must be off. Maybe that's a bit hacky?
On Tue, Aug 6, 2024 at 4:59 PM Jakub Kicinski <kuba@kernel.org> wrote: > ... > On Mon, 5 Aug 2024 21:25:20 +0000 Mina Almasry wrote: > > + if (pool->p.queue) { > > + /* We rely on rtnl_lock()ing to make sure netdev_rx_queue > > + * configuration doesn't change while we're initializing the > > + * page_pool. > > + */ > > + ASSERT_RTNL(); > > + pool->mp_priv = pool->p.queue->mp_params.mp_priv; > Hi Jakub, Sorry for the late reply, it took a bit of code reading to understand what you mean with the deactivation request on the other patch, but I think I got it down and have a patch on the way. > How do you know that the driver: > - supports net_iov at all (let's not make implicit assumptions based > on presence of queue API); > - supports net_iov in current configuration (eg header-data split is > enabled) > - supports net_iov for _this_ pool (all drivers must have separate > buffer pools for headers and data for this to work, some will use > page pool for both) > > What comes to mind is adding an "I can gobble up net_iovs from this > pool" flag in page pool params (the struct that comes from the driver), This already sorta exists in the current iteration, although maybe in an implicit way. As written, drivers need to set params.queue, otherwise core will not attempt to grab the mp information from params.queue. A driver can set params.queue for its data pages pool and not set it for the headers pool. AFAICT that deals with all 3 issues you present above. The awkward part is if params.queue starts getting used for other reasons rather than passing mp configuration, but as of today that's not the case so I didn't add the secondary flag. If you want a second flag to be added preemptively, I can do that, no problem. Can you confirm params.queue is not good enough? > and then on the installation path we can check if after queue reset > the refcount of the binding has increased. If it did - driver has > created a pool as we expected, otherwise - fail, something must be off. > Maybe that's a bit hacky? What's missing is for core to check at binding time that the driver supports net_iov. I had relied on the implicit presence of the queue-API. What you're proposing works, but AFAICT it's quite hacky, yes. I basically need to ASSERT_RTNL in net_devmem_binding_get() to ensure nothing can increment the refcount while the binding is happening so that the refcount check is valid. I think a less hacky approach is to add a function to the queue-API like ndo_queue_supported_features(), which lets the driver declare that it supports net_iov at a given rx queue. However I'm open to both approaches. What do you prefer?
On Thu, 8 Aug 2024 16:36:24 -0400 Mina Almasry wrote: > > How do you know that the driver: > > - supports net_iov at all (let's not make implicit assumptions based > > on presence of queue API); > > - supports net_iov in current configuration (eg header-data split is > > enabled) > > - supports net_iov for _this_ pool (all drivers must have separate > > buffer pools for headers and data for this to work, some will use > > page pool for both) > > > > What comes to mind is adding an "I can gobble up net_iovs from this > > pool" flag in page pool params (the struct that comes from the driver), > > This already sorta exists in the current iteration, although maybe in > an implicit way. As written, drivers need to set params.queue, > otherwise core will not attempt to grab the mp information from > params.queue. A driver can set params.queue for its data pages pool > and not set it for the headers pool. AFAICT that deals with all 3 > issues you present above. > > The awkward part is if params.queue starts getting used for other > reasons rather than passing mp configuration, but as of today that's > not the case so I didn't add the secondary flag. If you want a second > flag to be added preemptively, I can do that, no problem. Can you > confirm params.queue is not good enough? I'd prefer a flag. The setting queue in a param struct is not a good API for conveying that the page pool is for netmem payloads only. > > and then on the installation path we can check if after queue reset > > the refcount of the binding has increased. If it did - driver has > > created a pool as we expected, otherwise - fail, something must be off. > > Maybe that's a bit hacky? > > What's missing is for core to check at binding time that the driver > supports net_iov. I had relied on the implicit presence of the > queue-API. > > What you're proposing works, but AFAICT it's quite hacky, yes. I > basically need to ASSERT_RTNL in net_devmem_binding_get() to ensure > nothing can increment the refcount while the binding is happening so > that the refcount check is valid. True. Shooting from the hip, but we could walk the page pools of the netdev and find the one that has the right mp installed, and matches queue? The page pools are on a list hooked up to the netdev, trivial to walk. > I think a less hacky approach is to add a function to the queue-API > like ndo_queue_supported_features(), which lets the driver declare > that it supports net_iov at a given rx queue. However I'm open to both > approaches. What do you prefer? I kinda like trying to query the page pools more, because it's both fewer driver changes, and it actually validates that the driver did the right thing based on outcomes. Driver callback may have bugs. If you prefer strongly - fine, but hm.
On Thu, Aug 8, 2024 at 10:24 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 8 Aug 2024 16:36:24 -0400 Mina Almasry wrote: > > > How do you know that the driver: > > > - supports net_iov at all (let's not make implicit assumptions based > > > on presence of queue API); > > > - supports net_iov in current configuration (eg header-data split is > > > enabled) > > > - supports net_iov for _this_ pool (all drivers must have separate > > > buffer pools for headers and data for this to work, some will use > > > page pool for both) > > > > > > What comes to mind is adding an "I can gobble up net_iovs from this > > > pool" flag in page pool params (the struct that comes from the driver), > > > > This already sorta exists in the current iteration, although maybe in > > an implicit way. As written, drivers need to set params.queue, > > otherwise core will not attempt to grab the mp information from > > params.queue. A driver can set params.queue for its data pages pool > > and not set it for the headers pool. AFAICT that deals with all 3 > > issues you present above. > > > > The awkward part is if params.queue starts getting used for other > > reasons rather than passing mp configuration, but as of today that's > > not the case so I didn't add the secondary flag. If you want a second > > flag to be added preemptively, I can do that, no problem. Can you > > confirm params.queue is not good enough? > > I'd prefer a flag. The setting queue in a param struct is not a good > API for conveying that the page pool is for netmem payloads only. > > > > and then on the installation path we can check if after queue reset > > > the refcount of the binding has increased. If it did - driver has > > > created a pool as we expected, otherwise - fail, something must be off. > > > Maybe that's a bit hacky? > > > > What's missing is for core to check at binding time that the driver > > supports net_iov. I had relied on the implicit presence of the > > queue-API. > > > > What you're proposing works, but AFAICT it's quite hacky, yes. I > > basically need to ASSERT_RTNL in net_devmem_binding_get() to ensure > > nothing can increment the refcount while the binding is happening so > > that the refcount check is valid. > > True. Shooting from the hip, but we could walk the page pools of the > netdev and find the one that has the right mp installed, and matches > queue? The page pools are on a list hooked up to the netdev, trivial > to walk. > I think this is good, and it doesn't seem hacky to me, because we can check the page_pools of the netdev while we hold rtnl, so we can be sure nothing is messing with the pp configuration in the meantime. Like you say below it does validate the driver rather than rely on the driver saying it's doing the right thing. I'll look into putting this in the next version. -- Thanks, Mina
On 8/9/24 15:10, Mina Almasry wrote: > On Thu, Aug 8, 2024 at 10:24 PM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Thu, 8 Aug 2024 16:36:24 -0400 Mina Almasry wrote: >>>> How do you know that the driver: >>>> - supports net_iov at all (let's not make implicit assumptions based >>>> on presence of queue API); >>>> - supports net_iov in current configuration (eg header-data split is >>>> enabled) >>>> - supports net_iov for _this_ pool (all drivers must have separate >>>> buffer pools for headers and data for this to work, some will use >>>> page pool for both) >>>> >>>> What comes to mind is adding an "I can gobble up net_iovs from this >>>> pool" flag in page pool params (the struct that comes from the driver), >>> >>> This already sorta exists in the current iteration, although maybe in >>> an implicit way. As written, drivers need to set params.queue, >>> otherwise core will not attempt to grab the mp information from >>> params.queue. A driver can set params.queue for its data pages pool >>> and not set it for the headers pool. AFAICT that deals with all 3 >>> issues you present above. >>> >>> The awkward part is if params.queue starts getting used for other >>> reasons rather than passing mp configuration, but as of today that's >>> not the case so I didn't add the secondary flag. If you want a second >>> flag to be added preemptively, I can do that, no problem. Can you >>> confirm params.queue is not good enough? >> >> I'd prefer a flag. The setting queue in a param struct is not a good >> API for conveying that the page pool is for netmem payloads only. >> >>>> and then on the installation path we can check if after queue reset >>>> the refcount of the binding has increased. If it did - driver has >>>> created a pool as we expected, otherwise - fail, something must be off. >>>> Maybe that's a bit hacky? >>> >>> What's missing is for core to check at binding time that the driver >>> supports net_iov. I had relied on the implicit presence of the >>> queue-API. >>> >>> What you're proposing works, but AFAICT it's quite hacky, yes. I >>> basically need to ASSERT_RTNL in net_devmem_binding_get() to ensure >>> nothing can increment the refcount while the binding is happening so >>> that the refcount check is valid. >> >> True. Shooting from the hip, but we could walk the page pools of the >> netdev and find the one that has the right mp installed, and matches >> queue? The page pools are on a list hooked up to the netdev, trivial >> to walk. >> > > I think this is good, and it doesn't seem hacky to me, because we can > check the page_pools of the netdev while we hold rtnl, so we can be > sure nothing is messing with the pp configuration in the meantime. > Like you say below it does validate the driver rather than rely on the > driver saying it's doing the right thing. I'll look into putting this > in the next version. Why not have a flag set by the driver and advertising whether it supports providers or not, which should be checked for instance in netdev_rx_queue_restart()? If set, the driver should do the right thing. That's in addition to a new pp_params flag explicitly telling if pp should use providers. It's more explicit and feels a little less hacky.
On Fri, 9 Aug 2024 16:45:50 +0100 Pavel Begunkov wrote: > > I think this is good, and it doesn't seem hacky to me, because we can > > check the page_pools of the netdev while we hold rtnl, so we can be > > sure nothing is messing with the pp configuration in the meantime. > > Like you say below it does validate the driver rather than rely on the > > driver saying it's doing the right thing. I'll look into putting this > > in the next version. > > Why not have a flag set by the driver and advertising whether it > supports providers or not, which should be checked for instance in > netdev_rx_queue_restart()? If set, the driver should do the right > thing. That's in addition to a new pp_params flag explicitly telling > if pp should use providers. It's more explicit and feels a little > less hacky. You mean like I suggested in the previous two emails? :) Given how easy the check is to implement, I think it's worth adding as a sanity check. But the flag should be the main API, if the sanity check starts to be annoying we'll ditch it.
On Fri, Aug 9, 2024 at 11:52 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 9 Aug 2024 16:45:50 +0100 Pavel Begunkov wrote: > > > I think this is good, and it doesn't seem hacky to me, because we can > > > check the page_pools of the netdev while we hold rtnl, so we can be > > > sure nothing is messing with the pp configuration in the meantime. > > > Like you say below it does validate the driver rather than rely on the > > > driver saying it's doing the right thing. I'll look into putting this > > > in the next version. > > > > Why not have a flag set by the driver and advertising whether it > > supports providers or not, which should be checked for instance in > > netdev_rx_queue_restart()? If set, the driver should do the right > > thing. That's in addition to a new pp_params flag explicitly telling > > if pp should use providers. It's more explicit and feels a little > > less hacky. > > You mean like I suggested in the previous two emails? :) > > Given how easy the check is to implement, I think it's worth > adding as a sanity check. But the flag should be the main API, > if the sanity check starts to be annoying we'll ditch it. I think we're talking about 2 slightly different flags, AFAIU. Pavel and I are suggesting the driver reports "I support memory providers" directly to core (via the queue-api or what not), and we check that flag directly in netdev_rx_queue_restart(), and fail immediately if the support is not there. Jakub is suggesting a page_pool_params flag which lets the driver report "I support memory providers". If the driver doesn't support it but core is trying to configure that, then the page_pool_create will fail, which will cause the queue API operation (ndo_queue_alloc_mem_alloc) to fail, which causes netdev_rx_queue_restart() to fail. Both are fine, I don't see any extremely strong reason to pick one of the other. I prefer Jakub's suggestion, just because it's closer to the page_pool and may be more reusable in the future. I'll err on the side of that unless I hear strong preference to the contrary. I also think the additional check that Jakub is requesting is easy to implement and unobjectionable. It would let core validate that the driver did actually create the page_pool with the memory provider. I think one of the goals of the queue API was to allow core to do more validation on driver configuration anyway.
On 8/11/24 03:21, Mina Almasry wrote: > On Fri, Aug 9, 2024 at 11:52 PM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Fri, 9 Aug 2024 16:45:50 +0100 Pavel Begunkov wrote: >>>> I think this is good, and it doesn't seem hacky to me, because we can >>>> check the page_pools of the netdev while we hold rtnl, so we can be >>>> sure nothing is messing with the pp configuration in the meantime. >>>> Like you say below it does validate the driver rather than rely on the >>>> driver saying it's doing the right thing. I'll look into putting this >>>> in the next version. >>> >>> Why not have a flag set by the driver and advertising whether it >>> supports providers or not, which should be checked for instance in >>> netdev_rx_queue_restart()? If set, the driver should do the right >>> thing. That's in addition to a new pp_params flag explicitly telling >>> if pp should use providers. It's more explicit and feels a little >>> less hacky. >> >> You mean like I suggested in the previous two emails? :) >> >> Given how easy the check is to implement, I think it's worth >> adding as a sanity check. But the flag should be the main API, >> if the sanity check starts to be annoying we'll ditch it. > > I think we're talking about 2 slightly different flags, AFAIU.> > Pavel and I are suggesting the driver reports "I support memory > providers" directly to core (via the queue-api or what not), and we > check that flag directly in netdev_rx_queue_restart(), and fail > immediately if the support is not there. I might've misread Jakub, but yes, I believe it's different. It'd communicate about support for providers to upper layers, so we can fail even before attempting to allocate a new queue and init a page pool. > Jakub is suggesting a page_pool_params flag which lets the driver > report "I support memory providers". If the driver doesn't support it > but core is trying to configure that, then the page_pool_create will > fail, which will cause the queue API operation > (ndo_queue_alloc_mem_alloc) to fail, which causes > netdev_rx_queue_restart() to fail. And I'm not against this way either if we explicitly get an error back instead of trying to figure it out post-factum like by checking the references and possibly reverting the allocation. Maybe that's where I was confused, and that refcount thing was suggested as a WARN_ONCE? FWIW, I think it warrants two flags. The first saying that the driver supports providers at all: page_pool_init() { if (rxq->mp_params) if (!(flags & PP_PROVIDERS_SUPPORTED)) goto fail; } And the second telling whether the driver wants to install providers for this particular page pool, so if there is a separate pool for headers we can set it with plain old kernel pages. payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | PP_IGNORE_PROVIDERS); (or invert the flag). That's assuming page_pool_params::queue is a generic thing and we don't want to draw equivalence between it and memory providers.
On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote: > > I think we're talking about 2 slightly different flags, AFAIU.> > > Pavel and I are suggesting the driver reports "I support memory > > providers" directly to core (via the queue-api or what not), and we > > check that flag directly in netdev_rx_queue_restart(), and fail > > immediately if the support is not there. > > I might've misread Jakub, but yes, I believe it's different. It'd > communicate about support for providers to upper layers, so we can > fail even before attempting to allocate a new queue and init a > page pool. Got it. Since allocating memory happens before stopping traffic I think it's acceptable to stick to a single flag. > > Jakub is suggesting a page_pool_params flag which lets the driver > > report "I support memory providers". If the driver doesn't support it > > but core is trying to configure that, then the page_pool_create will > > fail, which will cause the queue API operation > > (ndo_queue_alloc_mem_alloc) to fail, which causes > > netdev_rx_queue_restart() to fail. > > And I'm not against this way either if we explicitly get an error > back instead of trying to figure it out post-factum like by > checking the references and possibly reverting the allocation. > Maybe that's where I was confused, and that refcount thing was > suggested as a WARN_ONCE? Yup, the refcount (now: check of the page pool list) was meant as a WARN_ONCE() to catch bad drivers. > FWIW, I think it warrants two flags. The first saying that the > driver supports providers at all: > > page_pool_init() { > if (rxq->mp_params) > if (!(flags & PP_PROVIDERS_SUPPORTED)) > goto fail; > } > > And the second telling whether the driver wants to install > providers for this particular page pool, so if there is a > separate pool for headers we can set it with plain old kernel > pages. The implementation of the queue API should be resilient against failures in alloc, and not being MP capable is just a form of alloc failure. I don't see the upside of double-flag. > payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); > header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | > PP_IGNORE_PROVIDERS); Also don't see the upside of the explicit "non-capable" flag, but I haven't thought of that. Is there any use? One important note. The flag should not be tied to memory providers but rather to netmem, IOW unreadable memory. MP is an internal detail, the important fact from the driver-facing API perspective is that the driver doesn't need struct pages. > (or invert the flag). That's assuming page_pool_params::queue is > a generic thing and we don't want to draw equivalence between > it and memory providers.
On Mon, Aug 12, 2024 at 1:57 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote: > > > I think we're talking about 2 slightly different flags, AFAIU.> > > > Pavel and I are suggesting the driver reports "I support memory > > > providers" directly to core (via the queue-api or what not), and we > > > check that flag directly in netdev_rx_queue_restart(), and fail > > > immediately if the support is not there. > > > > I might've misread Jakub, but yes, I believe it's different. It'd > > communicate about support for providers to upper layers, so we can > > fail even before attempting to allocate a new queue and init a > > page pool. > > Got it. Since allocating memory happens before stopping traffic > I think it's acceptable to stick to a single flag. > > > > Jakub is suggesting a page_pool_params flag which lets the driver > > > report "I support memory providers". If the driver doesn't support it > > > but core is trying to configure that, then the page_pool_create will > > > fail, which will cause the queue API operation > > > (ndo_queue_alloc_mem_alloc) to fail, which causes > > > netdev_rx_queue_restart() to fail. > > > > And I'm not against this way either if we explicitly get an error > > back instead of trying to figure it out post-factum like by > > checking the references and possibly reverting the allocation. > > Maybe that's where I was confused, and that refcount thing was > > suggested as a WARN_ONCE? > > Yup, the refcount (now: check of the page pool list) was meant > as a WARN_ONCE() to catch bad drivers. > > > FWIW, I think it warrants two flags. The first saying that the > > driver supports providers at all: > > > > page_pool_init() { > > if (rxq->mp_params) > > if (!(flags & PP_PROVIDERS_SUPPORTED)) > > goto fail; > > } > > > > And the second telling whether the driver wants to install > > providers for this particular page pool, so if there is a > > separate pool for headers we can set it with plain old kernel > > pages. > > The implementation of the queue API should be resilient against > failures in alloc, and not being MP capable is just a form of > alloc failure. I don't see the upside of double-flag. > > > payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); > > header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | > > PP_IGNORE_PROVIDERS); > > Also don't see the upside of the explicit "non-capable" flag, > but I haven't thought of that. Is there any use? > There are 2 things we're trying to accomplish: 1. Drivers need to be able to say "I support unreadable netmem". Failure to report unreadable netmem support should cause the netlink API to fail when the user tries to bind dmabuf/io uring memory. 2. Drivers need to be able to say "I want a header pool (with readable netmem)" or "I want a data pool (potentially with unreadable netmem)". Pavel is suggesting implementing both of these in 2 different flags. Jakub is suggesting implementing both with 1 flag which says "I can support unreadable netmem for this pool" , and guarding against #1 with a refcount check to detect if a dmabuf pool should have been created but wasn't. I prefer Jakub's suggestion, but beware that if we go with Jakub's suggestion, we can't WARN_ON when the core-net check fails, because it's not a kernel warning. All that's happening is that the user asked for dmabuf binding but the driver didn't ask for it (because likely it doesn't support it). That's not cause for a warning to be printed in the kernel. The netlink API should just fail and return -EOPNOTSUPP or something. > One important note. The flag should not be tied to memory providers > but rather to netmem, IOW unreadable memory. MP is an internal detail, > the important fact from the driver-facing API perspective is that the > driver doesn't need struct pages. > Agreed.
On 8/12/24 18:57, Jakub Kicinski wrote: > On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote: >>> I think we're talking about 2 slightly different flags, AFAIU.> >>> Pavel and I are suggesting the driver reports "I support memory >>> providers" directly to core (via the queue-api or what not), and we >>> check that flag directly in netdev_rx_queue_restart(), and fail >>> immediately if the support is not there. >> >> I might've misread Jakub, but yes, I believe it's different. It'd >> communicate about support for providers to upper layers, so we can >> fail even before attempting to allocate a new queue and init a >> page pool. > > Got it. Since allocating memory happens before stopping traffic > I think it's acceptable to stick to a single flag. I agree, it's an failure case of init path, shouldn't be a problem. >>> Jakub is suggesting a page_pool_params flag which lets the driver >>> report "I support memory providers". If the driver doesn't support it >>> but core is trying to configure that, then the page_pool_create will >>> fail, which will cause the queue API operation >>> (ndo_queue_alloc_mem_alloc) to fail, which causes >>> netdev_rx_queue_restart() to fail. >> >> And I'm not against this way either if we explicitly get an error >> back instead of trying to figure it out post-factum like by >> checking the references and possibly reverting the allocation. >> Maybe that's where I was confused, and that refcount thing was >> suggested as a WARN_ONCE? > > Yup, the refcount (now: check of the page pool list) was meant > as a WARN_ONCE() to catch bad drivers. > >> FWIW, I think it warrants two flags. The first saying that the >> driver supports providers at all: >> >> page_pool_init() { >> if (rxq->mp_params) >> if (!(flags & PP_PROVIDERS_SUPPORTED)) >> goto fail; >> } >> >> And the second telling whether the driver wants to install >> providers for this particular page pool, so if there is a >> separate pool for headers we can set it with plain old kernel >> pages. > > The implementation of the queue API should be resilient against > failures in alloc, and not being MP capable is just a form of > alloc failure. I don't see the upside of double-flag. > >> payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); >> header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | >> PP_IGNORE_PROVIDERS); > > Also don't see the upside of the explicit "non-capable" flag, > but I haven't thought of that. Is there any use? Considering that we pass a mp to page pool indirectly via a queue rxq->mp_param = devmemtcp; ... // later pp_params.queue = rxq; page_pool_create(pp_params); How can we create a separate header pool without mp and let the other data pool be initialized with mp? We can make sure pp_params.queue is NULL, but API isn't sound, I think the plans are to reuse the queue argument for some other purposes. param_tmp = rxq->mp_param; rxq->mp_param = NULL; page_pool_create(pp_params); rxq->mp_param = param_tmp; Temporarily erasing mp_param is another option, but I think by far the simplest is another flag. > One important note. The flag should not be tied to memory providers > but rather to netmem, IOW unreadable memory. MP is an internal detail, > the important fact from the driver-facing API perspective is that the > driver doesn't need struct pages. Sure, didn't want to go into describing how these are not the same thing. Naming should reflect it, and if we expect providers that don't produce net_iov, we might want to have a flag in mp_param or so.
On 8/12/24 19:57, Pavel Begunkov wrote: > On 8/12/24 18:57, Jakub Kicinski wrote: >> On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote: >>>> I think we're talking about 2 slightly different flags, AFAIU.> >>>> Pavel and I are suggesting the driver reports "I support memory >>>> providers" directly to core (via the queue-api or what not), and we >>>> check that flag directly in netdev_rx_queue_restart(), and fail >>>> immediately if the support is not there. >>> >>> I might've misread Jakub, but yes, I believe it's different. It'd >>> communicate about support for providers to upper layers, so we can >>> fail even before attempting to allocate a new queue and init a >>> page pool. >> >> Got it. Since allocating memory happens before stopping traffic >> I think it's acceptable to stick to a single flag. > > I agree, it's an failure case of init path, shouldn't be > a problem. > > >>>> Jakub is suggesting a page_pool_params flag which lets the driver >>>> report "I support memory providers". If the driver doesn't support it >>>> but core is trying to configure that, then the page_pool_create will >>>> fail, which will cause the queue API operation >>>> (ndo_queue_alloc_mem_alloc) to fail, which causes >>>> netdev_rx_queue_restart() to fail. >>> >>> And I'm not against this way either if we explicitly get an error >>> back instead of trying to figure it out post-factum like by >>> checking the references and possibly reverting the allocation. >>> Maybe that's where I was confused, and that refcount thing was >>> suggested as a WARN_ONCE? >> >> Yup, the refcount (now: check of the page pool list) was meant >> as a WARN_ONCE() to catch bad drivers. >> >>> FWIW, I think it warrants two flags. The first saying that the >>> driver supports providers at all: >>> >>> page_pool_init() { >>> if (rxq->mp_params) >>> if (!(flags & PP_PROVIDERS_SUPPORTED)) >>> goto fail; >>> } >>> >>> And the second telling whether the driver wants to install >>> providers for this particular page pool, so if there is a >>> separate pool for headers we can set it with plain old kernel >>> pages. >> >> The implementation of the queue API should be resilient against >> failures in alloc, and not being MP capable is just a form of >> alloc failure. I don't see the upside of double-flag. >> >>> payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); >>> header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | >>> PP_IGNORE_PROVIDERS); >> >> Also don't see the upside of the explicit "non-capable" flag, >> but I haven't thought of that. Is there any use? Or maybe I don't get what you're asking, I explained why to have that "PP_IGNORE_PROVIDERS" on top of the flag saying that it's supported. Which "non-capable" flag you have in mind? A page pool create flag or one facing upper layers like devmem tcp? > Considering that we pass a mp to page pool indirectly via > a queue > > rxq->mp_param = devmemtcp; > ... // later > pp_params.queue = rxq; > page_pool_create(pp_params); > > How can we create a separate header pool without mp and let > the other data pool be initialized with mp? We can make > sure pp_params.queue is NULL, but API isn't sound, I think > the plans are to reuse the queue argument for some other > purposes. > > param_tmp = rxq->mp_param; > rxq->mp_param = NULL; > page_pool_create(pp_params); > rxq->mp_param = param_tmp; > > Temporarily erasing mp_param is another option, but I think > by far the simplest is another flag. > >> One important note. The flag should not be tied to memory providers >> but rather to netmem, IOW unreadable memory. MP is an internal detail, >> the important fact from the driver-facing API perspective is that the >> driver doesn't need struct pages. > > Sure, didn't want to go into describing how these are not > the same thing. Naming should reflect it, and if we expect > providers that don't produce net_iov, we might want to have > a flag in mp_param or so. >
On 8/12/24 19:55, Mina Almasry wrote: > On Mon, Aug 12, 2024 at 1:57 PM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote: >>>> I think we're talking about 2 slightly different flags, AFAIU.> >>>> Pavel and I are suggesting the driver reports "I support memory >>>> providers" directly to core (via the queue-api or what not), and we >>>> check that flag directly in netdev_rx_queue_restart(), and fail >>>> immediately if the support is not there. >>> >>> I might've misread Jakub, but yes, I believe it's different. It'd >>> communicate about support for providers to upper layers, so we can >>> fail even before attempting to allocate a new queue and init a >>> page pool. >> >> Got it. Since allocating memory happens before stopping traffic >> I think it's acceptable to stick to a single flag. >> >>>> Jakub is suggesting a page_pool_params flag which lets the driver >>>> report "I support memory providers". If the driver doesn't support it >>>> but core is trying to configure that, then the page_pool_create will >>>> fail, which will cause the queue API operation >>>> (ndo_queue_alloc_mem_alloc) to fail, which causes >>>> netdev_rx_queue_restart() to fail. >>> >>> And I'm not against this way either if we explicitly get an error >>> back instead of trying to figure it out post-factum like by >>> checking the references and possibly reverting the allocation. >>> Maybe that's where I was confused, and that refcount thing was >>> suggested as a WARN_ONCE? >> >> Yup, the refcount (now: check of the page pool list) was meant >> as a WARN_ONCE() to catch bad drivers. >> >>> FWIW, I think it warrants two flags. The first saying that the >>> driver supports providers at all: >>> >>> page_pool_init() { >>> if (rxq->mp_params) >>> if (!(flags & PP_PROVIDERS_SUPPORTED)) >>> goto fail; >>> } >>> >>> And the second telling whether the driver wants to install >>> providers for this particular page pool, so if there is a >>> separate pool for headers we can set it with plain old kernel >>> pages. >> >> The implementation of the queue API should be resilient against >> failures in alloc, and not being MP capable is just a form of >> alloc failure. I don't see the upside of double-flag. >> >>> payload_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED); >>> header_pool = page_pool_create(rqx, PP_PROVIDERS_SUPPORTED | >>> PP_IGNORE_PROVIDERS); >> >> Also don't see the upside of the explicit "non-capable" flag, >> but I haven't thought of that. Is there any use? >> > > There are 2 things we're trying to accomplish: > > 1. Drivers need to be able to say "I support unreadable netmem". > Failure to report unreadable netmem support should cause the netlink > API to fail when the user tries to bind dmabuf/io uring memory. > > 2. Drivers need to be able to say "I want a header pool (with readable > netmem)" or "I want a data pool (potentially with unreadable netmem)". > > Pavel is suggesting implementing both of these in 2 different flags. > > Jakub is suggesting implementing both with 1 flag which says "I can > support unreadable netmem for this pool" , and guarding against #1 > with a refcount check to detect if a dmabuf pool should have been > created but wasn't. That would be iffy IIUC, but I think Jakub just explicitly said that the refcount trick was just for debugging purposes and not for gauging errors like "providers are not supported by the driver". "Yup, the refcount (now: check of the page pool list) was meant as a WARN_ONCE() to catch bad drivers." > I prefer Jakub's suggestion, but beware that if we go with Jakub's > suggestion, we can't WARN_ON when the core-net check fails, because > it's not a kernel warning. All that's happening is that the user asked > for dmabuf binding but the driver didn't ask for it (because likely it > doesn't support it). That's not cause for a warning to be printed in > the kernel. The netlink API should just fail and return -EOPNOTSUPP or > something.
On Mon, 12 Aug 2024 20:10:39 +0100 Pavel Begunkov wrote: > > 1. Drivers need to be able to say "I support unreadable netmem". > > Failure to report unreadable netmem support should cause the netlink > > API to fail when the user tries to bind dmabuf/io uring memory. > > > > 2. Drivers need to be able to say "I want a header pool (with readable > > netmem)" or "I want a data pool (potentially with unreadable netmem)". > > > > Pavel is suggesting implementing both of these in 2 different flags. > > > > Jakub is suggesting implementing both with 1 flag which says "I can > > support unreadable netmem for this pool" , and guarding against #1 > > with a refcount check to detect if a dmabuf pool should have been > > created but wasn't. > > That would be iffy IIUC, but I think Jakub just explicitly said > that the refcount trick was just for debugging purposes and not > for gauging errors like "providers are not supported by the driver". > > "Yup, the refcount (now: check of the page pool list) was meant > as a WARN_ONCE() to catch bad drivers." Sorry, insufficient caffeine level in the morning. We can't WARN_ONCE(), indeed.
On Mon, 12 Aug 2024 20:04:41 +0100 Pavel Begunkov wrote: > >> Also don't see the upside of the explicit "non-capable" flag, > >> but I haven't thought of that. Is there any use? > > Or maybe I don't get what you're asking, I explained > why to have that "PP_IGNORE_PROVIDERS" on top of the flag > saying that it's supported. > > Which "non-capable" flag you have in mind? A page pool create > flag or one facing upper layers like devmem tcp? Let me rephrase - what's the point of having both PP_PROVIDERS_SUPPORTED and PP_IGNORE_PROVIDERS at the page pool level? PP_CAP_NET(MEM|IOV), and it's either there or it's not. If you're thinking about advertising the support all the way to the user, I'm not sure if page pool is the right place to do so. It's more of a queue property. BTW, Mina, the core should probably also check that XDP isn't installed before / while the netmem is bound to a queue.
On 8/13/24 01:15, Jakub Kicinski wrote: > On Mon, 12 Aug 2024 20:04:41 +0100 Pavel Begunkov wrote: >>>> Also don't see the upside of the explicit "non-capable" flag, >>>> but I haven't thought of that. Is there any use? >> >> Or maybe I don't get what you're asking, I explained >> why to have that "PP_IGNORE_PROVIDERS" on top of the flag >> saying that it's supported. >> >> Which "non-capable" flag you have in mind? A page pool create >> flag or one facing upper layers like devmem tcp? > > Let me rephrase - what's the point of having both PP_PROVIDERS_SUPPORTED > and PP_IGNORE_PROVIDERS at the page pool level? PP_CAP_NET(MEM|IOV), > and it's either there or it's not. The second flag solves a problem with initializing page pools with headers, but let's forget about it for now, it's rather a small nuance, would probably reappear when someone would try to use pp_params->queue for purposes different from memory providers. > If you're thinking about advertising the support all the way to the > user, I'm not sure if page pool is the right place to do so. It's more > of a queue property. Nope. Only the first "SUPPORTED" flag serves that purpose in a way by failing setup like netlink devmem dmabuf binding and returning the error back to user. > BTW, Mina, the core should probably also check that XDP isn't installed > before / while the netmem is bound to a queue.
On 8/13/24 00:57, Jakub Kicinski wrote: > On Mon, 12 Aug 2024 20:10:39 +0100 Pavel Begunkov wrote: >>> 1. Drivers need to be able to say "I support unreadable netmem". >>> Failure to report unreadable netmem support should cause the netlink >>> API to fail when the user tries to bind dmabuf/io uring memory. >>> >>> 2. Drivers need to be able to say "I want a header pool (with readable >>> netmem)" or "I want a data pool (potentially with unreadable netmem)". >>> >>> Pavel is suggesting implementing both of these in 2 different flags. >>> >>> Jakub is suggesting implementing both with 1 flag which says "I can >>> support unreadable netmem for this pool" , and guarding against #1 >>> with a refcount check to detect if a dmabuf pool should have been >>> created but wasn't. >> >> That would be iffy IIUC, but I think Jakub just explicitly said >> that the refcount trick was just for debugging purposes and not >> for gauging errors like "providers are not supported by the driver". >> >> "Yup, the refcount (now: check of the page pool list) was meant >> as a WARN_ONCE() to catch bad drivers." > > Sorry, insufficient caffeine level in the morning. > We can't WARN_ONCE(), indeed. I'm getting lost, so repeating myself a bit. What I think would be a good approach is if we get an error back from the driver if it doesn't support netiov / providers. netdev_rx_queue_restart() { ... err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(); if (err == -EOPNOTSUPP) // the driver doesn't support netiov return -EOPNOTSUPP; ... } That can be done if drivers opt in to support providers, e.g. via a page pool flag. What I think wouldn't be a great option is getting back a "success" from the driver even though it ignored netdev_rx_queue_restart() { ... err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(); if (err) return err; // we get err==0 even if the driver doesn't support // providers, verify it is _actually_ installed if (rxq->mp_params) { // or walking pp list, same thing if (rxq->mp_params->refcount == 0) goto fail; } } And if we go with the first version, the refcount check can also be added but as a warning. Maybe it's easier to put it into code and discuss then.
On Mon, Aug 12, 2024 at 8:15 PM Jakub Kicinski <kuba@kernel.org> wrote: > BTW, Mina, the core should probably also check that XDP isn't installed > before / while the netmem is bound to a queue. Sorry if noob question, but what is the proper check for this? I tried adding this to net_devmem_bind_dmabuf_to_queue(): if (xdp_rxq_info_is_reg(&rxq->xdp_rxq)) return -EEXIST; But quickly found out that in netif_alloc_rx_queues() we initialize all the rxq->xdp_rxq to state REGISTERED regardless whether xdp is installed or not, so this check actually fails. Worthy of note is that GVE holds an instance of xdp_rxq_info in gve_rx_ring, and seems to use that for its xdp information, not the one that hangs off of netdev_rx_queue in core. Additionally, my understanding of XDP is limited, but why do we want to disable it? My understanding is that XDP is a kernel bypass that hands the data directly to userspace. In theory at least there should be no issue binding dmabuf to a queue, then getting the data in the queue via an XDP program instead of via TCP sockets or io uring. Is there some fundamental reason why dmabuf and XDP are incompatible?
On Tue, Aug 13, 2024 at 4:39 AM Mina Almasry <almasrymina@google.com> wrote: > > On Mon, Aug 12, 2024 at 8:15 PM Jakub Kicinski <kuba@kernel.org> wrote: > > BTW, Mina, the core should probably also check that XDP isn't installed > > before / while the netmem is bound to a queue. > > Sorry if noob question, but what is the proper check for this? I tried > adding this to net_devmem_bind_dmabuf_to_queue(): > > if (xdp_rxq_info_is_reg(&rxq->xdp_rxq)) > return -EEXIST; > > But quickly found out that in netif_alloc_rx_queues() we initialize > all the rxq->xdp_rxq to state REGISTERED regardless whether xdp is > installed or not, so this check actually fails. > > Worthy of note is that GVE holds an instance of xdp_rxq_info in > gve_rx_ring, and seems to use that for its xdp information, not the > one that hangs off of netdev_rx_queue in core. > To elaborate further, in order to disable binding dmabuf and XDP on the same rx queue for GVE, AFAIT the check would need to be inside of GVE. Inside of GVE I'd check if gve_priv->xdp_prog is installed, and check if the gve_rx_ring->xdp_info is registered. If so, then the rx queue is XDP enabled, and should not be bound to dmabuf. I think that would work. At the moment I can't think of a check inside of core that would be compatible with GVE, but above you clearly are specifically asking for a check in core. Any pointers to what you have in mind would be appreciated here, but I'll try to take a deeper look. > Additionally, my understanding of XDP is limited, but why do we want > to disable it? My understanding is that XDP is a kernel bypass that > hands the data directly to userspace. In theory at least there should > be no issue binding dmabuf to a queue, then getting the data in the > queue via an XDP program instead of via TCP sockets or io uring. Is > there some fundamental reason why dmabuf and XDP are incompatible? > > -- > Thanks, > Mina
On Tue, 13 Aug 2024 04:39:47 -0400 Mina Almasry wrote: > On Mon, Aug 12, 2024 at 8:15 PM Jakub Kicinski <kuba@kernel.org> wrote: > > BTW, Mina, the core should probably also check that XDP isn't installed > > before / while the netmem is bound to a queue. > > Sorry if noob question, but what is the proper check for this? if (dev_xdp_prog_count(netdev)) > I tried adding this to net_devmem_bind_dmabuf_to_queue(): > > if (xdp_rxq_info_is_reg(&rxq->xdp_rxq)) > return -EEXIST; > > But quickly found out that in netif_alloc_rx_queues() we initialize > all the rxq->xdp_rxq to state REGISTERED regardless whether xdp is > installed or not, so this check actually fails. > > Worthy of note is that GVE holds an instance of xdp_rxq_info in > gve_rx_ring, and seems to use that for its xdp information, not the > one that hangs off of netdev_rx_queue in core. > > Additionally, my understanding of XDP is limited, but why do we want > to disable it? My understanding is that XDP is a kernel bypass that > hands the data directly to userspace. In theory at least there should > be no issue binding dmabuf to a queue, then getting the data in the > queue via an XDP program instead of via TCP sockets or io uring. Is > there some fundamental reason why dmabuf and XDP are incompatible? You're conflating XDP with AF_XDP, two different things, slightly related but different.
On Tue, 13 Aug 2024 03:31:13 +0100 Pavel Begunkov wrote: > I'm getting lost, so repeating myself a bit. What I think > would be a good approach is if we get an error back from > the driver if it doesn't support netiov / providers. > > netdev_rx_queue_restart() { > ... > err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(); > if (err == -EOPNOTSUPP) // the driver doesn't support netiov > return -EOPNOTSUPP; > ... > } > > That can be done if drivers opt in to support providers, > e.g. via a page pool flag. > > What I think wouldn't be a great option is getting back a > "success" from the driver even though it ignored page pool params are not the right place for a supported flag. Sooner or later we'll want to expose this flag to user space.
On 8/13/24 15:39, Jakub Kicinski wrote: > On Tue, 13 Aug 2024 03:31:13 +0100 Pavel Begunkov wrote: >> I'm getting lost, so repeating myself a bit. What I think >> would be a good approach is if we get an error back from >> the driver if it doesn't support netiov / providers. >> >> netdev_rx_queue_restart() { >> ... >> err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(); >> if (err == -EOPNOTSUPP) // the driver doesn't support netiov >> return -EOPNOTSUPP; >> ... >> } >> >> That can be done if drivers opt in to support providers, >> e.g. via a page pool flag. >> >> What I think wouldn't be a great option is getting back a >> "success" from the driver even though it ignored > > page pool params are not the right place for a supported flag. > Sooner or later we'll want to expose this flag to user space. Fair enough, it appeared to me that's what you was suggesting "What comes to mind is adding an "I can gobble up net_iovs from this pool" flag in page pool params (the struct that comes from the driver), and then on the installation path we can check ..." We can also move it from pp flags to queue API callbacks, however if we want to expose it to the userspace, I'd imagine we need a queue flag set by the driver, which then can be queried by netlink or whichever interface is appropriate. And it can be used can be used to fail netdev_rx_queue_restart() for queues/drivers that don't support mp. netdev_rx_queue_restart() { if (rxq->mp_params && !rxq->netiov_supported) fail; } Perhaps, I don't understand what approach you contemplate, but maybe Mina has cracked it.
On Tue, 13 Aug 2024 16:11:15 +0100 Pavel Begunkov wrote: > On 8/13/24 15:39, Jakub Kicinski wrote: > > On Tue, 13 Aug 2024 03:31:13 +0100 Pavel Begunkov wrote: > >> I'm getting lost, so repeating myself a bit. What I think > >> would be a good approach is if we get an error back from > >> the driver if it doesn't support netiov / providers. > >> > >> netdev_rx_queue_restart() { > >> ... > >> err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(); > >> if (err == -EOPNOTSUPP) // the driver doesn't support netiov > >> return -EOPNOTSUPP; > >> ... > >> } > >> > >> That can be done if drivers opt in to support providers, > >> e.g. via a page pool flag. > >> > >> What I think wouldn't be a great option is getting back a > >> "success" from the driver even though it ignored > > > > page pool params are not the right place for a supported flag. > > Sooner or later we'll want to expose this flag to user space. > > Fair enough, it appeared to me that's what you was suggesting > > "What comes to mind is adding an "I can gobble up net_iovs from this > pool" flag in page pool params (the struct that comes from the driver), > and then on the installation path we can check ..." Yes, we still need one flag in page pool params -- functioning like the inverse of PP_IGNORE_PROVIDERS, I'd call it something like PP_CAP_NET(MEM|IOV). To distinguish the header and data pools. > We can also move it from pp flags to queue API callbacks, however if we > want to expose it to the userspace, I'd imagine we need a queue flag set > by the driver, which then can be queried by netlink or whichever > interface is appropriate. And it can be used can be used to fail > netdev_rx_queue_restart() for queues/drivers that don't support mp. > > netdev_rx_queue_restart() { > if (rxq->mp_params && !rxq->netiov_supported) > fail; > } Yup!
diff --git a/include/net/mp_dmabuf_devmem.h b/include/net/mp_dmabuf_devmem.h new file mode 100644 index 0000000000000..300a2356eed00 --- /dev/null +++ b/include/net/mp_dmabuf_devmem.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Dmabuf device memory provider. + * + * Authors: Mina Almasry <almasrymina@google.com> + * + */ +#ifndef _NET_MP_DMABUF_DEVMEM_H +#define _NET_MP_DMABUF_DEVMEM_H + +#include <net/netmem.h> + +#if defined(CONFIG_DMA_SHARED_BUFFER) && defined(CONFIG_GENERIC_ALLOCATOR) +int mp_dmabuf_devmem_init(struct page_pool *pool); + +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp); + +void mp_dmabuf_devmem_destroy(struct page_pool *pool); + +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem); +#else +static inline int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + return -EOPNOTSUPP; +} + +static inline netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, + gfp_t gfp) +{ + return 0; +} + +static inline void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ +} + +static inline bool mp_dmabuf_devmem_release_page(struct page_pool *pool, + netmem_ref netmem) +{ + return false; +} +#endif + +#endif /* _NET_MP_DMABUF_DEVMEM_H */ diff --git a/include/net/netmem.h b/include/net/netmem.h index 99531780e53af..34dd146f9e940 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -100,6 +100,21 @@ static inline struct page *netmem_to_page(netmem_ref netmem) return (__force struct page *)netmem; } +static inline struct net_iov *netmem_to_net_iov(netmem_ref netmem) +{ + if (netmem_is_net_iov(netmem)) + return (struct net_iov *)((__force unsigned long)netmem & + ~NET_IOV); + + DEBUG_NET_WARN_ON_ONCE(true); + return NULL; +} + +static inline netmem_ref net_iov_to_netmem(struct net_iov *niov) +{ + return (__force netmem_ref)((unsigned long)niov | NET_IOV); +} + static inline netmem_ref page_to_netmem(struct page *page) { return (__force netmem_ref)page; diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h index 4afd6dd56351a..e01ccb42e582f 100644 --- a/include/net/page_pool/types.h +++ b/include/net/page_pool/types.h @@ -52,6 +52,7 @@ struct pp_alloc_cache { * @nid: NUMA node id to allocate from pages from * @dev: device, for DMA pre-mapping purposes * @napi: NAPI which is the sole consumer of pages, otherwise NULL + * @queue: struct netdev_rx_queue this page_pool is being created for. * @dma_dir: DMA mapping direction * @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV * @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV @@ -66,6 +67,7 @@ struct page_pool_params { int nid; struct device *dev; struct napi_struct *napi; + struct netdev_rx_queue *queue; enum dma_data_direction dma_dir; unsigned int max_len; unsigned int offset; diff --git a/net/core/devmem.c b/net/core/devmem.c index befff59a2ee64..f8e78907c6c68 100644 --- a/net/core/devmem.c +++ b/net/core/devmem.c @@ -17,8 +17,11 @@ #include <linux/genalloc.h> #include <linux/dma-buf.h> #include <net/devmem.h> +#include <net/mp_dmabuf_devmem.h> #include <net/netdev_queues.h> +#include "page_pool_priv.h" + /* Device memory support */ #if defined(CONFIG_DMA_SHARED_BUFFER) && defined(CONFIG_GENERIC_ALLOCATOR) @@ -297,4 +300,79 @@ void dev_dmabuf_uninstall(struct net_device *dev) xa_erase(&binding->bound_rxqs, xa_idx); } } + +/*** "Dmabuf devmem memory provider" ***/ + +int mp_dmabuf_devmem_init(struct page_pool *pool) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + + if (!binding) + return -EINVAL; + + if (!pool->dma_map) + return -EOPNOTSUPP; + + if (pool->dma_sync) + return -EOPNOTSUPP; + + if (pool->p.order != 0) + return -E2BIG; + + net_devmem_dmabuf_binding_get(binding); + return 0; +} + +netmem_ref mp_dmabuf_devmem_alloc_netmems(struct page_pool *pool, gfp_t gfp) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + netmem_ref netmem; + struct net_iov *niov; + dma_addr_t dma_addr; + + niov = net_devmem_alloc_dmabuf(binding); + if (!niov) + return 0; + + dma_addr = net_devmem_get_dma_addr(niov); + + netmem = net_iov_to_netmem(niov); + + page_pool_set_pp_info(pool, netmem); + + if (page_pool_set_dma_addr_netmem(netmem, dma_addr)) + goto err_free; + + pool->pages_state_hold_cnt++; + trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt); + return netmem; + +err_free: + net_devmem_free_dmabuf(niov); + return 0; +} + +void mp_dmabuf_devmem_destroy(struct page_pool *pool) +{ + struct net_devmem_dmabuf_binding *binding = pool->mp_priv; + + net_devmem_dmabuf_binding_put(binding); +} + +bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem) +{ + if (WARN_ON_ONCE(!netmem_is_net_iov(netmem))) + return false; + + if (WARN_ON_ONCE(atomic_long_read(netmem_get_pp_ref_count_ref(netmem)) != + 1)) + return false; + + page_pool_clear_pp_info(netmem); + + net_devmem_free_dmabuf(netmem_to_net_iov(netmem)); + + /* We don't want the page pool put_page()ing our net_iovs. */ + return false; +} #endif diff --git a/net/core/page_pool.c b/net/core/page_pool.c index c5c303746d494..b70f779cb7856 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -13,6 +13,7 @@ #include <net/page_pool/helpers.h> #include <net/xdp.h> +#include <net/netdev_rx_queue.h> #include <linux/dma-direction.h> #include <linux/dma-mapping.h> @@ -21,6 +22,9 @@ #include <linux/poison.h> #include <linux/ethtool.h> #include <linux/netdevice.h> +#include <linux/genalloc.h> +#include <net/devmem.h> +#include <net/mp_dmabuf_devmem.h> #include <trace/events/page_pool.h> @@ -28,6 +32,7 @@ #include "netmem_priv.h" DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers); +EXPORT_SYMBOL(page_pool_mem_providers); #define DEFER_TIME (msecs_to_jiffies(1000)) #define DEFER_WARN_INTERVAL (60 * HZ) @@ -190,6 +195,7 @@ static int page_pool_init(struct page_pool *pool, int cpuid) { unsigned int ring_qsize = 1024; /* Default */ + int err; page_pool_struct_check(); @@ -271,7 +277,35 @@ static int page_pool_init(struct page_pool *pool, if (pool->dma_map) get_device(pool->p.dev); + if (pool->p.queue) { + /* We rely on rtnl_lock()ing to make sure netdev_rx_queue + * configuration doesn't change while we're initializing the + * page_pool. + */ + ASSERT_RTNL(); + pool->mp_priv = pool->p.queue->mp_params.mp_priv; + } + + if (pool->mp_priv) { + err = mp_dmabuf_devmem_init(pool); + if (err) { + pr_warn("%s() mem-provider init failed %d\n", __func__, + err); + goto free_ptr_ring; + } + + static_branch_inc(&page_pool_mem_providers); + } + return 0; + +free_ptr_ring: + ptr_ring_cleanup(&pool->ring, NULL); +#ifdef CONFIG_PAGE_POOL_STATS + if (!pool->system) + free_percpu(pool->recycle_stats); +#endif + return err; } static void page_pool_uninit(struct page_pool *pool) @@ -455,28 +489,6 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem) return false; } -static void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) -{ - netmem_set_pp(netmem, pool); - netmem_or_pp_magic(netmem, PP_SIGNATURE); - - /* Ensuring all pages have been split into one fragment initially: - * page_pool_set_pp_info() is only called once for every page when it - * is allocated from the page allocator and page_pool_fragment_page() - * is dirtying the same cache line as the page->pp_magic above, so - * the overhead is negligible. - */ - page_pool_fragment_netmem(netmem, 1); - if (pool->has_init_callback) - pool->slow.init_callback(netmem, pool->slow.init_arg); -} - -static void page_pool_clear_pp_info(netmem_ref netmem) -{ - netmem_clear_pp_magic(netmem); - netmem_set_pp(netmem, NULL); -} - static struct page *__page_pool_alloc_page_order(struct page_pool *pool, gfp_t gfp) { @@ -572,7 +584,10 @@ netmem_ref page_pool_alloc_netmem(struct page_pool *pool, gfp_t gfp) return netmem; /* Slow-path: cache empty, do real allocation */ - netmem = __page_pool_alloc_pages_slow(pool, gfp); + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) + netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp); + else + netmem = __page_pool_alloc_pages_slow(pool, gfp); return netmem; } EXPORT_SYMBOL(page_pool_alloc_netmem); @@ -608,6 +623,28 @@ s32 page_pool_inflight(const struct page_pool *pool, bool strict) return inflight; } +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem) +{ + netmem_set_pp(netmem, pool); + netmem_or_pp_magic(netmem, PP_SIGNATURE); + + /* Ensuring all pages have been split into one fragment initially: + * page_pool_set_pp_info() is only called once for every page when it + * is allocated from the page allocator and page_pool_fragment_page() + * is dirtying the same cache line as the page->pp_magic above, so + * the overhead is negligible. + */ + page_pool_fragment_netmem(netmem, 1); + if (pool->has_init_callback) + pool->slow.init_callback(netmem, pool->slow.init_arg); +} + +void page_pool_clear_pp_info(netmem_ref netmem) +{ + netmem_clear_pp_magic(netmem); + netmem_set_pp(netmem, NULL); +} + static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, netmem_ref netmem) { @@ -636,8 +673,13 @@ static __always_inline void __page_pool_release_page_dma(struct page_pool *pool, void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) { int count; + bool put; - __page_pool_release_page_dma(pool, netmem); + put = true; + if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv) + put = mp_dmabuf_devmem_release_page(pool, netmem); + else + __page_pool_release_page_dma(pool, netmem); /* This may be the last page returned, releasing the pool, so * it is not safe to reference pool afterwards. @@ -645,8 +687,10 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem) count = atomic_inc_return_relaxed(&pool->pages_state_release_cnt); trace_page_pool_state_release(pool, netmem, count); - page_pool_clear_pp_info(netmem); - put_page(netmem_to_page(netmem)); + if (put) { + page_pool_clear_pp_info(netmem); + put_page(netmem_to_page(netmem)); + } /* An optimization would be to call __free_pages(page, pool->p.order) * knowing page is not part of page-cache (thus avoiding a * __page_cache_release() call). @@ -965,6 +1009,12 @@ static void __page_pool_destroy(struct page_pool *pool) page_pool_unlist(pool); page_pool_uninit(pool); + + if (pool->mp_priv) { + mp_dmabuf_devmem_destroy(pool); + static_branch_dec(&page_pool_mem_providers); + } + kfree(pool); } diff --git a/net/core/page_pool_priv.h b/net/core/page_pool_priv.h index 90665d40f1eb7..e5b019ab0b735 100644 --- a/net/core/page_pool_priv.h +++ b/net/core/page_pool_priv.h @@ -9,4 +9,7 @@ int page_pool_list(struct page_pool *pool); void page_pool_detached(struct page_pool *pool); void page_pool_unlist(struct page_pool *pool); +void page_pool_set_pp_info(struct page_pool *pool, netmem_ref netmem); +void page_pool_clear_pp_info(netmem_ref netmem); + #endif