Message ID | 168564134580.7284.16867711571036004706.stgit@anambiarhost.jf.intel.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Introduce napi queues support | expand |
On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote: > Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list' > for rx and tx queue set associated with the napi and > initialize them. Handle their removal as well. > > This enables a mapping of each napi instance with the > queue/queue-set on the corresponding irq line. Wouldn't it be easier to store the NAPI instance pointer in the queue? That way we don't have to allocate memory.
On 6/2/2023 11:06 PM, Jakub Kicinski wrote: > On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote: >> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list' >> for rx and tx queue set associated with the napi and >> initialize them. Handle their removal as well. >> >> This enables a mapping of each napi instance with the >> queue/queue-set on the corresponding irq line. > > Wouldn't it be easier to store the NAPI instance pointer in the queue? > That way we don't have to allocate memory. > Could you please elaborate on this so I have more clarity ? Are you suggesting that there's a way to avoid maintaining the list of queues in the napi struct? The idea was for netdev-genl to extract information out of netdev->napi_list->napi. For tracking queues, we build a linked list of queues for the napi and store it in the napi_struct. This would also enable updating the napi<->queue[s] association (later with the 'set' command), i.e. remove the queue[s] from the existing napi instance that it is currently associated with and map with the new napi instance, by simply deleting from one list and adding to the new list.
On Wed, 12 Jul 2023 13:09:35 -0700 Nambiar, Amritha wrote: > On 6/2/2023 11:06 PM, Jakub Kicinski wrote: > > On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote: > >> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list' > >> for rx and tx queue set associated with the napi and > >> initialize them. Handle their removal as well. > >> > >> This enables a mapping of each napi instance with the > >> queue/queue-set on the corresponding irq line. > > > > Wouldn't it be easier to store the NAPI instance pointer in the queue? > > That way we don't have to allocate memory. > > > > Could you please elaborate on this so I have more clarity ? First off, let's acknowledge the fact you're asking me for clarifications ~40 days after I sent the feedback. Pause for self-reflection. Okay. > Are you suggesting that there's a way to avoid maintaining the list > of queues in the napi struct? Yes, why not add the napi pointer to struct netdev_queue and netdev_rx_queue, specifically? > The idea was for netdev-genl to extract information out of > netdev->napi_list->napi. For tracking queues, we build a linked list > of queues for the napi and store it in the napi_struct. This would > also enable updating the napi<->queue[s] association (later with the > 'set' command), i.e. remove the queue[s] from the existing napi > instance that it is currently associated with and map with the new > napi instance, by simply deleting from one list and adding to the new > list. Right, my point is that each queue can only be serviced by a single NAPI at a time, so we have a 1:N relationship. It's easier to store the state on the side that's the N, rather than 1. You can add list_head to the queue structs, if you prefer to be able to walk queues of a NAPI more efficiently (that said the head for the list is in "control path only section" of napi_struct so... I think you don't?)
On 7/12/2023 2:14 PM, Jakub Kicinski wrote: > On Wed, 12 Jul 2023 13:09:35 -0700 Nambiar, Amritha wrote: >> On 6/2/2023 11:06 PM, Jakub Kicinski wrote: >>> On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote: >>>> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list' >>>> for rx and tx queue set associated with the napi and >>>> initialize them. Handle their removal as well. >>>> >>>> This enables a mapping of each napi instance with the >>>> queue/queue-set on the corresponding irq line. >>> >>> Wouldn't it be easier to store the NAPI instance pointer in the queue? >>> That way we don't have to allocate memory. >>> >> >> Could you please elaborate on this so I have more clarity ? > > First off, let's acknowledge the fact you're asking me for > clarifications ~40 days after I sent the feedback. > Sorry about that, my vacation to be blamed. > Pause for self-reflection. > > Okay. > >> Are you suggesting that there's a way to avoid maintaining the list >> of queues in the napi struct? > > Yes, why not add the napi pointer to struct netdev_queue and > netdev_rx_queue, specifically? > Yes, this would achieve the queue<->napi binding for each queue. But when there are multiple queues for a NAPI, I would need to walk a list of queues for the NAPI. >> The idea was for netdev-genl to extract information out of >> netdev->napi_list->napi. For tracking queues, we build a linked list >> of queues for the napi and store it in the napi_struct. This would >> also enable updating the napi<->queue[s] association (later with the >> 'set' command), i.e. remove the queue[s] from the existing napi >> instance that it is currently associated with and map with the new >> napi instance, by simply deleting from one list and adding to the new >> list. > > Right, my point is that each queue can only be serviced by a single > NAPI at a time, so we have a 1:N relationship. It's easier to store > the state on the side that's the N, rather than 1. > > You can add list_head to the queue structs, if you prefer to be able > to walk queues of a NAPI more efficiently (that said the head for > the list is in "control path only section" of napi_struct so... > I think you don't?) The napi pointer in the queue structs would give the napi<->queue mapping, I still need to walk the queues of a NAPI (when there are multiple queues for the NAPI), example: 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5] in which case I would have a list of netdev queue structs within the napi_struct (instead of the list of queue indices that I currently have) to avoid memory allocation. Does this sound right?
On Wed, 12 Jul 2023 16:11:55 -0700 Nambiar, Amritha wrote: > >> The idea was for netdev-genl to extract information out of > >> netdev->napi_list->napi. For tracking queues, we build a linked list > >> of queues for the napi and store it in the napi_struct. This would > >> also enable updating the napi<->queue[s] association (later with the > >> 'set' command), i.e. remove the queue[s] from the existing napi > >> instance that it is currently associated with and map with the new > >> napi instance, by simply deleting from one list and adding to the new > >> list. > > > > Right, my point is that each queue can only be serviced by a single > > NAPI at a time, so we have a 1:N relationship. It's easier to store > > the state on the side that's the N, rather than 1. > > > > You can add list_head to the queue structs, if you prefer to be able > > to walk queues of a NAPI more efficiently (that said the head for > > the list is in "control path only section" of napi_struct so... > > I think you don't?) > > The napi pointer in the queue structs would give the napi<->queue > mapping, I still need to walk the queues of a NAPI (when there are > multiple queues for the NAPI), example: > 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5] > > in which case I would have a list of netdev queue structs within the > napi_struct (instead of the list of queue indices that I currently have) > to avoid memory allocation. > > Does this sound right? yes, I think that's fine. If we store the NAPI pointer in the queue struct, we can still generate the same dump with the time complexity of #napis * (#max_rx + #max_tx). Which I don't think is too bad. Up to you.
On Wed, 12 Jul 2023 16:53:26 -0700 Jakub Kicinski wrote: > > The napi pointer in the queue structs would give the napi<->queue > > mapping, I still need to walk the queues of a NAPI (when there are > > multiple queues for the NAPI), example: > > 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5] > > > > in which case I would have a list of netdev queue structs within the > > napi_struct (instead of the list of queue indices that I currently have) > > to avoid memory allocation. > > > > Does this sound right? > > yes, I think that's fine. > > If we store the NAPI pointer in the queue struct, we can still generate > the same dump with the time complexity of #napis * (#max_rx + #max_tx). > Which I don't think is too bad. Up to you. The more I think about it the more I feel like we should dump queues and NAPIs separately. And the queue can list the NAPI id of the NAPI instance which services it. Are you actively working on this or should I take a stab?
On 7/28/2023 2:59 PM, Jakub Kicinski wrote: > On Wed, 12 Jul 2023 16:53:26 -0700 Jakub Kicinski wrote: >>> The napi pointer in the queue structs would give the napi<->queue >>> mapping, I still need to walk the queues of a NAPI (when there are >>> multiple queues for the NAPI), example: >>> 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5] >>> >>> in which case I would have a list of netdev queue structs within the >>> napi_struct (instead of the list of queue indices that I currently have) >>> to avoid memory allocation. >>> >>> Does this sound right? >> >> yes, I think that's fine. >> >> If we store the NAPI pointer in the queue struct, we can still generate >> the same dump with the time complexity of #napis * (#max_rx + #max_tx). >> Which I don't think is too bad. Up to you. > > The more I think about it the more I feel like we should dump queues > and NAPIs separately. And the queue can list the NAPI id of the NAPI > instance which services it. > > Are you actively working on this or should I take a stab? Hi Jakub, I have the next version of patches ready (I'll send that in a bit). I suggest if you could take a look at it and let me know your thoughts and then we can proceed from there. About dumping queues and NAPIs separately, are you thinking about having both per-NAPI and per-queue instances, or do you think only one will suffice. The plan was to follow this work with a 'set-napi' series, something like, set-napi <napi_id> queues <q_id1, q_id2, ...> to configure the queue[s] that are to be serviced by the napi instance. In this case, dumping the NAPIs would be beneficial especially when there are multiple queues on the NAPI. WRT per-queue, are there a set of parameters that needs to exposed besides what's already handled by ethtool... Also, to configure a queue on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to be looked up from the queue parameters dumped. -Amritha
On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote: > Hi Jakub, I have the next version of patches ready (I'll send that in a > bit). I suggest if you could take a look at it and let me know your > thoughts and then we can proceed from there. Great, looking forward. > About dumping queues and NAPIs separately, are you thinking about having > both per-NAPI and per-queue instances, or do you think only one will > suffice. The plan was to follow this work with a 'set-napi' series, > something like, > set-napi <napi_id> queues <q_id1, q_id2, ...> > to configure the queue[s] that are to be serviced by the napi instance. > > In this case, dumping the NAPIs would be beneficial especially when > there are multiple queues on the NAPI. > > WRT per-queue, are there a set of parameters that needs to exposed > besides what's already handled by ethtool... Not much at this point, maybe memory model. Maybe stats if we want to put stats in the same command. But the fact that sysfs has a bunch of per queue attributes makes me think that sooner or later we'll want queue as a full object in netlink. And starting out that way makes the whole API cleaner, at least in my opinion. If we have another object which wants to refer to queues (e.g. page pool) it's easier to express the topology when it's clear what is an object and what's just an attribute. > Also, to configure a queue > on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to > be looked up from the queue parameters dumped. The look up should not be much of a problem. And don't you think that: set-queue queue 1 napi-id 101 set-queue queue 2 napi-id 101 is more natural than: set-napi napi-id 101 queues [1, 2] Especially in presence of conflicts. If user tries: set-napi napi-id 101 queues [1, 2] set-napi napi-id 102 queues [1, 2] Do both napis now serve those queues? May seem obvious to us, but "philosophically" why does setting an attribute of object 102 change attributes of object 101? If we ever gain the ability to create queues it will be: create-queue napi-id xyz which also matches set-queue more nicely than napi base API.
On 7/28/2023 4:09 PM, Jakub Kicinski wrote: > On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote: >> Hi Jakub, I have the next version of patches ready (I'll send that in a >> bit). I suggest if you could take a look at it and let me know your >> thoughts and then we can proceed from there. > > Great, looking forward. > >> About dumping queues and NAPIs separately, are you thinking about having >> both per-NAPI and per-queue instances, or do you think only one will >> suffice. The plan was to follow this work with a 'set-napi' series, >> something like, >> set-napi <napi_id> queues <q_id1, q_id2, ...> >> to configure the queue[s] that are to be serviced by the napi instance. >> >> In this case, dumping the NAPIs would be beneficial especially when >> there are multiple queues on the NAPI. >> >> WRT per-queue, are there a set of parameters that needs to exposed >> besides what's already handled by ethtool... > > Not much at this point, maybe memory model. Maybe stats if we want to > put stats in the same command. But the fact that sysfs has a bunch of > per queue attributes makes me think that sooner or later we'll want > queue as a full object in netlink. And starting out that way makes > the whole API cleaner, at least in my opinion. > > If we have another object which wants to refer to queues (e.g. page > pool) it's easier to express the topology when it's clear what is an > object and what's just an attribute. > >> Also, to configure a queue >> on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to >> be looked up from the queue parameters dumped. > > The look up should not be much of a problem. > > And don't you think that: > > set-queue queue 1 napi-id 101 > set-queue queue 2 napi-id 101 > > is more natural than: > > set-napi napi-id 101 queues [1, 2] > > Especially in presence of conflicts. If user tries: > > set-napi napi-id 101 queues [1, 2] > set-napi napi-id 102 queues [1, 2] > > Do both napis now serve those queues? May seem obvious to us, but > "philosophically" why does setting an attribute of object 102 change > attributes of object 101? > Right, I see the point. In presence of conflicts when the napi<->queue[s] mappings are updated, set-napi will impact other NAPI-IDs, while set-queue would limit the change to just the queue that is requested. In both the cases, the underlying work remains the same: 1. Remove the queue from the existing napi instance it is associated with. 2. Driver updates queue[s]<->vector mapping and associates with new napi instance. 3. Report the impacted napi/queue back to the stack. The 'napi-get' command would list all the napis and the updated queue[s] list. Now, in usecases where a single poller is set to service multiple queues (say 8), with set-napi this can be done with a single command, while with set-queue this will result in 8 different requests to the driver. This is the trade-off I see if we go with set-queue. > If we ever gain the ability to create queues it will be: > > create-queue napi-id xyz > > which also matches set-queue more nicely than napi base API.
On 7/28/23 5:09 PM, Jakub Kicinski wrote: > On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote: >> Hi Jakub, I have the next version of patches ready (I'll send that in a >> bit). I suggest if you could take a look at it and let me know your >> thoughts and then we can proceed from there. > > Great, looking forward. > >> About dumping queues and NAPIs separately, are you thinking about having >> both per-NAPI and per-queue instances, or do you think only one will >> suffice. The plan was to follow this work with a 'set-napi' series, >> something like, >> set-napi <napi_id> queues <q_id1, q_id2, ...> >> to configure the queue[s] that are to be serviced by the napi instance. >> >> In this case, dumping the NAPIs would be beneficial especially when >> there are multiple queues on the NAPI. >> >> WRT per-queue, are there a set of parameters that needs to exposed >> besides what's already handled by ethtool... > > Not much at this point, maybe memory model. Maybe stats if we want to > put stats in the same command. But the fact that sysfs has a bunch of > per queue attributes makes me think that sooner or later we'll want > queue as a full object in netlink. And starting out that way makes > the whole API cleaner, at least in my opinion. > > If we have another object which wants to refer to queues (e.g. page > pool) it's easier to express the topology when it's clear what is an > object and what's just an attribute. > >> Also, to configure a queue >> on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to >> be looked up from the queue parameters dumped. > > The look up should not be much of a problem. > > And don't you think that: > > set-queue queue 1 napi-id 101 > set-queue queue 2 napi-id 101 > > is more natural than: > > set-napi napi-id 101 queues [1, 2] > > Especially in presence of conflicts. If user tries: > > set-napi napi-id 101 queues [1, 2] > set-napi napi-id 102 queues [1, 2] > > Do both napis now serve those queues? May seem obvious to us, but > "philosophically" why does setting an attribute of object 102 change > attributes of object 101? > > If we ever gain the ability to create queues it will be: > > create-queue napi-id xyz > > which also matches set-queue more nicely than napi base API. > I take it you have this path in mind as a means of creating "specialized" queues (e.g., io_uring and Rx ZC). Any slides or notes on the bigger picture?
On Tue, 1 Aug 2023 18:26:26 -0600 David Ahern wrote: > I take it you have this path in mind as a means of creating > "specialized" queues (e.g., io_uring and Rx ZC). TBH I've been thinking more about the huge page stuff and RSS context handling. Rx ZC should be a subset of what's needed for those cases. > Any slides or notes on the bigger picture? I don't have it well figured out, yet. The user space API is pretty easy, but shaping it in a way that makes driver's life manageable is more challenging.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 08fbd4622ccf..49f64401af7c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -342,6 +342,11 @@ struct gro_list { */ #define GRO_HASH_BUCKETS 8 +struct napi_queue { + struct list_head q_list; + u16 queue_index; +}; + /* * Structure for NAPI scheduling similar to tasklet but with weighting */ @@ -376,6 +381,8 @@ struct napi_struct { /* control-path-only fields follow */ struct list_head dev_list; struct hlist_node napi_hash_node; + struct list_head napi_rxq_list; + struct list_head napi_txq_list; }; enum { diff --git a/net/core/dev.c b/net/core/dev.c index 3393c2f3dbe8..9ee8eb3ef223 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6401,6 +6401,9 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, */ if (dev->threaded && napi_kthread_create(napi)) dev->threaded = 0; + + INIT_LIST_HEAD(&napi->napi_rxq_list); + INIT_LIST_HEAD(&napi->napi_txq_list); } EXPORT_SYMBOL(netif_napi_add_weight); @@ -6462,6 +6465,23 @@ static void flush_gro_hash(struct napi_struct *napi) } } +static void __napi_del_queue(struct napi_queue *napi_queue) +{ + list_del_rcu(&napi_queue->q_list); + kfree(napi_queue); +} + +static void napi_del_queues(struct napi_struct *napi) +{ + struct napi_queue *napi_queue, *n; + + list_for_each_entry_safe(napi_queue, n, &napi->napi_rxq_list, q_list) + __napi_del_queue(napi_queue); + + list_for_each_entry_safe(napi_queue, n, &napi->napi_txq_list, q_list) + __napi_del_queue(napi_queue); +} + /* Must be called in process context */ void __netif_napi_del(struct napi_struct *napi) { @@ -6479,6 +6499,7 @@ void __netif_napi_del(struct napi_struct *napi) kthread_stop(napi->thread); napi->thread = NULL; } + napi_del_queues(napi); } EXPORT_SYMBOL(__netif_napi_del);
Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list' for rx and tx queue set associated with the napi and initialize them. Handle their removal as well. This enables a mapping of each napi instance with the queue/queue-set on the corresponding irq line. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> --- include/linux/netdevice.h | 7 +++++++ net/core/dev.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+)