diff mbox series

[net-next/RFC,v1,1/4] net: Introduce new napi fields for rx/tx queues

Message ID 168564134580.7284.16867711571036004706.stgit@anambiarhost.jf.intel.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series Introduce napi queues support | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 4172 this patch: 4172
netdev/cc_maintainers warning 2 maintainers not CCed: pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 921 this patch: 921
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4391 this patch: 4391
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 58 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Nambiar, Amritha June 1, 2023, 5:42 p.m. UTC
Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list'
for rx and tx queue set associated with the napi and
initialize them. Handle their removal as well.

This enables a mapping of each napi instance with the
queue/queue-set on the corresponding irq line.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/linux/netdevice.h |    7 +++++++
 net/core/dev.c            |   21 +++++++++++++++++++++
 2 files changed, 28 insertions(+)

Comments

Jakub Kicinski June 3, 2023, 6:06 a.m. UTC | #1
On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote:
> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list'
> for rx and tx queue set associated with the napi and
> initialize them. Handle their removal as well.
> 
> This enables a mapping of each napi instance with the
> queue/queue-set on the corresponding irq line.

Wouldn't it be easier to store the NAPI instance pointer in the queue?
That way we don't have to allocate memory.
Nambiar, Amritha July 12, 2023, 8:09 p.m. UTC | #2
On 6/2/2023 11:06 PM, Jakub Kicinski wrote:
> On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote:
>> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list'
>> for rx and tx queue set associated with the napi and
>> initialize them. Handle their removal as well.
>>
>> This enables a mapping of each napi instance with the
>> queue/queue-set on the corresponding irq line.
> 
> Wouldn't it be easier to store the NAPI instance pointer in the queue?
> That way we don't have to allocate memory.
> 

Could you please elaborate on this so I have more clarity ? Are you 
suggesting that there's a way to avoid maintaining the list of queues in 
the napi struct?

The idea was for netdev-genl to extract information out of 
netdev->napi_list->napi. For tracking queues, we build a linked list of 
queues for the napi and store it in the napi_struct. This would also 
enable updating the napi<->queue[s] association (later with the 'set' 
command), i.e. remove the queue[s] from the existing napi instance that 
it is currently associated with and map with the new napi instance, by 
simply deleting from one list and adding to the new list.
Jakub Kicinski July 12, 2023, 9:14 p.m. UTC | #3
On Wed, 12 Jul 2023 13:09:35 -0700 Nambiar, Amritha wrote:
> On 6/2/2023 11:06 PM, Jakub Kicinski wrote:
> > On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote:  
> >> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list'
> >> for rx and tx queue set associated with the napi and
> >> initialize them. Handle their removal as well.
> >>
> >> This enables a mapping of each napi instance with the
> >> queue/queue-set on the corresponding irq line.  
> > 
> > Wouldn't it be easier to store the NAPI instance pointer in the queue?
> > That way we don't have to allocate memory.
> >   
> 
> Could you please elaborate on this so I have more clarity ? 

First off, let's acknowledge the fact you're asking me for
clarifications ~40 days after I sent the feedback.

Pause for self-reflection.

Okay.

> Are you suggesting that there's a way to avoid maintaining the list
> of queues in the napi struct?

Yes, why not add the napi pointer to struct netdev_queue and
netdev_rx_queue, specifically?

> The idea was for netdev-genl to extract information out of 
> netdev->napi_list->napi. For tracking queues, we build a linked list
> of queues for the napi and store it in the napi_struct. This would
> also enable updating the napi<->queue[s] association (later with the
> 'set' command), i.e. remove the queue[s] from the existing napi
> instance that it is currently associated with and map with the new
> napi instance, by simply deleting from one list and adding to the new
> list.

Right, my point is that each queue can only be serviced by a single
NAPI at a time, so we have a 1:N relationship. It's easier to store
the state on the side that's the N, rather than 1.

You can add list_head to the queue structs, if you prefer to be able 
to walk queues of a NAPI more efficiently (that said the head for
the list is in "control path only section" of napi_struct so...
I think you don't?)
Nambiar, Amritha July 12, 2023, 11:11 p.m. UTC | #4
On 7/12/2023 2:14 PM, Jakub Kicinski wrote:
> On Wed, 12 Jul 2023 13:09:35 -0700 Nambiar, Amritha wrote:
>> On 6/2/2023 11:06 PM, Jakub Kicinski wrote:
>>> On Thu, 01 Jun 2023 10:42:25 -0700 Amritha Nambiar wrote:
>>>> Introduce new napi fields 'napi_rxq_list' and 'napi_txq_list'
>>>> for rx and tx queue set associated with the napi and
>>>> initialize them. Handle their removal as well.
>>>>
>>>> This enables a mapping of each napi instance with the
>>>> queue/queue-set on the corresponding irq line.
>>>
>>> Wouldn't it be easier to store the NAPI instance pointer in the queue?
>>> That way we don't have to allocate memory.
>>>    
>>
>> Could you please elaborate on this so I have more clarity ?
> 
> First off, let's acknowledge the fact you're asking me for
> clarifications ~40 days after I sent the feedback.
> 
Sorry about that, my vacation to be blamed.

> Pause for self-reflection.
> 
> Okay.
> 
>> Are you suggesting that there's a way to avoid maintaining the list
>> of queues in the napi struct?
> 
> Yes, why not add the napi pointer to struct netdev_queue and
> netdev_rx_queue, specifically?
> 

Yes, this would achieve the queue<->napi binding for each queue. But 
when there are multiple queues for a NAPI, I would need to walk a list 
of queues for the NAPI.

>> The idea was for netdev-genl to extract information out of
>> netdev->napi_list->napi. For tracking queues, we build a linked list
>> of queues for the napi and store it in the napi_struct. This would
>> also enable updating the napi<->queue[s] association (later with the
>> 'set' command), i.e. remove the queue[s] from the existing napi
>> instance that it is currently associated with and map with the new
>> napi instance, by simply deleting from one list and adding to the new
>> list.
> 
> Right, my point is that each queue can only be serviced by a single
> NAPI at a time, so we have a 1:N relationship. It's easier to store
> the state on the side that's the N, rather than 1.
> 
> You can add list_head to the queue structs, if you prefer to be able
> to walk queues of a NAPI more efficiently (that said the head for
> the list is in "control path only section" of napi_struct so...
> I think you don't?)

The napi pointer in the queue structs would give the napi<->queue 
mapping, I still need to walk the queues of a NAPI (when there are 
multiple queues for the NAPI), example:
'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5]

in which case I would have a list of netdev queue structs within the 
napi_struct (instead of the list of queue indices that I currently have) 
to avoid memory allocation.

Does this sound right?
Jakub Kicinski July 12, 2023, 11:53 p.m. UTC | #5
On Wed, 12 Jul 2023 16:11:55 -0700 Nambiar, Amritha wrote:
> >> The idea was for netdev-genl to extract information out of
> >> netdev->napi_list->napi. For tracking queues, we build a linked list
> >> of queues for the napi and store it in the napi_struct. This would
> >> also enable updating the napi<->queue[s] association (later with the
> >> 'set' command), i.e. remove the queue[s] from the existing napi
> >> instance that it is currently associated with and map with the new
> >> napi instance, by simply deleting from one list and adding to the new
> >> list.  
> > 
> > Right, my point is that each queue can only be serviced by a single
> > NAPI at a time, so we have a 1:N relationship. It's easier to store
> > the state on the side that's the N, rather than 1.
> > 
> > You can add list_head to the queue structs, if you prefer to be able
> > to walk queues of a NAPI more efficiently (that said the head for
> > the list is in "control path only section" of napi_struct so...
> > I think you don't?)  
> 
> The napi pointer in the queue structs would give the napi<->queue 
> mapping, I still need to walk the queues of a NAPI (when there are 
> multiple queues for the NAPI), example:
> 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5]
> 
> in which case I would have a list of netdev queue structs within the 
> napi_struct (instead of the list of queue indices that I currently have) 
> to avoid memory allocation.
> 
> Does this sound right?

yes, I think that's fine.

If we store the NAPI pointer in the queue struct, we can still generate
the same dump with the time complexity of #napis * (#max_rx + #max_tx).
Which I don't think is too bad. Up to you.
Jakub Kicinski July 28, 2023, 9:59 p.m. UTC | #6
On Wed, 12 Jul 2023 16:53:26 -0700 Jakub Kicinski wrote:
> > The napi pointer in the queue structs would give the napi<->queue 
> > mapping, I still need to walk the queues of a NAPI (when there are 
> > multiple queues for the NAPI), example:
> > 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5]
> > 
> > in which case I would have a list of netdev queue structs within the 
> > napi_struct (instead of the list of queue indices that I currently have) 
> > to avoid memory allocation.
> > 
> > Does this sound right?  
> 
> yes, I think that's fine.
> 
> If we store the NAPI pointer in the queue struct, we can still generate
> the same dump with the time complexity of #napis * (#max_rx + #max_tx).
> Which I don't think is too bad. Up to you.

The more I think about it the more I feel like we should dump queues
and NAPIs separately. And the queue can list the NAPI id of the NAPI
instance which services it.

Are you actively working on this or should I take a stab?
Nambiar, Amritha July 28, 2023, 10:37 p.m. UTC | #7
On 7/28/2023 2:59 PM, Jakub Kicinski wrote:
> On Wed, 12 Jul 2023 16:53:26 -0700 Jakub Kicinski wrote:
>>> The napi pointer in the queue structs would give the napi<->queue
>>> mapping, I still need to walk the queues of a NAPI (when there are
>>> multiple queues for the NAPI), example:
>>> 'napi-id': 600, 'rx-queues': [7,6,5], 'tx-queues': [7,6,5]
>>>
>>> in which case I would have a list of netdev queue structs within the
>>> napi_struct (instead of the list of queue indices that I currently have)
>>> to avoid memory allocation.
>>>
>>> Does this sound right?
>>
>> yes, I think that's fine.
>>
>> If we store the NAPI pointer in the queue struct, we can still generate
>> the same dump with the time complexity of #napis * (#max_rx + #max_tx).
>> Which I don't think is too bad. Up to you.
> 
> The more I think about it the more I feel like we should dump queues
> and NAPIs separately. And the queue can list the NAPI id of the NAPI
> instance which services it.
> 
> Are you actively working on this or should I take a stab?

Hi Jakub, I have the next version of patches ready (I'll send that in a 
bit). I suggest if you could take a look at it and let me know your 
thoughts and then we can proceed from there.

About dumping queues and NAPIs separately, are you thinking about having 
both per-NAPI and per-queue instances, or do you think only one will 
suffice. The plan was to follow this work with a 'set-napi' series, 
something like,
set-napi <napi_id> queues <q_id1, q_id2, ...>
to configure the queue[s] that are to be serviced by the napi instance.

In this case, dumping the NAPIs would be beneficial especially when 
there are multiple queues on the NAPI.

WRT per-queue, are there a set of parameters that needs to exposed 
besides what's already handled by ethtool... Also, to configure a queue 
on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to 
be looked up from the queue parameters dumped.

-Amritha
Jakub Kicinski July 28, 2023, 11:09 p.m. UTC | #8
On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote:
> Hi Jakub, I have the next version of patches ready (I'll send that in a 
> bit). I suggest if you could take a look at it and let me know your 
> thoughts and then we can proceed from there.

Great, looking forward.

> About dumping queues and NAPIs separately, are you thinking about having 
> both per-NAPI and per-queue instances, or do you think only one will 
> suffice. The plan was to follow this work with a 'set-napi' series, 
> something like,
> set-napi <napi_id> queues <q_id1, q_id2, ...>
> to configure the queue[s] that are to be serviced by the napi instance.
> 
> In this case, dumping the NAPIs would be beneficial especially when 
> there are multiple queues on the NAPI.
> 
> WRT per-queue, are there a set of parameters that needs to exposed 
> besides what's already handled by ethtool...

Not much at this point, maybe memory model. Maybe stats if we want to
put stats in the same command. But the fact that sysfs has a bunch of
per queue attributes makes me think that sooner or later we'll want
queue as a full object in netlink. And starting out that way makes 
the whole API cleaner, at least in my opinion.

If we have another object which wants to refer to queues (e.g. page
pool) it's easier to express the topology when it's clear what is an
object and what's just an attribute.

> Also, to configure a queue 
> on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to 
> be looked up from the queue parameters dumped.

The look up should not be much of a problem.

And don't you think that:

  set-queue queue 1 napi-id 101
  set-queue queue 2 napi-id 101

is more natural than:

  set-napi napi-id 101 queues [1, 2]

Especially in presence of conflicts. If user tries:

  set-napi napi-id 101 queues [1, 2]
  set-napi napi-id 102 queues [1, 2]

Do both napis now serve those queues? May seem obvious to us, but
"philosophically" why does setting an attribute of object 102 change
attributes of object 101?

If we ever gain the ability to create queues it will be:

  create-queue napi-id xyz

which also matches set-queue more nicely than napi base API.
Nambiar, Amritha July 31, 2023, 11:48 p.m. UTC | #9
On 7/28/2023 4:09 PM, Jakub Kicinski wrote:
> On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote:
>> Hi Jakub, I have the next version of patches ready (I'll send that in a
>> bit). I suggest if you could take a look at it and let me know your
>> thoughts and then we can proceed from there.
> 
> Great, looking forward.
> 
>> About dumping queues and NAPIs separately, are you thinking about having
>> both per-NAPI and per-queue instances, or do you think only one will
>> suffice. The plan was to follow this work with a 'set-napi' series,
>> something like,
>> set-napi <napi_id> queues <q_id1, q_id2, ...>
>> to configure the queue[s] that are to be serviced by the napi instance.
>>
>> In this case, dumping the NAPIs would be beneficial especially when
>> there are multiple queues on the NAPI.
>>
>> WRT per-queue, are there a set of parameters that needs to exposed
>> besides what's already handled by ethtool...
> 
> Not much at this point, maybe memory model. Maybe stats if we want to
> put stats in the same command. But the fact that sysfs has a bunch of
> per queue attributes makes me think that sooner or later we'll want
> queue as a full object in netlink. And starting out that way makes
> the whole API cleaner, at least in my opinion.
> 
> If we have another object which wants to refer to queues (e.g. page
> pool) it's easier to express the topology when it's clear what is an
> object and what's just an attribute.
> 
>> Also, to configure a queue
>> on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to
>> be looked up from the queue parameters dumped.
> 
> The look up should not be much of a problem.
> 
> And don't you think that:
> 
>    set-queue queue 1 napi-id 101
>    set-queue queue 2 napi-id 101
> 
> is more natural than:
> 
>    set-napi napi-id 101 queues [1, 2]
> 
> Especially in presence of conflicts. If user tries:
> 
>    set-napi napi-id 101 queues [1, 2]
>    set-napi napi-id 102 queues [1, 2]
> 
> Do both napis now serve those queues? May seem obvious to us, but
> "philosophically" why does setting an attribute of object 102 change
> attributes of object 101?
> 

Right, I see the point. In presence of conflicts when the 
napi<->queue[s] mappings are updated, set-napi will impact other 
NAPI-IDs, while set-queue would limit the change to just the queue that 
is requested.

In both the cases, the underlying work remains the same:
1. Remove the queue from the existing napi instance it is associated with.
2. Driver updates queue[s]<->vector mapping and associates with new napi
instance.
3. Report the impacted napi/queue back to the stack.

The 'napi-get' command would list all the napis and the updated
queue[s] list.

Now, in usecases where a single poller is set to service multiple queues 
(say 8), with set-napi this can be done with a single command, while 
with set-queue this will result in 8 different requests to the driver. 
This is the trade-off I see if we go with set-queue.

> If we ever gain the ability to create queues it will be:
> 
>    create-queue napi-id xyz
> 
> which also matches set-queue more nicely than napi base API.
David Ahern Aug. 2, 2023, 12:26 a.m. UTC | #10
On 7/28/23 5:09 PM, Jakub Kicinski wrote:
> On Fri, 28 Jul 2023 15:37:14 -0700 Nambiar, Amritha wrote:
>> Hi Jakub, I have the next version of patches ready (I'll send that in a 
>> bit). I suggest if you could take a look at it and let me know your 
>> thoughts and then we can proceed from there.
> 
> Great, looking forward.
> 
>> About dumping queues and NAPIs separately, are you thinking about having 
>> both per-NAPI and per-queue instances, or do you think only one will 
>> suffice. The plan was to follow this work with a 'set-napi' series, 
>> something like,
>> set-napi <napi_id> queues <q_id1, q_id2, ...>
>> to configure the queue[s] that are to be serviced by the napi instance.
>>
>> In this case, dumping the NAPIs would be beneficial especially when 
>> there are multiple queues on the NAPI.
>>
>> WRT per-queue, are there a set of parameters that needs to exposed 
>> besides what's already handled by ethtool...
> 
> Not much at this point, maybe memory model. Maybe stats if we want to
> put stats in the same command. But the fact that sysfs has a bunch of
> per queue attributes makes me think that sooner or later we'll want
> queue as a full object in netlink. And starting out that way makes 
> the whole API cleaner, at least in my opinion.
> 
> If we have another object which wants to refer to queues (e.g. page
> pool) it's easier to express the topology when it's clear what is an
> object and what's just an attribute.
> 
>> Also, to configure a queue 
>> on a NAPI, set-queue <qid> <napi_id>, the existing NAPIs would have to 
>> be looked up from the queue parameters dumped.
> 
> The look up should not be much of a problem.
> 
> And don't you think that:
> 
>   set-queue queue 1 napi-id 101
>   set-queue queue 2 napi-id 101
> 
> is more natural than:
> 
>   set-napi napi-id 101 queues [1, 2]
> 
> Especially in presence of conflicts. If user tries:
> 
>   set-napi napi-id 101 queues [1, 2]
>   set-napi napi-id 102 queues [1, 2]
> 
> Do both napis now serve those queues? May seem obvious to us, but
> "philosophically" why does setting an attribute of object 102 change
> attributes of object 101?
> 
> If we ever gain the ability to create queues it will be:
> 
>   create-queue napi-id xyz
> 
> which also matches set-queue more nicely than napi base API.
> 

I take it you have this path in mind as a means of creating
"specialized" queues (e.g., io_uring and Rx ZC). Any slides or notes on
the bigger picture?
Jakub Kicinski Aug. 2, 2023, 12:50 a.m. UTC | #11
On Tue, 1 Aug 2023 18:26:26 -0600 David Ahern wrote:
> I take it you have this path in mind as a means of creating
> "specialized" queues (e.g., io_uring and Rx ZC).

TBH I've been thinking more about the huge page stuff and RSS context
handling. Rx ZC should be a subset of what's needed for those cases.

> Any slides or notes on the bigger picture?

I don't have it well figured out, yet. The user space API is pretty
easy, but shaping it in a way that makes driver's life manageable is
more challenging.
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08fbd4622ccf..49f64401af7c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -342,6 +342,11 @@  struct gro_list {
  */
 #define GRO_HASH_BUCKETS	8
 
+struct napi_queue {
+	struct list_head	q_list;
+	u16			queue_index;
+};
+
 /*
  * Structure for NAPI scheduling similar to tasklet but with weighting
  */
@@ -376,6 +381,8 @@  struct napi_struct {
 	/* control-path-only fields follow */
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
+	struct list_head	napi_rxq_list;
+	struct list_head	napi_txq_list;
 };
 
 enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index 3393c2f3dbe8..9ee8eb3ef223 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6401,6 +6401,9 @@  void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi,
 	 */
 	if (dev->threaded && napi_kthread_create(napi))
 		dev->threaded = 0;
+
+	INIT_LIST_HEAD(&napi->napi_rxq_list);
+	INIT_LIST_HEAD(&napi->napi_txq_list);
 }
 EXPORT_SYMBOL(netif_napi_add_weight);
 
@@ -6462,6 +6465,23 @@  static void flush_gro_hash(struct napi_struct *napi)
 	}
 }
 
+static void __napi_del_queue(struct napi_queue *napi_queue)
+{
+	list_del_rcu(&napi_queue->q_list);
+	kfree(napi_queue);
+}
+
+static void napi_del_queues(struct napi_struct *napi)
+{
+	struct napi_queue *napi_queue, *n;
+
+	list_for_each_entry_safe(napi_queue, n, &napi->napi_rxq_list, q_list)
+		__napi_del_queue(napi_queue);
+
+	list_for_each_entry_safe(napi_queue, n, &napi->napi_txq_list, q_list)
+		__napi_del_queue(napi_queue);
+}
+
 /* Must be called in process context */
 void __netif_napi_del(struct napi_struct *napi)
 {
@@ -6479,6 +6499,7 @@  void __netif_napi_del(struct napi_struct *napi)
 		kthread_stop(napi->thread);
 		napi->thread = NULL;
 	}
+	napi_del_queues(napi);
 }
 EXPORT_SYMBOL(__netif_napi_del);