Message ID | 166260012413.81018.8010396115034847972.stgit@anambiarhost.jf.intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Extend action skbedit to RX queue mapping | expand |
On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar <amritha.nambiar@intel.com> wrote: > > Based on the discussion on > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > the following series extends skbedit tc action to RX queue mapping. > Currently, skbedit action in tc allows overriding of transmit queue. > Extending this ability of skedit action supports the selection of receive > queue for incoming packets. Offloading this action is added for receive > side. Enabled ice driver to offload this type of filter into the > hardware for accepting packets to the device's receive queue. > > v2: Added documentation in Documentation/networking > > --- > > Amritha Nambiar (4): > act_skbedit: Add support for action skbedit RX queue mapping > act_skbedit: Offload skbedit queue mapping for receive queue > ice: Enable RX queue selection using skbedit action > Documentation: networking: TC queue based filtering I don't think skbedit is the right thing to be updating for this. In the case of Tx we were using it because at the time we stored the sockets Tx queue in the skb, so it made sense to edit it there if we wanted to tweak things before it got to the qdisc layer. However it didn't have a direct impact on the hardware and only really affected the software routing in the device, which eventually resulted in which hardware queue and qdisc was selected. The problem with editing the receive queue is that the hardware offloaded case versus the software offloaded can have very different behaviors. I wonder if this wouldn't be better served by being an extension of the mirred ingress redirect action which is already used for multiple hardware offloads as I recall. In this case you would want to be redirecting packets received on a port to being received on a specific queue on that port. By using the redirect action it would take the packet out of the receive path and reinsert it, being able to account for anything such as the RPS configuration on the device so the behavior would be closer to what the hardware offloaded behavior would be.
> -----Original Message----- > From: Alexander Duyck <alexander.duyck@gmail.com> > Sent: Thursday, September 8, 2022 8:28 AM > To: Nambiar, Amritha <amritha.nambiar@intel.com> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > <vinicius.gomes@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > mapping > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > <amritha.nambiar@intel.com> wrote: > > > > Based on the discussion on > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > the following series extends skbedit tc action to RX queue mapping. > > Currently, skbedit action in tc allows overriding of transmit queue. > > Extending this ability of skedit action supports the selection of receive > > queue for incoming packets. Offloading this action is added for receive > > side. Enabled ice driver to offload this type of filter into the > > hardware for accepting packets to the device's receive queue. > > > > v2: Added documentation in Documentation/networking > > > > --- > > > > Amritha Nambiar (4): > > act_skbedit: Add support for action skbedit RX queue mapping > > act_skbedit: Offload skbedit queue mapping for receive queue > > ice: Enable RX queue selection using skbedit action > > Documentation: networking: TC queue based filtering > > I don't think skbedit is the right thing to be updating for this. In > the case of Tx we were using it because at the time we stored the > sockets Tx queue in the skb, so it made sense to edit it there if we > wanted to tweak things before it got to the qdisc layer. However it > didn't have a direct impact on the hardware and only really affected > the software routing in the device, which eventually resulted in which > hardware queue and qdisc was selected. > > The problem with editing the receive queue is that the hardware > offloaded case versus the software offloaded can have very different > behaviors. I wonder if this wouldn't be better served by being an Could you please explain how the hardware offload and software cases behave differently in the skbedit case. From Jakub's suggestion on https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, it looked like the skbedit action fits better to align the hardware and software description of RX queue offload (considering the skb metadata remains same in offload vs no-offload case). > extension of the mirred ingress redirect action which is already used > for multiple hardware offloads as I recall. > > In this case you would want to be redirecting packets received on a > port to being received on a specific queue on that port. By using the > redirect action it would take the packet out of the receive path and > reinsert it, being able to account for anything such as the RPS > configuration on the device so the behavior would be closer to what > the hardware offloaded behavior would be. Wouldn't this be an overkill as we only want to accept packets into a predetermined queue? IIUC, the mirred redirect action typically moves packets from one interface to another, the filter is added on interface different from the destination interface. In our case, with the destination interface being the same, I am not understanding the need for a loopback. Also, WRT to RPS, not sure I understand the impact here. In hardware, once the offloaded filter executes to select the queue, RSS does not run. In software, if RPS executes before sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit overriding the queue) behave in similar way ?
On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha <amritha.nambiar@intel.com> wrote: > > > -----Original Message----- > > From: Alexander Duyck <alexander.duyck@gmail.com> > > Sent: Thursday, September 8, 2022 8:28 AM > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > <sridhar.samudrala@intel.com> > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > mapping > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > > <amritha.nambiar@intel.com> wrote: > > > > > > Based on the discussion on > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > > the following series extends skbedit tc action to RX queue mapping. > > > Currently, skbedit action in tc allows overriding of transmit queue. > > > Extending this ability of skedit action supports the selection of receive > > > queue for incoming packets. Offloading this action is added for receive > > > side. Enabled ice driver to offload this type of filter into the > > > hardware for accepting packets to the device's receive queue. > > > > > > v2: Added documentation in Documentation/networking > > > > > > --- > > > > > > Amritha Nambiar (4): > > > act_skbedit: Add support for action skbedit RX queue mapping > > > act_skbedit: Offload skbedit queue mapping for receive queue > > > ice: Enable RX queue selection using skbedit action > > > Documentation: networking: TC queue based filtering > > > > I don't think skbedit is the right thing to be updating for this. In > > the case of Tx we were using it because at the time we stored the > > sockets Tx queue in the skb, so it made sense to edit it there if we > > wanted to tweak things before it got to the qdisc layer. However it > > didn't have a direct impact on the hardware and only really affected > > the software routing in the device, which eventually resulted in which > > hardware queue and qdisc was selected. > > > > The problem with editing the receive queue is that the hardware > > offloaded case versus the software offloaded can have very different > > behaviors. I wonder if this wouldn't be better served by being an > > Could you please explain how the hardware offload and software cases > behave differently in the skbedit case. From Jakub's suggestion on > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, > it looked like the skbedit action fits better to align the hardware and > software description of RX queue offload (considering the skb metadata > remains same in offload vs no-offload case). So specifically my concern is RPS. The problem is RPS takes place before your TC rule would be applied in the software case, but after it has been applied in the hardware case. As a result the behavior will be different for one versus the other. With the redirect action it will pull the packet out of the Rx pipeline and reinsert it so that RPS will be applied to the packet and it would be received on the CPUs expected. > > extension of the mirred ingress redirect action which is already used > > for multiple hardware offloads as I recall. > > > > In this case you would want to be redirecting packets received on a > > port to being received on a specific queue on that port. By using the > > redirect action it would take the packet out of the receive path and > > reinsert it, being able to account for anything such as the RPS > > configuration on the device so the behavior would be closer to what > > the hardware offloaded behavior would be. > > Wouldn't this be an overkill as we only want to accept packets into a > predetermined queue? IIUC, the mirred redirect action typically moves > packets from one interface to another, the filter is added on interface > different from the destination interface. In our case, with the > destination interface being the same, I am not understanding the need > for a loopback. Also, WRT to RPS, not sure I understand the impact > here. In hardware, once the offloaded filter executes to select the queue, > RSS does not run. In software, if RPS executes before > sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit > overriding the queue) behave in similar way ? The problem is that RPS != RSS. You can use the two together to spread work out over a greater set of queues. So for example in a NUMA system with multiple sockets/nodes you might use RSS to split the work up into a per-node queue(s), and then use RPS to split up the work across CPUs within that node. If you pick a packet up from one device and redirect it via the mirred action the RPS is applied as though the packet was received on the device so the RPS queue would be correct assuming you updated the queue. That is what I am looking for. What this patch is doing is creating a situation where the effect is very different between the hardware and software version of things which would likely break things for a use case such as this.
> -----Original Message----- > From: Alexander Duyck <alexander.duyck@gmail.com> > Sent: Friday, September 9, 2022 11:35 AM > To: Nambiar, Amritha <amritha.nambiar@intel.com> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > <vinicius.gomes@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > mapping > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha > <amritha.nambiar@intel.com> wrote: > > > > > -----Original Message----- > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > Sent: Thursday, September 8, 2022 8:28 AM > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > <sridhar.samudrala@intel.com> > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > > mapping > > > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > Based on the discussion on > > > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > > > the following series extends skbedit tc action to RX queue mapping. > > > > Currently, skbedit action in tc allows overriding of transmit queue. > > > > Extending this ability of skedit action supports the selection of receive > > > > queue for incoming packets. Offloading this action is added for receive > > > > side. Enabled ice driver to offload this type of filter into the > > > > hardware for accepting packets to the device's receive queue. > > > > > > > > v2: Added documentation in Documentation/networking > > > > > > > > --- > > > > > > > > Amritha Nambiar (4): > > > > act_skbedit: Add support for action skbedit RX queue mapping > > > > act_skbedit: Offload skbedit queue mapping for receive queue > > > > ice: Enable RX queue selection using skbedit action > > > > Documentation: networking: TC queue based filtering > > > > > > I don't think skbedit is the right thing to be updating for this. In > > > the case of Tx we were using it because at the time we stored the > > > sockets Tx queue in the skb, so it made sense to edit it there if we > > > wanted to tweak things before it got to the qdisc layer. However it > > > didn't have a direct impact on the hardware and only really affected > > > the software routing in the device, which eventually resulted in which > > > hardware queue and qdisc was selected. > > > > > > The problem with editing the receive queue is that the hardware > > > offloaded case versus the software offloaded can have very different > > > behaviors. I wonder if this wouldn't be better served by being an > > > > Could you please explain how the hardware offload and software cases > > behave differently in the skbedit case. From Jakub's suggestion on > > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, > > it looked like the skbedit action fits better to align the hardware and > > software description of RX queue offload (considering the skb metadata > > remains same in offload vs no-offload case). > > So specifically my concern is RPS. The problem is RPS takes place > before your TC rule would be applied in the software case, but after > it has been applied in the hardware case. As a result the behavior > will be different for one versus the other. With the redirect action > it will pull the packet out of the Rx pipeline and reinsert it so that > RPS will be applied to the packet and it would be received on the CPUs > expected. > Okay, so I understand that without HW offload, the SW behavior would not align for RPS, i.e., RPS CPU would be from a queue (already selected by HW, RSS etc.), and may not align with the queue selected from the SW TC rule. And I see your point, the solution to this would be reinserting the packet after updating the queue. But, as I look more into this, there are still some more concerns I have. IIUC, we may be looking at a potential TC rule as below: tc filter add dev ethX ingress ... \ action mirred ingress redirect dev ethX rxqueue <rx_qid> It looks to me that this configuration could possibly result in loops recursively calling act_mirred. Since the redirection is from ingress to ingress on the same device, when the packet is reinserted into the RX pipeline of the same device, RPS and tc classification happens again, the tc filter with act mirred executes redirecting and reinserting the packet again. act_mirred keeps a CPU counter of recursive calls for the action and drops the packet when the limit is reached. If this is a valid configuration, I cannot find any code that perhaps uses a combination of skb->redirect and skb->from_ingress to check and prevent recursive classification (further execution of TC mirred redirect action). Also, since reinserting the packet after updating the queue would fix the RPS inconsistency, can this be done from the skbedit action instead of mirred redirect ? So, if skbedit action is used for Rx queue selection, maybe this sequence helps: RPS on RX q1 -> TC action skbedit RX q2 -> always reinsert if action skbedit is on RX -> RPS on RX q2 -> stop further execution of TC action RX skbedit > > > extension of the mirred ingress redirect action which is already used > > > for multiple hardware offloads as I recall. > > > > > > In this case you would want to be redirecting packets received on a > > > port to being received on a specific queue on that port. By using the > > > redirect action it would take the packet out of the receive path and > > > reinsert it, being able to account for anything such as the RPS > > > configuration on the device so the behavior would be closer to what > > > the hardware offloaded behavior would be. > > > > Wouldn't this be an overkill as we only want to accept packets into a > > predetermined queue? IIUC, the mirred redirect action typically moves > > packets from one interface to another, the filter is added on interface > > different from the destination interface. In our case, with the > > destination interface being the same, I am not understanding the need > > for a loopback. Also, WRT to RPS, not sure I understand the impact > > here. In hardware, once the offloaded filter executes to select the queue, > > RSS does not run. In software, if RPS executes before > > sch_handle_ingress(), wouldn't any tc-actions (mirred redirect or skbedit > > overriding the queue) behave in similar way ? > > The problem is that RPS != RSS. You can use the two together to spread > work out over a greater set of queues. So for example in a NUMA system > with multiple sockets/nodes you might use RSS to split the work up > into a per-node queue(s), and then use RPS to split up the work across > CPUs within that node. If you pick a packet up from one device and > redirect it via the mirred action the RPS is applied as though the > packet was received on the device so the RPS queue would be correct > assuming you updated the queue. That is what I am looking for. What > this patch is doing is creating a situation where the effect is very > different between the hardware and software version of things which > would likely break things for a use case such as this.
On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha <amritha.nambiar@intel.com> wrote: > > > -----Original Message----- > > From: Alexander Duyck <alexander.duyck@gmail.com> > > Sent: Friday, September 9, 2022 11:35 AM > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > <sridhar.samudrala@intel.com> > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > mapping > > > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha > > <amritha.nambiar@intel.com> wrote: > > > > > > > -----Original Message----- > > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > > Sent: Thursday, September 8, 2022 8:28 AM > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > > <sridhar.samudrala@intel.com> > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > > > mapping > > > > > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > > > Based on the discussion on > > > > > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > > > > the following series extends skbedit tc action to RX queue mapping. > > > > > Currently, skbedit action in tc allows overriding of transmit queue. > > > > > Extending this ability of skedit action supports the selection of receive > > > > > queue for incoming packets. Offloading this action is added for receive > > > > > side. Enabled ice driver to offload this type of filter into the > > > > > hardware for accepting packets to the device's receive queue. > > > > > > > > > > v2: Added documentation in Documentation/networking > > > > > > > > > > --- > > > > > > > > > > Amritha Nambiar (4): > > > > > act_skbedit: Add support for action skbedit RX queue mapping > > > > > act_skbedit: Offload skbedit queue mapping for receive queue > > > > > ice: Enable RX queue selection using skbedit action > > > > > Documentation: networking: TC queue based filtering > > > > > > > > I don't think skbedit is the right thing to be updating for this. In > > > > the case of Tx we were using it because at the time we stored the > > > > sockets Tx queue in the skb, so it made sense to edit it there if we > > > > wanted to tweak things before it got to the qdisc layer. However it > > > > didn't have a direct impact on the hardware and only really affected > > > > the software routing in the device, which eventually resulted in which > > > > hardware queue and qdisc was selected. > > > > > > > > The problem with editing the receive queue is that the hardware > > > > offloaded case versus the software offloaded can have very different > > > > behaviors. I wonder if this wouldn't be better served by being an > > > > > > Could you please explain how the hardware offload and software cases > > > behave differently in the skbedit case. From Jakub's suggestion on > > > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, > > > it looked like the skbedit action fits better to align the hardware and > > > software description of RX queue offload (considering the skb metadata > > > remains same in offload vs no-offload case). > > > > So specifically my concern is RPS. The problem is RPS takes place > > before your TC rule would be applied in the software case, but after > > it has been applied in the hardware case. As a result the behavior > > will be different for one versus the other. With the redirect action > > it will pull the packet out of the Rx pipeline and reinsert it so that > > RPS will be applied to the packet and it would be received on the CPUs > > expected. > > > > Okay, so I understand that without HW offload, the SW behavior would > not align for RPS, i.e., RPS CPU would be from a queue (already selected > by HW, RSS etc.), and may not align with the queue selected from > the SW TC rule. And I see your point, the solution to this would be > reinserting the packet after updating the queue. But, as I look more into > this, there are still some more concerns I have. > > IIUC, we may be looking at a potential TC rule as below: > tc filter add dev ethX ingress ... \ > action mirred ingress redirect dev ethX rxqueue <rx_qid> > > It looks to me that this configuration could possibly result in loops > recursively calling act_mirred. Since the redirection is from ingress > to ingress on the same device, when the packet is reinserted into the > RX pipeline of the same device, RPS and tc classification happens again, > the tc filter with act mirred executes redirecting and reinserting the > packet again. act_mirred keeps a CPU counter of recursive calls for the > action and drops the packet when the limit is reached. > If this is a valid configuration, I cannot find any code that perhaps uses > a combination of skb->redirect and skb->from_ingress to check and > prevent recursive classification (further execution of TC mirred redirect > action). The recursion problem is easily solved by simply not requeueing again if the packet is on the queue it is supposed to be. If you have rules that are bouncing the traffic between two queues it wouldn't be any different than the potential issue of bouncing traffic between two netdevs which is why the recursion counters were added. > Also, since reinserting the packet after updating the queue would fix > the RPS inconsistency, can this be done from the skbedit action instead > of mirred redirect ? So, if skbedit action is used for Rx queue selection, > maybe this sequence helps: > > RPS on RX q1 -> TC action skbedit RX q2 -> > always reinsert if action skbedit is on RX -> RPS on RX q2 -> > stop further execution of TC action RX skbedit That is changing the function of skbedit. In addition the skbedit action isn't meant to be offloaded. The skbedit action is only really supposed to edit skb medatada, it isn't supposed to take other actions on the frame. What we want to avoid is making skbedit into another mirred.
> -----Original Message----- > From: Alexander Duyck <alexander.duyck@gmail.com> > Sent: Thursday, September 15, 2022 8:21 AM > To: Nambiar, Amritha <amritha.nambiar@intel.com> > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > <vinicius.gomes@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > mapping > > On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha > <amritha.nambiar@intel.com> wrote: > > > > > -----Original Message----- > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > Sent: Friday, September 9, 2022 11:35 AM > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > <sridhar.samudrala@intel.com> > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > > mapping > > > > > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > > -----Original Message----- > > > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > > > Sent: Thursday, September 8, 2022 8:28 AM > > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > > > <sridhar.samudrala@intel.com> > > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX > queue > > > > > mapping > > > > > > > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > > > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > > > > > Based on the discussion on > > > > > > > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > > > > > the following series extends skbedit tc action to RX queue mapping. > > > > > > Currently, skbedit action in tc allows overriding of transmit queue. > > > > > > Extending this ability of skedit action supports the selection of > receive > > > > > > queue for incoming packets. Offloading this action is added for > receive > > > > > > side. Enabled ice driver to offload this type of filter into the > > > > > > hardware for accepting packets to the device's receive queue. > > > > > > > > > > > > v2: Added documentation in Documentation/networking > > > > > > > > > > > > --- > > > > > > > > > > > > Amritha Nambiar (4): > > > > > > act_skbedit: Add support for action skbedit RX queue mapping > > > > > > act_skbedit: Offload skbedit queue mapping for receive queue > > > > > > ice: Enable RX queue selection using skbedit action > > > > > > Documentation: networking: TC queue based filtering > > > > > > > > > > I don't think skbedit is the right thing to be updating for this. In > > > > > the case of Tx we were using it because at the time we stored the > > > > > sockets Tx queue in the skb, so it made sense to edit it there if we > > > > > wanted to tweak things before it got to the qdisc layer. However it > > > > > didn't have a direct impact on the hardware and only really affected > > > > > the software routing in the device, which eventually resulted in which > > > > > hardware queue and qdisc was selected. > > > > > > > > > > The problem with editing the receive queue is that the hardware > > > > > offloaded case versus the software offloaded can have very different > > > > > behaviors. I wonder if this wouldn't be better served by being an > > > > > > > > Could you please explain how the hardware offload and software cases > > > > behave differently in the skbedit case. From Jakub's suggestion on > > > > > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, > > > > it looked like the skbedit action fits better to align the hardware and > > > > software description of RX queue offload (considering the skb metadata > > > > remains same in offload vs no-offload case). > > > > > > So specifically my concern is RPS. The problem is RPS takes place > > > before your TC rule would be applied in the software case, but after > > > it has been applied in the hardware case. As a result the behavior > > > will be different for one versus the other. With the redirect action > > > it will pull the packet out of the Rx pipeline and reinsert it so that > > > RPS will be applied to the packet and it would be received on the CPUs > > > expected. > > > > > > > Okay, so I understand that without HW offload, the SW behavior would > > not align for RPS, i.e., RPS CPU would be from a queue (already selected > > by HW, RSS etc.), and may not align with the queue selected from > > the SW TC rule. And I see your point, the solution to this would be > > reinserting the packet after updating the queue. But, as I look more into > > this, there are still some more concerns I have. > > > > IIUC, we may be looking at a potential TC rule as below: > > tc filter add dev ethX ingress ... \ > > action mirred ingress redirect dev ethX rxqueue <rx_qid> > > > > It looks to me that this configuration could possibly result in loops > > recursively calling act_mirred. Since the redirection is from ingress > > to ingress on the same device, when the packet is reinserted into the > > RX pipeline of the same device, RPS and tc classification happens again, > > the tc filter with act mirred executes redirecting and reinserting the > > packet again. act_mirred keeps a CPU counter of recursive calls for the > > action and drops the packet when the limit is reached. > > If this is a valid configuration, I cannot find any code that perhaps uses > > a combination of skb->redirect and skb->from_ingress to check and > > prevent recursive classification (further execution of TC mirred redirect > > action). > > The recursion problem is easily solved by simply not requeueing again > if the packet is on the queue it is supposed to be. If you have rules > that are bouncing the traffic between two queues it wouldn't be any > different than the potential issue of bouncing traffic between two > netdevs which is why the recursion counters were added. > Okay, makes sense. So, redirecting (ingress to ingress) on the same device with the queue_mapping extension would make it possible to update the queue. Without the queue_mapping extension, ingress to ingress redirect on the same device would simply bounce the packets and eventually drop them once the recursion limit is reached. > > Also, since reinserting the packet after updating the queue would fix > > the RPS inconsistency, can this be done from the skbedit action instead > > of mirred redirect ? So, if skbedit action is used for Rx queue selection, > > maybe this sequence helps: > > > > RPS on RX q1 -> TC action skbedit RX q2 -> > > always reinsert if action skbedit is on RX -> RPS on RX q2 -> > > stop further execution of TC action RX skbedit > > That is changing the function of skbedit. In addition the skbedit > action isn't meant to be offloaded. The skbedit action is only really > supposed to edit skb medatada, it isn't supposed to take other actions > on the frame. What we want to avoid is making skbedit into another > mirred. Okay, so skbedit changing the skb metadata and doing the additional action here (reinserting the packet) is not right. But in terms of skbedit action not meant to be offloaded, I do find that skbedit action allows offload of certain parameters like mark, ptype and priority.
On Thu, Sep 15, 2022 at 3:09 PM Nambiar, Amritha <amritha.nambiar@intel.com> wrote: > > > -----Original Message----- > > From: Alexander Duyck <alexander.duyck@gmail.com> > > Sent: Thursday, September 15, 2022 8:21 AM > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > <sridhar.samudrala@intel.com> > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > mapping > > > > On Thu, Sep 15, 2022 at 1:45 AM Nambiar, Amritha > > <amritha.nambiar@intel.com> wrote: > > > > > > > -----Original Message----- > > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > > Sent: Friday, September 9, 2022 11:35 AM > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > > <sridhar.samudrala@intel.com> > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > > > > mapping > > > > > > > > On Fri, Sep 9, 2022 at 2:18 AM Nambiar, Amritha > > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > > > > -----Original Message----- > > > > > > From: Alexander Duyck <alexander.duyck@gmail.com> > > > > > > Sent: Thursday, September 8, 2022 8:28 AM > > > > > > To: Nambiar, Amritha <amritha.nambiar@intel.com> > > > > > > Cc: netdev@vger.kernel.org; kuba@kernel.org; jhs@mojatatu.com; > > > > > > jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, Vinicius > > > > > > <vinicius.gomes@intel.com>; Samudrala, Sridhar > > > > > > <sridhar.samudrala@intel.com> > > > > > > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX > > queue > > > > > > mapping > > > > > > > > > > > > On Wed, Sep 7, 2022 at 6:14 PM Amritha Nambiar > > > > > > <amritha.nambiar@intel.com> wrote: > > > > > > > > > > > > > > Based on the discussion on > > > > > > > > > > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > > > > > > the following series extends skbedit tc action to RX queue mapping. > > > > > > > Currently, skbedit action in tc allows overriding of transmit queue. > > > > > > > Extending this ability of skedit action supports the selection of > > receive > > > > > > > queue for incoming packets. Offloading this action is added for > > receive > > > > > > > side. Enabled ice driver to offload this type of filter into the > > > > > > > hardware for accepting packets to the device's receive queue. > > > > > > > > > > > > > > v2: Added documentation in Documentation/networking > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > Amritha Nambiar (4): > > > > > > > act_skbedit: Add support for action skbedit RX queue mapping > > > > > > > act_skbedit: Offload skbedit queue mapping for receive queue > > > > > > > ice: Enable RX queue selection using skbedit action > > > > > > > Documentation: networking: TC queue based filtering > > > > > > > > > > > > I don't think skbedit is the right thing to be updating for this. In > > > > > > the case of Tx we were using it because at the time we stored the > > > > > > sockets Tx queue in the skb, so it made sense to edit it there if we > > > > > > wanted to tweak things before it got to the qdisc layer. However it > > > > > > didn't have a direct impact on the hardware and only really affected > > > > > > the software routing in the device, which eventually resulted in which > > > > > > hardware queue and qdisc was selected. > > > > > > > > > > > > The problem with editing the receive queue is that the hardware > > > > > > offloaded case versus the software offloaded can have very different > > > > > > behaviors. I wonder if this wouldn't be better served by being an > > > > > > > > > > Could you please explain how the hardware offload and software cases > > > > > behave differently in the skbedit case. From Jakub's suggestion on > > > > > > > https://lore.kernel.org/netdev/20220503084732.363b89cc@kernel.org/, > > > > > it looked like the skbedit action fits better to align the hardware and > > > > > software description of RX queue offload (considering the skb metadata > > > > > remains same in offload vs no-offload case). > > > > > > > > So specifically my concern is RPS. The problem is RPS takes place > > > > before your TC rule would be applied in the software case, but after > > > > it has been applied in the hardware case. As a result the behavior > > > > will be different for one versus the other. With the redirect action > > > > it will pull the packet out of the Rx pipeline and reinsert it so that > > > > RPS will be applied to the packet and it would be received on the CPUs > > > > expected. > > > > > > > > > > Okay, so I understand that without HW offload, the SW behavior would > > > not align for RPS, i.e., RPS CPU would be from a queue (already selected > > > by HW, RSS etc.), and may not align with the queue selected from > > > the SW TC rule. And I see your point, the solution to this would be > > > reinserting the packet after updating the queue. But, as I look more into > > > this, there are still some more concerns I have. > > > > > > IIUC, we may be looking at a potential TC rule as below: > > > tc filter add dev ethX ingress ... \ > > > action mirred ingress redirect dev ethX rxqueue <rx_qid> > > > > > > It looks to me that this configuration could possibly result in loops > > > recursively calling act_mirred. Since the redirection is from ingress > > > to ingress on the same device, when the packet is reinserted into the > > > RX pipeline of the same device, RPS and tc classification happens again, > > > the tc filter with act mirred executes redirecting and reinserting the > > > packet again. act_mirred keeps a CPU counter of recursive calls for the > > > action and drops the packet when the limit is reached. > > > If this is a valid configuration, I cannot find any code that perhaps uses > > > a combination of skb->redirect and skb->from_ingress to check and > > > prevent recursive classification (further execution of TC mirred redirect > > > action). > > > > The recursion problem is easily solved by simply not requeueing again > > if the packet is on the queue it is supposed to be. If you have rules > > that are bouncing the traffic between two queues it wouldn't be any > > different than the potential issue of bouncing traffic between two > > netdevs which is why the recursion counters were added. > > > > Okay, makes sense. So, redirecting (ingress to ingress) on the same > device with the queue_mapping extension would make it possible to > update the queue. Without the queue_mapping extension, ingress to > ingress redirect on the same device would simply bounce the packets and > eventually drop them once the recursion limit is reached. Right. Basically this would be an extension for the redirect action that would move a queue up to a first class citizen of sorts. > > > Also, since reinserting the packet after updating the queue would fix > > > the RPS inconsistency, can this be done from the skbedit action instead > > > of mirred redirect ? So, if skbedit action is used for Rx queue selection, > > > maybe this sequence helps: > > > > > > RPS on RX q1 -> TC action skbedit RX q2 -> > > > always reinsert if action skbedit is on RX -> RPS on RX q2 -> > > > stop further execution of TC action RX skbedit > > > > That is changing the function of skbedit. In addition the skbedit > > action isn't meant to be offloaded. The skbedit action is only really > > supposed to edit skb medatada, it isn't supposed to take other actions > > on the frame. What we want to avoid is making skbedit into another > > mirred. > > Okay, so skbedit changing the skb metadata and doing the additional > action here (reinserting the packet) is not right. But in terms of skbedit > action not meant to be offloaded, I do find that skbedit action allows > offload of certain parameters like mark, ptype and priority. I had overlooked the offload use as it isn't exactly a direct use case. It is needed in the context of offloading a qdisc if I am not mistaken. So it isn't really the action that is being offloaded but the queueing discipline itself, the skbedit is more of a passenger along for the ride. The general idea is that the values would be consumed by the qdisc layer to take some action, and since the qdisc is being handled down in the hardware it is now down in the hardware with it. I'm not thrilled, but I can see the justification as we have had discussions about renaming the action before since it really just edits the metadata for the frame. The big difference is that it is the qdisc taking the action versus the action directly, and all we are tweaking is the metadata which is the main task of skbedit which is meant to only edit metadata.
On Wed, 07 Sep 2022 18:23:57 -0700 Amritha Nambiar wrote: > Based on the discussion on > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > the following series extends skbedit tc action to RX queue mapping. > Currently, skbedit action in tc allows overriding of transmit queue. > Extending this ability of skedit action supports the selection of receive > queue for incoming packets. Offloading this action is added for receive > side. Enabled ice driver to offload this type of filter into the > hardware for accepting packets to the device's receive queue. > > v2: Added documentation in Documentation/networking Alex and I had a quick chat about this, I think we can work around the difficulties with duplicating the behavior in SW by enforcing that the action can only be used with skip_sw. Either skbedit or Alex's suggested approach with act_mirred. Or new hw-only action. Alex pointed out that it'd be worth documenting the priorities of aRFS vs this thing, which one will be used if HW matches both.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, September 21, 2022 1:29 PM > To: Nambiar, Amritha <amritha.nambiar@intel.com> > Cc: netdev@vger.kernel.org; alexander.duyck@gmail.com; > jhs@mojatatu.com; jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, > Vinicius <vinicius.gomes@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > mapping > > On Wed, 07 Sep 2022 18:23:57 -0700 Amritha Nambiar wrote: > > Based on the discussion on > > https://lore.kernel.org/netdev/20220429171717.5b0b2a81@kernel.org/, > > the following series extends skbedit tc action to RX queue mapping. > > Currently, skbedit action in tc allows overriding of transmit queue. > > Extending this ability of skedit action supports the selection of receive > > queue for incoming packets. Offloading this action is added for receive > > side. Enabled ice driver to offload this type of filter into the > > hardware for accepting packets to the device's receive queue. > > > > v2: Added documentation in Documentation/networking > > Alex and I had a quick chat about this, I think we can work around > the difficulties with duplicating the behavior in SW by enforcing > that the action can only be used with skip_sw. Either skbedit or > Alex's suggested approach with act_mirred. Or new hw-only action. > Okay, I will go with skbedit and enforce skip_sw for the action on Rx side. So, skbedit for TX queue is SW only and skbedit for RX queue is HW only action. > Alex pointed out that it'd be worth documenting the priorities of > aRFS vs this thing, which one will be used if HW matches both. Sure, I will document the priorities of aRFS and TC filter selecting the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue has higher priority over aRFS (Flow director filter) if both the filters are matched.
On Thu, 22 Sep 2022 08:19:07 +0000 Nambiar, Amritha wrote: > > Alex pointed out that it'd be worth documenting the priorities of > > aRFS vs this thing, which one will be used if HW matches both. > > Sure, I will document the priorities of aRFS and TC filter selecting > the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue > has higher priority over aRFS (Flow director filter) if both the filters > are matched. Is it configurable? :S If we think about RSS context selection we'd expect the context to be selected based on for example the IPv6 daddr of the container but we still want aRFS to select the exact queue... Is there a counterargument for giving the flow director higher prio?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 22, 2022 5:29 AM > To: Nambiar, Amritha <amritha.nambiar@intel.com> > Cc: netdev@vger.kernel.org; alexander.duyck@gmail.com; > jhs@mojatatu.com; jiri@resnulli.us; xiyou.wangcong@gmail.com; Gomes, > Vinicius <vinicius.gomes@intel.com>; Samudrala, Sridhar > <sridhar.samudrala@intel.com> > Subject: Re: [net-next PATCH v2 0/4] Extend action skbedit to RX queue > mapping > > On Thu, 22 Sep 2022 08:19:07 +0000 Nambiar, Amritha wrote: > > > Alex pointed out that it'd be worth documenting the priorities of > > > aRFS vs this thing, which one will be used if HW matches both. > > > > Sure, I will document the priorities of aRFS and TC filter selecting > > the Rx queue. On Intel E810 devices, the TC filter selecting Rx queue > > has higher priority over aRFS (Flow director filter) if both the filters > > are matched. > > Is it configurable? :S If we think about RSS context selection we'd > expect the context to be selected based on for example the IPv6 daddr > of the container but we still want aRFS to select the exact queue... This behavior is similar on E810 currently, i.e., TC filter selects the set of queues (like the RSS context), and then the flow director filter can be used to select the exact queue within the queue-set. We want to have the ability to select the exact queue within the queue-set via TC (and not flow director filter). On E810, TC filters are added to hardware block called the "switch". This block supports two types of actions in hardware termed as "forward to queue" and "forward to queue-set". aRFS filters are added to a different HW block called the "flow director" (FD). The FD block comes after the switch block in the HW pipeline. The FD block has certain restrictions (can only have filters with the same packet type). The switch table does not have this restriction. When both the TC filter and FD filter competes for queue selection (i.e. both filters are matched and action is to set a queue), the pipeline resolves conflicts based on metadata priority. The priorities are not user configurable. The ICE driver programs these priorities based on metadata at each stage, action etc. Switch (TC) filters with forward-to-queue action have higher priority over FD filter assigning a queue. The hash filter has lowest priority. > Is there a counterargument for giving the flow director higher prio? Isn't the HW behavior on RX (WRT to setting the queue) consistent with what we have in SW for TX ? In SW, TX queue set via TC filter (skbedit action) has the highest precedence over XPS and jhash (lowest).
On Fri, 23 Sep 2022 01:09:38 +0000 Nambiar, Amritha wrote: > > Is it configurable? :S If we think about RSS context selection we'd > > expect the context to be selected based on for example the IPv6 daddr > > of the container but we still want aRFS to select the exact queue... > > This behavior is similar on E810 currently, i.e., TC filter selects the set > of queues (like the RSS context), and then the flow director filter can > be used to select the exact queue within the queue-set. We want to have > the ability to select the exact queue within the queue-set via TC (and not > flow director filter). > > On E810, TC filters are added to hardware block called the "switch". This > block supports two types of actions in hardware termed as "forward to > queue" and "forward to queue-set". aRFS filters are added to a different > HW block called the "flow director" (FD). The FD block comes after the switch > block in the HW pipeline. The FD block has certain restrictions (can only > have filters with the same packet type). The switch table does not have > this restriction. > > When both the TC filter and FD filter competes for queue selection (i.e. both > filters are matched and action is to set a queue), the pipeline resolves > conflicts based on metadata priority. The priorities are not user configurable. > The ICE driver programs these priorities based on metadata at each stage, > action etc. Switch (TC) filters with forward-to-queue action have higher > priority over FD filter assigning a queue. The hash filter has lowest priority. > > > Is there a counterargument for giving the flow director higher prio? > > Isn't the HW behavior on RX (WRT to setting the queue) consistent > with what we have in SW for TX ? In SW, TX queue set via TC filter > (skbedit action) has the highest precedence over XPS and jhash (lowest). Alright, I guess that could work as well, thanks for the explanation. Initially I thought that it'd be strange if the queue-set selection was before aRFS and queue-id selection after, if they are both to be controlled by TC. But I can see how that makes most practical sense :S