Message ID | 1687823827-15850-1-git-send-email-longli@linuxonhyperv.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3] net: mana: Batch ringing RX queue doorbell on receiving packets | expand |
On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > From: Long Li <longli@microsoft.com> > > It's inefficient to ring the doorbell page every time a WQE is posted to > the received queue. Excessive MMIO writes result in CPU spending more > time waiting on LOCK instructions (atomic operations), resulting in > poor scaling performance. > > Move the code for ringing doorbell page to where after we have posted all > WQEs to the receive queue during a callback from napi_poll(). > > With this change, tests showed an improvement from 120G/s to 160G/s on a > 200G physical link, with 16 or 32 hardware queues. > > Tests showed no regression in network latency benchmarks on single > connection. > > While we are making changes in this code path, change the code for > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > hardware specification specifies that it should set to 0. Although > currently the hardware doesn't enforce the check, in the future releases > it may do. > > Cc: stable@vger.kernel.org > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Uhmmm... this looks like a performance improvement to me, more suitable for the net-next tree ?!? (Note that net-next is closed now). In any case you must avoid empty lines in the tag area. If you really intend targeting the -net tree, please repost fixing the above and explicitly specifying the target tree in the subj prefix. thanks! Paolo
On Thu, 29 Jun 2023 10:42:34 +0200 Paolo Abeni wrote: > > While we are making changes in this code path, change the code for > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > hardware specification specifies that it should set to 0. Although > > currently the hardware doesn't enforce the check, in the future releases > > it may do. And please split this cleanup into a separate patch, it doesn't sound like it has to be done as part of the optimization.
On Thu, 29 Jun 2023 16:53:42 +0000 Haiyang Zhang wrote: > This web page shows the net-next is "open": > http://vger.kernel.org/~davem/net-next.html > > Is this still the right place to check net-next status? We're working on fixing it. Unfortunately it's a private page and most of the netdev maintainers don't have access to changing it :(
On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > on receiving > > packets > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > From: Long Li <longli@microsoft.com> > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > posted > > > to the received queue. Excessive MMIO writes result in CPU > > > spending > > > more time waiting on LOCK instructions (atomic operations), > > > resulting > > > in poor scaling performance. > > > > > > Move the code for ringing doorbell page to where after we have > > > posted > > > all WQEs to the receive queue during a callback from napi_poll(). > > > > > > With this change, tests showed an improvement from 120G/s to > > > 160G/s on > > > a 200G physical link, with 16 or 32 hardware queues. > > > > > > Tests showed no regression in network latency benchmarks on > > > single > > > connection. > > > > > > While we are making changes in this code path, change the code > > > for > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > hardware specification specifies that it should set to 0. > > > Although > > > currently the hardware doesn't enforce the check, in the future > > > releases it may do. > > > > > > Cc: stable@vger.kernel.org > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > Network Adapter (MANA)") > > > > Uhmmm... this looks like a performance improvement to me, more > > suitable for > > the net-next tree ?!? (Note that net-next is closed now). > > This issue is a blocker for usage on 200G physical link. I think it > can be categorized as a fix. Let me ask the question the other way around: is there any specific reason to have this fix into 6.5 and all the way back to 5.13? Especially the latest bit (CC-ing stable) looks at least debatable. Thanks, Paolo
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > receiving packets > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > on receiving packets > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > > From: Long Li <longli@microsoft.com> > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > posted to the received queue. Excessive MMIO writes result in CPU > > > > spending more time waiting on LOCK instructions (atomic > > > > operations), resulting in poor scaling performance. > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > posted all WQEs to the receive queue during a callback from > > > > napi_poll(). > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > Tests showed no regression in network latency benchmarks on single > > > > connection. > > > > > > > > While we are making changes in this code path, change the code for > > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > > hardware specification specifies that it should set to 0. > > > > Although > > > > currently the hardware doesn't enforce the check, in the future > > > > releases it may do. > > > > > > > > Cc: stable@vger.kernel.org > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > > Network Adapter (MANA)") > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > now). > > > > This issue is a blocker for usage on 200G physical link. I think it > > can be categorized as a fix. > > Let me ask the question the other way around: is there any specific reason to > have this fix into 6.5 and all the way back to 5.13? > Especially the latest bit (CC-ing stable) looks at least debatable. There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target. Thanks, Long
On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote: > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > > receiving packets > > > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > > on receiving packets > > > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com wrote: > > > > > From: Long Li <longli@microsoft.com> > > > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > > posted to the received queue. Excessive MMIO writes result in CPU > > > > > spending more time waiting on LOCK instructions (atomic > > > > > operations), resulting in poor scaling performance. > > > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > > posted all WQEs to the receive queue during a callback from > > > > > napi_poll(). > > > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > > > Tests showed no regression in network latency benchmarks on single > > > > > connection. > > > > > > > > > > While we are making changes in this code path, change the code for > > > > > ringing doorbell to set the WQE_COUNT to 0 for Receive Queue. The > > > > > hardware specification specifies that it should set to 0. > > > > > Although > > > > > currently the hardware doesn't enforce the check, in the future > > > > > releases it may do. > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure > > > > > Network Adapter (MANA)") > > > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > > now). > > > > > > This issue is a blocker for usage on 200G physical link. I think it > > > can be categorized as a fix. > > > > Let me ask the question the other way around: is there any specific reason to > > have this fix into 6.5 and all the way back to 5.13? > > Especially the latest bit (CC-ing stable) looks at least debatable. > > There are many deployed Linux distributions with MANA driver on kernel 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve the performance target. Why can't they be upgraded to get that performance target, and all the other goodness that those kernels have? We don't normally backport new features, right? thanks, greg k-h
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on > receiving packets > > On Fri, Jun 30, 2023 at 05:31:48PM +0000, Long Li wrote: > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell > > > on receiving packets > > > > > > On Thu, 2023-06-29 at 18:18 +0000, Long Li wrote: > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX queue > > > > > doorbell on receiving packets > > > > > > > > > > On Mon, 2023-06-26 at 16:57 -0700, longli@linuxonhyperv.com > wrote: > > > > > > From: Long Li <longli@microsoft.com> > > > > > > > > > > > > It's inefficient to ring the doorbell page every time a WQE is > > > > > > posted to the received queue. Excessive MMIO writes result in > > > > > > CPU spending more time waiting on LOCK instructions (atomic > > > > > > operations), resulting in poor scaling performance. > > > > > > > > > > > > Move the code for ringing doorbell page to where after we have > > > > > > posted all WQEs to the receive queue during a callback from > > > > > > napi_poll(). > > > > > > > > > > > > With this change, tests showed an improvement from 120G/s to > > > > > > 160G/s on a 200G physical link, with 16 or 32 hardware queues. > > > > > > > > > > > > Tests showed no regression in network latency benchmarks on > > > > > > single connection. > > > > > > > > > > > > While we are making changes in this code path, change the code > > > > > > for ringing doorbell to set the WQE_COUNT to 0 for Receive > > > > > > Queue. The hardware specification specifies that it should set to 0. > > > > > > Although > > > > > > currently the hardware doesn't enforce the check, in the > > > > > > future releases it may do. > > > > > > > > > > > > Cc: stable@vger.kernel.org > > > > > > Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft > > > > > > Azure Network Adapter (MANA)") > > > > > > > > > > Uhmmm... this looks like a performance improvement to me, more > > > > > suitable for the net-next tree ?!? (Note that net-next is closed > > > > > now). > > > > > > > > This issue is a blocker for usage on 200G physical link. I think > > > > it can be categorized as a fix. > > > > > > Let me ask the question the other way around: is there any specific > > > reason to have this fix into 6.5 and all the way back to 5.13? > > > Especially the latest bit (CC-ing stable) looks at least debatable. > > > > There are many deployed Linux distributions with MANA driver on kernel > 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve > the performance target. > > Why can't they be upgraded to get that performance target, and all the other > goodness that those kernels have? We don't normally backport new features, > right? I think this should be considered as a fix, not a new feature. MANA is designed to be 200GB full duplex at the start. Due to lack of hardware testing capability at early stage of the project, we could only test 100GB for the Linux driver. When hardware is fully capable of reaching designed spec, this bug in the Linux driver shows up. Thanks, Long > > thanks, > > greg k-h
On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > 5.15 and kernel 6.1. (those kernels are longterm) They need this fix to achieve > > > the performance target. > > > > Why can't they be upgraded to get that performance target, and all the other > > goodness that those kernels have? We don't normally backport new features, > > right? > > I think this should be considered as a fix, not a new feature. > > MANA is designed to be 200GB full duplex at the start. Due to lack of > hardware testing capability at early stage of the project, we could only test 100GB > for the Linux driver. When hardware is fully capable of reaching designed spec, > this bug in the Linux driver shows up. That part we understand. If I were you I'd try to convince Greg and Paolo that the change is small and significant for user experience. And answer Greg's question why upgrading the kernel past 6.1 is a challenge in your environment.
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > 5.15 and kernel 6.1. (those kernels are longterm) They need this > > > > fix to achieve the performance target. > > > > > > Why can't they be upgraded to get that performance target, and all > > > the other goodness that those kernels have? We don't normally > > > backport new features, right? > > > > I think this should be considered as a fix, not a new feature. > > > > MANA is designed to be 200GB full duplex at the start. Due to lack of > > hardware testing capability at early stage of the project, we could > > only test 100GB for the Linux driver. When hardware is fully capable > > of reaching designed spec, this bug in the Linux driver shows up. > > That part we understand. > > If I were you I'd try to convince Greg and Paolo that the change is small and > significant for user experience. And answer Greg's question why upgrading the > kernel past 6.1 is a challenge in your environment. I was under the impression that this patch was considered to be a feature, not a bug fix. I was trying to justify that the "Fixes:" tag was needed. I apologize for misunderstanding this. Without this fix, it's not possible to run a typical workload designed for 200Gb physical link speed. We see a large number of customers and Linux distributions committed on 5.15 and 6.1 kernels. They planned the product cycles and certification processes around these longterm kernel versions. It's difficult for them to upgrade to newer kernel versions. Thanks, Long
On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote: > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > > > > > > > queue > > > > > > > > doorbell > > > > > > > > on receiving > > > > > > > > packets > > > > > > > > > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > > > > > > > kernels are longterm) > > > > > > > > > > > > > > > > > > > > They need > > > > > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > > > > fix to achieve the performance > > > > > > > > > > > > > > > > > > > > target. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > > > > > > > performance > > > > > > > > > > > > > > > > target, and > > > > > > > > > > > > > > > > all > > > > > > > > > > > > > > > > the other goodness that those kernels > > > > > > > > > > > > > > > > have? We don't > > > > > > > > > > > > > > > > normally > > > > > > > > > > > > > > > > backport new features, right? > > > > > > > > > > > > > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > > > > > > > a new > > > > > > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > > > > > > > start. Due > > > > > > > > > > > > to > > > > > > > > > > > > lack of > > > > > > > > > > > > hardware testing capability at early stage of > > > > > > > > > > > > the project, > > > > > > > > > > > > we > > > > > > > > > > > > could > > > > > > > > > > > > only test 100GB for the Linux driver. When > > > > > > > > > > > > hardware is > > > > > > > > > > > > fully > > > > > > > > > > > > capable > > > > > > > > > > > > of reaching designed spec, this bug in the > > > > > > > > > > > > Linux driver > > > > > > > > > > > > shows up. > > > > > > > > > > > > > > > > That part we understand. > > > > > > > > > > > > > > > > If I were you I'd try to convince Greg and Paolo that > > > > > > > > the > > > > > > > > change is > > > > > > > > small and > > > > > > > > significant for user experience. And answer Greg's > > > > > > > > question why > > > > > > > > upgrading the > > > > > > > > kernel past 6.1 is a challenge in your environment. > > > > > > > > I was under the impression that this patch was considered to be > > > > a > > > > feature, > > > > not a bug fix. I was trying to justify that the "Fixes:" tag > > > > was > > > > needed. > > > > > > > > I apologize for misunderstanding this. > > > > > > > > Without this fix, it's not possible to run a typical workload > > > > designed for 200Gb > > > > physical link speed. > > > > > > > > We see a large number of customers and Linux distributions > > > > committed > > > > on 5.15 > > > > and 6.1 kernels. They planned the product cycles and > > > > certification > > > > processes > > > > around these longterm kernel versions. It's difficult for them > > > > to > > > > upgrade to newer > > > > kernel versions. I think there are some misunderstanding WRT distros and stable kernels. (Commercial) distros will backport the patch as needed, regardless such patch landing in the 5.15 upstream tree or not. Individual users running their own vanilla 5.15 kernel can't expect performance improvement landing there. All in all I feel undecided. I would endorse this change going trough net-next (without the stable tag). I would feel less torn with this change targeting -net without the stable tag. Targeting -net with the stable tag sounds a bit too much to me. Cheers, Paolo
> Subject: Re: [Patch v3] net: mana: Batch ringing RX queue doorbell on receiving > packets > > On Sun, 2023-07-02 at 20:18 +0000, Long Li wrote: > > > > > > > > > Subject: Re: [Patch v3] net: mana: Batch ringing RX > > > > > > > > > queue doorbell on receiving packets > > > > > > > > > > > > > > > > > > On Fri, 30 Jun 2023 20:42:28 +0000 Long Li wrote: > > > > > > > > > > > > > > > > > > > > > 5.15 and kernel 6.1. (those > > > > > > > > > > > > > > > > > > > > > kernels are longterm) They need > > > > > > > > > > > > > > > > > > > > > this fix to achieve the > > > > > > > > > > > > > > > > > > > > > performance target. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Why can't they be upgraded to get that > > > > > > > > > > > > > > > > > performance target, and all the other > > > > > > > > > > > > > > > > > goodness that those kernels have? We > > > > > > > > > > > > > > > > > don't normally backport new features, > > > > > > > > > > > > > > > > > right? > > > > > > > > > > > > > > > > > > > > > > > > > > I think this should be considered as a fix, not > > > > > > > > > > > > > a new feature. > > > > > > > > > > > > > > > > > > > > > > > > > > MANA is designed to be 200GB full duplex at the > > > > > > > > > > > > > start. Due to lack of hardware testing > > > > > > > > > > > > > capability at early stage of the project, we > > > > > > > > > > > > > could only test 100GB for the Linux driver. When > > > > > > > > > > > > > hardware is fully capable of reaching designed > > > > > > > > > > > > > spec, this bug in the Linux driver shows up. > > > > > > > > > > > > > > > > > > That part we understand. > > > > > > > > > > > > > > > > > > If I were you I'd try to convince Greg and Paolo that > > > > > > > > > the change is small and significant for user experience. > > > > > > > > > And answer Greg's question why upgrading the kernel past > > > > > > > > > 6.1 is a challenge in your environment. > > > > > > > > > > I was under the impression that this patch was considered to be > > > > > a feature, not a bug fix. I was trying to justify that the > > > > > "Fixes:" tag was needed. > > > > > > > > > > I apologize for misunderstanding this. > > > > > > > > > > Without this fix, it's not possible to run a typical workload > > > > > designed for 200Gb physical link speed. > > > > > > > > > > We see a large number of customers and Linux distributions > > > > > committed on 5.15 and 6.1 kernels. They planned the product > > > > > cycles and certification processes around these longterm kernel > > > > > versions. It's difficult for them to upgrade to newer kernel > > > > > versions. > > I think there are some misunderstanding WRT distros and stable kernels. > (Commercial) distros will backport the patch as needed, regardless such patch > landing in the 5.15 upstream tree or not. Individual users running their own > vanilla 5.15 kernel can't expect performance improvement landing there. > > All in all I feel undecided. I would endorse this change going trough net-next > (without the stable tag). I would feel less torn with this change targeting -net > without the stable tag. Targeting -net with the stable tag sounds a bit too much to > me. > > Cheers, > Paolo I'm sending this patch to net-next without stable tag. Thanks, Long
diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c index 8f3f78b68592..3765d3389a9a 100644 --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c @@ -300,8 +300,11 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index, void mana_gd_wq_ring_doorbell(struct gdma_context *gc, struct gdma_queue *queue) { + /* Hardware Spec specifies that software client should set 0 for + * wqe_cnt for Receive Queues. This value is not used in Send Queues. + */ mana_gd_ring_doorbell(gc, queue->gdma_dev->doorbell, queue->type, - queue->id, queue->head * GDMA_WQE_BU_SIZE, 1); + queue->id, queue->head * GDMA_WQE_BU_SIZE, 0); } void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit) diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c index cd4d5ceb9f2d..1d8abe63fcb8 100644 --- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1383,8 +1383,8 @@ static void mana_post_pkt_rxq(struct mana_rxq *rxq) recv_buf_oob = &rxq->rx_oobs[curr_index]; - err = mana_gd_post_and_ring(rxq->gdma_rq, &recv_buf_oob->wqe_req, - &recv_buf_oob->wqe_inf); + err = mana_gd_post_work_request(rxq->gdma_rq, &recv_buf_oob->wqe_req, + &recv_buf_oob->wqe_inf); if (WARN_ON_ONCE(err)) return; @@ -1654,6 +1654,12 @@ static void mana_poll_rx_cq(struct mana_cq *cq) mana_process_rx_cqe(rxq, cq, &comp[i]); } + if (comp_read > 0) { + struct gdma_context *gc = rxq->gdma_rq->gdma_dev->gdma_context; + + mana_gd_wq_ring_doorbell(gc, rxq->gdma_rq); + } + if (rxq->xdp_flush) xdp_do_flush(); }