Message ID | 1365684023-9967-1-git-send-email-sebastian.hesselbarth@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Thu, Apr 11, 2013 at 02:40:23PM +0200, Sebastian Hesselbarth wrote: > This patch adds GRO support to mv643xx_eth by making it invoke > napi_gro_receive instead of netif_receive_skb. > > Signed-off-by: Soeren Moch <smoch@web.de> > Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > --- > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Lennert Buytenhek <buytenh@wantstofly.org> > Cc: Andrew Lunn <andrew@lunn.ch> > Cc: Jason Cooper <jason@lakedaemon.net> > Cc: Florian Fainelli <florian@openwrt.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Dale Farnsworth <dale@farnsworth.org> > Cc: netdev@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c > index 305038f..c850d04 100644 > --- a/drivers/net/ethernet/marvell/mv643xx_eth.c > +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c > @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget) > lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts); > lro_flush_needed = 1; > } else > - netif_receive_skb(skb); > + napi_gro_receive(&mp->napi, skb); > > continue; I remember having experimented with this on 3.6 a few months ago with this driver and finally switching back to something like this instead which showed better performance on my tests : if (skb->ip_summed == CHECKSUM_UNNECESSARY) napi_gro_receive(napi, skb); else netif_receive_skb(skb); Unfortunately I don't have more details as my commit message was rather short due to this resulting from experimentation. Did you verify that you did not lose any performance in various workloads ? I was playing with bridges at this time, it's possible that I got better performance on bridging with netif_receive_skb() than with napi_gro_receive(). Regards, Willy
On Thu, Apr 11, 2013 at 3:13 PM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Apr 11, 2013 at 02:40:23PM +0200, Sebastian Hesselbarth wrote: >> This patch adds GRO support to mv643xx_eth by making it invoke >> napi_gro_receive instead of netif_receive_skb. >> >> Signed-off-by: Soeren Moch <smoch@web.de> >> Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> >> --- >> Cc: "David S. Miller" <davem@davemloft.net> >> Cc: Lennert Buytenhek <buytenh@wantstofly.org> >> Cc: Andrew Lunn <andrew@lunn.ch> >> Cc: Jason Cooper <jason@lakedaemon.net> >> Cc: Florian Fainelli <florian@openwrt.org> >> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> >> Cc: Paul Mackerras <paulus@samba.org> >> Cc: Dale Farnsworth <dale@farnsworth.org> >> Cc: netdev@vger.kernel.org >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: linuxppc-dev@lists.ozlabs.org >> Cc: linux-kernel@vger.kernel.org >> --- >> drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c >> index 305038f..c850d04 100644 >> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c >> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c >> @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget) >> lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts); >> lro_flush_needed = 1; >> } else >> - netif_receive_skb(skb); >> + napi_gro_receive(&mp->napi, skb); >> >> continue; > > I remember having experimented with this on 3.6 a few months ago with this > driver and finally switching back to something like this instead which > showed better performance on my tests : > > if (skb->ip_summed == CHECKSUM_UNNECESSARY) > napi_gro_receive(napi, skb); > else > netif_receive_skb(skb); > > Unfortunately I don't have more details as my commit message was rather > short due to this resulting from experimentation. Did you verify that > you did not lose any performance in various workloads ? I was playing > with bridges at this time, it's possible that I got better performance > on bridging with netif_receive_skb() than with napi_gro_receive(). Hi Willy, I did some simple tests on Dove/Cubox with 'netperf -cCD' and gso/gro/lro options on mv643xx_eth. The tests may not be sufficient, as I am not that into net performance testing. I tried todays net-next on top of 3.9-rc6 without any gro patch, with the initial patch (Soeren) and your proposed patch (Willy). The results show that both patches allow a significant increase in throughput compared to netif_receive_skb (!gro, !lro) alone. Having gro with lro disabled gives some 2% more throughput compared to lro only. Sebastian Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.02 615.65 19.15 99.90 5.097 13.293 (3.9-rc6: gso) 87380 16384 16384 10.02 615.82 19.05 99.90 5.067 13.289 (3.9-rc6: gso, gro) 87380 16384 16384 10.03 747.44 23.17 99.80 5.079 10.938 (3.9-rc6: gso, lro) 87380 16384 16384 10.02 745.28 22.57 99.80 4.963 10.970 (3.9.rc6: gso, gro, lro) 87380 16384 16384 10.02 600.34 19.10 99.90 5.211 13.632 (3.9-rc6+soeren: gso) 87380 16384 16384 10.02 764.23 23.42 99.80 5.021 10.698 (3.9-rc6+soeren: gso, gro) 87380 16384 16384 10.02 749.81 23.13 99.80 5.055 10.904 (3.9-rc6+soeren: gso, lro) 87380 16384 16384 10.02 745.84 22.34 99.80 4.907 10.962 (3.9.rc6+soeren: gso, gro, lro) 87380 16384 16384 10.02 605.79 18.79 100.00 5.082 13.523 (3.9-rc6+willy: gso) 87380 16384 16384 10.02 765.64 24.68 99.80 5.281 10.678 (3.9-rc6+willy: gso, gro) 87380 16384 16384 10.02 750.30 26.02 99.80 5.682 10.897 (3.9-rc6+willy: gso, lro) 87380 16384 16384 10.03 749.40 21.86 99.80 4.778 10.910 (3.9.rc6+willy: gso, gro, lro)
Hi Sebastian, On Thu, Apr 11, 2013 at 04:47:49PM +0200, Sebastian Hesselbarth wrote: > I did some simple tests on Dove/Cubox with 'netperf -cCD' and > gso/gro/lro options on > mv643xx_eth. The tests may not be sufficient, as I am not that into > net performance testing. In fact the difference only happens when the NIC has not verified the checksum itself IIRC, which should be for non-IPv4 traffic. I agree that it's not easy to test a bridge with a cubox which has a single port :-) Maybe you'll see a difference in IPv6 traffic or with VLAN traffic, as I seem to remember this chip does not do cksum offloading on VLANs, but I could be wrong. > I tried todays net-next on top of 3.9-rc6 without any gro patch, with > the initial > patch (Soeren) and your proposed patch (Willy). The results show that > both patches > allow a significant increase in throughput compared to > netif_receive_skb (!gro, !lro) > alone. Having gro with lro disabled gives some 2% more throughput > compared to lro only. Indeed this is consistent with my memories, since Eric improved the GRO path, it became faster than LRO on this chip. Regards, Willy
On Thu, Apr 11, 2013 at 5:03 PM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Apr 11, 2013 at 04:47:49PM +0200, Sebastian Hesselbarth wrote: >> I tried todays net-next on top of 3.9-rc6 without any gro patch, with >> the initial >> patch (Soeren) and your proposed patch (Willy). The results show that >> both patches >> allow a significant increase in throughput compared to >> netif_receive_skb (!gro, !lro) >> alone. Having gro with lro disabled gives some 2% more throughput >> compared to lro only. > > Indeed this is consistent with my memories, since Eric improved the > GRO path, it became faster than LRO on this chip. I don't have a strong opinion on whether Soeren's or your proposal should be submitted. But I insist on having one of them in, as GRO significantly improves the common use case, is enabled by default, and not as constrained as LRO. Sebastian
On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote: > I don't have a strong opinion on whether Soeren's or your proposal should > be submitted. But I insist on having one of them in, as GRO significantly > improves the common use case, is enabled by default, and not as > constrained as LRO. I agree, use yours first, but we should keep an eye on this. Since you have everything to run a test, please try to see if you can get netperf to run over IPv6, I'm sure the NIC doesn't handle it. Willy
On Thu, 2013-04-11 at 17:32 +0200, Willy Tarreau wrote: > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote: > > I don't have a strong opinion on whether Soeren's or your proposal should > > be submitted. But I insist on having one of them in, as GRO significantly > > improves the common use case, is enabled by default, and not as > > constrained as LRO. > > I agree, use yours first, but we should keep an eye on this. Since you have > everything to run a test, please try to see if you can get netperf to run > over IPv6, I'm sure the NIC doesn't handle it. Willy, testing the checksum in the NIC driver itself prevents the stack doing GRO even if the NIC could not checksum the packet, as in GRE tunneling for example. So Sebastien patch is better IMHO : Just call the napi gro handler and let core stack handles the details ;)
On Thu, 2013-04-11 at 14:40 +0200, Sebastian Hesselbarth wrote: > This patch adds GRO support to mv643xx_eth by making it invoke > napi_gro_receive instead of netif_receive_skb. > > Signed-off-by: Soeren Moch <smoch@web.de> > Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > --- > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Lennert Buytenhek <buytenh@wantstofly.org> > Cc: Andrew Lunn <andrew@lunn.ch> > Cc: Jason Cooper <jason@lakedaemon.net> > Cc: Florian Fainelli <florian@openwrt.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Dale Farnsworth <dale@farnsworth.org> > Cc: netdev@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c > index 305038f..c850d04 100644 > --- a/drivers/net/ethernet/marvell/mv643xx_eth.c > +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c > @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget) > lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts); > lro_flush_needed = 1; > } else > - netif_receive_skb(skb); > + napi_gro_receive(&mp->napi, skb); > > continue; > Acked-by: Eric Dumazet <edumazet@google.com>
On Thu, Apr 11, 2013 at 08:54:35AM -0700, Eric Dumazet wrote: > On Thu, 2013-04-11 at 17:32 +0200, Willy Tarreau wrote: > > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote: > > > I don't have a strong opinion on whether Soeren's or your proposal should > > > be submitted. But I insist on having one of them in, as GRO significantly > > > improves the common use case, is enabled by default, and not as > > > constrained as LRO. > > > > I agree, use yours first, but we should keep an eye on this. Since you have > > everything to run a test, please try to see if you can get netperf to run > > over IPv6, I'm sure the NIC doesn't handle it. > > Willy, testing the checksum in the NIC driver itself prevents the stack > doing GRO even if the NIC could not checksum the packet, as in GRE > tunneling for example. > > So Sebastien patch is better IMHO : Just call the napi gro handler and > let core stack handles the details ;) OK, that makes sense indeed, I didn't think about this case. All I remember was that the old call achieved a higher packet rate than napi_gro_receive, but it was on an older kernel and I can't be more specifics after several months :-/ Cheers, Willy
On Thu, 2013-04-11 at 18:02 +0200, Willy Tarreau wrote: > OK, that makes sense indeed, I didn't think about this case. All > I remember was that the old call achieved a higher packet rate > than napi_gro_receive, but it was on an older kernel and I can't > be more specifics after several months :-/ Its probably true that the GRO handler consumes more cpu for packets that cant be aggregated in the end. Thats a trade off, and maybe we could add in the core stack a device feature to instruct gro handler to do a short cut for packets with no checksum. Or better a sysctl so that a static_branch can be used.
On Thu, Apr 11, 2013 at 5:32 PM, Willy Tarreau <w@1wt.eu> wrote: > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote: >> I don't have a strong opinion on whether Soeren's or your proposal should >> be submitted. But I insist on having one of them in, as GRO significantly >> improves the common use case, is enabled by default, and not as >> constrained as LRO. > > I agree, use yours first, but we should keep an eye on this. Since you have > everything to run a test, please try to see if you can get netperf to run > over IPv6, I'm sure the NIC doesn't handle it. Willy, out of curiosity I replayed all tests using netperf/netserver with -6 which enables ipv6. The overall results remain quite the same here: enabling support for GRO gives a huge improvement in achievable throughput, and the difference between Soeren's and your patch is neglectible. Sebastian
On Thu, Apr 11, 2013 at 06:59:11PM +0200, Sebastian Hesselbarth wrote: > On Thu, Apr 11, 2013 at 5:32 PM, Willy Tarreau <w@1wt.eu> wrote: > > On Thu, Apr 11, 2013 at 05:27:03PM +0200, Sebastian Hesselbarth wrote: > >> I don't have a strong opinion on whether Soeren's or your proposal should > >> be submitted. But I insist on having one of them in, as GRO significantly > >> improves the common use case, is enabled by default, and not as > >> constrained as LRO. > > > > I agree, use yours first, but we should keep an eye on this. Since you have > > everything to run a test, please try to see if you can get netperf to run > > over IPv6, I'm sure the NIC doesn't handle it. > > Willy, > > out of curiosity I replayed all tests using netperf/netserver with -6 which > enables ipv6. The overall results remain quite the same here: > enabling support for GRO gives a huge improvement in achievable > throughput, and the difference between Soeren's and your patch is > neglectible. Perfect, thank you for testing this ! Best regards, Willy
From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Date: Thu, 11 Apr 2013 17:27:03 +0200 > On Thu, Apr 11, 2013 at 5:03 PM, Willy Tarreau <w@1wt.eu> wrote: >> On Thu, Apr 11, 2013 at 04:47:49PM +0200, Sebastian Hesselbarth wrote: >>> I tried todays net-next on top of 3.9-rc6 without any gro patch, with >>> the initial >>> patch (Soeren) and your proposed patch (Willy). The results show that >>> both patches >>> allow a significant increase in throughput compared to >>> netif_receive_skb (!gro, !lro) >>> alone. Having gro with lro disabled gives some 2% more throughput >>> compared to lro only. >> >> Indeed this is consistent with my memories, since Eric improved the >> GRO path, it became faster than LRO on this chip. > > I don't have a strong opinion on whether Soeren's or your proposal should > be submitted. But I insist on having one of them in, as GRO significantly > improves the common use case, is enabled by default, and not as > constrained as LRO. I think, as per other drivers, LRO should be eliminated completely from all drivers, including this one, and GRO used exclusively instead.
On Thu, 2013-04-11 at 13:31 -0400, David Miller wrote: > I think, as per other drivers, LRO should be eliminated completely from > all drivers, including this one, and GRO used exclusively instead. This would be awesome. AFAIK current LRO users are : drivers/infiniband/hw/nes/nes_hw.c drivers/net/ethernet/marvell/mv643xx_eth.c drivers/net/ethernet/pasemi/pasemi_mac.c
On Thu, Apr 11, 2013 at 01:31:19PM -0400, David Miller wrote: > From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > Date: Thu, 11 Apr 2013 17:27:03 +0200 > > > On Thu, Apr 11, 2013 at 5:03 PM, Willy Tarreau <w@1wt.eu> wrote: > >> On Thu, Apr 11, 2013 at 04:47:49PM +0200, Sebastian Hesselbarth wrote: > >>> I tried todays net-next on top of 3.9-rc6 without any gro patch, with > >>> the initial > >>> patch (Soeren) and your proposed patch (Willy). The results show that > >>> both patches > >>> allow a significant increase in throughput compared to > >>> netif_receive_skb (!gro, !lro) > >>> alone. Having gro with lro disabled gives some 2% more throughput > >>> compared to lro only. > >> > >> Indeed this is consistent with my memories, since Eric improved the > >> GRO path, it became faster than LRO on this chip. > > > > I don't have a strong opinion on whether Soeren's or your proposal should > > be submitted. But I insist on having one of them in, as GRO significantly > > improves the common use case, is enabled by default, and not as > > constrained as LRO. > > I think, as per other drivers, LRO should be eliminated completely from > all drivers, including this one, and GRO used exclusively instead. Eric provided me with one such experimental patch in the past for this driver. It worked for me but we never tried to clean it up to propose it for inclusion. If anyone is interested, I might still have it in experimental shape. Willy
On Thu, 2013-04-11 at 14:40 +0200, Sebastian Hesselbarth wrote: > This patch adds GRO support to mv643xx_eth by making it invoke > napi_gro_receive instead of netif_receive_skb. The inet_lro support should be removed at the same time; inet_lro is now deprecated and there should be no need to keep both of them. Ben. > Signed-off-by: Soeren Moch <smoch@web.de> > Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> > --- > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Lennert Buytenhek <buytenh@wantstofly.org> > Cc: Andrew Lunn <andrew@lunn.ch> > Cc: Jason Cooper <jason@lakedaemon.net> > Cc: Florian Fainelli <florian@openwrt.org> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> > Cc: Paul Mackerras <paulus@samba.org> > Cc: Dale Farnsworth <dale@farnsworth.org> > Cc: netdev@vger.kernel.org > Cc: linux-arm-kernel@lists.infradead.org > Cc: linuxppc-dev@lists.ozlabs.org > Cc: linux-kernel@vger.kernel.org > --- > drivers/net/ethernet/marvell/mv643xx_eth.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c > index 305038f..c850d04 100644 > --- a/drivers/net/ethernet/marvell/mv643xx_eth.c > +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c > @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget) > lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts); > lro_flush_needed = 1; > } else > - netif_receive_skb(skb); > + napi_gro_receive(&mp->napi, skb); > > continue; >
On Thu, 2013-04-11 at 19:51 +0200, Willy Tarreau wrote: > Eric provided me with one such experimental patch in the past for this > driver. It worked for me but we never tried to clean it up to propose > it for inclusion. > > If anyone is interested, I might still have it in experimental shape. We are interested a lot ;) Thanks !
On 04/11/2013 07:55 PM, Ben Hutchings wrote: > On Thu, 2013-04-11 at 14:40 +0200, Sebastian Hesselbarth wrote: >> This patch adds GRO support to mv643xx_eth by making it invoke >> napi_gro_receive instead of netif_receive_skb. > > The inet_lro support should be removed at the same time; inet_lro is now > deprecated and there should be no need to keep both of them. Ben, I agree on removing it asap, but I also like to see GRO support in. Maybe, we postpone lro removal just a little bit. I have just started to look at mv643xx_eth and remember Marvell Orion SoCs are not the only platform using it. I haven't heard a single word from ppc guys, yet. Willy just posted an experimental patch to remove lro. I'll have a look at it and test it out on Dove. Sebastian
From: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Date: Thu, 11 Apr 2013 14:40:23 +0200 > This patch adds GRO support to mv643xx_eth by making it invoke > napi_gro_receive instead of netif_receive_skb. > > Signed-off-by: Soeren Moch <smoch@web.de> > Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Applied.
diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c index 305038f..c850d04 100644 --- a/drivers/net/ethernet/marvell/mv643xx_eth.c +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c @@ -604,7 +604,7 @@ static int rxq_process(struct rx_queue *rxq, int budget) lro_receive_skb(&rxq->lro_mgr, skb, (void *)cmd_sts); lro_flush_needed = 1; } else - netif_receive_skb(skb); + napi_gro_receive(&mp->napi, skb); continue;