diff mbox series

[net-next] net: lan966x: Improve the CPU TX bitrate.

Message ID 20220308165727.4088656-1-horatiu.vultur@microchip.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: lan966x: Improve the CPU TX bitrate. | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 29 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Horatiu Vultur March 8, 2022, 4:57 p.m. UTC
Replace 'readx_poll_timeout_atomic' with a simple while loop + timeout
when checking if it is possible to write to the HW the next word of the
frame.
Doing this will improve the TX bitrate by 65%. The measurements were
done using iperf3.

Before:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.03  sec  55.2 MBytes  46.2 Mbits/sec    0 sender
[  5]   0.00-10.04  sec  53.8 MBytes  45.0 Mbits/sec      receiver

After:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.02  sec  91.4 MBytes  76.6 Mbits/sec    0 sender
[  5]   0.00-10.03  sec  90.2 MBytes  75.5 Mbits/sec      receiver

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
---
 .../ethernet/microchip/lan966x/lan966x_main.c | 20 ++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Andrew Lunn March 8, 2022, 9:36 p.m. UTC | #1
>  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
>  {
> -	u32 val;
> +	unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> +	int ret = 0;
>  
> -	return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> -					 QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> -					 READL_SLEEP_US, READL_TIMEOUT_US);
> +	while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> +		 QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> +		if (time_after(jiffies, time)) {
> +			ret = -ETIMEDOUT;
> +			break;
> +		}

Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
explicitly supports that.

    Andrew
Horatiu Vultur March 8, 2022, 10:30 p.m. UTC | #2
The 03/08/2022 22:36, Andrew Lunn wrote:
> 
> >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> >  {
> > -     u32 val;
> > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > +     int ret = 0;
> >
> > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > +             if (time_after(jiffies, time)) {
> > +                     ret = -ETIMEDOUT;
> > +                     break;
> > +             }
> 
> Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> explicitly supports that.

I have tried but it didn't improve. It was the same as before.

> 
>     Andrew
David Laight March 8, 2022, 10:46 p.m. UTC | #3
From: Horatiu Vultur
> Sent: 08 March 2022 22:30
> 
> The 03/08/2022 22:36, Andrew Lunn wrote:
> >
> > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > >  {
> > > -     u32 val;
> > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > +     int ret = 0;
> > >
> > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > +             if (time_after(jiffies, time)) {
> > > +                     ret = -ETIMEDOUT;
> > > +                     break;
> > > +             }
> >
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.
> 
> I have tried but it didn't improve. It was the same as before.

How many times round the loop is it going ?

It might be that by the time readx_poll_timeout_atomic()
gets around to reading the register the fifo is actually ready.

The there is the delay between detecting 'ready' and writing
the next data.
That delay might be cumulative and affect performance.

OTOH spinning waiting for fifo space is just plain horrid.
Reminds me of 3C509 drivers :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Jakub Kicinski March 9, 2022, 12:40 a.m. UTC | #4
On Tue, 8 Mar 2022 23:30:00 +0100 Horatiu Vultur wrote:
> > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > >  {
> > > -     u32 val;
> > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > +     int ret = 0;
> > >
> > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > +             if (time_after(jiffies, time)) {
> > > +                     ret = -ETIMEDOUT;
> > > +                     break;
> > > +             }  
> > 
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.  
> 
> I have tried but it didn't improve. It was the same as before.

Huh, is ktime_get() super expensive on that platform?
jiffies vs ktime seems to be the main difference?
Horatiu Vultur March 9, 2022, 9:11 a.m. UTC | #5
The 03/08/2022 22:46, David Laight wrote:
> 
> From: Horatiu Vultur
> > Sent: 08 March 2022 22:30
> >
> > The 03/08/2022 22:36, Andrew Lunn wrote:
> > >
> > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > >  {
> > > > -     u32 val;
> > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > +     int ret = 0;
> > > >
> > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > +             if (time_after(jiffies, time)) {
> > > > +                     ret = -ETIMEDOUT;
> > > > +                     break;
> > > > +             }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
> 
> How many times round the loop is it going ?

In the tests that I have done, I have never seen entering in the loop.

> 
> It might be that by the time readx_poll_timeout_atomic()
> gets around to reading the register the fifo is actually ready.
> 
> The there is the delay between detecting 'ready' and writing
> the next data.
> That delay might be cumulative and affect performance.

There is also a small delay between writting the word and checking if
more words can be written.

> 
> OTOH spinning waiting for fifo space is just plain horrid.

Yes, I agree with you.

> Reminds me of 3C509 drivers :-)

> 
>         David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Horatiu Vultur March 9, 2022, 9:14 a.m. UTC | #6
The 03/08/2022 16:40, Jakub Kicinski wrote:
> 
> On Tue, 8 Mar 2022 23:30:00 +0100 Horatiu Vultur wrote:
> > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > >  {
> > > > -     u32 val;
> > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > +     int ret = 0;
> > > >
> > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > +             if (time_after(jiffies, time)) {
> > > > +                     ret = -ETIMEDOUT;
> > > > +                     break;
> > > > +             }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
> 
> Huh, is ktime_get() super expensive on that platform?

Hm.. it looks like. Just adding ktime_get() before the while loop, then
the performance drops like before.
I am using SOC_LAN966 which has an ARMv7 CPU. So I am not sure how
expensive is ktime_get().

> jiffies vs ktime seems to be the main difference?
David Laight March 9, 2022, 9:57 a.m. UTC | #7
From: 'Horatiu Vultur'
> Sent: 09 March 2022 09:11
> 
> The 03/08/2022 22:46, David Laight wrote:
> >
> > From: Horatiu Vultur
> > > Sent: 08 March 2022 22:30
> > >
> > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > >
> > > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > >  {
> > > > > -     u32 val;
> > > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > +     int ret = 0;
> > > > >
> > > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > +             if (time_after(jiffies, time)) {
> > > > > +                     ret = -ETIMEDOUT;
> > > > > +                     break;
> > > > > +             }
> > > >
> > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > explicitly supports that.
> > >
> > > I have tried but it didn't improve. It was the same as before.
> >
> > How many times round the loop is it going ?
> 
> In the tests that I have done, I have never seen entering in the loop.

In which case I'd do an initial status check before even
faffing with 'jiffies'.

It might even be that the status read is so slow that space
is always available by the time it is processed.
PCIe reads can be horribly slow.
Into our fgpa they end up being slower than old ISA bus cycles.
Probably several thousand cpu clocks.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Horatiu Vultur March 9, 2022, 12:16 p.m. UTC | #8
The 03/09/2022 09:57, David Laight wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> From: 'Horatiu Vultur'
> > Sent: 09 March 2022 09:11
> >
> > The 03/08/2022 22:46, David Laight wrote:
> > >
> > > From: Horatiu Vultur
> > > > Sent: 08 March 2022 22:30
> > > >
> > > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > > >
> > > > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > > >  {
> > > > > > -     u32 val;
> > > > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > > +     int ret = 0;
> > > > > >
> > > > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > > +             if (time_after(jiffies, time)) {
> > > > > > +                     ret = -ETIMEDOUT;
> > > > > > +                     break;
> > > > > > +             }
> > > > >
> > > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > > explicitly supports that.
> > > >
> > > > I have tried but it didn't improve. It was the same as before.
> > >
> > > How many times round the loop is it going ?
> >
> > In the tests that I have done, I have never seen entering in the loop.
> 
> In which case I'd do an initial status check before even
> faffing with 'jiffies'.

That is a good idea. I will do this change in the next version.

> 
> It might even be that the status read is so slow that space
> is always available by the time it is processed.
> PCIe reads can be horribly slow.
> Into our fgpa they end up being slower than old ISA bus cycles.
> Probably several thousand cpu clocks.
> 
>         David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
Andrew Lunn March 9, 2022, 1:11 p.m. UTC | #9
On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> The 03/08/2022 22:36, Andrew Lunn wrote:
> > 
> > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > >  {
> > > -     u32 val;
> > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > +     int ret = 0;
> > >
> > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > +             if (time_after(jiffies, time)) {
> > > +                     ret = -ETIMEDOUT;
> > > +                     break;
> > > +             }
> > 
> > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > explicitly supports that.
> 
> I have tried but it didn't improve. It was the same as before.

The reason i recommend ipoll.h is that your implementation has the
usual bug, which iopoll does not have. Since you are using _atomic()
it is less of an issue, but it still exists.

     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {

Say you take an interrupt here

             if (time_after(jiffies, time)) {
                     ret = -ETIMEDOUT;
                     break;
             }


The interrupt takes a while, so that by the time you get to
time_after(), you have reached your timeout. So -ETIMEDOUT is
returned. But in fact, the hardware has done its thing, and if you
where to read the status the bit would be set, and you should actually
return O.K, not an error.

iopoll does another check of the status before deciding to return
-ETIMEDOUT or O.K.

If you decide to simply check the status directly after the write, i
suggest you then use readx_poll_timeout_atomic() if you do need to
poll.

	Andrew
Horatiu Vultur March 9, 2022, 10:05 p.m. UTC | #10
The 03/09/2022 14:11, Andrew Lunn wrote:
> 
> On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> > The 03/08/2022 22:36, Andrew Lunn wrote:
> > >
> > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > >  {
> > > > -     u32 val;
> > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > +     int ret = 0;
> > > >
> > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > +             if (time_after(jiffies, time)) {
> > > > +                     ret = -ETIMEDOUT;
> > > > +                     break;
> > > > +             }
> > >
> > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > explicitly supports that.
> >
> > I have tried but it didn't improve. It was the same as before.
> 
> The reason i recommend ipoll.h is that your implementation has the
> usual bug, which iopoll does not have. Since you are using _atomic()
> it is less of an issue, but it still exists.
> 
>      while (!(lan_rd(lan966x, QS_INJ_STATUS) &
>               QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> 
> Say you take an interrupt here
> 
>              if (time_after(jiffies, time)) {
>                      ret = -ETIMEDOUT;
>                      break;
>              }
> 
> 
> The interrupt takes a while, so that by the time you get to
> time_after(), you have reached your timeout. So -ETIMEDOUT is
> returned. But in fact, the hardware has done its thing, and if you
> where to read the status the bit would be set, and you should actually
> return O.K, not an error.

That is a good catch and really nice explanation!
Then if I add also the check at the end, then it should be also OK.

> 
> iopoll does another check of the status before deciding to return
> -ETIMEDOUT or O.K.
> 
> If you decide to simply check the status directly after the write, i
> suggest you then use readx_poll_timeout_atomic() if you do need to
> poll.
> 
>         Andrew
Andrew Lunn March 10, 2022, 12:37 a.m. UTC | #11
On Wed, Mar 09, 2022 at 11:05:16PM +0100, Horatiu Vultur wrote:
> The 03/09/2022 14:11, Andrew Lunn wrote:
> > 
> > On Tue, Mar 08, 2022 at 11:30:00PM +0100, Horatiu Vultur wrote:
> > > The 03/08/2022 22:36, Andrew Lunn wrote:
> > > >
> > > > >  static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
> > > > >  {
> > > > > -     u32 val;
> > > > > +     unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
> > > > > +     int ret = 0;
> > > > >
> > > > > -     return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
> > > > > -                                      QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
> > > > > -                                      READL_SLEEP_US, READL_TIMEOUT_US);
> > > > > +     while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> > > > > +              QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > > > > +             if (time_after(jiffies, time)) {
> > > > > +                     ret = -ETIMEDOUT;
> > > > > +                     break;
> > > > > +             }
> > > >
> > > > Did you try setting READL_SLEEP_US to 0? readx_poll_timeout_atomic()
> > > > explicitly supports that.
> > >
> > > I have tried but it didn't improve. It was the same as before.
> > 
> > The reason i recommend ipoll.h is that your implementation has the
> > usual bug, which iopoll does not have. Since you are using _atomic()
> > it is less of an issue, but it still exists.
> > 
> >      while (!(lan_rd(lan966x, QS_INJ_STATUS) &
> >               QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
> > 
> > Say you take an interrupt here
> > 
> >              if (time_after(jiffies, time)) {
> >                      ret = -ETIMEDOUT;
> >                      break;
> >              }
> > 
> > 
> > The interrupt takes a while, so that by the time you get to
> > time_after(), you have reached your timeout. So -ETIMEDOUT is
> > returned. But in fact, the hardware has done its thing, and if you
> > where to read the status the bit would be set, and you should actually
> > return O.K, not an error.
> 
> That is a good catch and really nice explanation!
> Then if I add also the check at the end, then it should be also OK.

You are then just repeating code which is already in the core. That is
generally not liked. If you find reading the status once works 99% of
the time, then i suggest you call readx_poll_timeout_atomic() when the
status does indicate you need to poll.

       Andrew
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 81c01665d01e..f6cef29b9d36 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -176,18 +176,20 @@  static int lan966x_port_stop(struct net_device *dev)
 	return 0;
 }
 
-static int lan966x_port_inj_status(struct lan966x *lan966x)
-{
-	return lan_rd(lan966x, QS_INJ_STATUS);
-}
-
 static int lan966x_port_inj_ready(struct lan966x *lan966x, u8 grp)
 {
-	u32 val;
+	unsigned long time = jiffies + usecs_to_jiffies(READL_TIMEOUT_US);
+	int ret = 0;
 
-	return readx_poll_timeout_atomic(lan966x_port_inj_status, lan966x, val,
-					 QS_INJ_STATUS_FIFO_RDY_GET(val) & BIT(grp),
-					 READL_SLEEP_US, READL_TIMEOUT_US);
+	while (!(lan_rd(lan966x, QS_INJ_STATUS) &
+		 QS_INJ_STATUS_FIFO_RDY_SET(BIT(grp)))) {
+		if (time_after(jiffies, time)) {
+			ret = -ETIMEDOUT;
+			break;
+		}
+	}
+
+	return ret;
 }
 
 static int lan966x_port_ifh_xmit(struct sk_buff *skb,