mbox series

[0/3] net: microchip_t1s: additional phy support and collision detect handling

Message ID 20231127104045.96722-1-ramon.nordin.rodriguez@ferroamp.se (mailing list archive)
Headers show
Series net: microchip_t1s: additional phy support and collision detect handling | expand

Message

Ramón Nordin Rodriguez Nov. 27, 2023, 10:40 a.m. UTC
Hi,

This patch series adds support for LAN867x Rev.C and sets the collision
enable bit conditionally on plca enable for LAN865X and LAN867X phys.

The addition of Rev.C support is pretty straight forward (but weird), it 
follows the recommended magic and unexplained steps in Microchip
Application Note 1699.

The second change, conditionally setting collision detection has a
significant impact on link performance. Having both PLCA and collision
detection enabled led to a lot of dropped packets in the setup I've been
developing on.
The datasheets recommends disabling collision detect when PLCA is
enabled.
Took me a couple of weeks to find said footnote, hoping this series can
save the next dev some headache.

Following is a brief description of my test/eval setup:

LAN867x Rev.C
This was tested with Microchips usb eval board, the only testing I did
was that I got the driver attached and got a link.
No performance/stress testing has been performed.

Collision detection
This has been tested on a setup where one ARM system with a LAN8650
mac-phy is daisy-chained to 8 mcus using lan8670 phys. Without the patch we
were limited to really short cables, about 1m per node, but we were
still getting a lot of connection drops.
With the patch we could increase the total cable length to at least 40M.
Electrical properties play in here, both cable types and how the
termination is designed, so these results might not be reproducible in
another setup.
The only testing that has been performed has been a best effort of
estimating dropped packets on the linux node, with low frequency traffic.

Ramón Nordin Rodriguez (3):
  net: microchip_t1s: refactor reset functionality
  net: microchip_t1s: add support for LAN867x Rev.C1
  net: microchip_t1s: conditional collision detection

 drivers/net/phy/microchip_t1s.c | 135 +++++++++++++++++++++++++++++++-
 1 file changed, 132 insertions(+), 3 deletions(-)

Comments

Andrew Lunn Nov. 27, 2023, 1:58 p.m. UTC | #1
> Collision detection
> This has been tested on a setup where one ARM system with a LAN8650
> mac-phy is daisy-chained to 8 mcus using lan8670 phys. Without the patch we
> were limited to really short cables, about 1m per node, but we were
> still getting a lot of connection drops.
> With the patch we could increase the total cable length to at least 40M.

Did you do any testing of collision detection enabled, PLCA disabled?

You say you think this is noise related. But the noise should be the
same with or without PLCA. I'm just thinking maybe collision detection
is just plain broken and should always be disabled?

I've not read much about T1S, but if we assume it is doing good old
fashioned CSMA/CD, with short cables the CS bit works well and the CD
is less important. CD was needed when you have 1000m cable, and you
can fit 64 bytes on the 1000m cable. So always turning of CD might be
appropriate.

	Andrew
Ramón Nordin Rodriguez Nov. 27, 2023, 3:30 p.m. UTC | #2
On Mon, Nov 27, 2023 at 02:58:32PM +0100, Andrew Lunn wrote:
> > Collision detection
> > This has been tested on a setup where one ARM system with a LAN8650
> > mac-phy is daisy-chained to 8 mcus using lan8670 phys. Without the patch we
> > were limited to really short cables, about 1m per node, but we were
> > still getting a lot of connection drops.
> > With the patch we could increase the total cable length to at least 40M.
> 
> Did you do any testing of collision detection enabled, PLCA disabled?
> 

In our dev system we've only tested with PLCA enabled, bit too tricky
changing internals on the microcontrollers.
But I have a lot of usb eval dongles that I can test with.

> You say you think this is noise related. But the noise should be the
> same with or without PLCA. I'm just thinking maybe collision detection
> is just plain broken and should always be disabled?
> 

I don't have access to the equipment to measure noise or reflections,
I've looked at the link with an oscilloscope and it looked fine to me.
The reason I'm mentioning noise is just me parroting the datasheet, for
context I'll quote the footnote here

"No physical collisions will occur when all nodes in a mixing segment are properly
configured for PLCA operation. As a result, for improved performance in high noise
environments where false collisions may be detected leading to dropped packets, it is
recommended that the user write this bit to a ‘0’ to disable collision detection when PLCA
is enabled. When collision detection is disabled, the PLCA reconciliation sublayer will still
assert logical collisions to the MAC as part of normal operation."
LAN8650 datasheet 11.5.51

> I've not read much about T1S, but if we assume it is doing good old
> fashioned CSMA/CD, with short cables the CS bit works well and the CD
> is less important. CD was needed when you have 1000m cable, and you
> can fit 64 bytes on the 1000m cable. So always turning of CD might be
> appropriate.
> 
> 	Andrew

As you assume when PLCA is disabled the phy runs in CSMA/CD mode.

I'll do some tests with both PLCA and CD off/disabled. My thinking is that a
adequate test bench would look like

* 3-4 nodes (depending on how many usb ports and dongles I have)
* run iperf with long cables and CSMA/CD
* run iperf with long cables and CMSA/No CD

I'll report back the results. Anything you'd like to add/focus on with
evaluation?

Ramón
Andrew Lunn Nov. 27, 2023, 4:03 p.m. UTC | #3
> * 3-4 nodes (depending on how many usb ports and dongles I have)
> * run iperf with long cables and CSMA/CD
> * run iperf with long cables and CMSA/No CD
> 
> I'll report back the results. Anything you'd like to add/focus on with
> evaluation?

Humm, thinking about how CSMA/CD works...

Maybe look at what counters the MAC provides. Does it have collisions
and bad FCS? A collision should result in a bad FCS, if you are not
using CD. So if things are working correctly, the count for CD should
move to FCS if you turn CD off. If CD is falsely triggering, FCS as a
% should not really change, but you probably get more frames over the
link?

	Andrew
Ramón Nordin Rodriguez Nov. 28, 2023, 8:57 a.m. UTC | #4
On Mon, Nov 27, 2023 at 05:03:54PM +0100, Andrew Lunn wrote:
> > * 3-4 nodes (depending on how many usb ports and dongles I have)
> > * run iperf with long cables and CSMA/CD
> > * run iperf with long cables and CMSA/No CD
> > 
> > I'll report back the results. Anything you'd like to add/focus on with
> > evaluation?
> 
> Humm, thinking about how CSMA/CD works...
> 
> Maybe look at what counters the MAC provides. Does it have collisions
> and bad FCS? A collision should result in a bad FCS, if you are not
> using CD. So if things are working correctly, the count for CD should
> move to FCS if you turn CD off. If CD is falsely triggering, FCS as a
> % should not really change, but you probably get more frames over the
> link?
> 

That is some really cool input, I have to do some datasheet digging and
hacking. I'll try to set everything up today, tomorrow I can hang in
the lab after hours and test things out!

Partihban suggested that Microchips support might be able to help with
testing, might give them a ping soon as I a solid plan.

Really appreciate the insight! 
R
Ramón Nordin Rodriguez Dec. 10, 2023, 12:10 p.m. UTC | #5
On Mon, Nov 27, 2023 at 05:03:54PM +0100, Andrew Lunn wrote:
> > * 3-4 nodes (depending on how many usb ports and dongles I have)
> > * run iperf with long cables and CSMA/CD
> > * run iperf with long cables and CMSA/No CD
> > 
> > I'll report back the results. Anything you'd like to add/focus on with
> > evaluation?
> 
> Humm, thinking about how CSMA/CD works...
> 
> Maybe look at what counters the MAC provides. Does it have collisions
> and bad FCS? A collision should result in a bad FCS, if you are not
> using CD. So if things are working correctly, the count for CD should
> move to FCS if you turn CD off. If CD is falsely triggering, FCS as a
> % should not really change, but you probably get more frames over the
> link?
> 

# setup

Andrew suggested that I try to get statistics from the MAC, I did some
investigation here but could not figure it out.

Using iperf3
Client: Arm based system running lan865x macphy 
Server: PC running lan867x revB usb dongle


# test results

The results below should be considered fairly represenative but far from
perfect. There was some bounce up and down when rerunning, but these resutls
are an eye-ball average.

No meaningful difference was seen with short (2m) cables or long (12m).

## with collision detection enabled

iperf3 normal
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.54 MBytes  4.65 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  5.40 MBytes  4.53 Mbits/sec                  receiver

iperf3 reverse
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   929 KBytes   761 Kbits/sec  293             sender
[  5]   0.00-10.00  sec   830 KBytes   680 Kbits/sec                  receiver


## with collision detection disabled

iperf3 normal
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.39 MBytes  5.36 Mbits/sec    0             sender
[  5]   0.00-10.04  sec  6.19 MBytes  5.17 Mbits/sec                  receiver

iperf3 reverse
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.27  sec  1.10 MBytes   897 Kbits/sec  268             sender
[  5]   0.00-10.00  sec  1.01 MBytes   843 Kbits/sec                  receiver

# Conclusions

The arm system running the lan865x macphy uses a not yet mainlined driver, see
https://lore.kernel.org/all/20231023154649.45931-1-Parthiban.Veerasooran@microchip.com/

The lan865x driver crashed out every once in a while on reverse mode, there
is definetly something biased in the driver for tx over rx.
Then again it's not accepted yet.

Disabling collision detection seemes to have an positive effect.
Slightly higher speeds and slightly fewer retransmissions.

I don't have a black and white result to present, but things seems to work
slightly better with CD disabled, so I'm leaning towards just unconditionally 
disabling it for the lan865x and lan867x phys for the v2 patch.

I'll wait with submitting v2 for a day so anyone interested gets a
chance to weigh in on this.

R
Andrew Lunn Dec. 11, 2023, 1:54 p.m. UTC | #6
> ## with collision detection enabled
> 
> iperf3 normal
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  5.54 MBytes  4.65 Mbits/sec    0             sender
> [  5]   0.00-10.01  sec  5.40 MBytes  4.53 Mbits/sec                  receiver
> 
> iperf3 reverse
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   929 KBytes   761 Kbits/sec  293             sender
> [  5]   0.00-10.00  sec   830 KBytes   680 Kbits/sec                  receiver
> 
> 
> ## with collision detection disabled
> 
> iperf3 normal
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  6.39 MBytes  5.36 Mbits/sec    0             sender
> [  5]   0.00-10.04  sec  6.19 MBytes  5.17 Mbits/sec                  receiver
> 
> iperf3 reverse
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.27  sec  1.10 MBytes   897 Kbits/sec  268             sender
> [  5]   0.00-10.00  sec  1.01 MBytes   843 Kbits/sec                  receiver
> 
> # Conclusions
> 
> The arm system running the lan865x macphy uses a not yet mainlined driver, see
> https://lore.kernel.org/all/20231023154649.45931-1-Parthiban.Veerasooran@microchip.com/
> 
> The lan865x driver crashed out every once in a while on reverse mode, there
> is definetly something biased in the driver for tx over rx.
>
> Then again it's not accepted yet.
> 
> Disabling collision detection seemes to have an positive effect.
> Slightly higher speeds and slightly fewer retransmissions.

I would want to first understand why there is such a big difference
with the direction. Is it TCP backing off because of the packet loss?
Or is there some other problem.

Maybe try with UDP streams, say with a bandwidth of 5Mbps. Do you
loose 4Mbps in one direction? Or a much smaller number of packets.

Are there any usable statistics? FCS errors?

    Andrew