diff mbox series

[net-next] phy: micrel: ksz8041nl: do not use power down mode

Message ID 20211018094256.70096-1-francesco.dolcini@toradex.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] phy: micrel: ksz8041nl: do not use power down mode | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 4 this patch: 4
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success No Fixes tag
netdev/checkpatch warning WARNING: 'continous' may be misspelled - perhaps 'continuous'? WARNING: 'continously' may be misspelled - perhaps 'continuously'?
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Francesco Dolcini Oct. 18, 2021, 9:42 a.m. UTC
From: Stefan Agner <stefan@agner.ch>

Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
the power down mode bit (0.11). If the PHY is taken out of power down
mode in a certain temperature range, the PHY enters a weird state which
leads to continously reporting RX errors. In that state, the MAC is not
able to receive or send any Ethernet frames and the activity LED is
constantly blinking. Since Linux is using the suspend callback when the
interface is taken down, ending up in that state can easily happen
during a normal startup.

Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
clock recovery when using power down mode. Even the latest revision (A4,
Revision ID 0x1513) seems to suffer that problem, and according to the
errata is not going to be fixed.

Remove the suspend/resume callback to avoid using the power down mode
completely.

[*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf

Signed-off-by: Stefan Agner <stefan@agner.ch>
Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>

---
There was a previous attempt to merge a similar patch, see
https://lore.kernel.org/all/2ee9441d-1b3b-de6d-691d-b615c04c69d0@gmail.com/.
---
 drivers/net/phy/micrel.c | 2 --
 1 file changed, 2 deletions(-)

Comments

Christophe Leroy Oct. 18, 2021, 9:53 a.m. UTC | #1
Le 18/10/2021 à 11:42, Francesco Dolcini a écrit :
> From: Stefan Agner <stefan@agner.ch>
> 
> Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
> the power down mode bit (0.11). If the PHY is taken out of power down
> mode in a certain temperature range, the PHY enters a weird state which
> leads to continously reporting RX errors. In that state, the MAC is not
> able to receive or send any Ethernet frames and the activity LED is
> constantly blinking. Since Linux is using the suspend callback when the
> interface is taken down, ending up in that state can easily happen
> during a normal startup.
> 
> Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
> clock recovery when using power down mode. Even the latest revision (A4,
> Revision ID 0x1513) seems to suffer that problem, and according to the
> errata is not going to be fixed.
> 
> Remove the suspend/resume callback to avoid using the power down mode
> completely.

As far as I can see in the ERRATA, KSZ8041 RNLI also has the bug.
Shoudn't you also remove the suspend/resume on that one (which follows 
in ksphy_driver[])

Christophe

> 
> [*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
> 
> ---
> There was a previous attempt to merge a similar patch, see
> https://lore.kernel.org/all/2ee9441d-1b3b-de6d-691d-b615c04c69d0@gmail.com/.
> ---
>   drivers/net/phy/micrel.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
> index ff452669130a..1f28d5fae677 100644
> --- a/drivers/net/phy/micrel.c
> +++ b/drivers/net/phy/micrel.c
> @@ -1676,8 +1676,6 @@ static struct phy_driver ksphy_driver[] = {
>   	.get_sset_count = kszphy_get_sset_count,
>   	.get_strings	= kszphy_get_strings,
>   	.get_stats	= kszphy_get_stats,
> -	.suspend	= genphy_suspend,
> -	.resume		= genphy_resume,
>   }, {
>   	.phy_id		= PHY_ID_KSZ8041RNLI,
>   	.phy_id_mask	= MICREL_PHY_ID_MASK,
>
Francesco Dolcini Oct. 18, 2021, 10:18 a.m. UTC | #2
Hello Christophe,

On Mon, Oct 18, 2021 at 11:53:03AM +0200, Christophe Leroy wrote:
> 
> 
> Le 18/10/2021 à 11:42, Francesco Dolcini a écrit :
> > From: Stefan Agner <stefan@agner.ch>
> > 
> > Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
> > the power down mode bit (0.11). If the PHY is taken out of power down
> > mode in a certain temperature range, the PHY enters a weird state which
> > leads to continously reporting RX errors. In that state, the MAC is not
> > able to receive or send any Ethernet frames and the activity LED is
> > constantly blinking. Since Linux is using the suspend callback when the
> > interface is taken down, ending up in that state can easily happen
> > during a normal startup.
> > 
> > Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
> > clock recovery when using power down mode. Even the latest revision (A4,
> > Revision ID 0x1513) seems to suffer that problem, and according to the
> > errata is not going to be fixed.
> > 
> > Remove the suspend/resume callback to avoid using the power down mode
> > completely.
> 
> As far as I can see in the ERRATA, KSZ8041 RNLI also has the bug.
> Shoudn't you also remove the suspend/resume on that one (which follows in
> ksphy_driver[])

Yes, I could, however this patch is coming out of a real issue we had with
KSZ8041NL with this specific phy id (and we have such a patch in our linux
branch since years).

On the other hand the entry for KSZ8041RNLI in the driver is somehow weird,
since the phy id according to the original commit does not even exists on
the datasheet. Would you be confident applying such errata for that phyid
without having a way of testing it?

Francesco
Christophe Leroy Oct. 18, 2021, 10:46 a.m. UTC | #3
Le 18/10/2021 à 12:18, Francesco Dolcini a écrit :
> Hello Christophe,
> 
> On Mon, Oct 18, 2021 at 11:53:03AM +0200, Christophe Leroy wrote:
>>
>>
>> Le 18/10/2021 à 11:42, Francesco Dolcini a écrit :
>>> From: Stefan Agner <stefan@agner.ch>
>>>
>>> Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
>>> the power down mode bit (0.11). If the PHY is taken out of power down
>>> mode in a certain temperature range, the PHY enters a weird state which
>>> leads to continously reporting RX errors. In that state, the MAC is not
>>> able to receive or send any Ethernet frames and the activity LED is
>>> constantly blinking. Since Linux is using the suspend callback when the
>>> interface is taken down, ending up in that state can easily happen
>>> during a normal startup.
>>>
>>> Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
>>> clock recovery when using power down mode. Even the latest revision (A4,
>>> Revision ID 0x1513) seems to suffer that problem, and according to the
>>> errata is not going to be fixed.
>>>
>>> Remove the suspend/resume callback to avoid using the power down mode
>>> completely.
>>
>> As far as I can see in the ERRATA, KSZ8041 RNLI also has the bug.
>> Shoudn't you also remove the suspend/resume on that one (which follows in
>> ksphy_driver[])
> 
> Yes, I could, however this patch is coming out of a real issue we had with
> KSZ8041NL with this specific phy id (and we have such a patch in our linux
> branch since years).
> 
> On the other hand the entry for KSZ8041RNLI in the driver is somehow weird,
> since the phy id according to the original commit does not even exists on
> the datasheet. Would you be confident applying such errata for that phyid
> without having a way of testing it?


If your patch was to add the suspend/resume capability I would agree 
with you, but here we are talking about removing it, so what risk are we 
taking ?

In addition, commit 4bd7b5127bd0 ("micrel: add support for KSZ8041RNLI") 
clearly tells that the only thing it did was to copy KSZ8041NL entry, so 
for me updating both entries would really make sense.

It looks odd to me that you refer in your commit log to an ERRATA that 
tells you that the bug also exists on the KSZ8041RNLI and you apply it 
only partly.

Christophe
Francesco Dolcini Oct. 18, 2021, 11:27 a.m. UTC | #4
On Mon, Oct 18, 2021 at 12:46:14PM +0200, Christophe Leroy wrote:
> 
> 
> Le 18/10/2021 à 12:18, Francesco Dolcini a écrit :
> > Hello Christophe,
> > 
> > On Mon, Oct 18, 2021 at 11:53:03AM +0200, Christophe Leroy wrote:
> > > 
> > > 
> > > Le 18/10/2021 à 11:42, Francesco Dolcini a écrit :
> > > > From: Stefan Agner <stefan@agner.ch>
> > > > 
> > > > Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
> > > > the power down mode bit (0.11). If the PHY is taken out of power down
> > > > mode in a certain temperature range, the PHY enters a weird state which
> > > > leads to continously reporting RX errors. In that state, the MAC is not
> > > > able to receive or send any Ethernet frames and the activity LED is
> > > > constantly blinking. Since Linux is using the suspend callback when the
> > > > interface is taken down, ending up in that state can easily happen
> > > > during a normal startup.
> > > > 
> > > > Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
> > > > clock recovery when using power down mode. Even the latest revision (A4,
> > > > Revision ID 0x1513) seems to suffer that problem, and according to the
> > > > errata is not going to be fixed.
> > > > 
> > > > Remove the suspend/resume callback to avoid using the power down mode
> > > > completely.
> > > 
> > > As far as I can see in the ERRATA, KSZ8041 RNLI also has the bug.
> > > Shoudn't you also remove the suspend/resume on that one (which follows in
> > > ksphy_driver[])
> > 
> > Yes, I could, however this patch is coming out of a real issue we had with
> > KSZ8041NL with this specific phy id (and we have such a patch in our linux
> > branch since years).
> > 
> > On the other hand the entry for KSZ8041RNLI in the driver is somehow weird,
> > since the phy id according to the original commit does not even exists on
> > the datasheet. Would you be confident applying such errata for that phyid
> > without having a way of testing it?
> 
> 
> If your patch was to add the suspend/resume capability I would agree with
> you, but here we are talking about removing it, so what risk are we taking ?
yes, you are right.

> In addition, commit 4bd7b5127bd0 ("micrel: add support for KSZ8041RNLI")
> clearly tells that the only thing it did was to copy KSZ8041NL entry, so for
> me updating both entries would really make sense.
> 
> It looks odd to me that you refer in your commit log to an ERRATA that tells
> you that the bug also exists on the KSZ8041RNLI and you apply it only
> partly.

I think I was not clear enough, the entry I changed should already cover
KSZ8041RNLI, the phyid is supposed to be just the same according to the
datasheet. This entry for KSZ8041RNLI seems really special with this
un-documented phyid.
But I'm just speculating, I do not have access to these hardware.

Said that if there are no concern from anybody else, to be on the safe/cautious
side, I can just update also this entry.

Francesco
Christophe Leroy Oct. 18, 2021, 11:57 a.m. UTC | #5
+Sergei Shtylyov

Adding Sergei Shtylyov in the discussion, as he submitted the patch for 
the support of KSZ8041RNLI.


Le 18/10/2021 à 13:27, Francesco Dolcini a écrit :
> On Mon, Oct 18, 2021 at 12:46:14PM +0200, Christophe Leroy wrote:
>>
>>
>> Le 18/10/2021 à 12:18, Francesco Dolcini a écrit :
>>> Hello Christophe,
>>>
>>> On Mon, Oct 18, 2021 at 11:53:03AM +0200, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 18/10/2021 à 11:42, Francesco Dolcini a écrit :
>>>>> From: Stefan Agner <stefan@agner.ch>
>>>>>
>>>>> Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
>>>>> the power down mode bit (0.11). If the PHY is taken out of power down
>>>>> mode in a certain temperature range, the PHY enters a weird state which
>>>>> leads to continously reporting RX errors. In that state, the MAC is not
>>>>> able to receive or send any Ethernet frames and the activity LED is
>>>>> constantly blinking. Since Linux is using the suspend callback when the
>>>>> interface is taken down, ending up in that state can easily happen
>>>>> during a normal startup.
>>>>>
>>>>> Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
>>>>> clock recovery when using power down mode. Even the latest revision (A4,
>>>>> Revision ID 0x1513) seems to suffer that problem, and according to the
>>>>> errata is not going to be fixed.
>>>>>
>>>>> Remove the suspend/resume callback to avoid using the power down mode
>>>>> completely.
>>>>
>>>> As far as I can see in the ERRATA, KSZ8041 RNLI also has the bug.
>>>> Shoudn't you also remove the suspend/resume on that one (which follows in
>>>> ksphy_driver[])
>>>
>>> Yes, I could, however this patch is coming out of a real issue we had with
>>> KSZ8041NL with this specific phy id (and we have such a patch in our linux
>>> branch since years).
>>>
>>> On the other hand the entry for KSZ8041RNLI in the driver is somehow weird,
>>> since the phy id according to the original commit does not even exists on
>>> the datasheet. Would you be confident applying such errata for that phyid
>>> without having a way of testing it?
>>
>>
>> If your patch was to add the suspend/resume capability I would agree with
>> you, but here we are talking about removing it, so what risk are we taking ?
> yes, you are right.
> 
>> In addition, commit 4bd7b5127bd0 ("micrel: add support for KSZ8041RNLI")
>> clearly tells that the only thing it did was to copy KSZ8041NL entry, so for
>> me updating both entries would really make sense.
>>
>> It looks odd to me that you refer in your commit log to an ERRATA that tells
>> you that the bug also exists on the KSZ8041RNLI and you apply it only
>> partly.
> 
> I think I was not clear enough, the entry I changed should already cover
> KSZ8041RNLI, the phyid is supposed to be just the same according to the
> datasheet. This entry for KSZ8041RNLI seems really special with this
> un-documented phyid.
> But I'm just speculating, I do not have access to these hardware.
> 
> Said that if there are no concern from anybody else, to be on the safe/cautious
> side, I can just update also this entry.
> 
> Francesco
>
Jakub Kicinski Oct. 18, 2021, 4:52 p.m. UTC | #6
On Mon, 18 Oct 2021 11:42:58 +0200 Francesco Dolcini wrote:
> From: Stefan Agner <stefan@agner.ch>
> 
> Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
> the power down mode bit (0.11). If the PHY is taken out of power down
> mode in a certain temperature range, the PHY enters a weird state which
> leads to continously reporting RX errors. In that state, the MAC is not
> able to receive or send any Ethernet frames and the activity LED is
> constantly blinking. Since Linux is using the suspend callback when the
> interface is taken down, ending up in that state can easily happen
> during a normal startup.
> 
> Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
> clock recovery when using power down mode. Even the latest revision (A4,
> Revision ID 0x1513) seems to suffer that problem, and according to the
> errata is not going to be fixed.
> 
> Remove the suspend/resume callback to avoid using the power down mode
> completely.
> 
> [*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>
> Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>

Is this the correct fixes tag?

Fixes: 1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs")

Should we leave a comment in place of the callbacks referring 
to the errata?
Francesco Dolcini Oct. 18, 2021, 5:16 p.m. UTC | #7
Hello Jakub,

On Mon, Oct 18, 2021 at 09:52:49AM -0700, Jakub Kicinski wrote:
> On Mon, 18 Oct 2021 11:42:58 +0200 Francesco Dolcini wrote:
> > From: Stefan Agner <stefan@agner.ch>
> > 
> > Some Micrel KSZ8041NL PHY chips exhibit continous RX errors after using
> > the power down mode bit (0.11). If the PHY is taken out of power down
> > mode in a certain temperature range, the PHY enters a weird state which
> > leads to continously reporting RX errors. In that state, the MAC is not
> > able to receive or send any Ethernet frames and the activity LED is
> > constantly blinking. Since Linux is using the suspend callback when the
> > interface is taken down, ending up in that state can easily happen
> > during a normal startup.
> > 
> > Micrel confirmed the issue in errata DS80000700A [*], caused by abnormal
> > clock recovery when using power down mode. Even the latest revision (A4,
> > Revision ID 0x1513) seems to suffer that problem, and according to the
> > errata is not going to be fixed.
> > 
> > Remove the suspend/resume callback to avoid using the power down mode
> > completely.
> > 
> > [*] https://ww1.microchip.com/downloads/en/DeviceDoc/80000700A.pdf
> > 
> > Signed-off-by: Stefan Agner <stefan@agner.ch>
> > Acked-by: Marcel Ziswiler <marcel.ziswiler@toradex.com>
> > Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
> 
> Is this the correct fixes tag?
> 
> Fixes: 1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs")
The errata is from 2016, while this commit is from 2013, weird? Apart of that I
can add the Fixes tag, should we send this also to stable?

> Should we leave a comment in place of the callbacks referring 
> to the errata?
I think is a good idea, I'll add it.

Francesco
Jakub Kicinski Oct. 18, 2021, 5:27 p.m. UTC | #8
On Mon, 18 Oct 2021 19:16:21 +0200 Francesco Dolcini wrote:
> > Fixes: 1a5465f5d6a2 ("phy/micrel: Add suspend/resume support to Micrel PHYs")  
> The errata is from 2016, while this commit is from 2013, weird?
> Apart of that I can add the Fixes tag, should we send this also to stable?

I'd lean towards sending it to stable, yes.

> > Should we leave a comment in place of the callbacks referring 
> > to the errata?  
> I think is a good idea, I'll add it.
diff mbox series

Patch

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index ff452669130a..1f28d5fae677 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -1676,8 +1676,6 @@  static struct phy_driver ksphy_driver[] = {
 	.get_sset_count = kszphy_get_sset_count,
 	.get_strings	= kszphy_get_strings,
 	.get_stats	= kszphy_get_stats,
-	.suspend	= genphy_suspend,
-	.resume		= genphy_resume,
 }, {
 	.phy_id		= PHY_ID_KSZ8041RNLI,
 	.phy_id_mask	= MICREL_PHY_ID_MASK,