[v2,1/2] e1000e: let the sleep codes run every time

Message ID	20240503101824.32717-1-en-wei.wu@canonical.com (mailing list archive)
State	Awaiting Upstream
Delegated to:	Netdev Maintainers
Headers	show Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74DD714F9F8 for <netdev@vger.kernel.org>; Fri, 3 May 2024 10:18:40 +0000 (UTC) From: Ricky Wu <en-wei.wu@canonical.com> To: jesse.brandeburg@intel.com Cc: anthony.l.nguyen@intel.com, intel-wired-lan@lists.osuosl.org, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, rickywu0421@gmail.com, en-wei.wu@canonical.com Subject: [PATCH v2 1/2] e1000e: let the sleep codes run every time Date: Fri, 3 May 2024 18:18:24 +0800 Message-Id: <20240503101824.32717-1-en-wei.wu@canonical.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2,1/2] e1000e: let the sleep codes run every time \| expand [v2,1/2] e1000e: let the sleep codes run every time [v2,2/2] e1000e: fix link fluctuations problem

Context	Check	Description
netdev/series_format	warning	Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 926 this patch: 926
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	success	CCed 7 of 7 maintainers
netdev/build_clang	success	Errors and warnings before: 937 this patch: 937
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 937 this patch: 937
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 71 this patch: 71
netdev/source_inline	success	Was 0 now: 0

En-Wei Wu May 3, 2024, 10:18 a.m. UTC

Originally, the sleep codes being moved forward only
ran if we met some conditions (e.g. BMSR_LSTATUS bit
not set in phy_status). Moving these sleep codes forward
makes the usec_interval take effect every time.

Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
---

In v2:
* Split the sleep codes into this patch

 drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Jacob Keller May 3, 2024, 6:17 p.m. UTC | #1

> -----Original Message-----
> From: Ricky Wu <en-wei.wu@canonical.com>
> Sent: Friday, May 3, 2024 3:18 AM
> To: Brandeburg, Jesse <jesse.brandeburg@intel.com>
> Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; intel-wired-
> lan@lists.osuosl.org; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; rickywu0421@gmail.com; en-wei.wu@canonical.com
> Subject: [PATCH v2 1/2] e1000e: let the sleep codes run every time
> 
> Originally, the sleep codes being moved forward only
> ran if we met some conditions (e.g. BMSR_LSTATUS bit
> not set in phy_status). Moving these sleep codes forward
> makes the usec_interval take effect every time.
> 
> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>

What's the goal here? Do we need to sleep before initially checking the state?

Sasha Neftin May 6, 2024, 3:53 p.m. UTC | #2

On 03/05/2024 13:18, Ricky Wu wrote:
> Originally, the sleep codes being moved forward only
> ran if we met some conditions (e.g. BMSR_LSTATUS bit
> not set in phy_status). Moving these sleep codes forward
> makes the usec_interval take effect every time.
> 
> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
> ---
> 
> In v2:
> * Split the sleep codes into this patch
> 
>   drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> index 93544f1cc2a5..4a58d56679c9 100644
> --- a/drivers/net/ethernet/intel/e1000e/phy.c
> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>   
>   	*success = false;
>   	for (i = 0; i < iterations; i++) {
> +		if (usec_interval >= 1000)
> +			msleep(usec_interval / 1000);
> +		else
> +			udelay(usec_interval);
> +

I do not understand this approach. Why wait before first 
reading/accessing the PHY IEEE register?

For further discussion, I would like to introduce Dima Ruinskiy (architect)

>   		/* Some PHYs require the MII_BMSR register to be read
>   		 * twice due to the link bit being sticky.  No harm doing
>   		 * it across the board.
> @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>   			*success = true;
>   			break;
>   		}
> -		if (usec_interval >= 1000)
> -			msleep(usec_interval / 1000);
> -		else
> -			udelay(usec_interval);
>   	}
>   
>   	return ret_val;

En-Wei Wu May 6, 2024, 4:46 p.m. UTC | #3

Thank you for your time.

Originally, sleep codes would only be executed if the first read fails
or the link status that is read is down. Some circumstances like the
[v2,2/2] "e1000e: fix link fluctuations problem" would need a delay
before first reading/accessing the PHY IEEE register, so that it won't
read the instability of the link status bit in the PHY status
register.

I've realized that this approach isn't good enough since the purpose
is only to fix the problem in another patch and it also changes the
behavior.

Here is the modification of the patch [v2,2/2] "e1000e: fix link
fluctuations problem":
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1428,7 +1428,17 @@  static s32
e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
- ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
/* comments */
+ ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT,
100000, &link);

Do you think we can just add a msleep/usleep_range in front of the
e1000e_phy_has_link_generic() instead of modifying the sleep codes in
e1000e_phy_has_link_generic()?

Thanks.

On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote:
>
> On 03/05/2024 13:18, Ricky Wu wrote:
> > Originally, the sleep codes being moved forward only
> > ran if we met some conditions (e.g. BMSR_LSTATUS bit
> > not set in phy_status). Moving these sleep codes forward
> > makes the usec_interval take effect every time.
> >
> > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
> > ---
> >
> > In v2:
> > * Split the sleep codes into this patch
> >
> >   drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++----
> >   1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> > index 93544f1cc2a5..4a58d56679c9 100644
> > --- a/drivers/net/ethernet/intel/e1000e/phy.c
> > +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> > @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> >
> >       *success = false;
> >       for (i = 0; i < iterations; i++) {
> > +             if (usec_interval >= 1000)
> > +                     msleep(usec_interval / 1000);
> > +             else
> > +                     udelay(usec_interval);
> > +
>
> I do not understand this approach. Why wait before first
> reading/accessing the PHY IEEE register?
>
> For further discussion, I would like to introduce Dima Ruinskiy (architect)
>
> >               /* Some PHYs require the MII_BMSR register to be read
> >                * twice due to the link bit being sticky.  No harm doing
> >                * it across the board.
> > @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> >                       *success = true;
> >                       break;
> >               }
> > -             if (usec_interval >= 1000)
> > -                     msleep(usec_interval / 1000);
> > -             else
> > -                     udelay(usec_interval);
> >       }
> >
> >       return ret_val;
>

Ruinskiy, Dima May 7, 2024, 6:21 a.m. UTC | #4

On 06/05/2024 19:46, En-Wei WU wrote:
> Thank you for your time.
> 
> Originally, sleep codes would only be executed if the first read fails
> or the link status that is read is down. Some circumstances like the
> [v2,2/2] "e1000e: fix link fluctuations problem" would need a delay
> before first reading/accessing the PHY IEEE register, so that it won't
> read the instability of the link status bit in the PHY status
> register.
> 
> I've realized that this approach isn't good enough since the purpose
> is only to fix the problem in another patch and it also changes the
> behavior.
> 
> Here is the modification of the patch [v2,2/2] "e1000e: fix link
> fluctuations problem":
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1428,7 +1428,17 @@  static s32
> e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
> - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
> /* comments */
> + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT,
> 100000, &link);
> 
> Do you think we can just add a msleep/usleep_range in front of the
> e1000e_phy_has_link_generic() instead of modifying the sleep codes in
> e1000e_phy_has_link_generic()?
> 
> Thanks.
> 
> On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote:
>>
>> On 03/05/2024 13:18, Ricky Wu wrote:
>>> Originally, the sleep codes being moved forward only
>>> ran if we met some conditions (e.g. BMSR_LSTATUS bit
>>> not set in phy_status). Moving these sleep codes forward
>>> makes the usec_interval take effect every time.
>>>
>>> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
>>> ---
>>>
>>> In v2:
>>> * Split the sleep codes into this patch
>>>
>>>    drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++----
>>>    1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
>>> index 93544f1cc2a5..4a58d56679c9 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/phy.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
>>> @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>>>
>>>        *success = false;
>>>        for (i = 0; i < iterations; i++) {
>>> +             if (usec_interval >= 1000)
>>> +                     msleep(usec_interval / 1000);
>>> +             else
>>> +                     udelay(usec_interval);
>>> +
>>
>> I do not understand this approach. Why wait before first
>> reading/accessing the PHY IEEE register?
>>
>> For further discussion, I would like to introduce Dima Ruinskiy (architect)
>>
>>>                /* Some PHYs require the MII_BMSR register to be read
>>>                 * twice due to the link bit being sticky.  No harm doing
>>>                 * it across the board.
>>> @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
>>>                        *success = true;
>>>                        break;
>>>                }
>>> -             if (usec_interval >= 1000)
>>> -                     msleep(usec_interval / 1000);
>>> -             else
>>> -                     udelay(usec_interval);
>>>        }
>>>
>>>        return ret_val;
>>

Regarding the usage of sleep/sleep_range functions - they can only be 
used if this code will never be called from an atomic context. I do not 
know if such a guarantee exists.

In general I have quite a few questions and concerns regarding this 
patch series. The comment in patch 2/2 states that it is designed to 
work around a link flap issue with the average time between link up and 
down is 3-4ms, and yet the code waits a whole 100ms before reading the 
PHY bit the first time. Why so long?

Furthermore, if I am reading this right, it appears that, with the 
proposed change, e1000e_phy_has_link_generic will poll the PHY link up 
to 10 times, with 100ms delay between each iteration - until the link is 
up. Won't it lead to wasting all this time, if the link is actually down?

Looking at https://bugzilla.kernel.org/show_bug.cgi?id=218642, at the 
problem this commit series is trying to solve - I wonder:
(1) How serious this problem is. It is normal for link establishment to 
take a few seconds from plugging the cable (due to PHY 
auto-negotiation), and I can accept some link instability during that time.
(2) Assuming the problem is considered serious - have we root-caused it 
correctly.

En-Wei Wu May 7, 2024, 8:54 a.m. UTC | #5

> Why so long?

> Furthermore, if I am reading this right, it appears that, with the
> proposed change, e1000e_phy_has_link_generic will poll the PHY link up
> to 10 times, with 100ms delay between each iteration - until the link is
> up. Won't it lead to wasting all this time, if the link is actually down?
It seems like making the delay 10ms is more suitable. And if the link
is actually down, it takes up to 10 (times)* 10 (ms) = 100ms which is
not quite long in terms of the configuration path.

> (1) How serious this problem is. It is normal for link establishment to
> take a few seconds from plugging the cable (due to PHY
> auto-negotiation), and I can accept some link instability during that time.
Actually, the problem is not critical since the link will be up
permanently after the unstable up-down problem when hot-plugging. And
it has no functional impact on the system. But this problem can lead
to a failure in our script (for Canonical Certification), and it's not
tolerable.

> (2) Assuming the problem is considered serious - have we root-caused it
> correctly.
The problem seems to be a PHY manufacturer errata. The patch here is a
MAC workaround and I'm wondering if we can root-cause it.

On Tue, 7 May 2024 at 08:22, Ruinskiy, Dima <dima.ruinskiy@intel.com> wrote:
>
> On 06/05/2024 19:46, En-Wei WU wrote:
> > Thank you for your time.
> >
> > Originally, sleep codes would only be executed if the first read fails
> > or the link status that is read is down. Some circumstances like the
> > [v2,2/2] "e1000e: fix link fluctuations problem" would need a delay
> > before first reading/accessing the PHY IEEE register, so that it won't
> > read the instability of the link status bit in the PHY status
> > register.
> >
> > I've realized that this approach isn't good enough since the purpose
> > is only to fix the problem in another patch and it also changes the
> > behavior.
> >
> > Here is the modification of the patch [v2,2/2] "e1000e: fix link
> > fluctuations problem":
> > --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> > +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> > @@ -1428,7 +1428,17 @@  static s32
> > e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw)
> > - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link);
> > /* comments */
> > + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT,
> > 100000, &link);
> >
> > Do you think we can just add a msleep/usleep_range in front of the
> > e1000e_phy_has_link_generic() instead of modifying the sleep codes in
> > e1000e_phy_has_link_generic()?
> >
> > Thanks.
> >
> > On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote:
> >>
> >> On 03/05/2024 13:18, Ricky Wu wrote:
> >>> Originally, the sleep codes being moved forward only
> >>> ran if we met some conditions (e.g. BMSR_LSTATUS bit
> >>> not set in phy_status). Moving these sleep codes forward
> >>> makes the usec_interval take effect every time.
> >>>
> >>> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com>
> >>> ---
> >>>
> >>> In v2:
> >>> * Split the sleep codes into this patch
> >>>
> >>>    drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++----
> >>>    1 file changed, 5 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
> >>> index 93544f1cc2a5..4a58d56679c9 100644
> >>> --- a/drivers/net/ethernet/intel/e1000e/phy.c
> >>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c
> >>> @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> >>>
> >>>        *success = false;
> >>>        for (i = 0; i < iterations; i++) {
> >>> +             if (usec_interval >= 1000)
> >>> +                     msleep(usec_interval / 1000);
> >>> +             else
> >>> +                     udelay(usec_interval);
> >>> +
> >>
> >> I do not understand this approach. Why wait before first
> >> reading/accessing the PHY IEEE register?
> >>
> >> For further discussion, I would like to introduce Dima Ruinskiy (architect)
> >>
> >>>                /* Some PHYs require the MII_BMSR register to be read
> >>>                 * twice due to the link bit being sticky.  No harm doing
> >>>                 * it across the board.
> >>> @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations,
> >>>                        *success = true;
> >>>                        break;
> >>>                }
> >>> -             if (usec_interval >= 1000)
> >>> -                     msleep(usec_interval / 1000);
> >>> -             else
> >>> -                     udelay(usec_interval);
> >>>        }
> >>>
> >>>        return ret_val;
> >>
>
> Regarding the usage of sleep/sleep_range functions - they can only be
> used if this code will never be called from an atomic context. I do not
> know if such a guarantee exists.
>
> In general I have quite a few questions and concerns regarding this
> patch series. The comment in patch 2/2 states that it is designed to
> work around a link flap issue with the average time between link up and
> down is 3-4ms, and yet the code waits a whole 100ms before reading the
> PHY bit the first time. Why so long?
>
> Furthermore, if I am reading this right, it appears that, with the
> proposed change, e1000e_phy_has_link_generic will poll the PHY link up
> to 10 times, with 100ms delay between each iteration - until the link is
> up. Won't it lead to wasting all this time, if the link is actually down?
>
> Looking at https://bugzilla.kernel.org/show_bug.cgi?id=218642, at the
> problem this commit series is trying to solve - I wonder:
> (1) How serious this problem is. It is normal for link establishment to
> take a few seconds from plugging the cable (due to PHY
> auto-negotiation), and I can accept some link instability during that time.
> (2) Assuming the problem is considered serious - have we root-caused it
> correctly.
>

Andrew Lunn May 7, 2024, 12:29 p.m. UTC | #6

> > (1) How serious this problem is. It is normal for link establishment to
> > take a few seconds from plugging the cable (due to PHY
> > auto-negotiation), and I can accept some link instability during that time.
> Actually, the problem is not critical since the link will be up
> permanently after the unstable up-down problem when hot-plugging. And
> it has no functional impact on the system. But this problem can lead
> to a failure in our script (for Canonical Certification), and it's not
> tolerable.

Please could you describe your test. We should be sure you are fixing
the right thing. Maybe the test is broken, not the driver...

    Andrew

[v2,1/2] e1000e: let the sleep codes run every time

Checks

Commit Message

Comments

Patch