Message ID | 20240503101824.32717-1-en-wei.wu@canonical.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,1/2] e1000e: let the sleep codes run every time | expand |
> -----Original Message----- > From: Ricky Wu <en-wei.wu@canonical.com> > Sent: Friday, May 3, 2024 3:18 AM > To: Brandeburg, Jesse <jesse.brandeburg@intel.com> > Cc: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; intel-wired- > lan@lists.osuosl.org; davem@davemloft.net; edumazet@google.com; > kuba@kernel.org; pabeni@redhat.com; netdev@vger.kernel.org; linux- > kernel@vger.kernel.org; rickywu0421@gmail.com; en-wei.wu@canonical.com > Subject: [PATCH v2 1/2] e1000e: let the sleep codes run every time > > Originally, the sleep codes being moved forward only > ran if we met some conditions (e.g. BMSR_LSTATUS bit > not set in phy_status). Moving these sleep codes forward > makes the usec_interval take effect every time. > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> What's the goal here? Do we need to sleep before initially checking the state?
On 03/05/2024 13:18, Ricky Wu wrote: > Originally, the sleep codes being moved forward only > ran if we met some conditions (e.g. BMSR_LSTATUS bit > not set in phy_status). Moving these sleep codes forward > makes the usec_interval take effect every time. > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > --- > > In v2: > * Split the sleep codes into this patch > > drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c > index 93544f1cc2a5..4a58d56679c9 100644 > --- a/drivers/net/ethernet/intel/e1000e/phy.c > +++ b/drivers/net/ethernet/intel/e1000e/phy.c > @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > > *success = false; > for (i = 0; i < iterations; i++) { > + if (usec_interval >= 1000) > + msleep(usec_interval / 1000); > + else > + udelay(usec_interval); > + I do not understand this approach. Why wait before first reading/accessing the PHY IEEE register? For further discussion, I would like to introduce Dima Ruinskiy (architect) > /* Some PHYs require the MII_BMSR register to be read > * twice due to the link bit being sticky. No harm doing > * it across the board. > @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > *success = true; > break; > } > - if (usec_interval >= 1000) > - msleep(usec_interval / 1000); > - else > - udelay(usec_interval); > } > > return ret_val;
Thank you for your time. Originally, sleep codes would only be executed if the first read fails or the link status that is read is down. Some circumstances like the [v2,2/2] "e1000e: fix link fluctuations problem" would need a delay before first reading/accessing the PHY IEEE register, so that it won't read the instability of the link status bit in the PHY status register. I've realized that this approach isn't good enough since the purpose is only to fix the problem in another patch and it also changes the behavior. Here is the modification of the patch [v2,2/2] "e1000e: fix link fluctuations problem": --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c @@ -1428,7 +1428,17 @@ static s32 e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw) - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link); /* comments */ + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, 100000, &link); Do you think we can just add a msleep/usleep_range in front of the e1000e_phy_has_link_generic() instead of modifying the sleep codes in e1000e_phy_has_link_generic()? Thanks. On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote: > > On 03/05/2024 13:18, Ricky Wu wrote: > > Originally, the sleep codes being moved forward only > > ran if we met some conditions (e.g. BMSR_LSTATUS bit > > not set in phy_status). Moving these sleep codes forward > > makes the usec_interval take effect every time. > > > > Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > > --- > > > > In v2: > > * Split the sleep codes into this patch > > > > drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++---- > > 1 file changed, 5 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c > > index 93544f1cc2a5..4a58d56679c9 100644 > > --- a/drivers/net/ethernet/intel/e1000e/phy.c > > +++ b/drivers/net/ethernet/intel/e1000e/phy.c > > @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > > > > *success = false; > > for (i = 0; i < iterations; i++) { > > + if (usec_interval >= 1000) > > + msleep(usec_interval / 1000); > > + else > > + udelay(usec_interval); > > + > > I do not understand this approach. Why wait before first > reading/accessing the PHY IEEE register? > > For further discussion, I would like to introduce Dima Ruinskiy (architect) > > > /* Some PHYs require the MII_BMSR register to be read > > * twice due to the link bit being sticky. No harm doing > > * it across the board. > > @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > > *success = true; > > break; > > } > > - if (usec_interval >= 1000) > > - msleep(usec_interval / 1000); > > - else > > - udelay(usec_interval); > > } > > > > return ret_val; >
On 06/05/2024 19:46, En-Wei WU wrote: > Thank you for your time. > > Originally, sleep codes would only be executed if the first read fails > or the link status that is read is down. Some circumstances like the > [v2,2/2] "e1000e: fix link fluctuations problem" would need a delay > before first reading/accessing the PHY IEEE register, so that it won't > read the instability of the link status bit in the PHY status > register. > > I've realized that this approach isn't good enough since the purpose > is only to fix the problem in another patch and it also changes the > behavior. > > Here is the modification of the patch [v2,2/2] "e1000e: fix link > fluctuations problem": > --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c > +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c > @@ -1428,7 +1428,17 @@ static s32 > e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw) > - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link); > /* comments */ > + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, > 100000, &link); > > Do you think we can just add a msleep/usleep_range in front of the > e1000e_phy_has_link_generic() instead of modifying the sleep codes in > e1000e_phy_has_link_generic()? > > Thanks. > > On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote: >> >> On 03/05/2024 13:18, Ricky Wu wrote: >>> Originally, the sleep codes being moved forward only >>> ran if we met some conditions (e.g. BMSR_LSTATUS bit >>> not set in phy_status). Moving these sleep codes forward >>> makes the usec_interval take effect every time. >>> >>> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> >>> --- >>> >>> In v2: >>> * Split the sleep codes into this patch >>> >>> drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++---- >>> 1 file changed, 5 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c >>> index 93544f1cc2a5..4a58d56679c9 100644 >>> --- a/drivers/net/ethernet/intel/e1000e/phy.c >>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c >>> @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, >>> >>> *success = false; >>> for (i = 0; i < iterations; i++) { >>> + if (usec_interval >= 1000) >>> + msleep(usec_interval / 1000); >>> + else >>> + udelay(usec_interval); >>> + >> >> I do not understand this approach. Why wait before first >> reading/accessing the PHY IEEE register? >> >> For further discussion, I would like to introduce Dima Ruinskiy (architect) >> >>> /* Some PHYs require the MII_BMSR register to be read >>> * twice due to the link bit being sticky. No harm doing >>> * it across the board. >>> @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, >>> *success = true; >>> break; >>> } >>> - if (usec_interval >= 1000) >>> - msleep(usec_interval / 1000); >>> - else >>> - udelay(usec_interval); >>> } >>> >>> return ret_val; >> Regarding the usage of sleep/sleep_range functions - they can only be used if this code will never be called from an atomic context. I do not know if such a guarantee exists. In general I have quite a few questions and concerns regarding this patch series. The comment in patch 2/2 states that it is designed to work around a link flap issue with the average time between link up and down is 3-4ms, and yet the code waits a whole 100ms before reading the PHY bit the first time. Why so long? Furthermore, if I am reading this right, it appears that, with the proposed change, e1000e_phy_has_link_generic will poll the PHY link up to 10 times, with 100ms delay between each iteration - until the link is up. Won't it lead to wasting all this time, if the link is actually down? Looking at https://bugzilla.kernel.org/show_bug.cgi?id=218642, at the problem this commit series is trying to solve - I wonder: (1) How serious this problem is. It is normal for link establishment to take a few seconds from plugging the cable (due to PHY auto-negotiation), and I can accept some link instability during that time. (2) Assuming the problem is considered serious - have we root-caused it correctly.
> Why so long? > Furthermore, if I am reading this right, it appears that, with the > proposed change, e1000e_phy_has_link_generic will poll the PHY link up > to 10 times, with 100ms delay between each iteration - until the link is > up. Won't it lead to wasting all this time, if the link is actually down? It seems like making the delay 10ms is more suitable. And if the link is actually down, it takes up to 10 (times)* 10 (ms) = 100ms which is not quite long in terms of the configuration path. > (1) How serious this problem is. It is normal for link establishment to > take a few seconds from plugging the cable (due to PHY > auto-negotiation), and I can accept some link instability during that time. Actually, the problem is not critical since the link will be up permanently after the unstable up-down problem when hot-plugging. And it has no functional impact on the system. But this problem can lead to a failure in our script (for Canonical Certification), and it's not tolerable. > (2) Assuming the problem is considered serious - have we root-caused it > correctly. The problem seems to be a PHY manufacturer errata. The patch here is a MAC workaround and I'm wondering if we can root-cause it. On Tue, 7 May 2024 at 08:22, Ruinskiy, Dima <dima.ruinskiy@intel.com> wrote: > > On 06/05/2024 19:46, En-Wei WU wrote: > > Thank you for your time. > > > > Originally, sleep codes would only be executed if the first read fails > > or the link status that is read is down. Some circumstances like the > > [v2,2/2] "e1000e: fix link fluctuations problem" would need a delay > > before first reading/accessing the PHY IEEE register, so that it won't > > read the instability of the link status bit in the PHY status > > register. > > > > I've realized that this approach isn't good enough since the purpose > > is only to fix the problem in another patch and it also changes the > > behavior. > > > > Here is the modification of the patch [v2,2/2] "e1000e: fix link > > fluctuations problem": > > --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c > > +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c > > @@ -1428,7 +1428,17 @@ static s32 > > e1000_check_for_copper_link_ich8lan(struct e1000_hw *hw) > > - ret_val = e1000e_phy_has_link_generic(hw, 1, 0, &link); > > /* comments */ > > + ret_val = e1000e_phy_has_link_generic(hw, COPPER_LINK_UP_LIMIT, > > 100000, &link); > > > > Do you think we can just add a msleep/usleep_range in front of the > > e1000e_phy_has_link_generic() instead of modifying the sleep codes in > > e1000e_phy_has_link_generic()? > > > > Thanks. > > > > On Mon, 6 May 2024 at 23:53, Sasha Neftin <sasha.neftin@intel.com> wrote: > >> > >> On 03/05/2024 13:18, Ricky Wu wrote: > >>> Originally, the sleep codes being moved forward only > >>> ran if we met some conditions (e.g. BMSR_LSTATUS bit > >>> not set in phy_status). Moving these sleep codes forward > >>> makes the usec_interval take effect every time. > >>> > >>> Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> > >>> --- > >>> > >>> In v2: > >>> * Split the sleep codes into this patch > >>> > >>> drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++---- > >>> 1 file changed, 5 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c > >>> index 93544f1cc2a5..4a58d56679c9 100644 > >>> --- a/drivers/net/ethernet/intel/e1000e/phy.c > >>> +++ b/drivers/net/ethernet/intel/e1000e/phy.c > >>> @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > >>> > >>> *success = false; > >>> for (i = 0; i < iterations; i++) { > >>> + if (usec_interval >= 1000) > >>> + msleep(usec_interval / 1000); > >>> + else > >>> + udelay(usec_interval); > >>> + > >> > >> I do not understand this approach. Why wait before first > >> reading/accessing the PHY IEEE register? > >> > >> For further discussion, I would like to introduce Dima Ruinskiy (architect) > >> > >>> /* Some PHYs require the MII_BMSR register to be read > >>> * twice due to the link bit being sticky. No harm doing > >>> * it across the board. > >>> @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, > >>> *success = true; > >>> break; > >>> } > >>> - if (usec_interval >= 1000) > >>> - msleep(usec_interval / 1000); > >>> - else > >>> - udelay(usec_interval); > >>> } > >>> > >>> return ret_val; > >> > > Regarding the usage of sleep/sleep_range functions - they can only be > used if this code will never be called from an atomic context. I do not > know if such a guarantee exists. > > In general I have quite a few questions and concerns regarding this > patch series. The comment in patch 2/2 states that it is designed to > work around a link flap issue with the average time between link up and > down is 3-4ms, and yet the code waits a whole 100ms before reading the > PHY bit the first time. Why so long? > > Furthermore, if I am reading this right, it appears that, with the > proposed change, e1000e_phy_has_link_generic will poll the PHY link up > to 10 times, with 100ms delay between each iteration - until the link is > up. Won't it lead to wasting all this time, if the link is actually down? > > Looking at https://bugzilla.kernel.org/show_bug.cgi?id=218642, at the > problem this commit series is trying to solve - I wonder: > (1) How serious this problem is. It is normal for link establishment to > take a few seconds from plugging the cable (due to PHY > auto-negotiation), and I can accept some link instability during that time. > (2) Assuming the problem is considered serious - have we root-caused it > correctly. >
> > (1) How serious this problem is. It is normal for link establishment to > > take a few seconds from plugging the cable (due to PHY > > auto-negotiation), and I can accept some link instability during that time. > Actually, the problem is not critical since the link will be up > permanently after the unstable up-down problem when hot-plugging. And > it has no functional impact on the system. But this problem can lead > to a failure in our script (for Canonical Certification), and it's not > tolerable. Please could you describe your test. We should be sure you are fixing the right thing. Maybe the test is broken, not the driver... Andrew
diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c index 93544f1cc2a5..4a58d56679c9 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.c +++ b/drivers/net/ethernet/intel/e1000e/phy.c @@ -1777,6 +1777,11 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, *success = false; for (i = 0; i < iterations; i++) { + if (usec_interval >= 1000) + msleep(usec_interval / 1000); + else + udelay(usec_interval); + /* Some PHYs require the MII_BMSR register to be read * twice due to the link bit being sticky. No harm doing * it across the board. @@ -1799,10 +1804,6 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, *success = true; break; } - if (usec_interval >= 1000) - msleep(usec_interval / 1000); - else - udelay(usec_interval); } return ret_val;
Originally, the sleep codes being moved forward only ran if we met some conditions (e.g. BMSR_LSTATUS bit not set in phy_status). Moving these sleep codes forward makes the usec_interval take effect every time. Signed-off-by: Ricky Wu <en-wei.wu@canonical.com> --- In v2: * Split the sleep codes into this patch drivers/net/ethernet/intel/e1000e/phy.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)