Message ID | 9b5b6f4c-4f54-4b90-b0b3-8d8023c2e780@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | eabb8a9be1e4a12f3bf37ceb7411083e3775672d |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already" | expand |
On Wed, May 15, 2024 at 8:18 AM Heiner Kallweit <hkallweit1@gmail.com> wrote: > > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. > > Ken reported that RTL8125b can lock up if gro_flush_timeout has the > default value of 20000 and napi_defer_hard_irqs is set to 0. > In this scenario device interrupts aren't disabled, what seems to > trigger some silicon bug under heavy load. I was able to reproduce this > behavior on RTL8168h. Fix this by reverting 7274c4147afb. > > Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") > Cc: stable@vger.kernel.org > Reported-by: Ken Milmore <ken.milmore@gmail.com> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks !
On 15/05/2024 07:18, Heiner Kallweit wrote: > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. > > Ken reported that RTL8125b can lock up if gro_flush_timeout has the > default value of 20000 and napi_defer_hard_irqs is set to 0. > In this scenario device interrupts aren't disabled, what seems to > trigger some silicon bug under heavy load. I was able to reproduce this > behavior on RTL8168h. Fix this by reverting 7274c4147afb. > > Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") > Cc: stable@vger.kernel.org > Reported-by: Ken Milmore <ken.milmore@gmail.com> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > --- > drivers/net/ethernet/realtek/r8169_main.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c > index 0fc5fe564ae5..69606c8081a3 100644 > --- a/drivers/net/ethernet/realtek/r8169_main.c > +++ b/drivers/net/ethernet/realtek/r8169_main.c > @@ -4655,10 +4655,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) > rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING); > } > > - if (napi_schedule_prep(&tp->napi)) { > - rtl_irq_disable(tp); > - __napi_schedule(&tp->napi); > - } > + rtl_irq_disable(tp); > + napi_schedule(&tp->napi); > out: > rtl_ack_events(tp, status); > FYI, by now I am reasonably well convinced that the behaviour we've been seeing is not in fact a silicon bug, but rather a very specific behaviour regarding how these devices raise MSI interrupts. This is largely due to the investigations by David Dillow described exhaustively in the 2009 netdev thread linked below. I wish I had spotted this much sooner! This information has been corroborated by my own testing on the RTL8125b: https://lore.kernel.org/netdev/1242001754.4093.12.camel@obelisk.thedillows.org/T/ To summarise precisely what I think the behaviour is: ******** An interrupt is generated *only* when the device registers undergo a transition from (status & mask) == 0 to (status & mask) != 0. ******** If the above holds, then calling rtl_irq_disable() will immediately force the condition (status & mask) == 0, so we are ready to raise another interrupt when interrupts are subsequently enabled again. To try and verify this, I tried the code below, which locks up the network traffic immediately, regardless of the setting of napi_defer_hard_irqs: diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c index 2ce4bff..add5bdd 100644 --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c @@ -4607,10 +4607,13 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING); } - rtl_irq_disable(tp); napi_schedule(&tp->napi); out: + u32 status2 = rtl_get_events(tp); rtl_ack_events(tp, status); + if(status2 & ~status) + printk_ratelimited("rtl8169_interrupt: status=%x status2=%x\n", + status, status2); return IRQ_HANDLED; } Here's some typical dmesg output: [11315.581136] rtl8169_interrupt: status=1 status2=85 [11324.142176] r8169 0000:07:00.0 eth0: ASPM disabled on Tx timeout [11324.151765] rtl8169_interrupt: status=4 status2=84 We can see that when a new interrupt is flagged in the interval between reading the status register and writing to it, we may never achieve the condition (status & mask) == 0. So, if we read 0x01 (RxOK) from the status register, we will then write 0x01 back to acknowledge the interrupt. But in the meantime, 0x04 (TxOK) has been flagged, as well as 0x80 (TxDescUnavail), so the register now contains 0x85. We acknowledge by writing back 0x01, so the status register should now contain 0x84. If interrupts are unmasked throughout, then (status & mask) != 0 throughout, so no interrupt will be raised for the missed TxOK event, unless something else should occur to set one of the other status bits. To test this hypothesis, I tried the code below, which never disables interrupts but instead clears out the status register on every interrupt: index 2ce4bff..dbda9ef 100644 --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c @@ -4610,7 +4610,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) rtl_irq_disable(tp); napi_schedule(&tp->napi); out: - rtl_ack_events(tp, status); + rtl_ack_events(tp, tp->irq_mask); return IRQ_HANDLED; } This passed my iperf3 test perfectly! It is likely to cause other problems though: Specifically it opens the possibility that we will miss a SYSErr, LinkChg or RxFIFOOver interrupt. Hence the rationale for achieving the required (status & mask) == 0 condition by clearing the mask register instead. I hope this information may prove useful in the future.
On 16/05/2024 22:10, Ken Milmore wrote: > To test this hypothesis, I tried the code below, which never disables > interrupts but instead clears out the status register on every interrupt: > > index 2ce4bff..dbda9ef 100644 > --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c > +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c > @@ -4610,7 +4610,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) > rtl_irq_disable(tp); > napi_schedule(&tp->napi); > out: > - rtl_ack_events(tp, status); > + rtl_ack_events(tp, tp->irq_mask); > > return IRQ_HANDLED; > } > > This passed my iperf3 test perfectly! It is likely to cause other problems > though: Specifically it opens the possibility that we will miss a SYSErr, > LinkChg or RxFIFOOver interrupt. Hence the rationale for achieving the required > (status & mask) == 0 condition by clearing the mask register instead. Sorry, that patch should have been: diff --git linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c index 2ce4bff..e9757ff 100644 --- linux-source-6.1~/drivers/net/ethernet/realtek/r8169_main.c +++ linux-source-6.1/drivers/net/ethernet/realtek/r8169_main.c @@ -4607,10 +4607,9 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING); } - rtl_irq_disable(tp); napi_schedule(&tp->napi); out: - rtl_ack_events(tp, status); + rtl_ack_events(tp, tp->irq_mask); return IRQ_HANDLED; }
On 15.05.2024 08:18, Heiner Kallweit wrote: > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. > > Ken reported that RTL8125b can lock up if gro_flush_timeout has the > default value of 20000 and napi_defer_hard_irqs is set to 0. > In this scenario device interrupts aren't disabled, what seems to > trigger some silicon bug under heavy load. I was able to reproduce this > behavior on RTL8168h. Fix this by reverting 7274c4147afb. > > Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") > Cc: stable@vger.kernel.org > Reported-by: Ken Milmore <ken.milmore@gmail.com> > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > --- This patch was mistakenly set to "changes requested". The replies from Ken provide additional details on how interrupt mask and status register behave on these chips. However the patch itself is correct and should be applied.
I concur, Heiner's patch is correct. Sorry about the noise. On 20/05/2024 14:50, Heiner Kallweit wrote: > On 15.05.2024 08:18, Heiner Kallweit wrote: >> This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. >> >> Ken reported that RTL8125b can lock up if gro_flush_timeout has the >> default value of 20000 and napi_defer_hard_irqs is set to 0. >> In this scenario device interrupts aren't disabled, what seems to >> trigger some silicon bug under heavy load. I was able to reproduce this >> behavior on RTL8168h. Fix this by reverting 7274c4147afb. >> >> Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") >> Cc: stable@vger.kernel.org >> Reported-by: Ken Milmore <ken.milmore@gmail.com> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> >> --- > > This patch was mistakenly set to "changes requested". > The replies from Ken provide additional details on how interrupt mask > and status register behave on these chips. However the patch itself > is correct and should be applied. >
On Mon, May 20, 2024 at 03:50:11PM +0200, Heiner Kallweit wrote: > On 15.05.2024 08:18, Heiner Kallweit wrote: > > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. > > > > Ken reported that RTL8125b can lock up if gro_flush_timeout has the > > default value of 20000 and napi_defer_hard_irqs is set to 0. > > In this scenario device interrupts aren't disabled, what seems to > > trigger some silicon bug under heavy load. I was able to reproduce this > > behavior on RTL8168h. Fix this by reverting 7274c4147afb. > > > > Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") > > Cc: stable@vger.kernel.org > > Reported-by: Ken Milmore <ken.milmore@gmail.com> > > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > > --- > > This patch was mistakenly set to "changes requested". > The replies from Ken provide additional details on how interrupt mask > and status register behave on these chips. However the patch itself > is correct and should be applied. I guess someone hit the wrong button by mistake. Let's see if this puts things back on track. pw-bot: under-review
Hello: This patch was applied to netdev/net.git (main) by Paolo Abeni <pabeni@redhat.com>: On Wed, 15 May 2024 08:18:01 +0200 you wrote: > This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. > > Ken reported that RTL8125b can lock up if gro_flush_timeout has the > default value of 20000 and napi_defer_hard_irqs is set to 0. > In this scenario device interrupts aren't disabled, what seems to > trigger some silicon bug under heavy load. I was able to reproduce this > behavior on RTL8168h. Fix this by reverting 7274c4147afb. > > [...] Here is the summary with links: - [net] Revert "r8169: don't try to disable interrupts if NAPI is, scheduled already" https://git.kernel.org/netdev/net/c/eabb8a9be1e4 You are awesome, thank you!
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c index 0fc5fe564ae5..69606c8081a3 100644 --- a/drivers/net/ethernet/realtek/r8169_main.c +++ b/drivers/net/ethernet/realtek/r8169_main.c @@ -4655,10 +4655,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance) rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING); } - if (napi_schedule_prep(&tp->napi)) { - rtl_irq_disable(tp); - __napi_schedule(&tp->napi); - } + rtl_irq_disable(tp); + napi_schedule(&tp->napi); out: rtl_ack_events(tp, status);
This reverts commit 7274c4147afbf46f45b8501edbdad6da8cd013b9. Ken reported that RTL8125b can lock up if gro_flush_timeout has the default value of 20000 and napi_defer_hard_irqs is set to 0. In this scenario device interrupts aren't disabled, what seems to trigger some silicon bug under heavy load. I was able to reproduce this behavior on RTL8168h. Fix this by reverting 7274c4147afb. Fixes: 7274c4147afb ("r8169: don't try to disable interrupts if NAPI is scheduled already") Cc: stable@vger.kernel.org Reported-by: Ken Milmore <ken.milmore@gmail.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- drivers/net/ethernet/realtek/r8169_main.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)