Message ID | 20210511151955.28071-1-vigneshr@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | serial: 8250: 8250_omap: Fix possible interrupt storm | expand |
On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote: > It is possible that RX TIMEOUT is signalled after RX FIFO has been > drained, in which case a dummy read of RX FIFO is required to clear RX > TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared > leading to an interrupt storm > > Cc: stable@vger.kernel.org How far back does this need to go? What commit id does this fix? What caused this to just show up now vs. previously? thanks, greg k-h
Hi Greg, Vignesh & Jan, * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]: > On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote: > > It is possible that RX TIMEOUT is signalled after RX FIFO has been > > drained, in which case a dummy read of RX FIFO is required to clear RX > > TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared > > leading to an interrupt storm > > > > Cc: stable@vger.kernel.org > > How far back does this need to go? What commit id does this fix? What > caused this to just show up now vs. previously? I just noticed this causes the following regression in Linux next when pressing a key on uart console after boot at least on omap3. This seems to happen on serial_port_in(port, UART_RX) in the quirk handling. Vignesh, it seems this quirk needs some soc specific flag added to it maybe? Or maybe UART_OMAP_RX_LVL register is not available for all the SoCs? I think it's best to drop this patch until the issues are resolved, also there are some open comments above that might be answered by limiting this quirk to a specific range of SoCs :) Regards, Tony 8< ------------------------ #regzb introduced: 31fae7c8b18c ("serial: 8250: 8250_omap: Fix possible interrupt storm") Internal error: : 1028 [#1] SMP ARM Modules linked in: CPU: 0 PID: 870 Comm: syslog Not tainted 5.13.0-rc3-next-20210527-00001-g9b545469a50f #34 Hardware name: Generic OMAP36xx (Flattened Device Tree) PC is at mem_serial_in+0xc/0x20 LR is at omap8250_irq+0x258/0x2d4 pc : [<c06762f0>] lr : [<c0681720>] psr: 60000193 sp : c2975c90 ip : 00000000 fp : c1836000 r10: c0ff7f20 r9 : c0ff7f40 r8 : 00000058 r7 : c2975ce0 r6 : 00000000 r5 : 00000001 r4 : c1031a24 r3 : fa06a000 r2 : 00000002 r1 : 00000000 r0 : c1031a24 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 829cc019 DAC: 00000051 Register r0 information: non-slab/vmalloc memory Register r1 information: NULL pointer Register r2 information: non-paged memory Register r3 information: 0-page vmalloc region starting at 0xfa000000 allocated at iotable_init+0x0/0xf4 Register r4 information: non-slab/vmalloc memory Register r5 information: non-paged memory Register r6 information: NULL pointer Register r7 information: non-slab/vmalloc memory Register r8 information: non-paged memory Register r9 information: non-slab/vmalloc memory Register r10 information: non-slab/vmalloc memory Register r11 information: slab kmalloc-256 start c1836000 pointer offset 0 size 256 Register r12 information: NULL pointer Process syslog (pid: 870, stack limit = 0x64988e4e) ... [<c06762f0>] (mem_serial_in) from [<c0681720>] (omap8250_irq+0x258/0x2d4) [<c0681720>] (omap8250_irq) from [<c01a00b8>] (__handle_irq_event_percpu+0x58/0x1f0) [<c01a00b8>] (__handle_irq_event_percpu) from [<c01a0334>] (handle_irq_event+0x68/0xcc) [<c01a0334>] (handle_irq_event) from [<c01a4b6c>] (handle_level_irq+0xc4/0x1c8) [<c01a4b6c>] (handle_level_irq) from [<c019f968>] (__handle_domain_irq+0x84/0xfc) [<c019f968>] (__handle_domain_irq) from [<c0100b6c>] (__irq_svc+0x6c/0x90) Exception stack(0xc2975d28 to 0xc2975d70) 5d20: c2532990 c25328a8 00000006 c289a015 c2532888 c2532990 5d40: 00000002 00000000 00000006 64407df7 c2532880 64407df7 00000006 c2975d78 5d60: c02ef31c c030097c 60000013 ffffffff [<c0100b6c>] (__irq_svc) from [<c030097c>] (__d_lookup_rcu+0xbc/0x1b8) [<c030097c>] (__d_lookup_rcu) from [<c02ef31c>] (lookup_fast+0x48/0x180) [<c02ef31c>] (lookup_fast) from [<c02f23b8>] (walk_component+0x3c/0x1c8) [<c02f23b8>] (walk_component) from [<c02f2780>] (link_path_walk.part.0+0x23c/0x364) [<c02f2780>] (link_path_walk.part.0) from [<c02f28dc>] (path_parentat+0x34/0x74) [<c02f28dc>] (path_parentat) from [<c02f48c8>] (filename_parentat+0x88/0x19c) [<c02f48c8>] (filename_parentat) from [<c02f4a20>] (filename_create+0x44/0x150) [<c02f4a20>] (filename_create) from [<c02f4c2c>] (do_mkdirat+0x58/0x11c) [<c02f4c2c>] (do_mkdirat) from [<c0100080>] (ret_fast_syscall+0x0/0x58)
Hi, On 5/28/21 11:09 AM, Tony Lindgren wrote: > Hi Greg, Vignesh & Jan, > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]: >> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote: >>> It is possible that RX TIMEOUT is signalled after RX FIFO has been >>> drained, in which case a dummy read of RX FIFO is required to clear RX >>> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared >>> leading to an interrupt storm >>> >>> Cc: stable@vger.kernel.org >> >> How far back does this need to go? What commit id does this fix? What >> caused this to just show up now vs. previously? Sorry, I missed this reply. Issue was reported on AM65x SoC with custom test case from Jan Kiszka that stressed UART with rapid baudrate changes from 9600 to 4M along with data transfer. Based on the condition that led to interrupt storm, I inferred it to affect all SoCs with 8250 OMAP UARTs. But that seems thats not the best idea as seen from OMAP3 regression. Greg, Could you please drop the patch? Very sorry for the inconvenience.. > > I just noticed this causes the following regression in Linux next when > pressing a key on uart console after boot at least on omap3. This seems > to happen on serial_port_in(port, UART_RX) in the quirk handling. > > Vignesh, it seems this quirk needs some soc specific flag added to > it maybe? Or maybe UART_OMAP_RX_LVL register is not available for > all the SoCs? > Yes indeed :( > I think it's best to drop this patch until the issues are resolved, > also there are some open comments above that might be answered by > limiting this quirk to a specific range of SoCs :) > Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART IP is quite different. I will respin the patch making sure, workaround applies only to AM65x and K3 SoCs. Regards Vignesh
On Fri, May 28, 2021 at 11:41:36AM +0530, Vignesh Raghavendra wrote: > Hi, > > On 5/28/21 11:09 AM, Tony Lindgren wrote: > > Hi Greg, Vignesh & Jan, > > > > * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]: > >> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote: > >>> It is possible that RX TIMEOUT is signalled after RX FIFO has been > >>> drained, in which case a dummy read of RX FIFO is required to clear RX > >>> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared > >>> leading to an interrupt storm > >>> > >>> Cc: stable@vger.kernel.org > >> > >> How far back does this need to go? What commit id does this fix? What > >> caused this to just show up now vs. previously? > > Sorry, I missed this reply. Issue was reported on AM65x SoC with custom > test case from Jan Kiszka that stressed UART with rapid baudrate changes > from 9600 to 4M along with data transfer. > > Based on the condition that led to interrupt storm, I inferred it to > affect all SoCs with 8250 OMAP UARTs. But that seems thats not the best > idea as seen from OMAP3 regression. > > Greg, > > Could you please drop the patch? Very sorry for the inconvenience.. Now reverted, thanks. greg k-h
On 28.05.21 08:11, Vignesh Raghavendra wrote: > Hi, > > On 5/28/21 11:09 AM, Tony Lindgren wrote: >> Hi Greg, Vignesh & Jan, >> >> * Greg Kroah-Hartman <gregkh@linuxfoundation.org> [210513 14:17]: >>> On Tue, May 11, 2021 at 08:49:55PM +0530, Vignesh Raghavendra wrote: >>>> It is possible that RX TIMEOUT is signalled after RX FIFO has been >>>> drained, in which case a dummy read of RX FIFO is required to clear RX >>>> TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared >>>> leading to an interrupt storm >>>> >>>> Cc: stable@vger.kernel.org >>> >>> How far back does this need to go? What commit id does this fix? What >>> caused this to just show up now vs. previously? > > Sorry, I missed this reply. Issue was reported on AM65x SoC with custom > test case from Jan Kiszka that stressed UART with rapid baudrate changes > from 9600 to 4M along with data transfer. > > Based on the condition that led to interrupt storm, I inferred it to > affect all SoCs with 8250 OMAP UARTs. But that seems thats not the best > idea as seen from OMAP3 regression. > > Greg, > > Could you please drop the patch? Very sorry for the inconvenience.. > >> >> I just noticed this causes the following regression in Linux next when >> pressing a key on uart console after boot at least on omap3. This seems >> to happen on serial_port_in(port, UART_RX) in the quirk handling. >> >> Vignesh, it seems this quirk needs some soc specific flag added to >> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for >> all the SoCs? >> > > Yes indeed :( > >> I think it's best to drop this patch until the issues are resolved, >> also there are some open comments above that might be answered by >> limiting this quirk to a specific range of SoCs :) >> > > Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART > IP is quite different. I will respin the patch making sure, workaround > applies only to AM65x and K3 SoCs. > > Regards > Vignesh > What's the status here for AM65x? The issue remains present on that platform, and I was hoping to see a quick follow up that limit the fix to that target. Jan
On 6/22/21 11:45 AM, Jan Kiszka wrote: >>> Vignesh, it seems this quirk needs some soc specific flag added to >>> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for >>> all the SoCs? >>> >> Yes indeed :( >> >>> I think it's best to drop this patch until the issues are resolved, >>> also there are some open comments above that might be answered by >>> limiting this quirk to a specific range of SoCs :) >>> >> Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART >> IP is quite different. I will respin the patch making sure, workaround >> applies only to AM65x and K3 SoCs. >> >> Regards >> Vignesh >> > What's the status here for AM65x? The issue remains present on that > platform, and I was hoping to see a quick follow up that limit the fix > to that target. > Sorry for the delay, I am trying to find which other TI SoCs are affected by this issue. But that exercise will need a bit more time. Will send a fix to address K3 SoCs like AM65x today/tomo.
Tue, Jun 22, 2021 at 11:53:38AM +0530, Vignesh Raghavendra kirjoitti: > On 6/22/21 11:45 AM, Jan Kiszka wrote: > >>> Vignesh, it seems this quirk needs some soc specific flag added to > >>> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for > >>> all the SoCs? > >>> > >> Yes indeed :( > >> > >>> I think it's best to drop this patch until the issues are resolved, > >>> also there are some open comments above that might be answered by > >>> limiting this quirk to a specific range of SoCs :) > >>> > >> Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART > >> IP is quite different. I will respin the patch making sure, workaround > >> applies only to AM65x and K3 SoCs. > >> > >> Regards > >> Vignesh > >> > > What's the status here for AM65x? The issue remains present on that > > platform, and I was hoping to see a quick follow up that limit the fix > > to that target. > > Sorry for the delay, I am trying to find which other TI SoCs are > affected by this issue. But that exercise will need a bit more time. > Will send a fix to address K3 SoCs like AM65x today/tomo. This all reminds me the very similar issue one found on Intel integrated (Synopsys DesignWare based) UARTs: https://lore.kernel.org/linux-serial/20170206233000.3021-1-dianders@chromium.org/ https://lore.kernel.org/linux-serial/1440015124-28393-1-git-send-email-california.l.sullivan@intel.com/
On 7/13/21 1:57 AM, andy@surfacebook.localdomain wrote: > Tue, Jun 22, 2021 at 11:53:38AM +0530, Vignesh Raghavendra kirjoitti: >> On 6/22/21 11:45 AM, Jan Kiszka wrote: >>>>> Vignesh, it seems this quirk needs some soc specific flag added to >>>>> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for >>>>> all the SoCs? >>>>> >>>> Yes indeed :( >>>> >>>>> I think it's best to drop this patch until the issues are resolved, >>>>> also there are some open comments above that might be answered by >>>>> limiting this quirk to a specific range of SoCs :) >>>>> >>>> Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART >>>> IP is quite different. I will respin the patch making sure, workaround >>>> applies only to AM65x and K3 SoCs. >>>> >>>> Regards >>>> Vignesh >>>> >>> What's the status here for AM65x? The issue remains present on that >>> platform, and I was hoping to see a quick follow up that limit the fix >>> to that target. >> >> Sorry for the delay, I am trying to find which other TI SoCs are >> affected by this issue. But that exercise will need a bit more time. >> Will send a fix to address K3 SoCs like AM65x today/tomo. > > This all reminds me the very similar issue one found on Intel integrated > (Synopsys DesignWare based) UARTs: > Hmm, yes, seems like common problem with some 8250 UARTs although not all TI SoCs show this behavior even though they all claim 8250 compatible. > https://lore.kernel.org/linux-serial/20170206233000.3021-1-dianders@chromium.org/ I am not sure if reading UART_LSR is a good idea in the above patch. Some flags in LSR register are cleared on read (at least that's the case for UARTs on TI SoCs) and thus can result in loss of error/FIFO status information. > https://lore.kernel.org/linux-serial/1440015124-28393-1-git-send-email-california.l.sullivan@intel.com/ > Looks like this never made it. Given the quirks associated with 8250 UARTs, workarounds would need to be tied to specific variants, so I don't know if its possible to implement the fix in 8250 core IRQ handler. PS: v2 of $patch is already merged. Regards Vignesh
On Tue, Jul 13, 2021 at 11:54 AM Vignesh Raghavendra <vigneshr@ti.com> wrote: > On 7/13/21 1:57 AM, andy@surfacebook.localdomain wrote: > > Tue, Jun 22, 2021 at 11:53:38AM +0530, Vignesh Raghavendra kirjoitti: > >> On 6/22/21 11:45 AM, Jan Kiszka wrote: > >>>>> Vignesh, it seems this quirk needs some soc specific flag added to > >>>>> it maybe? Or maybe UART_OMAP_RX_LVL register is not available for > >>>>> all the SoCs? > >>>>> > >>>> Yes indeed :( > >>>> > >>>>> I think it's best to drop this patch until the issues are resolved, > >>>>> also there are some open comments above that might be answered by > >>>>> limiting this quirk to a specific range of SoCs :) > >>>>> > >>>> Oops, I did test patch AM33xx assuming its equivalent to OMAP3, but UART > >>>> IP is quite different. I will respin the patch making sure, workaround > >>>> applies only to AM65x and K3 SoCs. > >>>> > >>>> Regards > >>>> Vignesh > >>>> > >>> What's the status here for AM65x? The issue remains present on that > >>> platform, and I was hoping to see a quick follow up that limit the fix > >>> to that target. > >> > >> Sorry for the delay, I am trying to find which other TI SoCs are > >> affected by this issue. But that exercise will need a bit more time. > >> Will send a fix to address K3 SoCs like AM65x today/tomo. > > > > This all reminds me the very similar issue one found on Intel integrated > > (Synopsys DesignWare based) UARTs: > > > > Hmm, yes, seems like common problem with some 8250 UARTs although not > all TI SoCs show this behavior even though they all claim 8250 compatible. > > > https://lore.kernel.org/linux-serial/20170206233000.3021-1-dianders@chromium.org/ > > I am not sure if reading UART_LSR is a good idea in the above patch. > Some flags in LSR register are cleared on read (at least that's the case > for UARTs on TI SoCs) and thus can result in loss of error/FIFO status > information. > > > https://lore.kernel.org/linux-serial/1440015124-28393-1-git-send-email-california.l.sullivan@intel.com/ > > > > Looks like this never made it. > > Given the quirks associated with 8250 UARTs, workarounds would need to > be tied to specific variants, so I don't know if its possible to > implement the fix in 8250 core IRQ handler. I believe they are all are derivatives from Synopsys DesignWare one or another version of it. > PS: v2 of $patch is already merged. I noticed after sending that email.
On Tue, Jul 13, 2021 at 11:54 AM Vignesh Raghavendra <vigneshr@ti.com> wrote: > On 7/13/21 1:57 AM, andy@surfacebook.localdomain wrote: > > Tue, Jun 22, 2021 at 11:53:38AM +0530, Vignesh Raghavendra kirjoitti: ... > > https://lore.kernel.org/linux-serial/20170206233000.3021-1-dianders@chromium.org/ > > I am not sure if reading UART_LSR is a good idea in the above patch. > Some flags in LSR register are cleared on read (at least that's the case > for UARTs on TI SoCs) and thus can result in loss of error/FIFO status > information. > > > https://lore.kernel.org/linux-serial/1440015124-28393-1-git-send-email-california.l.sullivan@intel.com/ > > Looks like this never made it. Forgot to react to the above. Yes, they never made it because I believe due to the exact reason you mentioned above. Also California set up different experiments IIRC and it shows that the problem didn;t fully disappear with his approach. But maybe yours will work better (at least it's not the first time I have seen it on different hardware according to people's contributions).
* Andy Shevchenko <andy.shevchenko@gmail.com> [210713 09:14]: > On Tue, Jul 13, 2021 at 11:54 AM Vignesh Raghavendra <vigneshr@ti.com> wrote: > > On 7/13/21 1:57 AM, andy@surfacebook.localdomain wrote: > > > Tue, Jun 22, 2021 at 11:53:38AM +0530, Vignesh Raghavendra kirjoitti: > > ... > > > > https://lore.kernel.org/linux-serial/20170206233000.3021-1-dianders@chromium.org/ > > > > I am not sure if reading UART_LSR is a good idea in the above patch. > > Some flags in LSR register are cleared on read (at least that's the case > > for UARTs on TI SoCs) and thus can result in loss of error/FIFO status > > information. > > > > > https://lore.kernel.org/linux-serial/1440015124-28393-1-git-send-email-california.l.sullivan@intel.com/ > > > > Looks like this never made it. > > Forgot to react to the above. Yes, they never made it because I > believe due to the exact reason you mentioned above. Also California > set up different experiments IIRC and it shows that the problem didn;t > fully disappear with his approach. But maybe yours will work better > (at least it's not the first time I have seen it on different hardware > according to people's contributions). Not sure if this is the same issue with noisy lines, but see also the following in case it's related: [PATCH 2/2] serial: 8250_omap: Handle optional overrun-throttle-ms property Also available at [0] below. Regards, Tony [0] https://lore.kernel.org/linux-omap/20210727103533.51547-1-tony@atomide.com/T/#m5f9da26c32503f2937d3d5977310ca337fa0cb5a
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c index 8ac11eaeca51..c71bd766fa56 100644 --- a/drivers/tty/serial/8250/8250_omap.c +++ b/drivers/tty/serial/8250/8250_omap.c @@ -104,6 +104,9 @@ #define UART_OMAP_EFR2 0x23 #define UART_OMAP_EFR2_TIMEOUT_BEHAVE BIT(6) +/* RX FIFO occupancy indicator */ +#define UART_OMAP_RX_LVL 0x64 + struct omap8250_priv { int line; u8 habit; @@ -625,6 +628,15 @@ static irqreturn_t omap8250_irq(int irq, void *dev_id) serial8250_rpm_get(up); iir = serial_port_in(port, UART_IIR); ret = serial8250_handle_irq(port, iir); + /* + * It is possible that RX TIMEOUT is signalled after FIFO + * has been drained, in which case a dummy read of RX FIFO is + * required to clear RX TIMEOUT condition. + */ + if ((iir & UART_IIR_RX_TIMEOUT) == UART_IIR_RX_TIMEOUT) { + if (serial_port_in(port, UART_OMAP_RX_LVL) == 0) + serial_port_in(port, UART_RX); + } serial8250_rpm_put(up); return IRQ_RETVAL(ret);
It is possible that RX TIMEOUT is signalled after RX FIFO has been drained, in which case a dummy read of RX FIFO is required to clear RX TIMEOUT condition. Otherwise, RX TIMEOUT condition is not cleared leading to an interrupt storm Cc: stable@vger.kernel.org Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Vignesh Raghavendra <vigneshr@ti.com> --- drivers/tty/serial/8250/8250_omap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)