mbox series

[v2,0/3] serial: qcom-geni: fix lockups

Message ID 20240704101805.30612-1-johan+linaro@kernel.org (mailing list archive)
Headers show
Series serial: qcom-geni: fix lockups | expand

Message

Johan Hovold July 4, 2024, 10:18 a.m. UTC
Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
hard, for example, when stopping a getty on reboot.

This was triggered by the kfifo conversion, which turned an existing bug
that caused the driver to print discarded characters after a buffer
flush into a hard lockup.

This series fixes the regression and a related soft lockup issue that
can be triggered on software flow control and on suspend.

Doug has posted an alternative series of fixes here that depends on
reworking the driver a fair bit here:

	https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/

This rework has a significant impact on performance on some platforms,
but fortunately it seems such a rework can be avoided.

There are further bugs in the console code (e.g. that can lead to lost
characters) that this series does not address, but those can be fixed
separately (and I've started working on that).

Johan


Changes in v2
 - restart tx by issuing a transfer command when there is stale data in
   the fifo
 - reorder and rename patches
 - rename cancel tx helper

Johan Hovold (3):
  serial: qcom-geni: fix soft lockup on sw flow control and suspend
  serial: qcom-geni: fix hard lockup on buffer flush
  serial: qcom-geni: do not kill the machine on fifo underrun

 drivers/tty/serial/qcom_geni_serial.c | 51 ++++++++++++++++++++-------
 1 file changed, 38 insertions(+), 13 deletions(-)

Comments

Greg KH July 4, 2024, 10:30 a.m. UTC | #1
On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> hard, for example, when stopping a getty on reboot.
> 
> This was triggered by the kfifo conversion, which turned an existing bug
> that caused the driver to print discarded characters after a buffer
> flush into a hard lockup.
> 
> This series fixes the regression and a related soft lockup issue that
> can be triggered on software flow control and on suspend.
> 
> Doug has posted an alternative series of fixes here that depends on
> reworking the driver a fair bit here:
> 
> 	https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> 
> This rework has a significant impact on performance on some platforms,
> but fortunately it seems such a rework can be avoided.
> 
> There are further bugs in the console code (e.g. that can lead to lost
> characters) that this series does not address, but those can be fixed
> separately (and I've started working on that).

I'll take these now, thanks!

greg k-h
Doug Anderson July 8, 2024, 11:57 p.m. UTC | #2
Johan,

On Thu, Jul 4, 2024 at 3:31 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> > Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> > hard, for example, when stopping a getty on reboot.
> >
> > This was triggered by the kfifo conversion, which turned an existing bug
> > that caused the driver to print discarded characters after a buffer
> > flush into a hard lockup.
> >
> > This series fixes the regression and a related soft lockup issue that
> > can be triggered on software flow control and on suspend.
> >
> > Doug has posted an alternative series of fixes here that depends on
> > reworking the driver a fair bit here:
> >
> >       https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> >
> > This rework has a significant impact on performance on some platforms,
> > but fortunately it seems such a rework can be avoided.
> >
> > There are further bugs in the console code (e.g. that can lead to lost
> > characters) that this series does not address, but those can be fixed
> > separately (and I've started working on that).
>
> I'll take these now, thanks!

Are you going to continue to work on the driver? There are still some
pretty bad bugs including ones that are affecting Collabora's test
labs. Unless you want to try to tackle it some other way, I'm going to
keep pushing for something like my original series to land. I can
re-post them atop your patches since they've landed. This will regress
your performance but correctness trumps performance.

-Doug
Johan Hovold July 9, 2024, 9:32 a.m. UTC | #3
Hi Doug,

Hope you had a good holiday.

On Mon, Jul 08, 2024 at 04:57:55PM -0700, Doug Anderson wrote:
> On Thu, Jul 4, 2024 at 3:31 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Jul 04, 2024 at 12:18:02PM +0200, Johan Hovold wrote:
> > > Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> > > hard, for example, when stopping a getty on reboot.
> > >
> > > This was triggered by the kfifo conversion, which turned an existing bug
> > > that caused the driver to print discarded characters after a buffer
> > > flush into a hard lockup.
> > >
> > > This series fixes the regression and a related soft lockup issue that
> > > can be triggered on software flow control and on suspend.
> > >
> > > Doug has posted an alternative series of fixes here that depends on
> > > reworking the driver a fair bit here:
> > >
> > >       https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> > >
> > > This rework has a significant impact on performance on some platforms,
> > > but fortunately it seems such a rework can be avoided.
> > >
> > > There are further bugs in the console code (e.g. that can lead to lost
> > > characters) that this series does not address, but those can be fixed
> > > separately (and I've started working on that).
> >
> > I'll take these now, thanks!
> 
> Are you going to continue to work on the driver? There are still some
> pretty bad bugs including ones that are affecting Collabora's test
> labs. Unless you want to try to tackle it some other way, I'm going to
> keep pushing for something like my original series to land. I can
> re-post them atop your patches since they've landed. This will regress
> your performance but correctness trumps performance.

Yes, I have a working fix for the console issue that could lead to lost
characters. I need to spend some time on other things this week, but I
intend to follow with fixes for the remaining issues as well.

Johan