mbox series

[0/3] serial: qcom-geni: fix lockups

Message ID 20240624133135.7445-1-johan+linaro@kernel.org (mailing list archive)
Headers show
Series serial: qcom-geni: fix lockups | expand

Message

Johan Hovold June 24, 2024, 1:31 p.m. UTC
Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
hard, for example, when stopping a getty on reboot.

The first patch in this series fixes this severe regression by restoring
the pre-6.10-rc1 behaviour of printing additional characters when
flushing the tx buffer.

The second patch fixes a long-standing issue in the GENI driver which
can lead to a soft lock up when using software flow control and on
suspend.

The third patch, addresses the old issue with additional characters
being printing when flushing the buffer.

Note that timeouts used when clearing the tx fifo are a bit excessive
since I'm reusing the current qcom_geni_serial_poll_bit() helper for
now.

I think at least the first patch should be merged for rc6 while we
consider the best way forward to address the remaining issues.

Doug has posted an alternative series of fixes here that depends on
reworking the driver a fair bit here:

	https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/

Johan


Johan Hovold (3):
  serial: qcom-geni: fix hard lockup on buffer flush
  serial: qcom-geni: fix soft lockup on sw flow control and suspend
  serial: qcom-geni: fix garbage output after buffer flush

 drivers/tty/serial/qcom_geni_serial.c | 36 +++++++++++++++++++--------
 1 file changed, 25 insertions(+), 11 deletions(-)

Comments

Greg Kroah-Hartman July 3, 2024, 2:09 p.m. UTC | #1
On Mon, Jun 24, 2024 at 03:31:32PM +0200, Johan Hovold wrote:
> Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> hard, for example, when stopping a getty on reboot.
> 
> The first patch in this series fixes this severe regression by restoring
> the pre-6.10-rc1 behaviour of printing additional characters when
> flushing the tx buffer.
> 
> The second patch fixes a long-standing issue in the GENI driver which
> can lead to a soft lock up when using software flow control and on
> suspend.
> 
> The third patch, addresses the old issue with additional characters
> being printing when flushing the buffer.
> 
> Note that timeouts used when clearing the tx fifo are a bit excessive
> since I'm reusing the current qcom_geni_serial_poll_bit() helper for
> now.
> 
> I think at least the first patch should be merged for rc6 while we
> consider the best way forward to address the remaining issues.
> 
> Doug has posted an alternative series of fixes here that depends on
> reworking the driver a fair bit here:
> 
> 	https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/

I'm confused.  Should I take this series, or Doug's, or Doug's single
patch that they say resolve the immediate issue?  I can't tell what was
agreed on here at all, so I'm going to drop all of these patches and
wait for a resubmission that everyone agrees should be what is taken...

thanks,

greg k-h
Johan Hovold July 3, 2024, 2:13 p.m. UTC | #2
On Wed, Jul 03, 2024 at 04:09:22PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Jun 24, 2024 at 03:31:32PM +0200, Johan Hovold wrote:
> > Since 6.10-rc1, Qualcomm machines with a serial port can easily lock up
> > hard, for example, when stopping a getty on reboot.
> > 
> > The first patch in this series fixes this severe regression by restoring
> > the pre-6.10-rc1 behaviour of printing additional characters when
> > flushing the tx buffer.
> > 
> > The second patch fixes a long-standing issue in the GENI driver which
> > can lead to a soft lock up when using software flow control and on
> > suspend.
> > 
> > The third patch, addresses the old issue with additional characters
> > being printing when flushing the buffer.
> > 
> > Note that timeouts used when clearing the tx fifo are a bit excessive
> > since I'm reusing the current qcom_geni_serial_poll_bit() helper for
> > now.
> > 
> > I think at least the first patch should be merged for rc6 while we
> > consider the best way forward to address the remaining issues.
> > 
> > Doug has posted an alternative series of fixes here that depends on
> > reworking the driver a fair bit here:
> > 
> > 	https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
> 
> I'm confused.  Should I take this series, or Doug's, or Doug's single
> patch that they say resolve the immediate issue?  I can't tell what was
> agreed on here at all, so I'm going to drop all of these patches and
> wait for a resubmission that everyone agrees should be what is taken...

Yes, sorry about the confusion. I'm preparing a v2 of this series which
fixes the regression without downsides of Doug's first series or minimal
fix. I should be able to post it tomorrow.

Johan