diff mbox series

[v2,1/2] platform/surface: aggregator: Defer probing when serdev is not ready

Message ID 20240503030900.1334763-2-weifeng.liu.z@gmail.com (mailing list archive)
State Superseded, archived
Headers show
Series Defer probing of SAM if serdev device is not ready | expand

Commit Message

Weifeng Liu May 3, 2024, 3:08 a.m. UTC
This is an attempt to alleviate race conditions in the SAM driver where
essential resources like serial device and GPIO pins are not ready at
the time ssam_serial_hub_probe() is called.  Instead of giving up
probing, a better way would be to defer the probing by returning
-EPROBE_DEFER, allowing the kernel try again later.

However, there is no way of identifying all such cases from other real
errors in a few days.  So let's take a gradual approach identify and
address these cases as they arise.  This commit marks the initial step
in this process.

Suggested-by: Maximilian Luz <luzmaximilian@gmail.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Weifeng Liu <weifeng.liu.z@gmail.com>
---
 drivers/platform/surface/aggregator/core.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Andy Shevchenko May 3, 2024, 5:03 p.m. UTC | #1
On Fri, May 03, 2024 at 11:08:46AM +0800, Weifeng Liu wrote:
> This is an attempt to alleviate race conditions in the SAM driver where
> essential resources like serial device and GPIO pins are not ready at
> the time ssam_serial_hub_probe() is called.  Instead of giving up
> probing, a better way would be to defer the probing by returning
> -EPROBE_DEFER, allowing the kernel try again later.
> 
> However, there is no way of identifying all such cases from other real
> errors in a few days.  So let's take a gradual approach identify and
> address these cases as they arise.  This commit marks the initial step
> in this process.

...

> +	/*
> +	 * The following step can fail when it's called too early before the
> +	 * underlying UART device is ready (in this case -ENXIO is returned).
> +	 * Instead of simply giving up and losing everything, we can defer
> +	 * the probing by returning -EPROBE_DEFER so that the kernel would be
> +	 * able to retry later.
> +	 */

You can add the following to the
serial_core.c (at the top after the headers)

#undef ENXIO
#define ENXIO __LINE__

And I'm pretty much sure it will point out you to the uart_port_activate().
If it's the case you may elaborate this in the comment.
Otherwise you may add the same hack to other files and find the culprit.

Also it might be that we add some error code substitution inside serdev core.
At least there more data is available to make the (better) decision.
Weifeng Liu May 4, 2024, 4:17 a.m. UTC | #2
Hi Andy,

On Fri, 2024-05-03 at 20:03 +0300, Andy Shevchenko wrote:
> On Fri, May 03, 2024 at 11:08:46AM +0800, Weifeng Liu wrote:
> > This is an attempt to alleviate race conditions in the SAM driver where
> > essential resources like serial device and GPIO pins are not ready at
> > the time ssam_serial_hub_probe() is called.  Instead of giving up
> > probing, a better way would be to defer the probing by returning
> > -EPROBE_DEFER, allowing the kernel try again later.
> > 
> > However, there is no way of identifying all such cases from other real
> > errors in a few days.  So let's take a gradual approach identify and
> > address these cases as they arise.  This commit marks the initial step
> > in this process.
> 
> ...
> 
> > +	/*
> > +	 * The following step can fail when it's called too early before the
> > +	 * underlying UART device is ready (in this case -ENXIO is returned).
> > +	 * Instead of simply giving up and losing everything, we can defer
> > +	 * the probing by returning -EPROBE_DEFER so that the kernel would be
> > +	 * able to retry later.
> > +	 */
> 
> You can add the following to the
> serial_core.c (at the top after the headers)
> 
> #undef ENXIO
> #define ENXIO __LINE__
> 
> And I'm pretty much sure it will point out you to the uart_port_activate().
> If it's the case you may elaborate this in the comment.
> Otherwise you may add the same hack to other files and find the culprit.

Indeed, uart_port_activate() is the function where we gets errno -
ENXIO.  Please see the cover letter [1] of this series:


  ssam_serial_hub_probe()
  serdev_device_open()
  ctrl->ops->open()	/* this callback being ttyport_open() */
  tty->ops->open()	/* this callback being uart_open() */
  tty_port_open()
  port->ops->activate()	/* this callback being uart_port_activate() */
  Find bit UPF_DEAD is set in uport->flags and fail with errno -ENXIO.

What confuses me is that when ssam_serial_hub_probe() (probe callback
of a serdev driver) gets called and tries to open the provided serdev
device by serdev_device_open(), it simply fails to open it...  The
serdev device is not the kind of auxiliary devices but the main device
this driver is to manage.  And I don't find other serdev driver doing
this sort of checking and returning -EPROBE_DEFER thing when opening
serdev device.  Thus, from the perspective of a newcomer to this area,
I think this phenomenon is somewhat abnormal.  Maybe somehow the
initializing of the UART device and probing of serdev driver are
interleaved...

Andy, do you have any idea what's going wrong here?  Or do you think
this is an expected behavior?

> 
> Also it might be that we add some error code substitution inside serdev core.
> At least there more data is available to make the (better) decision.
> 

Agree.  If failing in serdev_device_open() is common in a serdev
driver, we'd better handle it properly inside serdev core and convey
explicit semantics to the caller.

Best regards,
Weifeng

[1]
https://lore.kernel.org/linux-serial/20240503030900.1334763-1-weifeng.liu.z@gmail.com/T/#m40d73c44bf92ad45e4fca5ed5f01f9c11e30adb1
diff mbox series

Patch

diff --git a/drivers/platform/surface/aggregator/core.c b/drivers/platform/surface/aggregator/core.c
index 9591a28bc38a..87dea91f91fe 100644
--- a/drivers/platform/surface/aggregator/core.c
+++ b/drivers/platform/surface/aggregator/core.c
@@ -645,9 +645,22 @@  static int ssam_serial_hub_probe(struct serdev_device *serdev)
 	/* Set up serdev device. */
 	serdev_device_set_drvdata(serdev, ctrl);
 	serdev_device_set_client_ops(serdev, &ssam_serdev_ops);
+
+	/*
+	 * The following step can fail when it's called too early before the
+	 * underlying UART device is ready (in this case -ENXIO is returned).
+	 * Instead of simply giving up and losing everything, we can defer
+	 * the probing by returning -EPROBE_DEFER so that the kernel would be
+	 * able to retry later.
+	 */
 	status = serdev_device_open(serdev);
-	if (status)
+	if (status == -ENXIO)
+		status = -EPROBE_DEFER;
+	if (status) {
+		dev_err_probe(&serdev->dev, status,
+			      "failed to open serdev device\n");
 		goto err_devopen;
+	}
 
 	astatus = ssam_serdev_setup_via_acpi(ssh->handle, serdev);
 	if (ACPI_FAILURE(astatus)) {