Message ID | 20191005105551.353273-1-hdegoede@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | bcf059578980526d6067a2b1b0cd8344b58bf2d5 |
Headers | show |
Series | [5.4,regression,fix] Input: soc_button_array - partial revert of support for newer surface devices | expand |
Hi, sorry for the inconvenience this change has caused. On 10/5/19 12:55 PM, Hans de Goede wrote: > Note ideally this seamingly unrelated change would have been made in a > separate commit, with a message explaining the what and why of this > change. Would I have known the impact, then yes. This change was added due to some reported instances where it seems that soc_button_array would occasionally load on MSHW0040 before the GPIO controller was ready, causing power and volume buttons to not work. > I guess this change may have been added to deal with -EPROBE_DEFER errors, Correct. After a comment mentioned that gpiod_get() returning -EPROBE_DEFER would be the proper way to detect this, I decided on this change. Might I suggest the following addition: Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> --- drivers/input/misc/soc_button_array.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c index 97e3639e99d0..a0f0c977b790 100644 --- a/drivers/input/misc/soc_button_array.c +++ b/drivers/input/misc/soc_button_array.c @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev, continue; gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index); - if (gpio < 0 && gpio != -ENOENT) { - error = gpio; - goto err_free_mem; - } else if (!gpio_is_valid(gpio)) { - /* Skip GPIO if not present */ + if (!gpio_is_valid(gpio)) { + /* + * Skip GPIO if not present. Note we deliberately + * ignore -EPROBE_DEFER errors here. On some devices + * Intel is using so called virtual GPIOs which are not + * GPIOs at all but some way for AML code to check some + * random status bits without need a custom opregion. + * In some cases the resources table we parse points to + * such a virtual GPIO, since these are not real GPIOs + * we do not have a driver for these so they will never + * show up, therefor we ignore -EPROBE_DEFER. + */ continue; } @@ -429,6 +436,14 @@ static int soc_device_check_MSHW0040(struct device *dev) dev_dbg(dev, "OEM Platform Revision %llu\n", oem_platform_rev); + /* + * Explicitly check if GPIO controller is ready. This check is done here + * to avoid issues with virtual GPIOs on other chips, as elaborated above. + * We are at least expecting one GPIO pin for the power button (index 0). + */ + if (soc_button_lookup_gpio(dev, 0) == -EPROBE_DEFER) + return -EPROBE_DEFER; + return 0; }
On Sat, Oct 05, 2019 at 12:55:51PM +0200, Hans de Goede wrote: > Commit c394159310d0 ("Input: soc_button_array - add support for newer > surface devices") not only added support for the MSHW0040 ACPI HID, > but for some reason it also makes changes to the error handling of the > soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally > this seamingly unrelated change would have been made in a separate commit, > with a message explaining the what and why of this change. > > I guess this change may have been added to deal with -EPROBE_DEFER errors, > but in case of the existing support for PNP0C40 devices, treating > -EPROBE_DEFER as any other error is deliberate, see the comment this > commit adds for why. > > The actual returning of -EPROBE_DEFER to the caller of soc_button_probe() > introduced by the new error checking causes a serious regression: > > On devices with so called virtual GPIOs soc_button_lookup_gpio() will > always return -EPROBE_DEFER for these fake GPIOs, when this happens > during the second call of soc_button_device_create() we already have > successfully registered our first child. This causes the kernel to think > we are making progress with probing things even though we unregister the > child before again before we return the -EPROBE_DEFER. Since we are making > progress the kernel will retry deferred-probes again immediately ending > up stuck in a loop with the following showing in dmesg: > > [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537 > [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538 > [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539 > [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540 > [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541 > [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542 > [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543 > [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544 > [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545 > [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546 > [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547 > <continues on and on and on> > > And 1 CPU core being stuck at 100% and udev hanging since it is waiting > for the modprobe of soc_button_array to return. > > This patch reverts the soc_button_lookup_gpio() error handling changes, > fixing this regression. I confirm this issue and I made similar hotfix while developing of other things, so it's actual and patch should work. I will test it at my hardware soon.
Hi, On 05-10-2019 14:17, Maximilian Luz wrote: > Hi, > > sorry for the inconvenience this change has caused. > > On 10/5/19 12:55 PM, Hans de Goede wrote: >> Note ideally this seamingly unrelated change would have been made in a >> separate commit, with a message explaining the what and why of this >> change. > > Would I have known the impact, then yes. This change was added due to > some reported instances where it seems that soc_button_array would > occasionally load on MSHW0040 before the GPIO controller was ready, > causing power and volume buttons to not work. > >> I guess this change may have been added to deal with -EPROBE_DEFER errors, > > Correct. After a comment mentioned that gpiod_get() returning > -EPROBE_DEFER would be the proper way to detect this, I decided on this > change. Ok, on x86 the GPIO drivers really should all be builtin because various ACPI methods including device D0 / D3 (power-on/off) methods may depend on them. So normally this should never happen. If this (-EPROBE_DEFER on surface devices) somehow still is happening please let me know and we will figure something out. > Might I suggest the following addition: > > Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> S-o-b is only for patches which pass through your hands, e.g. if you make changes to my patch and submit a v2 of it. I guess you mean / want one of: Acked-by: ... or Reviewed-by: ... ? Regards, Hans > --- > drivers/input/misc/soc_button_array.c | 25 ++++++++++++++++++++----- > 1 file changed, 20 insertions(+), 5 deletions(-) > > diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c > index 97e3639e99d0..a0f0c977b790 100644 > --- a/drivers/input/misc/soc_button_array.c > +++ b/drivers/input/misc/soc_button_array.c > @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev, > continue; > > gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index); > - if (gpio < 0 && gpio != -ENOENT) { > - error = gpio; > - goto err_free_mem; > - } else if (!gpio_is_valid(gpio)) { > - /* Skip GPIO if not present */ > + if (!gpio_is_valid(gpio)) { > + /* > + * Skip GPIO if not present. Note we deliberately > + * ignore -EPROBE_DEFER errors here. On some devices > + * Intel is using so called virtual GPIOs which are not > + * GPIOs at all but some way for AML code to check some > + * random status bits without need a custom opregion. > + * In some cases the resources table we parse points to > + * such a virtual GPIO, since these are not real GPIOs > + * we do not have a driver for these so they will never > + * show up, therefor we ignore -EPROBE_DEFER. > + */ > continue; > } > > @@ -429,6 +436,14 @@ static int soc_device_check_MSHW0040(struct device *dev) > > dev_dbg(dev, "OEM Platform Revision %llu\n", oem_platform_rev); > > + /* > + * Explicitly check if GPIO controller is ready. This check is done here > + * to avoid issues with virtual GPIOs on other chips, as elaborated above. > + * We are at least expecting one GPIO pin for the power button (index 0). > + */ > + if (soc_button_lookup_gpio(dev, 0) == -EPROBE_DEFER) > + return -EPROBE_DEFER; > + > return 0; > } >
Hi, On 10/5/19 3:20 PM, Hans de Goede wrote: > Hi, > > S-o-b is only for patches which pass through your hands, e.g. if > you make changes to my patch and submit a v2 of it. > > I guess you mean / want one of: > > Acked-by: ... > > or > > Reviewed-by: ... > > > ? I apologize if the s-o-b was misplaced here, it was meant for the modification proposed below. I can send a v2 if requested, but I felt this was a change that I wanted to bring up here first, to see if anyone has any comments on it. Regards, Maximilian
Hi, again On 10/5/19 3:20 PM, Hans de Goede wrote: > Ok, on x86 the GPIO drivers really should all be builtin because > various ACPI methods including device D0 / D3 (power-on/off) methods > may depend on them. So normally this should never happen. > > If this (-EPROBE_DEFER on surface devices) somehow still is happening > please let me know and we will figure something out. I have never personally experienced this, only received reports which indicated this and that the change (as well as manually reloading soc_button_array) fixed it. I will come back to you if I hear anything in regards to this again. I have now also tested your patch on the Surface Book 2. Does not cause any issues as far as I can tell. Tested-by: Maximilian Luz <luzmaximilian@gmail.com> And if that is needed/wanted Acked-by: Maximilian Luz <luzmaximilian@gmail.com> Regards, Maximilian
Hi, On 05-10-2019 17:01, Maximilian Luz wrote: > Hi, again > > On 10/5/19 3:20 PM, Hans de Goede wrote: >> Ok, on x86 the GPIO drivers really should all be builtin because >> various ACPI methods including device D0 / D3 (power-on/off) methods >> may depend on them. So normally this should never happen. >> >> If this (-EPROBE_DEFER on surface devices) somehow still is happening >> please let me know and we will figure something out. > > I have never personally experienced this, only received reports which > indicated this and that the change (as well as manually reloading > soc_button_array) fixed it. I will come back to you if I hear anything > in regards to this again. > > I have now also tested your patch on the Surface Book 2. Does not cause > any issues as far as I can tell. > > Tested-by: Maximilian Luz <luzmaximilian@gmail.com> > > And if that is needed/wanted > > Acked-by: Maximilian Luz <luzmaximilian@gmail.com> Great, thank you. Regards, Hans
On Sat, Oct 05, 2019 at 12:55:51PM +0200, Hans de Goede wrote: > Commit c394159310d0 ("Input: soc_button_array - add support for newer > surface devices") not only added support for the MSHW0040 ACPI HID, > but for some reason it also makes changes to the error handling of the > soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally > this seamingly unrelated change would have been made in a separate commit, > with a message explaining the what and why of this change. > > I guess this change may have been added to deal with -EPROBE_DEFER errors, > but in case of the existing support for PNP0C40 devices, treating > -EPROBE_DEFER as any other error is deliberate, see the comment this > commit adds for why. > > The actual returning of -EPROBE_DEFER to the caller of soc_button_probe() > introduced by the new error checking causes a serious regression: > > On devices with so called virtual GPIOs soc_button_lookup_gpio() will > always return -EPROBE_DEFER for these fake GPIOs, when this happens > during the second call of soc_button_device_create() we already have > successfully registered our first child. This causes the kernel to think > we are making progress with probing things even though we unregister the > child before again before we return the -EPROBE_DEFER. Since we are making > progress the kernel will retry deferred-probes again immediately ending > up stuck in a loop with the following showing in dmesg: > > [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537 > [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538 > [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539 > [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540 > [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541 > [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542 > [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543 > [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544 > [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545 > [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546 > [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547 > <continues on and on and on> > > And 1 CPU core being stuck at 100% and udev hanging since it is waiting > for the modprobe of soc_button_array to return. > > This patch reverts the soc_button_lookup_gpio() error handling changes, > fixing this regression. > > Fixes: c394159310d0 ("Input: soc_button_array - add support for newer surface devices") > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205031 > Cc: Maximilian Luz <luzmaximilian@gmail.com> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> Applied, thank you. > --- > drivers/input/misc/soc_button_array.c | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c > index 97e3639e99d0..97761421d6dd 100644 > --- a/drivers/input/misc/soc_button_array.c > +++ b/drivers/input/misc/soc_button_array.c > @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev, > continue; > > gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index); > - if (gpio < 0 && gpio != -ENOENT) { > - error = gpio; > - goto err_free_mem; > - } else if (!gpio_is_valid(gpio)) { > - /* Skip GPIO if not present */ > + if (!gpio_is_valid(gpio)) { > + /* > + * Skip GPIO if not present. Note we deliberately > + * ignore -EPROBE_DEFER errors here. On some devices > + * Intel is using so called virtual GPIOs which are not > + * GPIOs at all but some way for AML code to check some > + * random status bits without need a custom opregion. > + * In some cases the resources table we parse points to > + * such a virtual GPIO, since these are not real GPIOs > + * we do not have a driver for these so they will never > + * show up, therefor we ignore -EPROBE_DEFER. > + */ > continue; > } > > -- > 2.23.0 >
diff --git a/drivers/input/misc/soc_button_array.c b/drivers/input/misc/soc_button_array.c index 97e3639e99d0..97761421d6dd 100644 --- a/drivers/input/misc/soc_button_array.c +++ b/drivers/input/misc/soc_button_array.c @@ -92,11 +92,18 @@ soc_button_device_create(struct platform_device *pdev, continue; gpio = soc_button_lookup_gpio(&pdev->dev, info->acpi_index); - if (gpio < 0 && gpio != -ENOENT) { - error = gpio; - goto err_free_mem; - } else if (!gpio_is_valid(gpio)) { - /* Skip GPIO if not present */ + if (!gpio_is_valid(gpio)) { + /* + * Skip GPIO if not present. Note we deliberately + * ignore -EPROBE_DEFER errors here. On some devices + * Intel is using so called virtual GPIOs which are not + * GPIOs at all but some way for AML code to check some + * random status bits without need a custom opregion. + * In some cases the resources table we parse points to + * such a virtual GPIO, since these are not real GPIOs + * we do not have a driver for these so they will never + * show up, therefor we ignore -EPROBE_DEFER. + */ continue; }
Commit c394159310d0 ("Input: soc_button_array - add support for newer surface devices") not only added support for the MSHW0040 ACPI HID, but for some reason it also makes changes to the error handling of the soc_button_lookup_gpio() call in soc_button_device_create(). Note ideally this seamingly unrelated change would have been made in a separate commit, with a message explaining the what and why of this change. I guess this change may have been added to deal with -EPROBE_DEFER errors, but in case of the existing support for PNP0C40 devices, treating -EPROBE_DEFER as any other error is deliberate, see the comment this commit adds for why. The actual returning of -EPROBE_DEFER to the caller of soc_button_probe() introduced by the new error checking causes a serious regression: On devices with so called virtual GPIOs soc_button_lookup_gpio() will always return -EPROBE_DEFER for these fake GPIOs, when this happens during the second call of soc_button_device_create() we already have successfully registered our first child. This causes the kernel to think we are making progress with probing things even though we unregister the child before again before we return the -EPROBE_DEFER. Since we are making progress the kernel will retry deferred-probes again immediately ending up stuck in a loop with the following showing in dmesg: [ 124.022697] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6537 [ 124.040764] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6538 [ 124.056967] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6539 [ 124.072143] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6540 [ 124.092373] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6541 [ 124.108065] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6542 [ 124.128483] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6543 [ 124.147141] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6544 [ 124.165070] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6545 [ 124.179775] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6546 [ 124.202726] input: gpio-keys as /devices/platform/INTCFD9:00/gpio-keys.0.auto/input/input6547 <continues on and on and on> And 1 CPU core being stuck at 100% and udev hanging since it is waiting for the modprobe of soc_button_array to return. This patch reverts the soc_button_lookup_gpio() error handling changes, fixing this regression. Fixes: c394159310d0 ("Input: soc_button_array - add support for newer surface devices") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205031 Cc: Maximilian Luz <luzmaximilian@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- drivers/input/misc/soc_button_array.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)