diff mbox series

[v2,03/13] firmware: google: Test spinlock on panic path to avoid lockups

Message ID 20220719195325.402745-4-gpiccoli@igalia.com (mailing list archive)
State Not Applicable
Headers show
Series The panic notifiers refactor strikes back - fixes/clean-ups | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch, async

Commit Message

Guilherme G. Piccoli July 19, 2022, 7:53 p.m. UTC
Currently the gsmi driver registers a panic notifier as well as
reboot and die notifiers. The callbacks registered are called in
atomic and very limited context - for instance, panic disables
preemption and local IRQs, also all secondary CPUs (not executing
the panic path) are shutdown.

With that said, taking a spinlock in this scenario is a dangerous
invitation for lockup scenarios. So, fix that by checking if the
spinlock is free to acquire in the panic notifier callback - if not,
bail-out and avoid a potential hang.

Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: David Gow <davidgow@google.com>
Cc: Evan Green <evgreen@chromium.org>
Cc: Julius Werner <jwerner@chromium.org>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

---

V2:
- do not use spin_trylock anymore, to avoid messing with
non-panic paths; now we just check the spinlock state in
the panic notifier before taking it. Thanks Evan for the
review/idea!

 drivers/firmware/google/gsmi.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Guilherme G. Piccoli Aug. 7, 2022, 3:38 p.m. UTC | #1
On 19/07/2022 16:53, Guilherme G. Piccoli wrote:
> Currently the gsmi driver registers a panic notifier as well as
> reboot and die notifiers. The callbacks registered are called in
> atomic and very limited context - for instance, panic disables
> preemption and local IRQs, also all secondary CPUs (not executing
> the panic path) are shutdown.
> 
> With that said, taking a spinlock in this scenario is a dangerous
> invitation for lockup scenarios. So, fix that by checking if the
> spinlock is free to acquire in the panic notifier callback - if not,
> bail-out and avoid a potential hang.
> 
> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: David Gow <davidgow@google.com>
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> ---
> 
> V2:
> - do not use spin_trylock anymore, to avoid messing with
> non-panic paths; now we just check the spinlock state in
> the panic notifier before taking it. Thanks Evan for the
> review/idea!
> 
>  drivers/firmware/google/gsmi.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> [...]

Hi Evan, do you think this one is good now, based on your previous review?

Appreciate any feedback!
Cheers,


Guilherme
Evan Green Aug. 8, 2022, 5:07 a.m. UTC | #2
On Tue, Jul 19, 2022 at 12:55 PM Guilherme G. Piccoli
<gpiccoli@igalia.com> wrote:
>
> Currently the gsmi driver registers a panic notifier as well as
> reboot and die notifiers. The callbacks registered are called in
> atomic and very limited context - for instance, panic disables
> preemption and local IRQs, also all secondary CPUs (not executing
> the panic path) are shutdown.
>
> With that said, taking a spinlock in this scenario is a dangerous
> invitation for lockup scenarios. So, fix that by checking if the
> spinlock is free to acquire in the panic notifier callback - if not,
> bail-out and avoid a potential hang.
>
> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: David Gow <davidgow@google.com>
> Cc: Evan Green <evgreen@chromium.org>
> Cc: Julius Werner <jwerner@chromium.org>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Reviewed-by: Evan Green <evgreen@chromium.org>
Guilherme G. Piccoli Aug. 8, 2022, 3:14 p.m. UTC | #3
On 08/08/2022 02:07, Evan Green wrote:
> On Tue, Jul 19, 2022 at 12:55 PM Guilherme G. Piccoli
> <gpiccoli@igalia.com> wrote:
>>
>> Currently the gsmi driver registers a panic notifier as well as
>> reboot and die notifiers. The callbacks registered are called in
>> atomic and very limited context - for instance, panic disables
>> preemption and local IRQs, also all secondary CPUs (not executing
>> the panic path) are shutdown.
>>
>> With that said, taking a spinlock in this scenario is a dangerous
>> invitation for lockup scenarios. So, fix that by checking if the
>> spinlock is free to acquire in the panic notifier callback - if not,
>> bail-out and avoid a potential hang.
>>
>> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
>> Cc: Ard Biesheuvel <ardb@kernel.org>
>> Cc: David Gow <davidgow@google.com>
>> Cc: Evan Green <evgreen@chromium.org>
>> Cc: Julius Werner <jwerner@chromium.org>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> Reviewed-by: Evan Green <evgreen@chromium.org>

Thanks a bunch Evan!

Ard / Greg, do you think you could get this patch through your -next (or
-fixes) trees? Not sure which tree is the most common for picking GSMI
stuff.

I'm trying to get these fixes merged individually in their trees to not
stall the whole series and increase the burden of re-submitting.
Thanks in advance,


Guilherme
Greg KH Aug. 8, 2022, 3:26 p.m. UTC | #4
On Mon, Aug 08, 2022 at 12:14:30PM -0300, Guilherme G. Piccoli wrote:
> On 08/08/2022 02:07, Evan Green wrote:
> > On Tue, Jul 19, 2022 at 12:55 PM Guilherme G. Piccoli
> > <gpiccoli@igalia.com> wrote:
> >>
> >> Currently the gsmi driver registers a panic notifier as well as
> >> reboot and die notifiers. The callbacks registered are called in
> >> atomic and very limited context - for instance, panic disables
> >> preemption and local IRQs, also all secondary CPUs (not executing
> >> the panic path) are shutdown.
> >>
> >> With that said, taking a spinlock in this scenario is a dangerous
> >> invitation for lockup scenarios. So, fix that by checking if the
> >> spinlock is free to acquire in the panic notifier callback - if not,
> >> bail-out and avoid a potential hang.
> >>
> >> Fixes: 74c5b31c6618 ("driver: Google EFI SMI")
> >> Cc: Ard Biesheuvel <ardb@kernel.org>
> >> Cc: David Gow <davidgow@google.com>
> >> Cc: Evan Green <evgreen@chromium.org>
> >> Cc: Julius Werner <jwerner@chromium.org>
> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> > 
> > Reviewed-by: Evan Green <evgreen@chromium.org>
> 
> Thanks a bunch Evan!
> 
> Ard / Greg, do you think you could get this patch through your -next (or
> -fixes) trees? Not sure which tree is the most common for picking GSMI
> stuff.

Picking out an individual patch from a series with as many responses and
threads like this one is quite difficult.

Just resend this as a stand-alone patch if you want it applied
stand-alone as our tools want to apply a whole patch series at once.

> I'm trying to get these fixes merged individually in their trees to not
> stall the whole series and increase the burden of re-submitting.

The burden is on the submitter, not the maintainer as we have more
submitters than reviewers/maintainers.

thanks,

greg k-h
Guilherme G. Piccoli Aug. 8, 2022, 3:37 p.m. UTC | #5
On 08/08/2022 12:26, Greg Kroah-Hartman wrote:
> [...]
>>
>> Ard / Greg, do you think you could get this patch through your -next (or
>> -fixes) trees? Not sure which tree is the most common for picking GSMI
>> stuff.
> 
> Picking out an individual patch from a series with as many responses and
> threads like this one is quite difficult.
> 
> Just resend this as a stand-alone patch if you want it applied
> stand-alone as our tools want to apply a whole patch series at once.
> 
>> I'm trying to get these fixes merged individually in their trees to not
>> stall the whole series and increase the burden of re-submitting.
> 
> The burden is on the submitter, not the maintainer as we have more
> submitters than reviewers/maintainers.
> 

I understand, thanks for letting me know!

Let me clarify / ask something: this series, for example, is composed as
a bunch of patches "centered" around the same idea, panic notifiers
improvements/fixes. But its patches belong to completely different
subsystems, like EFI/misc, architectures (alpha, parisc, arm), core
kernel code, etc.

What is the best way of getting this merged?
(a) Re-send individual patches with the respective Review/ACK tags to
the proper subsystem, or;

(b) Wait until the whole series is ACKed/Reviewed, and a single
maintainer (like you or Andrew, for example) would pick the whole series
and apply at once, even if it spans across multiple parts of the kernel?

Let me know what is the general preference of the kernel maintainers,
and I'll gladly follow that =)

Thanks,


Guilherme
Greg KH Aug. 10, 2022, 12:54 p.m. UTC | #6
On Mon, Aug 08, 2022 at 12:37:46PM -0300, Guilherme G. Piccoli wrote:
> Let me clarify / ask something: this series, for example, is composed as
> a bunch of patches "centered" around the same idea, panic notifiers
> improvements/fixes. But its patches belong to completely different
> subsystems, like EFI/misc, architectures (alpha, parisc, arm), core
> kernel code, etc.
> 
> What is the best way of getting this merged?
> (a) Re-send individual patches with the respective Review/ACK tags to
> the proper subsystem, or;

Yes.

> (b) Wait until the whole series is ACKed/Reviewed, and a single
> maintainer (like you or Andrew, for example) would pick the whole series
> and apply at once, even if it spans across multiple parts of the kernel?

No, only do this after a kernel release cycle happens and there are
straggler patches that did not get picked up by the relevant subsystem
maintainers.

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index adaa492c3d2d..3ef5f3c0b4e4 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -681,6 +681,14 @@  static struct notifier_block gsmi_die_notifier = {
 static int gsmi_panic_callback(struct notifier_block *nb,
 			       unsigned long reason, void *arg)
 {
+	/*
+	 * Perform the lock check before effectively trying
+	 * to acquire it on gsmi_shutdown_reason() to avoid
+	 * potential lockups in atomic context.
+	 */
+	if (spin_is_locked(&gsmi_dev.lock))
+		return NOTIFY_DONE;
+
 	gsmi_shutdown_reason(GSMI_SHUTDOWN_PANIC);
 	return NOTIFY_DONE;
 }