Message ID | 20221127-snd-freeze-v8-3-3bc02d09f2ce@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ASoC: SOF: Fix deadlock when shutdown a frozen userspace | expand |
On 01.12.22 12:08, Ricardo Ribalda wrote: > If we are shutting down due to kexec and the userspace is frozen, the > system will stall forever waiting for userspace to complete. > > Do not wait for the clients to complete in that case. Hi, I am afraid I have to state that this approach is bad in every case, not just this corner case. It basically means that user space can stall the kernel for an arbitrary amount of time. And we cannot have that. Regards Oliver
Hi Oliver Thanks for your review On Thu, 1 Dec 2022 at 13:29, Oliver Neukum <oneukum@suse.com> wrote: > > On 01.12.22 12:08, Ricardo Ribalda wrote: > > If we are shutting down due to kexec and the userspace is frozen, the > > system will stall forever waiting for userspace to complete. > > > > Do not wait for the clients to complete in that case. > > Hi, > > I am afraid I have to state that this approach is bad in every case, > not just this corner case. It basically means that user space can stall > the kernel for an arbitrary amount of time. And we cannot have that. > > Regards > Oliver This patchset does not modify this behaviour. It simply fixes the stall for kexec(). The patch that introduced the stall: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") was sent as a generalised version of: https://github.com/thesofproject/linux/pull/3388 AFAIK, we would need a similar patch for every single board.... which I am not sure it is doable in a reasonable timeframe. On the meantime this seems like a decent compromises. Yes, a miss-behaving userspace can still stall during suspend, but that was not introduced in this patch. Regards! >
On 01.12.22 14:03, Ricardo Ribalda wrote: Hi, > This patchset does not modify this behaviour. It simply fixes the > stall for kexec(). > > The patch that introduced the stall: > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > in .shutdown") That patch is problematic. I would go as far as saying that it needs to be reverted. > was sent as a generalised version of: > https://github.com/thesofproject/linux/pull/3388 > > AFAIK, we would need a similar patch for every single board.... which > I am not sure it is doable in a reasonable timeframe. > > On the meantime this seems like a decent compromises. Yes, a > miss-behaving userspace can still stall during suspend, but that was > not introduced in this patch. Well, I mean if you know what wrong then I'd say at least return to a sanely broken state. The whole approach is wrong. You need to be able to deal with user space talking to removed devices by returning an error and keeping the resources association with the open file allocated until user space calls close() Regards Oliver
Hi Oliver On Thu, 1 Dec 2022 at 14:22, 'Oliver Neukum' via Chromeos Kdump <chromeos-kdump@google.com> wrote: > > On 01.12.22 14:03, Ricardo Ribalda wrote: > > Hi, > > > This patchset does not modify this behaviour. It simply fixes the > > stall for kexec(). > > > > The patch that introduced the stall: > > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > > in .shutdown") > > That patch is problematic. I would go as far as saying that > it needs to be reverted. > It fixes a real issue. We have not had any complaints until we tried to kexec in the platform. I wont recommend reverting it until we have an alternative implementation. kexec is far less common than suspend/reboot. > > was sent as a generalised version of: > > https://github.com/thesofproject/linux/pull/3388 > > > > AFAIK, we would need a similar patch for every single board.... which > > I am not sure it is doable in a reasonable timeframe. > > > > On the meantime this seems like a decent compromises. Yes, a > > miss-behaving userspace can still stall during suspend, but that was > > not introduced in this patch. > > Well, I mean if you know what wrong then I'd say at least return to > a sanely broken state. > > The whole approach is wrong. You need to be able to deal with user > space talking to removed devices by returning an error and keeping > the resources association with the open file allocated until > user space calls close() In general, the whole shutdown is broken for all the subsystems ;). It is a complicated issue. Users handling fds, devices with DMAs in the middle of an operation, dma fences.... Unfortunately I am not that familiar with the sound subsystem to make a proper patch for this. > > Regards > Oliver > > > > -- > You received this message because you are subscribed to the Google Groups "Chromeos Kdump" group. > To unsubscribe from this group and stop receiving emails from it, send an email to chromeos-kdump+unsubscribe@google.com. > To view this discussion on the web, visit https://groups.google.com/a/google.com/d/msgid/chromeos-kdump/d3730d1d-6f92-700a-06c4-0e0a35e270b0%40suse.com.
On Thu, 01 Dec 2022 14:22:12 +0100, Oliver Neukum wrote: > > On 01.12.22 14:03, Ricardo Ribalda wrote: > > Hi, > > > This patchset does not modify this behaviour. It simply fixes the > > stall for kexec(). > > > > The patch that introduced the stall: > > 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers > > in .shutdown") > > That patch is problematic. I would go as far as saying that > it needs to be reverted. ... or fixed. > > was sent as a generalised version of: > > https://github.com/thesofproject/linux/pull/3388 > > > > AFAIK, we would need a similar patch for every single board.... which > > I am not sure it is doable in a reasonable timeframe. > > > > On the meantime this seems like a decent compromises. Yes, a > > miss-behaving userspace can still stall during suspend, but that was > > not introduced in this patch. > > Well, I mean if you know what wrong then I'd say at least return to > a sanely broken state. > > The whole approach is wrong. You need to be able to deal with user > space talking to removed devices by returning an error and keeping > the resources association with the open file allocated until > user space calls close() As I already mentioned in another thread, if the user-space action has to be cut off, we just need to call snd_card_disconnect() instead without sync. A quick hack would be like below (totally untested and might be wrong, though). In anyway, Ricardo, please stop spinning too frequently; v8 in a few days is way too much, and now the recipient list became unmanageable. Let's give people some time to review and consider a better solution at first. thanks, Takashi -- 8< -- --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -475,7 +475,7 @@ EXPORT_SYMBOL(snd_sof_device_remove); int snd_sof_device_shutdown(struct device *dev) { struct snd_sof_dev *sdev = dev_get_drvdata(dev); - struct snd_sof_pdata *pdata = sdev->pdata; + struct snd_soc_component *component; if (IS_ENABLED(CONFIG_SND_SOC_SOF_PROBE_WORK_QUEUE)) cancel_work_sync(&sdev->probe_work); @@ -484,9 +484,9 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + component = snd_soc_lookup_component(sdev->dev, NULL); + if (component && component->card && component->card->snd_card) + snd_card_disconnect(component->card->snd_card); if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c index 3e6141d03770..9587b6a85103 100644 --- a/sound/soc/sof/core.c +++ b/sound/soc/sof/core.c @@ -9,6 +9,8 @@ // #include <linux/firmware.h> +#include <linux/kexec.h> +#include <linux/freezer.h> #include <linux/module.h> #include <sound/soc.h> #include <sound/sof.h> @@ -484,9 +486,10 @@ int snd_sof_device_shutdown(struct device *dev) * make sure clients and machine driver(s) are unregistered to force * all userspace devices to be closed prior to the DSP shutdown sequence */ - sof_unregister_clients(sdev); - - snd_sof_machine_unregister(sdev, pdata); + if (!(kexec_in_progress() && pm_freezing())) { + sof_unregister_clients(sdev); + snd_sof_machine_unregister(sdev, pdata); + } if (sdev->fw_state == SOF_FW_BOOT_COMPLETE) return snd_sof_shutdown(sdev);
If we are shutting down due to kexec and the userspace is frozen, the system will stall forever waiting for userspace to complete. Do not wait for the clients to complete in that case. This fixes: [ 84.943749] Freezing user space processes ... (elapsed 0.111 seconds) done. [ 246.784446] INFO: task kexec-lite:5123 blocked for more than 122 seconds. [ 246.819035] Call Trace: [ 246.821782] <TASK> [ 246.824186] __schedule+0x5f9/0x1263 [ 246.828231] schedule+0x87/0xc5 [ 246.831779] snd_card_disconnect_sync+0xb5/0x127 ... [ 246.889249] snd_sof_device_shutdown+0xb4/0x150 [ 246.899317] pci_device_shutdown+0x37/0x61 [ 246.903990] device_shutdown+0x14c/0x1d6 [ 246.908391] kernel_kexec+0x45/0xb9 And: [ 246.893222] INFO: task kexec-lite:4891 blocked for more than 122 seconds. [ 246.927709] Call Trace: [ 246.930461] <TASK> [ 246.932819] __schedule+0x5f9/0x1263 [ 246.936855] ? fsnotify_grab_connector+0x5c/0x70 [ 246.942045] schedule+0x87/0xc5 [ 246.945567] schedule_timeout+0x49/0xf3 [ 246.949877] wait_for_completion+0x86/0xe8 [ 246.954463] snd_card_free+0x68/0x89 ... [ 247.001080] platform_device_unregister+0x12/0x35 Cc: stable@vger.kernel.org Fixes: 83bfc7e793b5 ("ASoC: SOF: core: unregister clients and machine drivers in .shutdown") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> --- sound/soc/sof/core.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)