Message ID | 20240904203955.245085-1-maxtram95@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ALSA: hda: intel: Fix Optimus when GPU has no sound | expand |
On Wed, 04 Sep 2024 22:39:55 +0200, Maxim Mikityanskiy wrote: > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > the discrete GPU. snd_hda_intel probes the device and schedules > azx_probe_continue(), which fails at azx_first_init(). The driver ends > up probed, but calls azx_free() and stops the chip. However, from the > runtime PM point of view, the device remains active, because the PCI > subsystem makes it active on probe, and it's still bound. It prevents > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > power management for the video and audio devices). > > Fix it by forcing the device to the suspended state in azx_free(). > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> What if this device probe is skipped (e.g. adding your device entry to driver_denylist[] in hda_intel.c)? Is the device also in the runtime-suspended state? thanks, Takashi > --- > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > index b79020adce63..65fcb92e11c7 100644 > --- a/sound/pci/hda/hda_intel.c > +++ b/sound/pci/hda/hda_intel.c > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > if (use_vga_switcheroo(hda)) { > if (chip->disabled && hda->probe_continued) > snd_hda_unlock_devices(&chip->bus); > - if (hda->vga_switcheroo_registered) > + if (hda->vga_switcheroo_registered) { > vga_switcheroo_unregister_client(chip->pci); > + > + /* Some GPUs don't have sound, and azx_first_init fails, > + * leaving the device probed but non-functional. As long > + * as it's probed, the PCI subsystem keeps its runtime > + * PM status as active. Force it to suspended (as we > + * actually stop the chip) to allow GPU to suspend via > + * vga_switcheroo. > + */ > + pm_runtime_disable(&pci->dev); > + pm_runtime_set_suspended(&pci->dev); > + pm_runtime_enable(&pci->dev); > + } > } > > if (bus->chip_init) { > -- > 2.46.0 >
On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > On Wed, 04 Sep 2024 22:39:55 +0200, > Maxim Mikityanskiy wrote: > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on Spotted a typo: s/520M/540M/ > > the discrete GPU. snd_hda_intel probes the device and schedules > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > up probed, but calls azx_free() and stops the chip. However, from the > > runtime PM point of view, the device remains active, because the PCI > > subsystem makes it active on probe, and it's still bound. It prevents > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > power management for the video and audio devices). > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > What if this device probe is skipped (e.g. adding your device entry to > driver_denylist[] in hda_intel.c)? Is the device also in the > runtime-suspended state? I added the following: { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, The probe was apparently skipped (the device is not attached to a driver), runtime_status=suspended, runtime_usage=0, the GPU goes to DynOff. However, I'm not sure whether it's a good idea to blacklist 540M entirely, as there might be other laptops with this GPU that have sound, and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, although I don't know whether it can be done without recompiling the kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. (Note that I need to set nouveau.modeset=2 either way, otherwise it stays in DynPwr if the screen is on.) > > > thanks, > > Takashi > > > --- > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > index b79020adce63..65fcb92e11c7 100644 > > --- a/sound/pci/hda/hda_intel.c > > +++ b/sound/pci/hda/hda_intel.c > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > if (use_vga_switcheroo(hda)) { > > if (chip->disabled && hda->probe_continued) > > snd_hda_unlock_devices(&chip->bus); > > - if (hda->vga_switcheroo_registered) > > + if (hda->vga_switcheroo_registered) { > > vga_switcheroo_unregister_client(chip->pci); > > + > > + /* Some GPUs don't have sound, and azx_first_init fails, > > + * leaving the device probed but non-functional. As long > > + * as it's probed, the PCI subsystem keeps its runtime > > + * PM status as active. Force it to suspended (as we > > + * actually stop the chip) to allow GPU to suspend via > > + * vga_switcheroo. > > + */ > > + pm_runtime_disable(&pci->dev); > > + pm_runtime_set_suspended(&pci->dev); > > + pm_runtime_enable(&pci->dev); > > + } > > } > > > > if (bus->chip_init) { > > -- > > 2.46.0 > >
On Fri, 06 Sep 2024 20:05:06 +0200, Maxim Mikityanskiy wrote: > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > On Wed, 04 Sep 2024 22:39:55 +0200, > > Maxim Mikityanskiy wrote: > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > Spotted a typo: s/520M/540M/ > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > up probed, but calls azx_free() and stops the chip. However, from the > > > runtime PM point of view, the device remains active, because the PCI > > > subsystem makes it active on probe, and it's still bound. It prevents > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > power management for the video and audio devices). > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > What if this device probe is skipped (e.g. adding your device entry to > > driver_denylist[] in hda_intel.c)? Is the device also in the > > runtime-suspended state? > > I added the following: > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > The probe was apparently skipped (the device is not attached to a > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > DynOff. OK, that's good. > However, I'm not sure whether it's a good idea to blacklist 540M > entirely, as there might be other laptops with this GPU that have sound, > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. You should specify the PCI SSID of your device instead of 0:0 in the above, so that it's picked up only for yours. Takashi > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > although I don't know whether it can be done without recompiling the > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > stays in DynPwr if the screen is on.) > > > > > > > thanks, > > > > Takashi > > > > > --- > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > index b79020adce63..65fcb92e11c7 100644 > > > --- a/sound/pci/hda/hda_intel.c > > > +++ b/sound/pci/hda/hda_intel.c > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > if (use_vga_switcheroo(hda)) { > > > if (chip->disabled && hda->probe_continued) > > > snd_hda_unlock_devices(&chip->bus); > > > - if (hda->vga_switcheroo_registered) > > > + if (hda->vga_switcheroo_registered) { > > > vga_switcheroo_unregister_client(chip->pci); > > > + > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > + * leaving the device probed but non-functional. As long > > > + * as it's probed, the PCI subsystem keeps its runtime > > > + * PM status as active. Force it to suspended (as we > > > + * actually stop the chip) to allow GPU to suspend via > > > + * vga_switcheroo. > > > + */ > > > + pm_runtime_disable(&pci->dev); > > > + pm_runtime_set_suspended(&pci->dev); > > > + pm_runtime_enable(&pci->dev); > > > + } > > > } > > > > > > if (bus->chip_init) { > > > -- > > > 2.46.0 > > >
On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > On Fri, 06 Sep 2024 20:05:06 +0200, > Maxim Mikityanskiy wrote: > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > Maxim Mikityanskiy wrote: > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > Spotted a typo: s/520M/540M/ > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > runtime PM point of view, the device remains active, because the PCI > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > power management for the video and audio devices). > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > runtime-suspended state? > > > > I added the following: > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > The probe was apparently skipped (the device is not attached to a > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > DynOff. > > OK, that's good. > > > However, I'm not sure whether it's a good idea to blacklist 540M > > entirely, as there might be other laptops with this GPU that have sound, > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > You should specify the PCI SSID of your device instead of 0:0 in the > above, so that it's picked up only for yours. Where can I get it? # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor 0x0000 # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device 0x0000 Is it not the right place? > > Takashi > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > although I don't know whether it can be done without recompiling the > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > stays in DynPwr if the screen is on.) > > > > > > > > > > > thanks, > > > > > > Takashi > > > > > > > --- > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > index b79020adce63..65fcb92e11c7 100644 > > > > --- a/sound/pci/hda/hda_intel.c > > > > +++ b/sound/pci/hda/hda_intel.c > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > if (use_vga_switcheroo(hda)) { > > > > if (chip->disabled && hda->probe_continued) > > > > snd_hda_unlock_devices(&chip->bus); > > > > - if (hda->vga_switcheroo_registered) > > > > + if (hda->vga_switcheroo_registered) { > > > > vga_switcheroo_unregister_client(chip->pci); > > > > + > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > + * leaving the device probed but non-functional. As long > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > + * PM status as active. Force it to suspended (as we > > > > + * actually stop the chip) to allow GPU to suspend via > > > > + * vga_switcheroo. > > > > + */ > > > > + pm_runtime_disable(&pci->dev); > > > > + pm_runtime_set_suspended(&pci->dev); > > > > + pm_runtime_enable(&pci->dev); > > > > + } > > > > } > > > > > > > > if (bus->chip_init) { > > > > -- > > > > 2.46.0 > > > >
On Sat, 07 Sep 2024 15:45:41 +0200, Maxim Mikityanskiy wrote: > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > On Fri, 06 Sep 2024 20:05:06 +0200, > > Maxim Mikityanskiy wrote: > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > power management for the video and audio devices). > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > runtime-suspended state? > > > > > > I added the following: > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > The probe was apparently skipped (the device is not attached to a > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > DynOff. > > > > OK, that's good. > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > entirely, as there might be other laptops with this GPU that have sound, > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > above, so that it's picked up only for yours. > > Where can I get it? > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > 0x0000 > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > 0x0000 Ouch, Lenovo didn't set it right. Alternatively we may introduce a deny list based on DMI. Hmm... Takashi > > Is it not the right place? > > > > > Takashi > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > although I don't know whether it can be done without recompiling the > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > thanks, > > > > > > > > Takashi > > > > > > > > > --- > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > if (use_vga_switcheroo(hda)) { > > > > > if (chip->disabled && hda->probe_continued) > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > - if (hda->vga_switcheroo_registered) > > > > > + if (hda->vga_switcheroo_registered) { > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > + > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > + * leaving the device probed but non-functional. As long > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > + * PM status as active. Force it to suspended (as we > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > + * vga_switcheroo. > > > > > + */ > > > > > + pm_runtime_disable(&pci->dev); > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > + pm_runtime_enable(&pci->dev); > > > > > + } > > > > > } > > > > > > > > > > if (bus->chip_init) { > > > > > -- > > > > > 2.46.0 > > > > >
On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > On Sat, 07 Sep 2024 15:45:41 +0200, > Maxim Mikityanskiy wrote: > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > Maxim Mikityanskiy wrote: > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > power management for the video and audio devices). > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > runtime-suspended state? > > > > > > > > I added the following: > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > DynOff. > > > > > > OK, that's good. > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > above, so that it's picked up only for yours. > > > > Where can I get it? > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > 0x0000 > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > 0x0000 > > Ouch, Lenovo didn't set it right. > Alternatively we may introduce a deny list based on DMI. Hmm... A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would also allow handling alternative GPUs on this laptop model. But could you explain what's wrong with a generic approach that I suggest (probe_continue failed => mark as suspended)? It doesn't require any lists. Any drawbacks? > > > Takashi > > > > > > Is it not the right place? > > > > > > > > Takashi > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > > although I don't know whether it can be done without recompiling the > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > Takashi > > > > > > > > > > > --- > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > > if (use_vga_switcheroo(hda)) { > > > > > > if (chip->disabled && hda->probe_continued) > > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > > - if (hda->vga_switcheroo_registered) > > > > > > + if (hda->vga_switcheroo_registered) { > > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > > + > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > > + * leaving the device probed but non-functional. As long > > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > > + * PM status as active. Force it to suspended (as we > > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > > + * vga_switcheroo. > > > > > > + */ > > > > > > + pm_runtime_disable(&pci->dev); > > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > > + pm_runtime_enable(&pci->dev); > > > > > > + } > > > > > > } > > > > > > > > > > > > if (bus->chip_init) { > > > > > > -- > > > > > > 2.46.0 > > > > > >
On Sat, 07 Sep 2024 19:24:41 +0200, Maxim Mikityanskiy wrote: > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > > On Sat, 07 Sep 2024 15:45:41 +0200, > > Maxim Mikityanskiy wrote: > > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > > power management for the video and audio devices). > > > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > > runtime-suspended state? > > > > > > > > > > I added the following: > > > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > > DynOff. > > > > > > > > OK, that's good. > > > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > > above, so that it's picked up only for yours. > > > > > > Where can I get it? > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > > 0x0000 > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > > 0x0000 > > > > Ouch, Lenovo didn't set it right. > > Alternatively we may introduce a deny list based on DMI. Hmm... > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would > also allow handling alternative GPUs on this laptop model. > > But could you explain what's wrong with a generic approach that I > suggest (probe_continue failed => mark as suspended)? It doesn't require > any lists. Any drawbacks? As you noticed, it will leave the device bound with driver, i.e. this looks as if it were operative. It means that the sound driver is still responsible for suspend/resume or whatever action even though it's totally useless. In that sense, it makes more sense to give it away at the early stage before actually binding it, and the deny list is exactly for that. Apart from that, the suggested change itself can be applied independently from the deny list update, too. It'd be good for other similar devices, too. Though, a slight concern is the sequence of runtime PM you applied in the patch. Is it a standard idiom to perform pm_runtime_disable(), set_suspended() and enable()? Also, azx_free() is the common destructor, hence it's called also at the regular driver unbinding. I'm not entirely sure whether it's OK to call there also at unbinding. thanks, Takashi > > > > > > > Takashi > > > > > > > > > > Is it not the right place? > > > > > > > > > > > Takashi > > > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > > > although I don't know whether it can be done without recompiling the > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > Takashi > > > > > > > > > > > > > --- > > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > > > if (use_vga_switcheroo(hda)) { > > > > > > > if (chip->disabled && hda->probe_continued) > > > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > > > - if (hda->vga_switcheroo_registered) > > > > > > > + if (hda->vga_switcheroo_registered) { > > > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > > > + > > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > > > + * leaving the device probed but non-functional. As long > > > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > > > + * PM status as active. Force it to suspended (as we > > > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > > > + * vga_switcheroo. > > > > > > > + */ > > > > > > > + pm_runtime_disable(&pci->dev); > > > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > > > + pm_runtime_enable(&pci->dev); > > > > > > > + } > > > > > > > } > > > > > > > > > > > > > > if (bus->chip_init) { > > > > > > > -- > > > > > > > 2.46.0 > > > > > > >
On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote: > On Sat, 07 Sep 2024 19:24:41 +0200, > Maxim Mikityanskiy wrote: > > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > > > On Sat, 07 Sep 2024 15:45:41 +0200, > > > Maxim Mikityanskiy wrote: > > > > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > > > power management for the video and audio devices). > > > > > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > > > runtime-suspended state? > > > > > > > > > > > > I added the following: > > > > > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > > > DynOff. > > > > > > > > > > OK, that's good. > > > > > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > > > above, so that it's picked up only for yours. > > > > > > > > Where can I get it? > > > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > > > 0x0000 > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > > > 0x0000 > > > > > > Ouch, Lenovo didn't set it right. > > > Alternatively we may introduce a deny list based on DMI. Hmm... > > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would > > also allow handling alternative GPUs on this laptop model. > > > > But could you explain what's wrong with a generic approach that I > > suggest (probe_continue failed => mark as suspended)? It doesn't require > > any lists. Any drawbacks? > > As you noticed, it will leave the device bound with driver, i.e. this > looks as if it were operative. It means that the sound driver is > still responsible for suspend/resume or whatever action even though > it's totally useless. In that sense, it makes more sense to give it > away at the early stage before actually binding it, and the deny list > is exactly for that. Thanks for your feedback! Do you happen to know whether there is a way for a driver to unbind itself after probe() returned success? I've never seen anything like this, but it would be a perfect solution: no lists and no driver bound after the delayed initialization fails. > Apart from that, the suggested change itself can be applied > independently from the deny list update, too. It'd be good for other > similar devices, too. I was trying to avoid lists, because it's a maintainance nightmare (the lists are never complete, there might be false positives, etc.), and we'd need to add another type of a list for DMI quirks. Moreover, adding both my change and a list entry has a drawback that similarly broken devices will use different workaround code (i.e. less robust), and the ones that use the generic fallback will never get on the list, because no one will report it. As a maintainer, what's your preferred approach? 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know whether it's possible). 2. Add a DMI deny list to hda_intel. 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent HDA if it's not enabled by BIOS on this laptop model). 4. Driver becomes a stub no failure in azx_probe_continue() (this patch). 5. Some combo of 4 and 2/3? > Though, a slight concern is the sequence of runtime PM you applied in > the patch. Is it a standard idiom to perform pm_runtime_disable(), > set_suspended() and enable()? I'm not an expert in PM. Before I wrote this code, I was meditating on the docs for a few hours, and what I understood was: 1. As we are disabling the device (below in the same function), we need set_suspended() to just update the state. 2. set_suspended() must be called with PM disabled (see the comment above the function). 3. Disable and enable must come in pairs, because there is refcounting. I've seen set_suspended() after disable() on device remove (e.g., rt9120_remove()). For hda_intel, assuming that disable/enable are already paired, adding a new disable without a matching enable would break refcounting. Hence such a sequence. I'd love to hear an opinion from someone more experienced in PM. > Also, azx_free() is the common > destructor, hence it's called also at the regular driver unbinding. > I'm not entirely sure whether it's OK to call there also at > unbinding. I don't have hardware to test this flow, but theoretically it looks OK to me: we set the suspended state as we disable the device. > > thanks, > > Takashi > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > Is it not the right place? > > > > > > > > > > > > > > Takashi > > > > > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > > > > although I don't know whether it can be done without recompiling the > > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > --- > > > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > > > > if (use_vga_switcheroo(hda)) { > > > > > > > > if (chip->disabled && hda->probe_continued) > > > > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > > > > - if (hda->vga_switcheroo_registered) > > > > > > > > + if (hda->vga_switcheroo_registered) { > > > > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > > > > + > > > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > > > > + * leaving the device probed but non-functional. As long > > > > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > > > > + * PM status as active. Force it to suspended (as we > > > > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > > > > + * vga_switcheroo. > > > > > > > > + */ > > > > > > > > + pm_runtime_disable(&pci->dev); > > > > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > > > > + pm_runtime_enable(&pci->dev); > > > > > > > > + } > > > > > > > > } > > > > > > > > > > > > > > > > if (bus->chip_init) { > > > > > > > > -- > > > > > > > > 2.46.0 > > > > > > > >
On Sat, 21 Sep 2024 10:10:37 +0200, Maxim Mikityanskiy wrote: > > On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote: > > On Sat, 07 Sep 2024 19:24:41 +0200, > > Maxim Mikityanskiy wrote: > > > > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > > > > On Sat, 07 Sep 2024 15:45:41 +0200, > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > > > > power management for the video and audio devices). > > > > > > > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > > > > runtime-suspended state? > > > > > > > > > > > > > > I added the following: > > > > > > > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > > > > DynOff. > > > > > > > > > > > > OK, that's good. > > > > > > > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > > > > above, so that it's picked up only for yours. > > > > > > > > > > Where can I get it? > > > > > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > > > > 0x0000 > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > > > > 0x0000 > > > > > > > > Ouch, Lenovo didn't set it right. > > > > Alternatively we may introduce a deny list based on DMI. Hmm... > > > > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would > > > also allow handling alternative GPUs on this laptop model. > > > > > > But could you explain what's wrong with a generic approach that I > > > suggest (probe_continue failed => mark as suspended)? It doesn't require > > > any lists. Any drawbacks? > > > > As you noticed, it will leave the device bound with driver, i.e. this > > looks as if it were operative. It means that the sound driver is > > still responsible for suspend/resume or whatever action even though > > it's totally useless. In that sense, it makes more sense to give it > > away at the early stage before actually binding it, and the deny list > > is exactly for that. > > Thanks for your feedback! Sorry for the late reply, as I've been off in the last weeks. > Do you happen to know whether there is a way for a driver to unbind > itself after probe() returned success? I've never seen anything like > this, but it would be a perfect solution: no lists and no driver bound > after the delayed initialization fails. I don't think it's possible for now (at least in a clean way). It should be a call of device_driver_detach(), but it's no exported API. > > Apart from that, the suggested change itself can be applied > > independently from the deny list update, too. It'd be good for other > > similar devices, too. > > I was trying to avoid lists, because it's a maintainance nightmare (the > lists are never complete, there might be false positives, etc.), and > we'd need to add another type of a list for DMI quirks. Moreover, adding > both my change and a list entry has a drawback that similarly broken > devices will use different workaround code (i.e. less robust), and the > ones that use the generic fallback will never get on the list, because > no one will report it. > > As a maintainer, what's your preferred approach? > > 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know > whether it's possible). So it's no easy way. > 2. Add a DMI deny list to hda_intel. > > 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent > HDA if it's not enabled by BIOS on this laptop model). I'm afraid that it may break other systems; IIRC, the power up was needed explicitly even if it's disabled on BIOS. > 4. Driver becomes a stub no failure in azx_probe_continue() (this > patch). It's OK, but slightly risky. It's kept bound, hence any unexpected thing might happen if the code is changed without noticing it. > 5. Some combo of 4 and 2/3? Not safe. So, from the safety POV, I'd rather take the DMI deny list approach. thanks, Takashi > > Though, a slight concern is the sequence of runtime PM you applied in > > the patch. Is it a standard idiom to perform pm_runtime_disable(), > > set_suspended() and enable()? > > I'm not an expert in PM. Before I wrote this code, I was meditating on > the docs for a few hours, and what I understood was: > > 1. As we are disabling the device (below in the same function), we need > set_suspended() to just update the state. > > 2. set_suspended() must be called with PM disabled (see the comment > above the function). > > 3. Disable and enable must come in pairs, because there is refcounting. > > I've seen set_suspended() after disable() on device remove (e.g., > rt9120_remove()). For hda_intel, assuming that disable/enable are > already paired, adding a new disable without a matching enable would > break refcounting. Hence such a sequence. > > I'd love to hear an opinion from someone more experienced in PM. > > > Also, azx_free() is the common > > destructor, hence it's called also at the regular driver unbinding. > > I'm not entirely sure whether it's OK to call there also at > > unbinding. > > I don't have hardware to test this flow, but theoretically it looks OK > to me: we set the suspended state as we disable the device. > > > > > thanks, > > > > Takashi > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > > > > Is it not the right place? > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > > > > > although I don't know whether it can be done without recompiling the > > > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > > > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > > > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > > > --- > > > > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > > > > > if (use_vga_switcheroo(hda)) { > > > > > > > > > if (chip->disabled && hda->probe_continued) > > > > > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > > > > > - if (hda->vga_switcheroo_registered) > > > > > > > > > + if (hda->vga_switcheroo_registered) { > > > > > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > > > > > + > > > > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > > > > > + * leaving the device probed but non-functional. As long > > > > > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > > > > > + * PM status as active. Force it to suspended (as we > > > > > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > > > > > + * vga_switcheroo. > > > > > > > > > + */ > > > > > > > > > + pm_runtime_disable(&pci->dev); > > > > > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > > > > > + pm_runtime_enable(&pci->dev); > > > > > > > > > + } > > > > > > > > > } > > > > > > > > > > > > > > > > > > if (bus->chip_init) { > > > > > > > > > -- > > > > > > > > > 2.46.0 > > > > > > > > >
On Mon, 30 Sep 2024 at 08:43:46 +0200, Takashi Iwai wrote: > On Sat, 21 Sep 2024 10:10:37 +0200, > Maxim Mikityanskiy wrote: > > > > On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote: > > > On Sat, 07 Sep 2024 19:24:41 +0200, > > > Maxim Mikityanskiy wrote: > > > > > > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > > > > > On Sat, 07 Sep 2024 15:45:41 +0200, > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > > > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > > > > > power management for the video and audio devices). > > > > > > > > > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > > > > > runtime-suspended state? > > > > > > > > > > > > > > > > I added the following: > > > > > > > > > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > > > > > DynOff. > > > > > > > > > > > > > > OK, that's good. > > > > > > > > > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > > > > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > > > > > above, so that it's picked up only for yours. > > > > > > > > > > > > Where can I get it? > > > > > > > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > > > > > 0x0000 > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > > > > > 0x0000 > > > > > > > > > > Ouch, Lenovo didn't set it right. > > > > > Alternatively we may introduce a deny list based on DMI. Hmm... > > > > > > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would > > > > also allow handling alternative GPUs on this laptop model. > > > > > > > > But could you explain what's wrong with a generic approach that I > > > > suggest (probe_continue failed => mark as suspended)? It doesn't require > > > > any lists. Any drawbacks? > > > > > > As you noticed, it will leave the device bound with driver, i.e. this > > > looks as if it were operative. It means that the sound driver is > > > still responsible for suspend/resume or whatever action even though > > > it's totally useless. In that sense, it makes more sense to give it > > > away at the early stage before actually binding it, and the deny list > > > is exactly for that. > > > > Thanks for your feedback! > > Sorry for the late reply, as I've been off in the last weeks. > > > Do you happen to know whether there is a way for a driver to unbind > > itself after probe() returned success? I've never seen anything like > > this, but it would be a perfect solution: no lists and no driver bound > > after the delayed initialization fails. > > I don't think it's possible for now (at least in a clean way). > It should be a call of device_driver_detach(), but it's no exported > API. > > > > Apart from that, the suggested change itself can be applied > > > independently from the deny list update, too. It'd be good for other > > > similar devices, too. > > > > I was trying to avoid lists, because it's a maintainance nightmare (the > > lists are never complete, there might be false positives, etc.), and > > we'd need to add another type of a list for DMI quirks. Moreover, adding > > both my change and a list entry has a drawback that similarly broken > > devices will use different workaround code (i.e. less robust), and the > > ones that use the generic fallback will never get on the list, because > > no one will report it. > > > > As a maintainer, what's your preferred approach? > > > > 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know > > whether it's possible). > > So it's no easy way. > > > 2. Add a DMI deny list to hda_intel. > > > > 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent > > HDA if it's not enabled by BIOS on this laptop model). > > I'm afraid that it may break other systems; IIRC, the power up was > needed explicitly even if it's disabled on BIOS. I'm not suggesting full removal of quirk_nvidia_hda. If we add a DMI check to disable quirk_nvidia_hda on this particular laptop model, it can't break other systems, can it? Anyways, adding a DMI check to hda_intel or quirk_nvidia_hda seems the same level of efforts, so I can proceed with the former. It's the same power consumption either way, right? I'm also thinking of adding a module parameter to block probing of the DGPU audio. Back in the days, there were plenty of similar Lenovo laptops, which might also suffer from the same issue. > > 4. Driver becomes a stub no failure in azx_probe_continue() (this > > patch). > > It's OK, but slightly risky. It's kept bound, hence any unexpected > thing might happen if the code is changed without noticing it. > > > 5. Some combo of 4 and 2/3? > > Not safe. > > So, from the safety POV, I'd rather take the DMI deny list approach. > > > thanks, > > Takashi > > > > Though, a slight concern is the sequence of runtime PM you applied in > > > the patch. Is it a standard idiom to perform pm_runtime_disable(), > > > set_suspended() and enable()? > > > > I'm not an expert in PM. Before I wrote this code, I was meditating on > > the docs for a few hours, and what I understood was: > > > > 1. As we are disabling the device (below in the same function), we need > > set_suspended() to just update the state. > > > > 2. set_suspended() must be called with PM disabled (see the comment > > above the function). > > > > 3. Disable and enable must come in pairs, because there is refcounting. > > > > I've seen set_suspended() after disable() on device remove (e.g., > > rt9120_remove()). For hda_intel, assuming that disable/enable are > > already paired, adding a new disable without a matching enable would > > break refcounting. Hence such a sequence. > > > > I'd love to hear an opinion from someone more experienced in PM. > > > > > Also, azx_free() is the common > > > destructor, hence it's called also at the regular driver unbinding. > > > I'm not entirely sure whether it's OK to call there also at > > > unbinding. > > > > I don't have hardware to test this flow, but theoretically it looks OK > > to me: we set the suspended state as we disable the device. > > > > > > > > thanks, > > > > > > Takashi > > > > > > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > > > > > > > > Is it not the right place? > > > > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > Another way to make vga_switcheroo work is to disable quirk_nvidia_hda, > > > > > > > > although I don't know whether it can be done without recompiling the > > > > > > > > kernel. In this case, 0000:01:00.1 doesn't even appear on the bus. > > > > > > > > > > > > > > > > (Note that I need to set nouveau.modeset=2 either way, otherwise it > > > > > > > > stays in DynPwr if the screen is on.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > > > > > > > Takashi > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > sound/pci/hda/hda_intel.c | 14 +++++++++++++- > > > > > > > > > > 1 file changed, 13 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > > > > > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c > > > > > > > > > > index b79020adce63..65fcb92e11c7 100644 > > > > > > > > > > --- a/sound/pci/hda/hda_intel.c > > > > > > > > > > +++ b/sound/pci/hda/hda_intel.c > > > > > > > > > > @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) > > > > > > > > > > if (use_vga_switcheroo(hda)) { > > > > > > > > > > if (chip->disabled && hda->probe_continued) > > > > > > > > > > snd_hda_unlock_devices(&chip->bus); > > > > > > > > > > - if (hda->vga_switcheroo_registered) > > > > > > > > > > + if (hda->vga_switcheroo_registered) { > > > > > > > > > > vga_switcheroo_unregister_client(chip->pci); > > > > > > > > > > + > > > > > > > > > > + /* Some GPUs don't have sound, and azx_first_init fails, > > > > > > > > > > + * leaving the device probed but non-functional. As long > > > > > > > > > > + * as it's probed, the PCI subsystem keeps its runtime > > > > > > > > > > + * PM status as active. Force it to suspended (as we > > > > > > > > > > + * actually stop the chip) to allow GPU to suspend via > > > > > > > > > > + * vga_switcheroo. > > > > > > > > > > + */ > > > > > > > > > > + pm_runtime_disable(&pci->dev); > > > > > > > > > > + pm_runtime_set_suspended(&pci->dev); > > > > > > > > > > + pm_runtime_enable(&pci->dev); > > > > > > > > > > + } > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > if (bus->chip_init) { > > > > > > > > > > -- > > > > > > > > > > 2.46.0 > > > > > > > > > >
On Mon, Sep 30, 2024 at 10:29:08AM +0300, Maxim Mikityanskiy wrote: > Anyways, adding a DMI check to hda_intel or quirk_nvidia_hda seems the > same level of efforts, so I can proceed with the former. It's the same > power consumption either way, right? > > I'm also thinking of adding a module parameter to block probing of the > DGPU audio. Back in the days, there were plenty of similar Lenovo > laptops, which might also suffer from the same issue. Adding new command line parameters for hardware quirks is generally frowned upon these days and is usually greeted by Greg KH's trademark "this isn't the 1990s anymore" comment. Drivers are supposed to auto-detect hardware with quirks and handle them correctly without the user having to add a command line option. Users should not get accustomed to fiddling with the command line. They won't have the need to report broken hardware and the driver may never be amended to deal with affected models automatically. Thanks, Lukas
On Mon, 30 Sep 2024 09:29:08 +0200, Maxim Mikityanskiy wrote: > > On Mon, 30 Sep 2024 at 08:43:46 +0200, Takashi Iwai wrote: > > On Sat, 21 Sep 2024 10:10:37 +0200, > > Maxim Mikityanskiy wrote: > > > > > > On Sat, 07 Sep 2024 at 20:10:44 +0200, Takashi Iwai wrote: > > > > On Sat, 07 Sep 2024 19:24:41 +0200, > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > On Sat, 07 Sep 2024 at 17:49:11 +0200, Takashi Iwai wrote: > > > > > > On Sat, 07 Sep 2024 15:45:41 +0200, > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > On Sat, 07 Sep 2024 at 12:06:25 +0200, Takashi Iwai wrote: > > > > > > > > On Fri, 06 Sep 2024 20:05:06 +0200, > > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > > > On Thu, 05 Sep 2024 at 16:24:26 +0200, Takashi Iwai wrote: > > > > > > > > > > On Wed, 04 Sep 2024 22:39:55 +0200, > > > > > > > > > > Maxim Mikityanskiy wrote: > > > > > > > > > > > > > > > > > > > > > > Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on > > > > > > > > > > > > > > > > > > Spotted a typo: s/520M/540M/ > > > > > > > > > > > > > > > > > > > > the discrete GPU. snd_hda_intel probes the device and schedules > > > > > > > > > > > azx_probe_continue(), which fails at azx_first_init(). The driver ends > > > > > > > > > > > up probed, but calls azx_free() and stops the chip. However, from the > > > > > > > > > > > runtime PM point of view, the device remains active, because the PCI > > > > > > > > > > > subsystem makes it active on probe, and it's still bound. It prevents > > > > > > > > > > > vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs > > > > > > > > > > > power management for the video and audio devices). > > > > > > > > > > > > > > > > > > > > > > Fix it by forcing the device to the suspended state in azx_free(). > > > > > > > > > > > > > > > > > > > > > > Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") > > > > > > > > > > > Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> > > > > > > > > > > > > > > > > > > > > What if this device probe is skipped (e.g. adding your device entry to > > > > > > > > > > driver_denylist[] in hda_intel.c)? Is the device also in the > > > > > > > > > > runtime-suspended state? > > > > > > > > > > > > > > > > > > I added the following: > > > > > > > > > > > > > > > > > > { PCI_DEVICE_SUB(0x10de, 0x0bea, 0x0000, 0x0000) }, > > > > > > > > > > > > > > > > > > The probe was apparently skipped (the device is not attached to a > > > > > > > > > driver), runtime_status=suspended, runtime_usage=0, the GPU goes to > > > > > > > > > DynOff. > > > > > > > > > > > > > > > > OK, that's good. > > > > > > > > > > > > > > > > > However, I'm not sure whether it's a good idea to blacklist 540M > > > > > > > > > entirely, as there might be other laptops with this GPU that have sound, > > > > > > > > > and AFAIK there are variants of Lenovo Z570 with other NVIDIA GPUs. > > > > > > > > > > > > > > > > You should specify the PCI SSID of your device instead of 0:0 in the > > > > > > > > above, so that it's picked up only for yours. > > > > > > > > > > > > > > Where can I get it? > > > > > > > > > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_vendor > > > > > > > 0x0000 > > > > > > > # cat /sys/bus/pci/devices/0000\:01\:00.1/subsystem_device > > > > > > > 0x0000 > > > > > > > > > > > > Ouch, Lenovo didn't set it right. > > > > > > Alternatively we may introduce a deny list based on DMI. Hmm... > > > > > > > > > > A DMI-based quirk sounds better than blocking any NVIDIA 540M, it would > > > > > also allow handling alternative GPUs on this laptop model. > > > > > > > > > > But could you explain what's wrong with a generic approach that I > > > > > suggest (probe_continue failed => mark as suspended)? It doesn't require > > > > > any lists. Any drawbacks? > > > > > > > > As you noticed, it will leave the device bound with driver, i.e. this > > > > looks as if it were operative. It means that the sound driver is > > > > still responsible for suspend/resume or whatever action even though > > > > it's totally useless. In that sense, it makes more sense to give it > > > > away at the early stage before actually binding it, and the deny list > > > > is exactly for that. > > > > > > Thanks for your feedback! > > > > Sorry for the late reply, as I've been off in the last weeks. > > > > > Do you happen to know whether there is a way for a driver to unbind > > > itself after probe() returned success? I've never seen anything like > > > this, but it would be a perfect solution: no lists and no driver bound > > > after the delayed initialization fails. > > > > I don't think it's possible for now (at least in a clean way). > > It should be a call of device_driver_detach(), but it's no exported > > API. > > > > > > Apart from that, the suggested change itself can be applied > > > > independently from the deny list update, too. It'd be good for other > > > > similar devices, too. > > > > > > I was trying to avoid lists, because it's a maintainance nightmare (the > > > lists are never complete, there might be false positives, etc.), and > > > we'd need to add another type of a list for DMI quirks. Moreover, adding > > > both my change and a list entry has a drawback that similarly broken > > > devices will use different workaround code (i.e. less robust), and the > > > ones that use the generic fallback will never get on the list, because > > > no one will report it. > > > > > > As a maintainer, what's your preferred approach? > > > > > > 1. Unbind ourselves on failure in azx_probe_continue() (but I don't know > > > whether it's possible). > > > > So it's no easy way. > > > > > 2. Add a DMI deny list to hda_intel. > > > > > > 3. Add a DMI check to quirk_nvidia_hda (i.e. don't enable a non-existent > > > HDA if it's not enabled by BIOS on this laptop model). > > > > I'm afraid that it may break other systems; IIRC, the power up was > > needed explicitly even if it's disabled on BIOS. > > I'm not suggesting full removal of quirk_nvidia_hda. If we add a DMI > check to disable quirk_nvidia_hda on this particular laptop model, it > can't break other systems, can it? Possibly, but if BIOS enables it, this quirk will just skip, no? My gut feeling is rather to leave it. Who knows any side effect if we tweak there. But, you can experiment it for judging whether it fits better, of course. > Anyways, adding a DMI check to hda_intel or quirk_nvidia_hda seems the > same level of efforts, so I can proceed with the former. It's the same > power consumption either way, right? I assume so, but you need to check in your side. > I'm also thinking of adding a module parameter to block probing of the > DGPU audio. Back in the days, there were plenty of similar Lenovo > laptops, which might also suffer from the same issue. That's an overkill. We can apply a runtime-PM suspended change as your patch in addition to the deny list. And this can give some info print to users to update the deny list, too. thanks, Takashi
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index b79020adce63..65fcb92e11c7 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1361,8 +1361,20 @@ static void azx_free(struct azx *chip) if (use_vga_switcheroo(hda)) { if (chip->disabled && hda->probe_continued) snd_hda_unlock_devices(&chip->bus); - if (hda->vga_switcheroo_registered) + if (hda->vga_switcheroo_registered) { vga_switcheroo_unregister_client(chip->pci); + + /* Some GPUs don't have sound, and azx_first_init fails, + * leaving the device probed but non-functional. As long + * as it's probed, the PCI subsystem keeps its runtime + * PM status as active. Force it to suspended (as we + * actually stop the chip) to allow GPU to suspend via + * vga_switcheroo. + */ + pm_runtime_disable(&pci->dev); + pm_runtime_set_suspended(&pci->dev); + pm_runtime_enable(&pci->dev); + } } if (bus->chip_init) {
Lenovo IdeaPad Z570 with NVIDIA GeForce Ge 520M doesn't have sound on the discrete GPU. snd_hda_intel probes the device and schedules azx_probe_continue(), which fails at azx_first_init(). The driver ends up probed, but calls azx_free() and stops the chip. However, from the runtime PM point of view, the device remains active, because the PCI subsystem makes it active on probe, and it's still bound. It prevents vga_switcheroo from turning off the DGPU (pci_create_device_link() syncs power management for the video and audio devices). Fix it by forcing the device to the suspended state in azx_free(). Fixes: 07f4f97d7b4b ("vga_switcheroo: Use device link for HDA controller") Signed-off-by: Maxim Mikityanskiy <maxtram95@gmail.com> --- sound/pci/hda/hda_intel.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)