diff mbox

BUG-REPORT: snd-hda: hacked-together EPROBE_DEFER support

Message ID s5ha84sl8ni.wl-tiwai@suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Takashi Iwai June 28, 2017, 10:16 a.m. UTC
On Mon, 26 Jun 2017 19:54:49 +0200,
Daniel Vetter wrote:
> 
> On Mon, Jun 26, 2017 at 7:47 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > On Mon, 26 Jun 2017 18:16:30 +0200,
> > Daniel Vetter wrote:
> >>
> >> On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote:
> >> > On Wed, 21 Jun 2017 17:23:57 +0200,
> >> > Chris Wilson wrote:
> >> > >
> >> > > Quoting Daniel Vetter (2017-06-21 16:08:54)
> >> > > > So back when the i915 power well support landed in
> >> > > >
> >> > > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> >> > > > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> >> > > > Date:   Thu May 30 22:07:10 2013 +0800
> >> > > >
> >> > > >     ALSA: hda - Add power-welll support for haswell HDA
> >> > > >
> >> > > > the logic to handle the cross-module depencies was hand-rolled using a
> >> > > > async work item, and that just doesn't work.
> >> > > >
> >> > > > The correct way to handle cross-module deps is either:
> >> > > > - request_module + failing when the other module isn't there
> >> > > >
> >> > > > OR
> >> > > >
> >> > > > - failing the module load with EPROBE_DEFER.
> >> > > >
> >> > > > You can't mix them, if you do then the entire load path just
> >> > > > busy-spins blowing through cpu cycles forever with no way to stop
> >> > > > this.
> >> > > >
> >> > > > snd-hda-intel does mix it, because the hda codec drivers are loaded
> >> > > > using request_module, but the i915 depency is handled using
> >> > > > PROBE_DEFER (or well, should be, but I haven't found any code at all).
> >> > > > This is a major pain when trying to debug i915 load failures.
> >> > > >
> >> > > > This patch here is a horrible hackish attempt at somewhat correctly
> >> > > > wriing EPROBE_DEFER through. Stuff that's missing:
> >> > > > - Check all the other places where load errors are conveniently
> >> > > >   dropped on the floor.
> >> > > > - Also fix up the firmware_cb path.
> >> > > > - Drop the debug noise I've left in to make it clear this isn't
> >> > > >   anything for merging.
> >> > >
> >> > > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
> >> > > continuously spewing previously, and now the system is usable again.
> >> >
> >> > Could you give a failing scenario?  I'm not opposing to the suggested
> >> > solution, we need to fix the mess in anyway, but I just would like to
> >> > know how to trigger the problem easily.
> >>
> >> Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda*
> >> collective blow through 100% of the cpu time spewing into dmesg (and make
> >> the system completely unuseable for kernel work because you can't find
> >> your own debug printk anymore).
> >
> > Ah, that's the case we discussed in the past.  We know that it's
> > problematic for component binding, but we're ignoring this scenario
> > because it's supposed to be no real use-case but only for some
> > temporary workarounds.
> >
> > We had some bigger-hammer patchset, but it didn't justify for the
> > further development of the reasoning above.
> >
> >> This is on a snb, where we don't even need the cross-module stuff ... But
> >> I think it goes sideways in other cases too, if you simply build but don't
> >> load i915. So every time an i915 breaks module load things become real
> >> painful.
> >
> > Even on SNB, we still need i915 for the HDMI/DP ELD notification.  The
> > hardware inquiry over HD-audio verb was so unstable, so we rather take
> > a path directly inquiring to the gfx driver.
> 
> Ah right, forgot about that.
> 
> >> Unfortunately the patch is a bit too big for our fixup branch in drm-tip,
> >> so plan B would be to stop building snd-hda (which will make the intel
> >> audio team unhappy, but mea culpa if they don't fix this mess).
> >
> > OK, let me think and take a look for older patchset, too.
> 
> Yeah would be great if we can somehow address this, preferrably using
> EPROBE_DEFER or something else that's standard. At least the component
> stuff really doesn't work without wiring EPROBE_DEFER through.

Now I took a closer look, and this appears rather like a brown paper
bag bug, not about the deferred probe or module dependency.
The fix patch is below.  Could you check whether it works?


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: hda - Fix endless loop of codec configure

azx_codec_configure() loops over the codecs found on the given
controller via a linked list.  The code used to work in the past, but
in the current version, this may lead to an endless loop when a codec
binding returns an error.

The culprit is that the snd_hda_codec_configure() unregisters the
device upon error, and this eventually deletes the given codec object
from the bus.  Since the list is initialized via list_del_init(), the
next object points to the same device itself.  This behavior change
was introduced at splitting the HD-audio code code, and forgotten to
adapt it here.

For fixing this bug, just use a *_safe() version of list iteration.

Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct")
Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_codec.h      | 2 ++
 sound/pci/hda/hda_controller.c | 8 ++++++--
 2 files changed, 8 insertions(+), 2 deletions(-)

Comments

Daniel Vetter June 29, 2017, 10:25 a.m. UTC | #1
On Wed, Jun 28, 2017 at 12:16:17PM +0200, Takashi Iwai wrote:
> On Mon, 26 Jun 2017 19:54:49 +0200,
> Daniel Vetter wrote:
> > 
> > On Mon, Jun 26, 2017 at 7:47 PM, Takashi Iwai <tiwai@suse.de> wrote:
> > > On Mon, 26 Jun 2017 18:16:30 +0200,
> > > Daniel Vetter wrote:
> > >>
> > >> On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote:
> > >> > On Wed, 21 Jun 2017 17:23:57 +0200,
> > >> > Chris Wilson wrote:
> > >> > >
> > >> > > Quoting Daniel Vetter (2017-06-21 16:08:54)
> > >> > > > So back when the i915 power well support landed in
> > >> > > >
> > >> > > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6
> > >> > > > Author: Wang Xingchao <xingchao.wang@linux.intel.com>
> > >> > > > Date:   Thu May 30 22:07:10 2013 +0800
> > >> > > >
> > >> > > >     ALSA: hda - Add power-welll support for haswell HDA
> > >> > > >
> > >> > > > the logic to handle the cross-module depencies was hand-rolled using a
> > >> > > > async work item, and that just doesn't work.
> > >> > > >
> > >> > > > The correct way to handle cross-module deps is either:
> > >> > > > - request_module + failing when the other module isn't there
> > >> > > >
> > >> > > > OR
> > >> > > >
> > >> > > > - failing the module load with EPROBE_DEFER.
> > >> > > >
> > >> > > > You can't mix them, if you do then the entire load path just
> > >> > > > busy-spins blowing through cpu cycles forever with no way to stop
> > >> > > > this.
> > >> > > >
> > >> > > > snd-hda-intel does mix it, because the hda codec drivers are loaded
> > >> > > > using request_module, but the i915 depency is handled using
> > >> > > > PROBE_DEFER (or well, should be, but I haven't found any code at all).
> > >> > > > This is a major pain when trying to debug i915 load failures.
> > >> > > >
> > >> > > > This patch here is a horrible hackish attempt at somewhat correctly
> > >> > > > wriing EPROBE_DEFER through. Stuff that's missing:
> > >> > > > - Check all the other places where load errors are conveniently
> > >> > > >   dropped on the floor.
> > >> > > > - Also fix up the firmware_cb path.
> > >> > > > - Drop the debug noise I've left in to make it clear this isn't
> > >> > > >   anything for merging.
> > >> > >
> > >> > > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was
> > >> > > continuously spewing previously, and now the system is usable again.
> > >> >
> > >> > Could you give a failing scenario?  I'm not opposing to the suggested
> > >> > solution, we need to fix the mess in anyway, but I just would like to
> > >> > know how to trigger the problem easily.
> > >>
> > >> Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda*
> > >> collective blow through 100% of the cpu time spewing into dmesg (and make
> > >> the system completely unuseable for kernel work because you can't find
> > >> your own debug printk anymore).
> > >
> > > Ah, that's the case we discussed in the past.  We know that it's
> > > problematic for component binding, but we're ignoring this scenario
> > > because it's supposed to be no real use-case but only for some
> > > temporary workarounds.
> > >
> > > We had some bigger-hammer patchset, but it didn't justify for the
> > > further development of the reasoning above.
> > >
> > >> This is on a snb, where we don't even need the cross-module stuff ... But
> > >> I think it goes sideways in other cases too, if you simply build but don't
> > >> load i915. So every time an i915 breaks module load things become real
> > >> painful.
> > >
> > > Even on SNB, we still need i915 for the HDMI/DP ELD notification.  The
> > > hardware inquiry over HD-audio verb was so unstable, so we rather take
> > > a path directly inquiring to the gfx driver.
> > 
> > Ah right, forgot about that.
> > 
> > >> Unfortunately the patch is a bit too big for our fixup branch in drm-tip,
> > >> so plan B would be to stop building snd-hda (which will make the intel
> > >> audio team unhappy, but mea culpa if they don't fix this mess).
> > >
> > > OK, let me think and take a look for older patchset, too.
> > 
> > Yeah would be great if we can somehow address this, preferrably using
> > EPROBE_DEFER or something else that's standard. At least the component
> > stuff really doesn't work without wiring EPROBE_DEFER through.
> 
> Now I took a closer look, and this appears rather like a brown paper
> bag bug, not about the deferred probe or module dependency.
> The fix patch is below.  Could you check whether it works?

Yay, this works!

Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: hda - Fix endless loop of codec configure
> 
> azx_codec_configure() loops over the codecs found on the given
> controller via a linked list.  The code used to work in the past, but
> in the current version, this may lead to an endless loop when a codec
> binding returns an error.
> 
> The culprit is that the snd_hda_codec_configure() unregisters the
> device upon error, and this eventually deletes the given codec object
> from the bus.  Since the list is initialized via list_del_init(), the
> next object points to the same device itself.  This behavior change
> was introduced at splitting the HD-audio code code, and forgotten to
> adapt it here.
> 
> For fixing this bug, just use a *_safe() version of list iteration.
> 
> Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct")
> Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/pci/hda/hda_codec.h      | 2 ++
>  sound/pci/hda/hda_controller.c | 8 ++++++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h
> index d6fb2d5d01a7..60ce1cfc300f 100644
> --- a/sound/pci/hda/hda_codec.h
> +++ b/sound/pci/hda/hda_codec.h
> @@ -295,6 +295,8 @@ struct hda_codec {
>  
>  #define list_for_each_codec(c, bus) \
>  	list_for_each_entry(c, &(bus)->core.codec_list, core.list)
> +#define list_for_each_codec_safe(c, n, bus)				\
> +	list_for_each_entry_safe(c, n, &(bus)->core.codec_list, core.list)
>  
>  /* snd_hda_codec_read/write optional flags */
>  #define HDA_RW_NO_RESPONSE_FALLBACK	(1 << 0)
> diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
> index 3715a5725613..1c60beb5b70a 100644
> --- a/sound/pci/hda/hda_controller.c
> +++ b/sound/pci/hda/hda_controller.c
> @@ -1337,8 +1337,12 @@ EXPORT_SYMBOL_GPL(azx_probe_codecs);
>  /* configure each codec instance */
>  int azx_codec_configure(struct azx *chip)
>  {
> -	struct hda_codec *codec;
> -	list_for_each_codec(codec, &chip->bus) {
> +	struct hda_codec *codec, *next;
> +
> +	/* use _safe version here since snd_hda_codec_configure() deregisters
> +	 * the device upon error and deletes itself from the bus list.
> +	 */
> +	list_for_each_codec_safe(codec, next, &chip->bus) {
>  		snd_hda_codec_configure(codec);
>  	}
>  	return 0;
> -- 
> 2.13.2
>
Daniel Vetter July 4, 2017, 3:14 p.m. UTC | #2
On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> Now I took a closer look, and this appears rather like a brown paper
>> bag bug, not about the deferred probe or module dependency.
>> The fix patch is below.  Could you check whether it works?
>
> Yay, this works!
>
> Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Next one: i915 module reloading is broken because something is holding
onto a module reference and doesn't drop it. Didn't check which sets
of patches introduced this, but iirc this worked last week. Disabling
hda-intel gets in Kconfig gets rid of the problem, so I assume
something in the sound driver is leaking that reference ...

It's also causing lots and lots of red in our CI :( If we can't fix
this we need to disable snd-hda-intel there too.
-Daniel
Takashi Iwai July 4, 2017, 3:28 p.m. UTC | #3
On Tue, 04 Jul 2017 17:14:39 +0200,
Daniel Vetter wrote:
> 
> On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >> Now I took a closer look, and this appears rather like a brown paper
> >> bag bug, not about the deferred probe or module dependency.
> >> The fix patch is below.  Could you check whether it works?
> >
> > Yay, this works!
> >
> > Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Next one: i915 module reloading is broken because something is holding
> onto a module reference and doesn't drop it. Didn't check which sets
> of patches introduced this, but iirc this worked last week. Disabling
> hda-intel gets in Kconfig gets rid of the problem, so I assume
> something in the sound driver is leaking that reference ...
> 
> It's also causing lots and lots of red in our CI :( If we can't fix
> this we need to disable snd-hda-intel there too.

I spotted out a typo in my previous patch that leads to the module
reference unbalance.  The fix is  already in sound.git tree today:
  https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?h=for-linus&id=fc18282cdcba984ab89c74d7e844c10114ae0795

The bug was introduced after 4.12.


thanks,

--
Takashi Iwai <tiwai@suse.de>
Daniel Vetter July 4, 2017, 6:12 p.m. UTC | #4
On Tue, Jul 4, 2017 at 5:28 PM, Takashi Iwai <tiwai@suse.de> wrote:
> On Tue, 04 Jul 2017 17:14:39 +0200,
> Daniel Vetter wrote:
>> On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> Now I took a closer look, and this appears rather like a brown paper
>> >> bag bug, not about the deferred probe or module dependency.
>> >> The fix patch is below.  Could you check whether it works?
>> >
>> > Yay, this works!
>> >
>> > Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>>
>> Next one: i915 module reloading is broken because something is holding
>> onto a module reference and doesn't drop it. Didn't check which sets
>> of patches introduced this, but iirc this worked last week. Disabling
>> hda-intel gets in Kconfig gets rid of the problem, so I assume
>> something in the sound driver is leaking that reference ...
>>
>> It's also causing lots and lots of red in our CI :( If we can't fix
>> this we need to disable snd-hda-intel there too.
>
> I spotted out a typo in my previous patch that leads to the module
> reference unbalance.  The fix is  already in sound.git tree today:
>   https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?h=for-linus&id=fc18282cdcba984ab89c74d7e844c10114ae0795
>
> The bug was introduced after 4.12.

Ok, CI over here confirmed that it's all good again.

Thanks, Daniel
diff mbox

Patch

diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h
index d6fb2d5d01a7..60ce1cfc300f 100644
--- a/sound/pci/hda/hda_codec.h
+++ b/sound/pci/hda/hda_codec.h
@@ -295,6 +295,8 @@  struct hda_codec {
 
 #define list_for_each_codec(c, bus) \
 	list_for_each_entry(c, &(bus)->core.codec_list, core.list)
+#define list_for_each_codec_safe(c, n, bus)				\
+	list_for_each_entry_safe(c, n, &(bus)->core.codec_list, core.list)
 
 /* snd_hda_codec_read/write optional flags */
 #define HDA_RW_NO_RESPONSE_FALLBACK	(1 << 0)
diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c
index 3715a5725613..1c60beb5b70a 100644
--- a/sound/pci/hda/hda_controller.c
+++ b/sound/pci/hda/hda_controller.c
@@ -1337,8 +1337,12 @@  EXPORT_SYMBOL_GPL(azx_probe_codecs);
 /* configure each codec instance */
 int azx_codec_configure(struct azx *chip)
 {
-	struct hda_codec *codec;
-	list_for_each_codec(codec, &chip->bus) {
+	struct hda_codec *codec, *next;
+
+	/* use _safe version here since snd_hda_codec_configure() deregisters
+	 * the device upon error and deletes itself from the bus list.
+	 */
+	list_for_each_codec_safe(codec, next, &chip->bus) {
 		snd_hda_codec_configure(codec);
 	}
 	return 0;