Message ID | s5ha84sl8ni.wl-tiwai@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 28, 2017 at 12:16:17PM +0200, Takashi Iwai wrote: > On Mon, 26 Jun 2017 19:54:49 +0200, > Daniel Vetter wrote: > > > > On Mon, Jun 26, 2017 at 7:47 PM, Takashi Iwai <tiwai@suse.de> wrote: > > > On Mon, 26 Jun 2017 18:16:30 +0200, > > > Daniel Vetter wrote: > > >> > > >> On Wed, Jun 21, 2017 at 05:30:10PM +0200, Takashi Iwai wrote: > > >> > On Wed, 21 Jun 2017 17:23:57 +0200, > > >> > Chris Wilson wrote: > > >> > > > > >> > > Quoting Daniel Vetter (2017-06-21 16:08:54) > > >> > > > So back when the i915 power well support landed in > > >> > > > > > >> > > > commit 99a2008d0b32d72dfc2a54e7be1eb698dd2e3bd6 > > >> > > > Author: Wang Xingchao <xingchao.wang@linux.intel.com> > > >> > > > Date: Thu May 30 22:07:10 2013 +0800 > > >> > > > > > >> > > > ALSA: hda - Add power-welll support for haswell HDA > > >> > > > > > >> > > > the logic to handle the cross-module depencies was hand-rolled using a > > >> > > > async work item, and that just doesn't work. > > >> > > > > > >> > > > The correct way to handle cross-module deps is either: > > >> > > > - request_module + failing when the other module isn't there > > >> > > > > > >> > > > OR > > >> > > > > > >> > > > - failing the module load with EPROBE_DEFER. > > >> > > > > > >> > > > You can't mix them, if you do then the entire load path just > > >> > > > busy-spins blowing through cpu cycles forever with no way to stop > > >> > > > this. > > >> > > > > > >> > > > snd-hda-intel does mix it, because the hda codec drivers are loaded > > >> > > > using request_module, but the i915 depency is handled using > > >> > > > PROBE_DEFER (or well, should be, but I haven't found any code at all). > > >> > > > This is a major pain when trying to debug i915 load failures. > > >> > > > > > >> > > > This patch here is a horrible hackish attempt at somewhat correctly > > >> > > > wriing EPROBE_DEFER through. Stuff that's missing: > > >> > > > - Check all the other places where load errors are conveniently > > >> > > > dropped on the floor. > > >> > > > - Also fix up the firmware_cb path. > > >> > > > - Drop the debug noise I've left in to make it clear this isn't > > >> > > > anything for merging. > > >> > > > > >> > > This tames "hdaudio hdaudioC0D0: Unable to bind the codec" which was > > >> > > continuously spewing previously, and now the system is usable again. > > >> > > > >> > Could you give a failing scenario? I'm not opposing to the suggested > > >> > solution, we need to fix the mess in anyway, but I just would like to > > >> > know how to trigger the problem easily. > > >> > > >> Disable i915 loading e.g. with i915.modeset=0. Watch how snd-hda* > > >> collective blow through 100% of the cpu time spewing into dmesg (and make > > >> the system completely unuseable for kernel work because you can't find > > >> your own debug printk anymore). > > > > > > Ah, that's the case we discussed in the past. We know that it's > > > problematic for component binding, but we're ignoring this scenario > > > because it's supposed to be no real use-case but only for some > > > temporary workarounds. > > > > > > We had some bigger-hammer patchset, but it didn't justify for the > > > further development of the reasoning above. > > > > > >> This is on a snb, where we don't even need the cross-module stuff ... But > > >> I think it goes sideways in other cases too, if you simply build but don't > > >> load i915. So every time an i915 breaks module load things become real > > >> painful. > > > > > > Even on SNB, we still need i915 for the HDMI/DP ELD notification. The > > > hardware inquiry over HD-audio verb was so unstable, so we rather take > > > a path directly inquiring to the gfx driver. > > > > Ah right, forgot about that. > > > > >> Unfortunately the patch is a bit too big for our fixup branch in drm-tip, > > >> so plan B would be to stop building snd-hda (which will make the intel > > >> audio team unhappy, but mea culpa if they don't fix this mess). > > > > > > OK, let me think and take a look for older patchset, too. > > > > Yeah would be great if we can somehow address this, preferrably using > > EPROBE_DEFER or something else that's standard. At least the component > > stuff really doesn't work without wiring EPROBE_DEFER through. > > Now I took a closer look, and this appears rather like a brown paper > bag bug, not about the deferred probe or module dependency. > The fix patch is below. Could you check whether it works? Yay, this works! Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > > thanks, > > Takashi > > -- 8< -- > From: Takashi Iwai <tiwai@suse.de> > Subject: [PATCH] ALSA: hda - Fix endless loop of codec configure > > azx_codec_configure() loops over the codecs found on the given > controller via a linked list. The code used to work in the past, but > in the current version, this may lead to an endless loop when a codec > binding returns an error. > > The culprit is that the snd_hda_codec_configure() unregisters the > device upon error, and this eventually deletes the given codec object > from the bus. Since the list is initialized via list_del_init(), the > next object points to the same device itself. This behavior change > was introduced at splitting the HD-audio code code, and forgotten to > adapt it here. > > For fixing this bug, just use a *_safe() version of list iteration. > > Fixes: d068ebc25e6e ("ALSA: hda - Move some codes up to hdac_bus struct") > Reported-by: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: <stable@vger.kernel.org> > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > sound/pci/hda/hda_codec.h | 2 ++ > sound/pci/hda/hda_controller.c | 8 ++++++-- > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h > index d6fb2d5d01a7..60ce1cfc300f 100644 > --- a/sound/pci/hda/hda_codec.h > +++ b/sound/pci/hda/hda_codec.h > @@ -295,6 +295,8 @@ struct hda_codec { > > #define list_for_each_codec(c, bus) \ > list_for_each_entry(c, &(bus)->core.codec_list, core.list) > +#define list_for_each_codec_safe(c, n, bus) \ > + list_for_each_entry_safe(c, n, &(bus)->core.codec_list, core.list) > > /* snd_hda_codec_read/write optional flags */ > #define HDA_RW_NO_RESPONSE_FALLBACK (1 << 0) > diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c > index 3715a5725613..1c60beb5b70a 100644 > --- a/sound/pci/hda/hda_controller.c > +++ b/sound/pci/hda/hda_controller.c > @@ -1337,8 +1337,12 @@ EXPORT_SYMBOL_GPL(azx_probe_codecs); > /* configure each codec instance */ > int azx_codec_configure(struct azx *chip) > { > - struct hda_codec *codec; > - list_for_each_codec(codec, &chip->bus) { > + struct hda_codec *codec, *next; > + > + /* use _safe version here since snd_hda_codec_configure() deregisters > + * the device upon error and deletes itself from the bus list. > + */ > + list_for_each_codec_safe(codec, next, &chip->bus) { > snd_hda_codec_configure(codec); > } > return 0; > -- > 2.13.2 >
On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> Now I took a closer look, and this appears rather like a brown paper >> bag bug, not about the deferred probe or module dependency. >> The fix patch is below. Could you check whether it works? > > Yay, this works! > > Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Next one: i915 module reloading is broken because something is holding onto a module reference and doesn't drop it. Didn't check which sets of patches introduced this, but iirc this worked last week. Disabling hda-intel gets in Kconfig gets rid of the problem, so I assume something in the sound driver is leaking that reference ... It's also causing lots and lots of red in our CI :( If we can't fix this we need to disable snd-hda-intel there too. -Daniel
On Tue, 04 Jul 2017 17:14:39 +0200, Daniel Vetter wrote: > > On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote: > >> Now I took a closer look, and this appears rather like a brown paper > >> bag bug, not about the deferred probe or module dependency. > >> The fix patch is below. Could you check whether it works? > > > > Yay, this works! > > > > Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> > > Next one: i915 module reloading is broken because something is holding > onto a module reference and doesn't drop it. Didn't check which sets > of patches introduced this, but iirc this worked last week. Disabling > hda-intel gets in Kconfig gets rid of the problem, so I assume > something in the sound driver is leaking that reference ... > > It's also causing lots and lots of red in our CI :( If we can't fix > this we need to disable snd-hda-intel there too. I spotted out a typo in my previous patch that leads to the module reference unbalance. The fix is already in sound.git tree today: https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?h=for-linus&id=fc18282cdcba984ab89c74d7e844c10114ae0795 The bug was introduced after 4.12. thanks, -- Takashi Iwai <tiwai@suse.de>
On Tue, Jul 4, 2017 at 5:28 PM, Takashi Iwai <tiwai@suse.de> wrote: > On Tue, 04 Jul 2017 17:14:39 +0200, > Daniel Vetter wrote: >> On Thu, Jun 29, 2017 at 12:25 PM, Daniel Vetter <daniel@ffwll.ch> wrote: >> >> Now I took a closer look, and this appears rather like a brown paper >> >> bag bug, not about the deferred probe or module dependency. >> >> The fix patch is below. Could you check whether it works? >> > >> > Yay, this works! >> > >> > Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> >> >> Next one: i915 module reloading is broken because something is holding >> onto a module reference and doesn't drop it. Didn't check which sets >> of patches introduced this, but iirc this worked last week. Disabling >> hda-intel gets in Kconfig gets rid of the problem, so I assume >> something in the sound driver is leaking that reference ... >> >> It's also causing lots and lots of red in our CI :( If we can't fix >> this we need to disable snd-hda-intel there too. > > I spotted out a typo in my previous patch that leads to the module > reference unbalance. The fix is already in sound.git tree today: > https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git/commit/?h=for-linus&id=fc18282cdcba984ab89c74d7e844c10114ae0795 > > The bug was introduced after 4.12. Ok, CI over here confirmed that it's all good again. Thanks, Daniel
diff --git a/sound/pci/hda/hda_codec.h b/sound/pci/hda/hda_codec.h index d6fb2d5d01a7..60ce1cfc300f 100644 --- a/sound/pci/hda/hda_codec.h +++ b/sound/pci/hda/hda_codec.h @@ -295,6 +295,8 @@ struct hda_codec { #define list_for_each_codec(c, bus) \ list_for_each_entry(c, &(bus)->core.codec_list, core.list) +#define list_for_each_codec_safe(c, n, bus) \ + list_for_each_entry_safe(c, n, &(bus)->core.codec_list, core.list) /* snd_hda_codec_read/write optional flags */ #define HDA_RW_NO_RESPONSE_FALLBACK (1 << 0) diff --git a/sound/pci/hda/hda_controller.c b/sound/pci/hda/hda_controller.c index 3715a5725613..1c60beb5b70a 100644 --- a/sound/pci/hda/hda_controller.c +++ b/sound/pci/hda/hda_controller.c @@ -1337,8 +1337,12 @@ EXPORT_SYMBOL_GPL(azx_probe_codecs); /* configure each codec instance */ int azx_codec_configure(struct azx *chip) { - struct hda_codec *codec; - list_for_each_codec(codec, &chip->bus) { + struct hda_codec *codec, *next; + + /* use _safe version here since snd_hda_codec_configure() deregisters + * the device upon error and deletes itself from the bus list. + */ + list_for_each_codec_safe(codec, next, &chip->bus) { snd_hda_codec_configure(codec); } return 0;