Message ID | 1547194442-1487-1-git-send-email-rohitkr@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ASoC: soc-core: Fix null pointer dereference in soc_find_component | expand |
On 1/11/19 2:14 AM, Rohit kumar wrote: > From: Ajit Pandey <ajitp@codeaurora.org> > > soc_find_component() may lead to null pointer exception if both > arguments i.e of_node and name is NULL. Add NULL check before > calling soc_find_component(). Also fix some typos. Thanks for the overnight fix. This update fixes the issue on my Skylake XPS13 test device (blind testing since I don't understand what the code does). Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > > Fixes: 8780cf1142a5 ("ASoC: soc-core: defer card probe until all component is added to list") > Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > Signed-off-by: Ajit Pandey <ajitp@codeaurora.org> > Signed-off-by: Rohit kumar <rohitkr@codeaurora.org> > --- > sound/soc/soc-core.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c > index 0934b36..df05fb8 100644 > --- a/sound/soc/soc-core.c > +++ b/sound/soc/soc-core.c > @@ -1131,11 +1131,13 @@ static int soc_init_dai_link(struct snd_soc_card *card, > } > > /* > - * Defer card registartion if platform dai component is not added to > + * Defer card registration if platform dai component is not added to > * component list. > */ > - if (!soc_find_component(link->platform->of_node, link->platform->name)) > - return -EPROBE_DEFER; > + if (link->platform->of_node || link->platform->name) > + if (!soc_find_component(link->platform->of_node, > + link->platform->name)) > + return -EPROBE_DEFER; > > /* > * CPU device may be specified by either name or OF node, but > @@ -1150,11 +1152,12 @@ static int soc_init_dai_link(struct snd_soc_card *card, > } > > /* > - * Defer card registartion if cpu dai component is not added to > + * Defer card registration if cpu dai component is not added to > * component list. > */ > - if (!soc_find_component(link->cpu_of_node, link->cpu_name)) > - return -EPROBE_DEFER; > + if (link->cpu_of_node || link->cpu_name) > + if (!soc_find_component(link->cpu_of_node, link->cpu_name)) > + return -EPROBE_DEFER; > > /* > * At least one of CPU DAI name or CPU device name/node must be
> Thanks for the overnight fix. This update fixes the issue on my > Skylake XPS13 test device (blind testing since I don't understand what > the code does). > > Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> I need to take this back, this set of changes (initial+fix) causes an error with our HDMI support [ 17.437684] sof-audio sof-audio: created machine bxt-pcm512x [ 17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1 [ 17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517 Removing your changes restores the functionality Adding some traces I can see that the the platform name we use doesn't seem compatible with your logic. All the Intel boards used a constant platform name matching the PCI ID, see e.g. [1], which IIRC is used to bind components. Liam, do you recall in more details if this is really required? [1] https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/boards/bxt_da7219_max98357a.c#L475 [ 18.205812] plb: platform name sof-audio [ 18.206059] plb: cpu_name (null) [ 18.206234] plb: platform name 0000:00:0e.0 [ 18.206459] plb: returning -EPROBE_DEFER 1 [ 18.206686] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1 [ 18.207054] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517 diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index cbafbdd02483..ae731212f82b 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1133,11 +1133,15 @@ static int soc_init_dai_link(struct snd_soc_card *card, * Defer card registration if platform dai component is not added to * component list. */ + pr_err("plb: platform name %s\n", link->platform->name); if (link->platform->of_node || link->platform->name) if (!soc_find_component(link->platform->of_node, link->platform->name)) - return -EPROBE_DEFER; + { + pr_err("plb: returning -EPROBE_DEFER 1\n"); + return -EPROBE_DEFER; + } /* * CPU device may be specified by either name or OF node, but * can be left unspecified, and will be matched based on DAI @@ -1154,9 +1158,14 @@ static int soc_init_dai_link(struct snd_soc_card *card, * Defer card registration if cpu dai component is not added to * component list. */ + pr_err("plb: cpu_name %s\n", link->cpu_name); if (link->cpu_of_node || link->cpu_name) if (!soc_find_component(link->cpu_of_node, link->cpu_name)) - return -EPROBE_DEFER; + { + pr_err("plb: returning -EPROBE_DEFER 2\n"); + return -EPROBE_DEFER; + + } /* * At least one of CPU DAI name or CPU device name/node must be
On 1/12/2019 3:19 AM, Pierre-Louis Bossart wrote: > >> Thanks for the overnight fix. This update fixes the issue on my >> Skylake XPS13 test device (blind testing since I don't understand >> what the code does). >> >> Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > > I need to take this back, this set of changes (initial+fix) causes an > error with our HDMI support > > [ 17.437684] sof-audio sof-audio: created machine bxt-pcm512x > [ 17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1 > [ 17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517 > > Removing your changes restores the functionality > Looks like we should revert generic implementation for defering probe and move to call from machine driver as done in v1. https://lore.kernel.org/patchwork/patch/1027560/ https://lore.kernel.org/patchwork/patch/1027561/ @Mark, Do you have suggestion to refine current patch? > Adding some traces I can see that the the platform name we use doesn't > seem compatible with your logic. All the Intel boards used a constant > platform name matching the PCI ID, see e.g. [1], which IIRC is used to > bind components. Liam, do you recall in more details if this is really > required? > > [1] > https://elixir.bootlin.com/linux/latest/source/sound/soc/intel/boards/bxt_da7219_max98357a.c#L475 I think if it does not set platform name as 0000:00:0e.0, it should take snd-soc-dummy as platform name and that might fix the issue. snd-soc-dummy does have component associated with it. https://elixir.bootlin.com/linux/latest/source/sound/soc/soc-utils.c#L281 > > [ 18.205812] plb: platform name sof-audio > [ 18.206059] plb: cpu_name (null) > [ 18.206234] plb: platform name 0000:00:0e.0 > [ 18.206459] plb: returning -EPROBE_DEFER 1 > [ 18.206686] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1 > [ 18.207054] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517 > > diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c > index cbafbdd02483..ae731212f82b 100644 > --- a/sound/soc/soc-core.c > +++ b/sound/soc/soc-core.c > @@ -1133,11 +1133,15 @@ static int soc_init_dai_link(struct > snd_soc_card *card, > * Defer card registration if platform dai component is not > added to > * component list. > */ > + pr_err("plb: platform name %s\n", link->platform->name); > if (link->platform->of_node || link->platform->name) > if (!soc_find_component(link->platform->of_node, > link->platform->name)) > - return -EPROBE_DEFER; > + { > + pr_err("plb: returning -EPROBE_DEFER 1\n"); > + return -EPROBE_DEFER; > > + } > /* > * CPU device may be specified by either name or OF node, but > * can be left unspecified, and will be matched based on DAI > @@ -1154,9 +1158,14 @@ static int soc_init_dai_link(struct > snd_soc_card *card, > * Defer card registration if cpu dai component is not added to > * component list. > */ > + pr_err("plb: cpu_name %s\n", link->cpu_name); > if (link->cpu_of_node || link->cpu_name) > if (!soc_find_component(link->cpu_of_node, > link->cpu_name)) > - return -EPROBE_DEFER; > + { > + pr_err("plb: returning -EPROBE_DEFER 2\n"); > + return -EPROBE_DEFER; > + > + } > > /* > * At least one of CPU DAI name or CPU device name/node must be > Thanks, Rohit
On Sat, 2019-01-12 at 11:37 +0530, Rohit kumar wrote: > On 1/12/2019 3:19 AM, Pierre-Louis Bossart wrote: > > > Thanks for the overnight fix. This update fixes the issue on my > > > Skylake XPS13 test device (blind testing since I don't understand > > > what the code does). > > > > > > Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> > > > > I need to take this back, this set of changes (initial+fix) causes an > > error with our HDMI support > > > > [ 17.437684] sof-audio sof-audio: created machine bxt-pcm512x > > [ 17.585279] bxt-pcm512x bxt-pcm512x: ASoC: failed to init link iDisp1 > > [ 17.585639] bxt-pcm512x bxt-pcm512x: snd_soc_register_card failed -517 > > > > Removing your changes restores the functionality > > > Looks like we should revert generic implementation for defering probe > and move to call from machine driver as done in v1. > https://lore.kernel.org/patchwork/patch/1027560/ > https://lore.kernel.org/patchwork/patch/1027561/ > > @Mark, Do you have suggestion to refine current patch? > > Adding some traces I can see that the the platform name we use doesn't > > seem compatible with your logic. All the Intel boards used a constant > > platform name matching the PCI ID, see e.g. [1], which IIRC is used to > > bind components. Liam, do you recall in more details if this is really > > required? Sorry, I cant quite remember why the PCI ID was used for the platform name, I think it started with the SKL generation as previous generations used "baytrail- pcm" and "haswell-pcm" as platform names IIRC. Perhaps Vinod will know. The platform name is only used by SOF when over writing the "legacy" platform name e.g. "baytrail-pcm" would become "sof-audio" and this is only used for binding DAI links (so that all legacy machine drivers can be reused without modification). Liam
On Fri, Jan 11, 2019 at 01:44:02PM +0530, Rohit kumar wrote: > - if (!soc_find_component(link->platform->of_node, link->platform->name)) > - return -EPROBE_DEFER; > + if (link->platform->of_node || link->platform->name) > + if (!soc_find_component(link->platform->of_node, > + link->platform->name)) > + return -EPROBE_DEFER; If we need to do this for every user (which we do pretty much it seems) we should just be doing it inside find_component().
On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote: > Adding some traces I can see that the the platform name we use doesn't seem > compatible with your logic. All the Intel boards used a constant platform > name matching the PCI ID, see e.g. [1], which IIRC is used to bind > components. Liam, do you recall in more details if this is really required? That's telling me that either snd_soc_find_components() isn't finding components in the same way that we do when we bind things which isn't good or we're binding links without having fully matched everything on the link which also isn't good. Without a system that shows the issue I can't 100% confirm but I think it's the former - I'm fairly sure that those machines are relying on the component name being initialized to fmt_single_name() in snd_soc_component_initialize(). That is supposed to wind up using dev_name() (which would be the PCI address for a PCI device) as the basis of the name. What I can't currently see is how exactly that gets bound (or how any of the other links avoid trouble for that matter). We could revert and push this into cards but I would rather be confident that we understand what's going on, I'm not comfortable that we aren't just pushing the breakage around rather than fixing it. Can someone with an x86 system take a look and confirm exactly what's going on with binding these cards please?
On 1/14/19 6:06 PM, Mark Brown wrote: > On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote: > >> Adding some traces I can see that the the platform name we use doesn't seem >> compatible with your logic. All the Intel boards used a constant platform >> name matching the PCI ID, see e.g. [1], which IIRC is used to bind >> components. Liam, do you recall in more details if this is really required? > That's telling me that either snd_soc_find_components() isn't finding > components in the same way that we do when we bind things which isn't > good or we're binding links without having fully matched everything on > the link which also isn't good. > > Without a system that shows the issue I can't 100% confirm but I think > it's the former - I'm fairly sure that those machines are relying on the > component name being initialized to fmt_single_name() in > snd_soc_component_initialize(). That is supposed to wind up using > dev_name() (which would be the PCI address for a PCI device) as the > basis of the name. What I can't currently see is how exactly that gets > bound (or how any of the other links avoid trouble for that matter). We > could revert and push this into cards but I would rather be confident > that we understand what's going on, I'm not comfortable that we aren't > just pushing the breakage around rather than fixing it. Can someone > with an x86 system take a look and confirm exactly what's going on with > binding these cards please? I am actually not sure at all why we need the platform_name to be set in Intel machine drivers. I ran a 5mn experiment with SOF and we completely ignore what is set by machine drivers, it's overridden by the component name: dev_info(card->dev, "info: override FE DAI link %s\n", card->dai_link[i].name); /* override platform component */ if (snd_soc_init_platform(card, dai_link) < 0) { dev_err(card->dev, "init platform error"); continue; } pr_err("plb: platform_name was %s, now %s\n", dai_link->platform->name, component->name); dai_link->platform->name = component->name; existing machine driver (info is incorrect btw, should be BE DAI link) [ 36.628466] bxt-pcm512x bxt-pcm512x: info: override FE DAI link SSP5-Codec [ 36.628469] plb: platform_name was sof-audio, now sof-audio [ 36.628772] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp1 [ 36.628773] plb: platform_name was 0000:00:0e.0, now sof-audio [ 36.629111] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp2 [ 36.629112] plb: platform_name was 0000:00:0e.0, now sof-audio [ 36.629422] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp3 [ 36.629423] plb: platform_name was 0000:00:0e.0, now sof-audio machine driver with all platform_name commented out, no regression at all. [ 15.839652] sof-audio sof-audio: created machine bxt-pcm512x [ 15.846990] bxt-pcm512x bxt-pcm512x: info: override FE DAI link SSP5-Codec [ 15.846995] plb: platform_name was snd-soc-dummy, now sof-audio [ 15.847325] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp1 [ 15.847326] plb: platform_name was snd-soc-dummy, now sof-audio [ 15.847657] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp2 [ 15.847658] plb: platform_name was snd-soc-dummy, now sof-audio [ 15.847974] bxt-pcm512x bxt-pcm512x: info: override FE DAI link iDisp3 [ 15.847974] plb: platform_name was snd-soc-dummy, now sof-audio I checked for a bit longer with the Skylake driver, I also couldn't see a difference with or without the platform_name set. diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 0462b3ec977a..0fdf99ec17cd 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -918,7 +918,7 @@ static int soc_bind_dai_link(struct snd_soc_card *card, if (!snd_soc_is_matching_component(dai_link->platform, component)) continue; - + pr_err("plb: binding with component_name %s \n", component->name); snd_soc_rtdcom_add(rtd, component); } @@ -1041,6 +1041,8 @@ static int snd_soc_init_platform(struct snd_soc_card *card, if (!platform) return -ENOMEM; + pr_err("plb: %s: plaform->name set to %s\n", __func__, + dai_link->platform_name); dai_link->platform = platform; platform->name = dai_link->platform_name; platform->of_node = dai_link->platform_of_node; [ 1.345143] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3 [ 1.345148] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL Audio [ 1.345151] plb: binding with component_name 0000:00:1f.3 [ 1.345153] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3 [ 1.345154] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI1 [ 1.345155] plb: binding with component_name 0000:00:1f.3 [ 1.345157] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3 [ 1.345158] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI2 [ 1.345159] plb: binding with component_name 0000:00:1f.3 [ 1.345161] plb: snd_soc_init_platform: plaform->name set to 0000:00:1f.3 [ 1.345162] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI3 [ 1.345163] plb: binding with component_name 0000:00:1f.3 [ 1.349607] plb: snd_soc_init_platform: plaform->name set to (null) [ 1.349613] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL Audio [ 1.349617] plb: binding with component_name snd-soc-dummy [ 1.349619] plb: binding with component_name snd-soc-dummy [ 1.349621] plb: snd_soc_init_platform: plaform->name set to (null) [ 1.349623] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI1 [ 1.349625] plb: binding with component_name snd-soc-dummy [ 1.349626] plb: binding with component_name snd-soc-dummy [ 1.349628] plb: snd_soc_init_platform: plaform->name set to (null) [ 1.349631] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI2 [ 1.349632] plb: binding with component_name snd-soc-dummy [ 1.349633] plb: binding with component_name snd-soc-dummy [ 1.349635] plb: snd_soc_init_platform: plaform->name set to (null) [ 1.349638] skl_hda_dsp_generic skl_hda_dsp_generic: ASoC: binding CNL HDMI3 [ 1.349639] plb: binding with component_name snd-soc-dummy [ 1.349641] plb: binding with component_name snd-soc-dummy Audio playback works in both cases. The funny thing is that the list of components is right in /sys/kernel/debug/asoc. My guess is that the required PCI component is already added when the platform_name is used in soc_bind_dai_link() and snd_soc_rtdcom_add() does nothing for the back-end, so the dailink platform_name has actually zero influence on the outcome. Another way to look at it is that we already provide the dai_link->cpu_dai_name which is enough for soc_bind_dai_link() to find the relevant component and as a result the dailink->platform_name seems redundant/unnecessary. I am using the conditional here since this part of the code (Intel driver included) seems to work by accident rather than by design, and we'd need a set of additional tests before removing these hard-coded initializations. Does this help?
On 1/14/19 6:06 PM, Mark Brown wrote: > On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote: > >> Adding some traces I can see that the the platform name we use doesn't seem >> compatible with your logic. All the Intel boards used a constant platform >> name matching the PCI ID, see e.g. [1], which IIRC is used to bind >> components. Liam, do you recall in more details if this is really required? > That's telling me that either snd_soc_find_components() isn't finding > components in the same way that we do when we bind things which isn't > good or we're binding links without having fully matched everything on > the link which also isn't good. > > Without a system that shows the issue I can't 100% confirm but I think > it's the former - I'm fairly sure that those machines are relying on the > component name being initialized to fmt_single_name() in > snd_soc_component_initialize(). That is supposed to wind up using > dev_name() (which would be the PCI address for a PCI device) as the > basis of the name. What I can't currently see is how exactly that gets > bound (or how any of the other links avoid trouble for that matter). We > could revert and push this into cards but I would rather be confident > that we understand what's going on, I'm not comfortable that we aren't > just pushing the breakage around rather than fixing it. Can someone > with an x86 system take a look and confirm exactly what's going on with > binding these cards please? Beyond the fact that the platform_name seems to be totally useless, additional tests show that the patch ('ASoC: soc-core: defer card probe until all component is added to list') adds a new restriction which contradicts existing error checks. None of the Intel machine drivers set the dailink "cpu_name" field but use the "cpu_dai_name" field instead. This was perfectly legit as documented by the code at the end of soc_init_dai_link() /* * At least one of CPU DAI name or CPU device name/node must be * specified */ if (!link->cpu_dai_name && !(link->cpu_name || link->cpu_of_node)) { dev_err(card->dev, "ASoC: Neither cpu_dai_name nor cpu_name/of_node are set for %s\n", link->name); return -EINVAL; } The code contributed by Qualcomm only checks for cpu_name, which prevents the init from completing. So if we want to be consistent, the new code should be something like: diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index b680c673c553..2791da9417f8 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card *card, * Defer card registartion if cpu dai component is not added to * component list. */ - if (!soc_find_component(link->cpu_of_node, link->cpu_name)) + if (!link->cpu_dai_name && !soc_find_component(link->cpu_of_node, link->cpu_name)) return -EPROBE_DEFER; /* or try to call soc_find_component with both cpu_name or cpu_dai_name, if this makes sense?
On Tue, Jan 15, 2019 at 01:35:07PM -0600, Pierre-Louis Bossart wrote: > On 1/14/19 6:06 PM, Mark Brown wrote: > > just pushing the breakage around rather than fixing it. Can someone > > with an x86 system take a look and confirm exactly what's going on with > > binding these cards please? > Beyond the fact that the platform_name seems to be totally useless, > additional tests show that the patch ('ASoC: soc-core: defer card probe > until all component is added to list') adds a new restriction which > contradicts existing error checks. Yes... I'd been coming to the conclusion that it was a huge red herring here. > So if we want to be consistent, the new code should be something like: > diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c > index b680c673c553..2791da9417f8 100644 > --- a/sound/soc/soc-core.c > +++ b/sound/soc/soc-core.c > @@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card > *card, > * Defer card registartion if cpu dai component is not added to > * component list. > */ > - if (!soc_find_component(link->cpu_of_node, link->cpu_name)) > + if (!link->cpu_dai_name && !soc_find_component(link->cpu_of_node, > link->cpu_name)) > return -EPROBE_DEFER; > > /* > or try to call soc_find_component with both cpu_name or cpu_dai_name, if > this makes sense? I think calling _find_component() makes more sense here as it will do the check it's actually there thing no matter how the link is identified. Assuming that does resolve the issue do you want to make a patch given that you got there first?
On Tue, Jan 15, 2019 at 01:35:07PM -0600, Pierre-Louis Bossart wrote: > > On 1/14/19 6:06 PM, Mark Brown wrote: > > On Fri, Jan 11, 2019 at 03:49:08PM -0600, Pierre-Louis Bossart wrote: > > > > > Adding some traces I can see that the the platform name we use doesn't seem > > > compatible with your logic. All the Intel boards used a constant platform > > > name matching the PCI ID, see e.g. [1], which IIRC is used to bind > > > components. Liam, do you recall in more details if this is really required? > > That's telling me that either snd_soc_find_components() isn't finding > > components in the same way that we do when we bind things which isn't > > good or we're binding links without having fully matched everything on > > the link which also isn't good. > > > > Without a system that shows the issue I can't 100% confirm but I think > > it's the former - I'm fairly sure that those machines are relying on the > > component name being initialized to fmt_single_name() in > > snd_soc_component_initialize(). That is supposed to wind up using > > dev_name() (which would be the PCI address for a PCI device) as the > > basis of the name. What I can't currently see is how exactly that gets > > bound (or how any of the other links avoid trouble for that matter). We > > could revert and push this into cards but I would rather be confident > > that we understand what's going on, I'm not comfortable that we aren't > > just pushing the breakage around rather than fixing it. Can someone > > with an x86 system take a look and confirm exactly what's going on with > > binding these cards please? > > Beyond the fact that the platform_name seems to be totally useless, > additional tests show that the patch ('ASoC: soc-core: defer card probe > until all component is added to list') adds a new restriction which > contradicts existing error checks. > > None of the Intel machine drivers set the dailink "cpu_name" field but use > the "cpu_dai_name" field instead. This was perfectly legit as documented by > the code at the end of soc_init_dai_link() This should be fixed by the patch "ASoC: core: Don't defer probe on optional, NULL components" which Mark already applied to his tree. See http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html Maybe the defer card probe logic needs to be extended to also check if dai_link_name had already been registered (either cpu or cpu_dai_name needs to be set), not 100% sure which problem the defer card probe patch was trying to solve. so long, Hias > > /* > * At least one of CPU DAI name or CPU device name/node must be > * specified > */ > if (!link->cpu_dai_name && > !(link->cpu_name || link->cpu_of_node)) { > dev_err(card->dev, > "ASoC: Neither cpu_dai_name nor cpu_name/of_node are set for > %s\n", > link->name); > return -EINVAL; > } > > The code contributed by Qualcomm only checks for cpu_name, which prevents > the init from completing. > > So if we want to be consistent, the new code should be something like: > > diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c > index b680c673c553..2791da9417f8 100644 > --- a/sound/soc/soc-core.c > +++ b/sound/soc/soc-core.c > @@ -1154,7 +1154,7 @@ static int soc_init_dai_link(struct snd_soc_card > *card, > * Defer card registartion if cpu dai component is not added to > * component list. > */ > - if (!soc_find_component(link->cpu_of_node, link->cpu_name)) > + if (!link->cpu_dai_name && !soc_find_component(link->cpu_of_node, > link->cpu_name)) > return -EPROBE_DEFER; > > /* > > or try to call soc_find_component with both cpu_name or cpu_dai_name, if > this makes sense? > _______________________________________________ > Alsa-devel mailing list > Alsa-devel@alsa-project.org > http://mailman.alsa-project.org/mailman/listinfo/alsa-devel
>> Beyond the fact that the platform_name seems to be totally useless, >> additional tests show that the patch ('ASoC: soc-core: defer card probe >> until all component is added to list') adds a new restriction which >> contradicts existing error checks. >> >> None of the Intel machine drivers set the dailink "cpu_name" field but use >> the "cpu_dai_name" field instead. This was perfectly legit as documented by >> the code at the end of soc_init_dai_link() > This should be fixed by the patch > "ASoC: core: Don't defer probe on optional, NULL components" which Mark > already applied to his tree. See > http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html Ah yes, I missed this patch while I was debugging. Indeed this fixes the problem and my devices work again with Mark's for-next branch. Thanks Matthias! > > Maybe the defer card probe logic needs to be extended to also check if > dai_link_name had already been registered (either cpu or cpu_dai_name > needs to be set), not 100% sure which problem the defer card probe patch > was trying to solve. same here, I don't get why the deferred probe stuff only deals with one of the two options.
On Tue, Jan 15, 2019 at 03:16:57PM -0600, Pierre-Louis Bossart wrote: > > Maybe the defer card probe logic needs to be extended to also check if > > dai_link_name had already been registered (either cpu or cpu_dai_name > > needs to be set), not 100% sure which problem the defer card probe patch > > was trying to solve. We were getting cards probing without the platforms being registered (which in turn meant we were just skipping their init) and had patches proposed to implement the deferral in the cards. The deferral stuff is supposed to making sure that everything is registered when we instantiate. > same here, I don't get why the deferred probe stuff only deals with one of > the two options. I think it's just an oversight - I think the change you were proposing to check the cpu_dai_name is a good idea anyway as it makes things more consistent and work more obviously by intention. And more generally if we can simplify the code by removing legacy options that'd be good but that's a bigger bit of work...
On Tue, Jan 15, 2019 at 09:41:38PM +0000, Mark Brown wrote: > On Tue, Jan 15, 2019 at 03:16:57PM -0600, Pierre-Louis Bossart wrote: > > > > Maybe the defer card probe logic needs to be extended to also check if > > > dai_link_name had already been registered (either cpu or cpu_dai_name > > > needs to be set), not 100% sure which problem the defer card probe patch > > > was trying to solve. > > We were getting cards probing without the platforms being registered > (which in turn meant we were just skipping their init) and had patches > proposed to implement the deferral in the cards. The deferral stuff is > supposed to making sure that everything is registered when we > instantiate. Ah, that makes sense. Thanks a lot for the info! so long, Hias > > > same here, I don't get why the deferred probe stuff only deals with one of > > the two options. > > I think it's just an oversight - I think the change you were proposing > to check the cpu_dai_name is a good idea anyway as it makes things more > consistent and work more obviously by intention. And more generally if > we can simplify the code by removing legacy options that'd be good but > that's a bigger bit of work...
On 1/15/19 3:16 PM, Pierre-Louis Bossart wrote: > >>> Beyond the fact that the platform_name seems to be totally useless, >>> additional tests show that the patch ('ASoC: soc-core: defer card probe >>> until all component is added to list') adds a new restriction which >>> contradicts existing error checks. >>> >>> None of the Intel machine drivers set the dailink "cpu_name" field >>> but use >>> the "cpu_dai_name" field instead. This was perfectly legit as >>> documented by >>> the code at the end of soc_init_dai_link() >> This should be fixed by the patch >> "ASoC: core: Don't defer probe on optional, NULL components" which Mark >> already applied to his tree. See >> http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html >> > > Ah yes, I missed this patch while I was debugging. Indeed this fixes > the problem and my devices work again with Mark's for-next branch. > Thanks Matthias! This PROBE_DEFER support actually breaks the topology override that we've been relying on for SOF (and which has been in Mark's branch for some time now). This override helps us reuse machine drivers between legacy and SOF-based solutions. With the current code, the tests in soc_register_card() complain that the platform_name can't be tied to a component and stop the card registration, but that's mainly because the tests are done before the topology overrides are done in soc_check_tplg_fes(). Moving soc_check_tplg_fes() from soc_instantiate_card() to an earlier time in soc_register_card() works-around the problem but looks quite invasive (mutex lock, etc). There is also a second problem where we seem to have a memory management issue root caused to the change in snd_soc_init_platform() added by 09ac6a817bd6 ('ASoC: soc-core: fix init platform memory handling') The code does this static int snd_soc_init_platform(struct snd_soc_card *card, struct snd_soc_dai_link *dai_link) { struct snd_soc_dai_link_component *platform = dai_link->platform; /* convert Legacy platform link */ if (!platform || dai_link->legacy_platform) { platform = devm_kzalloc(card->dev, sizeof(struct snd_soc_dai_link_component), GFP_KERNEL); if (!platform) return -ENOMEM; dai_link->platform = platform; dai_link->legacy_platform = 1; This last assignment guarantees that memory will be allocated every time this function is called, and whatever overrides are done later will themselves be overridden by the new allocation. I am not sure what the intent was here, Curtis can you please double-check? Details, test code and logs are available here: https://github.com/thesofproject/linux/issues/565 Have a nice week-end everyone, that's it for me until Tuesday. -Pierre
On Fri, Jan 18, 2019 at 3:02 PM Pierre-Louis Bossart < pierre-louis.bossart@linux.intel.com> wrote: > > On 1/15/19 3:16 PM, Pierre-Louis Bossart wrote: > > > >>> Beyond the fact that the platform_name seems to be totally useless, > >>> additional tests show that the patch ('ASoC: soc-core: defer card probe > >>> until all component is added to list') adds a new restriction which > >>> contradicts existing error checks. > >>> > >>> None of the Intel machine drivers set the dailink "cpu_name" field > >>> but use > >>> the "cpu_dai_name" field instead. This was perfectly legit as > >>> documented by > >>> the code at the end of soc_init_dai_link() > >> This should be fixed by the patch > >> "ASoC: core: Don't defer probe on optional, NULL components" which Mark > >> already applied to his tree. See > >> > http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html > >> > > > > Ah yes, I missed this patch while I was debugging. Indeed this fixes > > the problem and my devices work again with Mark's for-next branch. > > Thanks Matthias! > > This PROBE_DEFER support actually breaks the topology override that > we've been relying on for SOF (and which has been in Mark's branch for > some time now). This override helps us reuse machine drivers between > legacy and SOF-based solutions. > > With the current code, the tests in soc_register_card() complain that > the platform_name can't be tied to a component and stop the card > registration, but that's mainly because the tests are done before the > topology overrides are done in soc_check_tplg_fes(). Moving > soc_check_tplg_fes() from soc_instantiate_card() to an earlier time in > soc_register_card() works-around the problem but looks quite invasive > (mutex lock, etc). > > There is also a second problem where we seem to have a memory management > issue root caused to the change in snd_soc_init_platform() added by > 09ac6a817bd6 ('ASoC: soc-core: fix init platform memory handling') > > The code does this > > static int snd_soc_init_platform(struct snd_soc_card *card, > struct snd_soc_dai_link *dai_link) > { > struct snd_soc_dai_link_component *platform = dai_link->platform; > > > /* convert Legacy platform link */ > if (!platform || dai_link->legacy_platform) { > platform = devm_kzalloc(card->dev, > sizeof(struct snd_soc_dai_link_component), > GFP_KERNEL); > if (!platform) > return -ENOMEM; > > dai_link->platform = platform; > dai_link->legacy_platform = 1; > > This last assignment guarantees that memory will be allocated every time > this function is called, and whatever overrides are done later will > themselves be overridden by the new allocation. I am not sure what the > intent was here, Curtis can you please double-check? > > The issue was that we were seeing a memory corruption bug on an AMD chromebooks with that function already (not observed on Intel). I was testing some SOF integrations and was seeing this in the kernel logs. I had Dylan verify my logic before I sent the patch because it took so long to identify the bug and it was traced to the patch that introduce soc_init_platform. [ 10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI designware-i2s.1.auto not registered [ 10.922122] cz-da7219-max98357a AMD7219:00: devm_snd_soc_register_card(acpd7219m98357) failed: -517 [ 11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform name/of_node are set for amd-max98357-play [ 11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init link amd-max98357-play [ 11.001431] cz-da7219-max98357a AMD7219:00: devm_snd_soc_register_card(acpd7219m98357) failed: -22 [ 11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22 of_node was never getting set but the pointer was becoming populated (outside of the probe call) which traced to soc_init_platform function which was not reallocating memory on a EPROBE_DEFER even though it was getting freed by devm. I am not very familiar with devm but my local maintainers say that it should be freeing the memory even on a PROBE_DEFER. The patch should mirror the memory behaviour in snd_soc_init_multicodec which also reallocates its memory on every probe. I'm not sure how the patch is causing you to defer, is your component list corrupt? Details, test code and logs are available here: > https://github.com/thesofproject/linux/issues/565 > > Have a nice week-end everyone, that's it for me until Tuesday. > > -Pierre > > > >
On Fri, Jan 18, 2019 at 5:12 PM Curtis Malainey <cujomalainey@google.com> wrote: > > > > On Fri, Jan 18, 2019 at 3:02 PM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: >> >> >> On 1/15/19 3:16 PM, Pierre-Louis Bossart wrote: >> > >> >>> Beyond the fact that the platform_name seems to be totally useless, >> >>> additional tests show that the patch ('ASoC: soc-core: defer card probe >> >>> until all component is added to list') adds a new restriction which >> >>> contradicts existing error checks. >> >>> >> >>> None of the Intel machine drivers set the dailink "cpu_name" field >> >>> but use >> >>> the "cpu_dai_name" field instead. This was perfectly legit as >> >>> documented by >> >>> the code at the end of soc_init_dai_link() >> >> This should be fixed by the patch >> >> "ASoC: core: Don't defer probe on optional, NULL components" which Mark >> >> already applied to his tree. See >> >> http://mailman.alsa-project.org/pipermail/alsa-devel/2019-January/144323.html >> >> >> > >> > Ah yes, I missed this patch while I was debugging. Indeed this fixes >> > the problem and my devices work again with Mark's for-next branch. >> > Thanks Matthias! >> >> This PROBE_DEFER support actually breaks the topology override that >> we've been relying on for SOF (and which has been in Mark's branch for >> some time now). This override helps us reuse machine drivers between >> legacy and SOF-based solutions. >> >> With the current code, the tests in soc_register_card() complain that >> the platform_name can't be tied to a component and stop the card >> registration, but that's mainly because the tests are done before the >> topology overrides are done in soc_check_tplg_fes(). Moving >> soc_check_tplg_fes() from soc_instantiate_card() to an earlier time in >> soc_register_card() works-around the problem but looks quite invasive >> (mutex lock, etc). >> >> There is also a second problem where we seem to have a memory management >> issue root caused to the change in snd_soc_init_platform() added by >> 09ac6a817bd6 ('ASoC: soc-core: fix init platform memory handling') >> >> The code does this >> >> static int snd_soc_init_platform(struct snd_soc_card *card, >> struct snd_soc_dai_link *dai_link) >> { >> struct snd_soc_dai_link_component *platform = dai_link->platform; >> >> >> /* convert Legacy platform link */ >> if (!platform || dai_link->legacy_platform) { >> platform = devm_kzalloc(card->dev, >> sizeof(struct snd_soc_dai_link_component), >> GFP_KERNEL); >> if (!platform) >> return -ENOMEM; >> >> dai_link->platform = platform; >> dai_link->legacy_platform = 1; >> >> This last assignment guarantees that memory will be allocated every time >> this function is called, and whatever overrides are done later will >> themselves be overridden by the new allocation. I am not sure what the >> intent was here, Curtis can you please double-check? >> The issue was that we were seeing a memory corruption bug on an AMD chromebooks with that function already (not observed on Intel). I was testing some SOF integrations and was seeing this in the kernel logs. I had Dylan verify my logic before I sent the patch because it took so long to identify the bug and it was traced to the patch that introduce soc_init_platform. [ 10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI designware-i2s.1.auto not registered [ 10.922122] cz-da7219-max98357a AMD7219:00: devm_snd_soc_register_card(acpd7219m98357) failed: -517 [ 11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform name/of_node are set for amd-max98357-play [ 11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init link amd-max98357-play [ 11.001431] cz-da7219-max98357a AMD7219:00: devm_snd_soc_register_card(acpd7219m98357) failed: -22 [ 11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22 of_node was never getting set but the pointer was becoming populated (outside of the probe call) which traced to soc_init_platform function which was not reallocating memory on a EPROBE_DEFER even though it was getting freed by devm. I am not very familiar with devm but my local maintainers say that it should be freeing the memory even on a PROBE_DEFER. The patch should mirror the memory behaviour in snd_soc_init_multicodec which also reallocates its memory on every probe. I'm not sure how the patch is causing you to defer, is your component list corrupt? Sorry for the duplicate spam, forgot to send via plain text mode, re-sending for the mailing list so it gets accepted. > >> Details, test code and logs are available here: >> https://github.com/thesofproject/linux/issues/565 >> >> Have a nice week-end everyone, that's it for me until Tuesday. >> >> -Pierre >> >> >>
On Fri, Jan 18, 2019 at 05:15:32PM -0800, Curtis Malainey wrote: > of_node was never getting set but the pointer was becoming populated > (outside of the probe call) which traced to soc_init_platform function > which was not reallocating memory on a EPROBE_DEFER even though it was > getting freed by devm. I am not very familiar with devm but my local > maintainers say that it should be freeing the memory even on a > PROBE_DEFER. Probe deferral is just like any other error from probe, any managed resources will be unwound.
On Fri, Jan 18, 2019 at 05:02:08PM -0600, Pierre-Louis Bossart wrote: > This PROBE_DEFER support actually breaks the topology override that we've > been relying on for SOF (and which has been in Mark's branch for some time > now). This override helps us reuse machine drivers between legacy and > SOF-based solutions. > With the current code, the tests in soc_register_card() complain that the > platform_name can't be tied to a component and stop the card registration, > but that's mainly because the tests are done before the topology overrides > are done in soc_check_tplg_fes(). Moving soc_check_tplg_fes() from > soc_instantiate_card() to an earlier time in soc_register_card() > works-around the problem but looks quite invasive (mutex lock, etc). Right, I see. I don't think there's going to be any non-invasive solution here, either we need to stop overriding bits of the machine driver relevant to binding in the topology files like this or we need to look at the topology files in the probe deferral stuff so it's either going to be very invasive for the machine driver or the core. I'm tempted to say that the machine driver is a better option here. > This last assignment guarantees that memory will be allocated every time > this function is called, and whatever overrides are done later will > themselves be overridden by the new allocation. I am not sure what the > intent was here, Curtis can you please double-check? The intent is to avoid modifying statically allocated data so we can tell the difference between pointers set up statically as init data and those that are filled in at runtime.
> The issue was that we were seeing a memory corruption bug on an AMD > chromebooks with that function already (not observed on Intel). I was > testing some SOF integrations and was seeing this in the kernel logs. > I had Dylan verify my logic before I sent the patch because it took so > long to identify the bug and it was traced to the patch that introduce > soc_init_platform. > > [ 10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI > designware-i2s.1.auto not registered > [ 10.922122] cz-da7219-max98357a AMD7219:00: > devm_snd_soc_register_card(acpd7219m98357) failed: -517 > [ 11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform > name/of_node are set for amd-max98357-play > [ 11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init > link amd-max98357-play > [ 11.001431] cz-da7219-max98357a AMD7219:00: > devm_snd_soc_register_card(acpd7219m98357) failed: -22 > [ 11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22 > > of_node was never getting set but the pointer was becoming populated > (outside of the probe call) which traced to soc_init_platform function > which was not reallocating memory on a EPROBE_DEFER even though it was > getting freed by devm. I am not very familiar with devm but my local > maintainers say that it should be freeing the memory even on a > PROBE_DEFER. > The patch should mirror the memory behaviour in > snd_soc_init_multicodec which also reallocates its memory on every > probe. I'm not sure how the patch is causing you to defer, is your > component list corrupt? > > Sorry for the duplicate spam, forgot to send via plain text mode, > re-sending for the mailing list so it gets accepted. There is no defer issue with the intel stuff, but we call this routine multiple times snd_soc_register_card --soc_init_dai_link ----snd_soc_init_platform -- soc_soc_bind_card ----snd_soc_instantiate_card ------ soc_check_tplg_fes -------- snd_soc_init_platform << ALLOC1 --------soc_init_dai_link ----------snd_soc_init_platform << ALLOC2 Initially dai_link->legacy_platform is 0, so gets set after the first first devm_kzalloc (ALLOC1) and after that we always allocate new memory (ALLOC2). The end result is that whatever we set in soc_check_tplg_fes is lost with the new/unnecessary alloc. I would guess your solution is also a work-around, if devm_ effectively freed the memory then the pointer would become NULL. Or may that's the issue is that no one actually resets it.
Curtis Malainey | Software Engineer | cujomalainey@google.com | 650-898-3849 On Wed, Jan 23, 2019 at 4:11 AM Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> wrote: > > > > The issue was that we were seeing a memory corruption bug on an AMD > > chromebooks with that function already (not observed on Intel). I was > > testing some SOF integrations and was seeing this in the kernel logs. > > I had Dylan verify my logic before I sent the patch because it took so > > long to identify the bug and it was traced to the patch that introduce > > soc_init_platform. > > > > [ 10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI > > designware-i2s.1.auto not registered > > [ 10.922122] cz-da7219-max98357a AMD7219:00: > > devm_snd_soc_register_card(acpd7219m98357) failed: -517 > > [ 11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform > > name/of_node are set for amd-max98357-play > > [ 11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init > > link amd-max98357-play > > [ 11.001431] cz-da7219-max98357a AMD7219:00: > > devm_snd_soc_register_card(acpd7219m98357) failed: -22 > > [ 11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22 > > > > of_node was never getting set but the pointer was becoming populated > > (outside of the probe call) which traced to soc_init_platform function > > which was not reallocating memory on a EPROBE_DEFER even though it was > > getting freed by devm. I am not very familiar with devm but my local > > maintainers say that it should be freeing the memory even on a > > PROBE_DEFER. > > The patch should mirror the memory behaviour in > > snd_soc_init_multicodec which also reallocates its memory on every > > probe. I'm not sure how the patch is causing you to defer, is your > > component list corrupt? > > > > Sorry for the duplicate spam, forgot to send via plain text mode, > > re-sending for the mailing list so it gets accepted. > > There is no defer issue with the intel stuff, but we call this routine > multiple times > > snd_soc_register_card > > --soc_init_dai_link > > ----snd_soc_init_platform > > -- soc_soc_bind_card > > ----snd_soc_instantiate_card > > ------ soc_check_tplg_fes > > -------- snd_soc_init_platform << ALLOC1 > > --------soc_init_dai_link > > ----------snd_soc_init_platform << ALLOC2 > Ah that explains it, in my testing I didn't have the patch that brought in the call from within tplg_fes > > Initially dai_link->legacy_platform is 0, so gets set after the first > first devm_kzalloc (ALLOC1) and after that we always allocate new memory > (ALLOC2). The end result is that whatever we set in soc_check_tplg_fes > is lost with the new/unnecessary alloc. > > I would guess your solution is also a work-around, if devm_ effectively > freed the memory then the pointer would become NULL. Or may that's the > issue is that no one actually resets it. > > Yes, its a work around to fix the memory issue. If you set the platform in the machine driver the code will ignore it and not reset it. That being said that is not a full proof workaround and a better solution is definitely needed. We could go and clean up the pointers in soc_instantiate_card based on the flag being set. That way we only relocate on a NULL pointer like we used to but still don't affect statically allocated memory. I will draft a patch, test it on the AMD device, reply to this thread later with it, Pierre can you test it as well? I am curious why soc_check_tplg_fes is calling snd_soc_init_platform. It should have already been called earlier, in soc_init_dai_link at the beginning of snd_soc_register_card so the memory should already be initialized. Unless I am missing somewhere where links are getting added between the calls.
On 1/22/19 7:36 PM, Curtis Malainey wrote: > Curtis Malainey | Software Engineer | cujomalainey@google.com | 650-898-3849 > > > On Wed, Jan 23, 2019 at 4:11 AM Pierre-Louis Bossart > <pierre-louis.bossart@linux.intel.com> wrote: >> >>> The issue was that we were seeing a memory corruption bug on an AMD >>> chromebooks with that function already (not observed on Intel). I was >>> testing some SOF integrations and was seeing this in the kernel logs. >>> I had Dylan verify my logic before I sent the patch because it took so >>> long to identify the bug and it was traced to the patch that introduce >>> soc_init_platform. >>> >>> [ 10.922112] cz-da7219-max98357a AMD7219:00: ASoC: CPU DAI >>> designware-i2s.1.auto not registered >>> [ 10.922122] cz-da7219-max98357a AMD7219:00: >>> devm_snd_soc_register_card(acpd7219m98357) failed: -517 >>> [ 11.001411] cz-da7219-max98357a AMD7219:00: ASoC: Both platform >>> name/of_node are set for amd-max98357-play >>> [ 11.001423] cz-da7219-max98357a AMD7219:00: ASoC: failed to init >>> link amd-max98357-play >>> [ 11.001431] cz-da7219-max98357a AMD7219:00: >>> devm_snd_soc_register_card(acpd7219m98357) failed: -22 >>> [ 11.001577] cz-da7219-max98357a: probe of AMD7219:00 failed with error -22 >>> >>> of_node was never getting set but the pointer was becoming populated >>> (outside of the probe call) which traced to soc_init_platform function >>> which was not reallocating memory on a EPROBE_DEFER even though it was >>> getting freed by devm. I am not very familiar with devm but my local >>> maintainers say that it should be freeing the memory even on a >>> PROBE_DEFER. >>> The patch should mirror the memory behaviour in >>> snd_soc_init_multicodec which also reallocates its memory on every >>> probe. I'm not sure how the patch is causing you to defer, is your >>> component list corrupt? >>> >>> Sorry for the duplicate spam, forgot to send via plain text mode, >>> re-sending for the mailing list so it gets accepted. >> There is no defer issue with the intel stuff, but we call this routine >> multiple times >> >> snd_soc_register_card >> >> --soc_init_dai_link >> >> ----snd_soc_init_platform >> >> -- soc_soc_bind_card >> >> ----snd_soc_instantiate_card >> >> ------ soc_check_tplg_fes >> >> -------- snd_soc_init_platform << ALLOC1 >> >> --------soc_init_dai_link >> >> ----------snd_soc_init_platform << ALLOC2 >> > Ah that explains it, in my testing I didn't have the patch that > brought in the call from within tplg_fes >> Initially dai_link->legacy_platform is 0, so gets set after the first >> first devm_kzalloc (ALLOC1) and after that we always allocate new memory >> (ALLOC2). The end result is that whatever we set in soc_check_tplg_fes >> is lost with the new/unnecessary alloc. >> >> I would guess your solution is also a work-around, if devm_ effectively >> freed the memory then the pointer would become NULL. Or may that's the >> issue is that no one actually resets it. >> >> > Yes, its a work around to fix the memory issue. If you set the > platform in the machine driver the code will ignore it and not reset > it. That being said that is not a full proof workaround and a better > solution is definitely needed. We could go and clean up the pointers > in soc_instantiate_card based on the flag being set. That way we only > relocate on a NULL pointer like we used to but still don't affect > statically allocated memory. I will draft a patch, test it on the AMD > device, reply to this thread later with it, Pierre can you test it as > well? > > I am curious why soc_check_tplg_fes is calling snd_soc_init_platform. > It should have already been called earlier, in soc_init_dai_link at > the beginning of snd_soc_register_card so the memory should already be > initialized. Unless I am missing somewhere where links are getting > added between the calls. This is actually a second order problem, the main issue i have is that the very first call to init_dai_link fails with the new DEFER_PROBE handling. I don't quite understand what Linaro/AMD folks are doing but I trust their changes are legitimate. To move forward, maybe it's not worth spending too much time on a grand unification of string theory, there are simpler solutions: the Intel machine drivers already do get the platform driver name as an platform_data argument, so we could modify the dailinks platform names before even registering the card. I tested with the attached proof-of-concept patch, it adds 2 lines of code per machine driver if we use a common helper (after the transition to the "modern" dailink representation that's needed anyways) so maybe it's better in the end? the override we care about is really the automatic handling of all the hard-coded front-ends, the platform-name override isn't really a battle i want to pick or spend time on. From 5680c64b09964b134e20bf96142d1ce5dcf0f77f Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Date: Tue, 22 Jan 2019 18:53:43 -0600 Subject: [PATCH] ASoC: add helper to change platform name for all dailinks To reuse the same machine drivers with Atom/SST, Skylake and SOF, we need to change the default platform_name (or platforms->name in the "modern" representation). So far, this override was done with an automatic override, which was broken by a set of changes for DT platforms related to deferred probe handling. This automatic override is actually not really needed, the machine driver can already receive the platform name as a platform_data parameter. This is used e.g. for HDaudio support where we have different PCI aliases used for different platforms. We can reuse the same mechanism and modify the machine drivers to override the dailinks prior to registrating the card. This will require additional work for SOF, but with this helper it'll be just two lines of additional code per machine driver which is reused, not the end of the world. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> --- include/sound/soc.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/sound/soc.h b/include/sound/soc.h index b71b05019c68..59873084f3bd 100644 --- a/include/sound/soc.h +++ b/include/sound/soc.h @@ -1583,6 +1583,29 @@ struct snd_soc_dai *snd_soc_card_get_codec_dai(struct snd_soc_card *card, return NULL; } +static inline +int snd_soc_fixup_dai_links_platform_name(struct snd_soc_card *card, + const char *platform_name) +{ + struct snd_soc_dai_link *dai_link; + int i; + + if (!platform_name) /* nothing to do */ + return 0; + + /* set platform name for each dailink */ + for_each_card_prelinks(card, i, dai_link) { + /* only single platform is supported for now */ + dai_link->platforms->name = devm_kstrdup(card->dev, + platform_name, + GFP_KERNEL); + if (!dai_link->platforms->name) + return -ENOMEM; + } + + return 0; +} + #ifdef CONFIG_DEBUG_FS extern struct dentry *snd_soc_debugfs_root; #endif
On Tue, Jan 22, 2019 at 08:01:15PM -0600, Pierre-Louis Bossart wrote: > changes are legitimate. To move forward, maybe it's not worth spending too > much time on a grand unification of string theory, there are simpler > solutions: the Intel machine drivers already do get the platform driver name > as an platform_data argument, so we could modify the dailinks platform names > before even registering the card. I tested with the attached Yes, that would be much better - it's vastly more idiomatic. The general idea is that a machine driver should know what it's expecting to find before it starts probing.
>> changes are legitimate. To move forward, maybe it's not worth spending too >> much time on a grand unification of string theory, there are simpler >> solutions: the Intel machine drivers already do get the platform driver name >> as an platform_data argument, so we could modify the dailinks platform names >> before even registering the card. I tested with the attached > Yes, that would be much better - it's vastly more idiomatic. The > general idea is that a machine driver should know what it's expecting to > find before it starts probing. Thanks for the feedback, will send a formal patch with the helper and machine driver changes after I test more with the legacy drivers. Do you have a preference for one patch that deals with multiple machines drivers in one shot, or individual patches? The latter are nicer for backports (e.g. for Chrome), the former nicer for maintainers... The goal of reusing machine drivers as is isn't really achievable anyways, it looks like we are going to have additional changes, e.g. if we want to avoid the calls to snd_pcm_suspend as suggested by Takashi, we'll have to add a reference to snd_soc_pm_ops that's only used for SOF, the Atom/SST driver does things in different ways mostly due to historical reasons. -Pierre
On Thu, Jan 24, 2019 at 01:07:17PM -0600, Pierre-Louis Bossart wrote: > Thanks for the feedback, will send a formal patch with the helper and > machine driver changes after I test more with the legacy drivers. Do you > have a preference for one patch that deals with multiple machines drivers in > one shot, or individual patches? The latter are nicer for backports (e.g. > for Chrome), the former nicer for maintainers... More patches is good, it doesn't make a huge difference if I get one big patch or a series of repetitive patches - big serieses are more of an issue if they're all different patches needing individual review.
I have a patch to fix the memory leak but I haven't been able to test it yet because I am remote right now and I accidentally bootlooped the AMD device I am working on. I will have this tested early next week. Here is the patch for anyone interested. Curtis Malainey | Software Engineer | cujomalainey@google.com | 650-898-3849 On Fri, Jan 25, 2019 at 3:26 AM Mark Brown <broonie@kernel.org> wrote: > > On Thu, Jan 24, 2019 at 01:07:17PM -0600, Pierre-Louis Bossart wrote: > > > Thanks for the feedback, will send a formal patch with the helper and > > machine driver changes after I test more with the legacy drivers. Do you > > have a preference for one patch that deals with multiple machines drivers in > > one shot, or individual patches? The latter are nicer for backports (e.g. > > for Chrome), the former nicer for maintainers... > > More patches is good, it doesn't make a huge difference if I get one big > patch or a series of repetitive patches - big serieses are more of an > issue if they're all different patches needing individual review.
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 0934b36..df05fb8 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -1131,11 +1131,13 @@ static int soc_init_dai_link(struct snd_soc_card *card, } /* - * Defer card registartion if platform dai component is not added to + * Defer card registration if platform dai component is not added to * component list. */ - if (!soc_find_component(link->platform->of_node, link->platform->name)) - return -EPROBE_DEFER; + if (link->platform->of_node || link->platform->name) + if (!soc_find_component(link->platform->of_node, + link->platform->name)) + return -EPROBE_DEFER; /* * CPU device may be specified by either name or OF node, but @@ -1150,11 +1152,12 @@ static int soc_init_dai_link(struct snd_soc_card *card, } /* - * Defer card registartion if cpu dai component is not added to + * Defer card registration if cpu dai component is not added to * component list. */ - if (!soc_find_component(link->cpu_of_node, link->cpu_name)) - return -EPROBE_DEFER; + if (link->cpu_of_node || link->cpu_name) + if (!soc_find_component(link->cpu_of_node, link->cpu_name)) + return -EPROBE_DEFER; /* * At least one of CPU DAI name or CPU device name/node must be