Message ID | 1533301025-19980-1-git-send-email-jonathanh@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] ASoC: core: Optimise suspend/resume of DAPM widgets | expand |
On Fri, Aug 03, 2018 at 01:57:05PM +0100, Jon Hunter wrote: > For soundcards that have several DAI links and many DAPM widgets the > time taken for snd_soc_suspend to execute has been observed to be > several milliseconds. The time is largely spent executing > dapm_power_widgets() for each for the DAI links that need to be > suspended. Given that dapm_power_widgets() is called again after > suspending/resuming the DAI links, one way to optimise the > suspend/resume time is to avoid calling dapm_power_widgets() for > each DAI link and reduces the suspend time significantly. It's a bit alarming that dapm_power_widgets() is taking substantial enough time to be worth worring about - it's *supposed* to be relatively well optimized so it's effectively free. It'll never be quite free, but close enough. The goal is that if nothing changes we end up testing a few nodes at most before we figure out that nothing changed state and stop. Do you have any idea what it's spending its time on, we do call it a lot so if there's any optimization opportunties there we can probably get a lot of benefit out of them. One thing that it occurs to me might help is to start the suspend process by powering down all the input and output widgets that aren't ignoring suspend in one operation, that should hopefully have the effect of ensuring that most of the system that's going to get powered down does so on the first pass through. > Please note that this has been observed on the Tegra210 Jetson TX1 > platform which is not currently supported in the mainline for audio > but has been tested with out-of-tree patches to enable I2S audio. If someone could get the platform booting reliably in -next that'd be good but that's a separate issue... > In the resume path, it is not clear if there could be any issues from > not sync'ing the DAPM power state until after unmuting and resuming > the CPU DAI drivers, because this will happen later with this change. This is a definite problem, we want to have the audio path powered up before we start playing audio otherwise we loose the start of the audio which may be important.
On 03/08/18 17:36, Mark Brown wrote: > On Fri, Aug 03, 2018 at 01:57:05PM +0100, Jon Hunter wrote: > >> For soundcards that have several DAI links and many DAPM widgets the >> time taken for snd_soc_suspend to execute has been observed to be >> several milliseconds. The time is largely spent executing >> dapm_power_widgets() for each for the DAI links that need to be >> suspended. Given that dapm_power_widgets() is called again after >> suspending/resuming the DAI links, one way to optimise the >> suspend/resume time is to avoid calling dapm_power_widgets() for >> each DAI link and reduces the suspend time significantly. > > It's a bit alarming that dapm_power_widgets() is taking substantial > enough time to be worth worring about - it's *supposed* to be relatively > well optimized so it's effectively free. It'll never be quite free, but > close enough. The goal is that if nothing changes we end up testing a > few nodes at most before we figure out that nothing changed state and > stop. Do you have any idea what it's spending its time on, we do call > it a lot so if there's any optimization opportunties there we can > probably get a lot of benefit out of them. Sorry for the delay, I was out last week. I had taken some ftrace graphs but there was not one thing that really stood out. Looking again it seems that each call to async_schedule_domain() can take tens of uS and so it there are a lot of DAPM widgets (100+) this can take some time. I will dig into it a bit further. > One thing that it occurs to me might help is to start the suspend > process by powering down all the input and output widgets that aren't > ignoring suspend in one operation, that should hopefully have the effect > of ensuring that most of the system that's going to get powered down > does so on the first pass through. If the async_schedule_domain() calls are the problem, then it may not help as we call that for all dapms apart from the card->dapm. >> Please note that this has been observed on the Tegra210 Jetson TX1 >> platform which is not currently supported in the mainline for audio >> but has been tested with out-of-tree patches to enable I2S audio. > > If someone could get the platform booting reliably in -next that'd be > good but that's a separate issue... Yes I plan to work on that in the next few months. >> In the resume path, it is not clear if there could be any issues from >> not sync'ing the DAPM power state until after unmuting and resuming >> the CPU DAI drivers, because this will happen later with this change. > > This is a definite problem, we want to have the audio path powered up > before we start playing audio otherwise we loose the start of the audio > which may be important. I was thinking I could add another call to snd_soc_dapm_sync() after resuming the streams to address that. However, maybe I need to dig into this a bit more to understand exactly why dapm_power_widgets() takes so long. Cheers Jon
On Mon, Aug 13, 2018 at 07:19:16PM +0100, Jon Hunter wrote: > > I had taken some ftrace graphs but there was not one thing that really > stood out. Looking again it seems that each call to > async_schedule_domain() can take tens of uS and so it there are a lot of > DAPM widgets (100+) this can take some time. I will dig into it a bit > further. We don't call that per widget, we call that per context. We might have a lot of contexts but it's definitely not number of widgets level. Perhaps what we need there is something which remembers if we actually changed anything in each context and/or there's relevant operations and then doesn't do this if not, we're working on the basis that these operations are fairly cheap. > > One thing that it occurs to me might help is to start the suspend > > process by powering down all the input and output widgets that aren't > > ignoring suspend in one operation, that should hopefully have the effect > > of ensuring that most of the system that's going to get powered down > > does so on the first pass through. > If the async_schedule_domain() calls are the problem, then it may not > help as we call that for all dapms apart from the card->dapm. No, we actually need to run the operations on all contexts - they can all do things if they want to. We just want them all to run in parallel so if we are doing things like waiting for capacitors to charge we don't do it in series.
On 14/08/18 15:40, Mark Brown wrote: > On Mon, Aug 13, 2018 at 07:19:16PM +0100, Jon Hunter wrote: >> >> I had taken some ftrace graphs but there was not one thing that really >> stood out. Looking again it seems that each call to >> async_schedule_domain() can take tens of uS and so it there are a lot of >> DAPM widgets (100+) this can take some time. I will dig into it a bit >> further. > > We don't call that per widget, we call that per context. We might have > a lot of contexts but it's definitely not number of widgets level. > Perhaps what we need there is something which remembers if we actually > changed anything in each context and/or there's relevant operations and > then doesn't do this if not, we're working on the basis that these > operations are fairly cheap. Yes I see that now. >>> One thing that it occurs to me might help is to start the suspend >>> process by powering down all the input and output widgets that aren't >>> ignoring suspend in one operation, that should hopefully have the effect >>> of ensuring that most of the system that's going to get powered down >>> does so on the first pass through. > >> If the async_schedule_domain() calls are the problem, then it may not >> help as we call that for all dapms apart from the card->dapm. > > No, we actually need to run the operations on all contexts - they can > all do things if they want to. We just want them all to run in parallel > so if we are doing things like waiting for capacitors to charge we don't > do it in series. So I did a bit more digging and this time on Tegra124 Jetson-TK1 (which is supported in the mainline). On Jetson-TK1 I see that dapm_power_widgets is called 4 times on entering suspend and each call then calls the dapm_pre/post_sequence_async() functions 3 times. Interestingly, at least for Jetson-TK1, it appears that the calls to dapm_pre/post_sequence_async() do not actually change the bias state because the dapm contexts are already in the desired state. Please note that at this time audio is not active and so that is probably why. Anyway, I was interested in the overhead added of calling async_schedule_domain() to schedule the dapm_pre/post_sequence_async() function calls if there is nothing to be done. This is what I see ... Times for function dapm_power_widgets() are (us): Min 23, Ave 190, Max 434, Count 39 Where 'Count' is the number of times that dapm_power_widgets() has been called (both during boot and through a few suspend iterations). So then I made the following change to avoid scheduling the function calls unnecessarily (which I think should be fine) ... diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index 5ed90ea69a24..ff66a6834531 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -1955,7 +1955,7 @@ static int dapm_power_widgets(struct snd_soc_card *card, int event) dapm_pre_sequence_async(&card->dapm, 0); /* Run other bias changes in parallel */ list_for_each_entry(d, &card->dapm_list, list) { - if (d != &card->dapm) + if (d != &card->dapm && d->bias_level != d->target_bias_level) async_schedule_domain(dapm_pre_sequence_async, d, &async_domain); } @@ -1979,7 +1979,7 @@ static int dapm_power_widgets(struct snd_soc_card *card, int event) /* Run all the bias changes in parallel */ list_for_each_entry(d, &card->dapm_list, list) { - if (d != &card->dapm) + if (d != &card->dapm && d->bias_level != d->target_bias_level) async_schedule_domain(dapm_post_sequence_async, d, &async_domain); } And with the above change I now see ... Times for function dapm_power_widgets() are (us): Min 4, Ave 16, Max 82, Count 39 Please note that I am just using ktime_get() to log the time on entry and exit from dapm_power_widgets() and so yes the time may not be purely the time take to execute this function if we are preempted, but I am measuring in the same way in both cases and so it does seem to show some improvement. Let me know your thoughts. Cheers Jon
On Wed, Aug 15, 2018 at 07:48:47PM +0100, Jon Hunter wrote: > So then I made the following change to avoid scheduling the function > calls unnecessarily (which I think should be fine) ... > Please note that I am just using ktime_get() to log the time on entry > and exit from dapm_power_widgets() and so yes the time may not be > purely the time take to execute this function if we are preempted, > but I am measuring in the same way in both cases and so it does seem > to show some improvement. That seems like a simple and useful optimization, will you send a patch?
On 16/08/18 11:51, Mark Brown wrote: > On Wed, Aug 15, 2018 at 07:48:47PM +0100, Jon Hunter wrote: > >> So then I made the following change to avoid scheduling the function >> calls unnecessarily (which I think should be fine) ... > >> Please note that I am just using ktime_get() to log the time on entry >> and exit from dapm_power_widgets() and so yes the time may not be >> purely the time take to execute this function if we are preempted, >> but I am measuring in the same way in both cases and so it does seem >> to show some improvement. > > That seems like a simple and useful optimization, will you send a patch? Yes I will send a patch for this. Cheers Jon
diff --git a/include/sound/soc-dapm.h b/include/sound/soc-dapm.h index af9ef16cc34d..5dbd601154c4 100644 --- a/include/sound/soc-dapm.h +++ b/include/sound/soc-dapm.h @@ -427,6 +427,8 @@ void snd_soc_dapm_reset_cache(struct snd_soc_dapm_context *dapm); /* dapm events */ void snd_soc_dapm_stream_event(struct snd_soc_pcm_runtime *rtd, int stream, int event); +void snd_soc_dapm_stream_suspend(struct snd_soc_pcm_runtime *rtd, int stream); +void snd_soc_dapm_stream_resume(struct snd_soc_pcm_runtime *rtd, int stream); void snd_soc_dapm_shutdown(struct snd_soc_card *card); /* external DAPM widget events */ diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c index 81b27923303d..03c5fa95afb8 100644 --- a/sound/soc/soc-core.c +++ b/sound/soc/soc-core.c @@ -519,13 +519,8 @@ int snd_soc_suspend(struct device *dev) if (rtd->dai_link->ignore_suspend) continue; - snd_soc_dapm_stream_event(rtd, - SNDRV_PCM_STREAM_PLAYBACK, - SND_SOC_DAPM_STREAM_SUSPEND); - - snd_soc_dapm_stream_event(rtd, - SNDRV_PCM_STREAM_CAPTURE, - SND_SOC_DAPM_STREAM_SUSPEND); + snd_soc_dapm_stream_suspend(rtd, SNDRV_PCM_STREAM_PLAYBACK); + snd_soc_dapm_stream_suspend(rtd, SNDRV_PCM_STREAM_CAPTURE); } /* Recheck all endpoints too, their state is affected by suspend */ @@ -637,13 +632,8 @@ static void soc_resume_deferred(struct work_struct *work) if (rtd->dai_link->ignore_suspend) continue; - snd_soc_dapm_stream_event(rtd, - SNDRV_PCM_STREAM_PLAYBACK, - SND_SOC_DAPM_STREAM_RESUME); - - snd_soc_dapm_stream_event(rtd, - SNDRV_PCM_STREAM_CAPTURE, - SND_SOC_DAPM_STREAM_RESUME); + snd_soc_dapm_stream_resume(rtd, SNDRV_PCM_STREAM_PLAYBACK); + snd_soc_dapm_stream_resume(rtd, SNDRV_PCM_STREAM_CAPTURE); } /* unmute any active DACs */ diff --git a/sound/soc/soc-dapm.c b/sound/soc/soc-dapm.c index 7e96793050c9..7d4f75d29d55 100644 --- a/sound/soc/soc-dapm.c +++ b/sound/soc/soc-dapm.c @@ -4201,7 +4201,7 @@ void snd_soc_dapm_connect_dai_link_widgets(struct snd_soc_card *card) } static void soc_dapm_stream_event(struct snd_soc_pcm_runtime *rtd, int stream, - int event) + int event, bool power_widgets) { int i; @@ -4209,7 +4209,8 @@ static void soc_dapm_stream_event(struct snd_soc_pcm_runtime *rtd, int stream, for (i = 0; i < rtd->num_codecs; i++) soc_dapm_dai_stream_event(rtd->codec_dais[i], stream, event); - dapm_power_widgets(rtd->card, event); + if (power_widgets) + dapm_power_widgets(rtd->card, event); } /** @@ -4229,7 +4230,43 @@ void snd_soc_dapm_stream_event(struct snd_soc_pcm_runtime *rtd, int stream, struct snd_soc_card *card = rtd->card; mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_RUNTIME); - soc_dapm_stream_event(rtd, stream, event); + soc_dapm_stream_event(rtd, stream, event, true); + mutex_unlock(&card->dapm_mutex); +} + +/** + * snd_soc_dapm_stream_suspend - send a suspend stream event to the dapm core + * @rtd: PCM runtime data + * @stream: stream name + * + * Sends a suspend stream event to the dapm core. + * + * Returns 0 for success else error. + */ +void snd_soc_dapm_stream_suspend(struct snd_soc_pcm_runtime *rtd, int stream) +{ + struct snd_soc_card *card = rtd->card; + + mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_RUNTIME); + soc_dapm_stream_event(rtd, stream, SND_SOC_DAPM_STREAM_SUSPEND, false); + mutex_unlock(&card->dapm_mutex); +} + +/** + * snd_soc_dapm_stream_resume - send a resume stream event to the dapm core + * @rtd: PCM runtime data + * @stream: stream name + * + * Sends a resume stream event to the dapm core. + * + * Returns 0 for success else error. + */ +void snd_soc_dapm_stream_resume(struct snd_soc_pcm_runtime *rtd, int stream) +{ + struct snd_soc_card *card = rtd->card; + + mutex_lock_nested(&card->dapm_mutex, SND_SOC_DAPM_CLASS_RUNTIME); + soc_dapm_stream_event(rtd, stream, SND_SOC_DAPM_STREAM_RESUME, false); mutex_unlock(&card->dapm_mutex); }
For soundcards that have several DAI links and many DAPM widgets the time taken for snd_soc_suspend to execute has been observed to be several milliseconds. The time is largely spent executing dapm_power_widgets() for each for the DAI links that need to be suspended. Given that dapm_power_widgets() is called again after suspending/resuming the DAI links, one way to optimise the suspend/resume time is to avoid calling dapm_power_widgets() for each DAI link and reduces the suspend time significantly. Please note that this has been observed on the Tegra210 Jetson TX1 platform which is not currently supported in the mainline for audio but has been tested with out-of-tree patches to enable I2S audio. For Tegra210 the audio drivers (for I2S, etc) are implemented as component drivers to offer greater flexibility with dynamic audio routing within the Tegra Audio Processing Engine (APE) from userspace. The downside of this flexibility is that now there are a lof more DAI links and so the time to suspend the soundcard can take several milliseconds. In the resume path, it is not clear if there could be any issues from not sync'ing the DAPM power state until after unmuting and resuming the CPU DAI drivers, because this will happen later with this change. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> --- It is not 100% clear to me if there are any downsides to this approach but I am looking to find away to avoid calling dapm_power_widgets() so many times in the suspend path when there are many DAI links. Unfortunately, I don't think that I can simply mark some of the DAI links with 'ignore_suspend' because it could be possible that they are active. include/sound/soc-dapm.h | 2 ++ sound/soc/soc-core.c | 18 ++++-------------- sound/soc/soc-dapm.c | 43 ++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 46 insertions(+), 17 deletions(-)