Message ID | 20230317195128.3911155-1-john@metanate.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | ALSA: usb-audio: Fix recursive locking on XRUN | expand |
Hi, On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > snd_usb_queue_pending_output_urbs() may be called from > snd_pcm_ops::ack() which means the PCM stream is locked. > > For the normal case where the call back into the PCM core is via > prepare_output_urb() the "_under_stream_lock" variant of > snd_pcm_period_elapsed() is called, but when an error occurs and the > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > the stream which results in deadlock. > > Follow the example of snd_pcm_period_elapsed() by adding > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > lock is already held. > > Signed-off-by: John Keeping <john@metanate.com> > --- > include/sound/pcm.h | 1 + > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > sound/usb/endpoint.c | 18 +++++++++++------- > 3 files changed, 36 insertions(+), 11 deletions(-) The name of added kernel API implies me that you refer to existent 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux v5.14. In my opinion, unlike the version of period elapsed API, the version of XRUN API seems not to be necessarily required to ALSA PCM core, since PCM device drivers can implement .pointer callback in the part of PCM operation. When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get occurence of XRUN as a result of any operation relevant to hwptr movement (e.g. SNDRV_PCM_IOCTL_HWSYNC). Therefore I think it possible to fix the issue without the proposed kernel API. I can assume some scenario: 1. Failure at tasklet for URB completion It is softIRQ context. The stream lock is not acquired. It doesn't matter to call current XRUN API. 2. Failure at PCM operation called by ALSA PCM application It is process context. The stream lock is acquired before calling driver code. When detecting any type of failure, driver code stores the state. Then .pointer callback should return SNDRV_PCM_IOCTL_HWSYNC refering to the state. Of course, I'm not a developer for USB audio devices. I'm just a developer for the other type of packet-oriented drivers (IEC 61883-1/6 packet streaming engine for audio and music unit in IEEE 1394 bus). So I do not get every part of USB driver. However, from my experience for the packet-oriented drivers, I have the above concern about adding the new XRUN API. I apologize if miss-hitting the point for your issue. Regards Takashi Sakamoto
Hi John,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on tiwai-sound/for-next]
[also build test ERROR on tiwai-sound/for-linus v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/John-Keeping/ALSA-usb-audio-Fix-recursive-locking-on-XRUN/20230318-035430
base: https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
patch link: https://lore.kernel.org/r/20230317195128.3911155-1-john%40metanate.com
patch subject: [PATCH] ALSA: usb-audio: Fix recursive locking on XRUN
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20230318/202303181006.qGZbdrAN-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/21bbf1266d22cbc0e1ec7c8d535738f66bbc9801
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review John-Keeping/ALSA-usb-audio-Fix-recursive-locking-on-XRUN/20230318-035430
git checkout 21bbf1266d22cbc0e1ec7c8d535738f66bbc9801
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303181006.qGZbdrAN-lkp@intel.com/
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "snd_pcm_stop_xrun_under_stream_lock" [sound/usb/snd-usb-audio.ko] undefined!
Hi John,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on tiwai-sound/for-next]
[also build test ERROR on tiwai-sound/for-linus linus/master v6.3-rc2 next-20230317]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/John-Keeping/ALSA-usb-audio-Fix-recursive-locking-on-XRUN/20230318-035430
base: https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
patch link: https://lore.kernel.org/r/20230317195128.3911155-1-john%40metanate.com
patch subject: [PATCH] ALSA: usb-audio: Fix recursive locking on XRUN
config: x86_64-randconfig-a013-20230313 (https://download.01.org/0day-ci/archive/20230318/202303181348.8bLUUc1G-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/21bbf1266d22cbc0e1ec7c8d535738f66bbc9801
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review John-Keeping/ALSA-usb-audio-Fix-recursive-locking-on-XRUN/20230318-035430
git checkout 21bbf1266d22cbc0e1ec7c8d535738f66bbc9801
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303181348.8bLUUc1G-lkp@intel.com/
All errors (new ones prefixed by >>, old ones prefixed by <<):
>> ERROR: modpost: "snd_pcm_stop_xrun_under_stream_lock" [sound/usb/snd-usb-audio.ko] undefined!
On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > 1. Failure at tasklet for URB completion > > It is softIRQ context. The stream lock is not acquired. It doesn't > matter to call current XRUN API. > > 2. Failure at PCM operation called by ALSA PCM application > > It is process context. The stream lock is acquired before calling driver > code. When detecting any type of failure, driver code stores the state. > Then .pointer callback should return SNDRV_PCM_IOCTL_HWSYNC refering to > the state. Oops. I did copy-and-paste mistake here... It should be SNDRV_PCM_POS_XRUN instead of SNDRV_PCM_IOCTL_HWSYNC... Regards Takashi Sakamoto
Hi, On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > > snd_usb_queue_pending_output_urbs() may be called from > > snd_pcm_ops::ack() which means the PCM stream is locked. > > > > For the normal case where the call back into the PCM core is via > > prepare_output_urb() the "_under_stream_lock" variant of > > snd_pcm_period_elapsed() is called, but when an error occurs and the > > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > > the stream which results in deadlock. > > > > Follow the example of snd_pcm_period_elapsed() by adding > > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > > lock is already held. > > > > Signed-off-by: John Keeping <john@metanate.com> > > --- > > include/sound/pcm.h | 1 + > > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > > sound/usb/endpoint.c | 18 +++++++++++------- > > 3 files changed, 36 insertions(+), 11 deletions(-) > > The name of added kernel API implies me that you refer to existent > 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux > v5.14. > > In my opinion, unlike the version of period elapsed API, the version of > XRUN API seems not to be necessarily required to ALSA PCM core, since PCM > device drivers can implement .pointer callback in the part of PCM operation. > When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get > occurence of XRUN as a result of any operation relevant to hwptr movement > (e.g. SNDRV_PCM_IOCTL_HWSYNC). > > Therefore I think it possible to fix the issue without the proposed > kernel API. I can assume some scenario: > > 1. Failure at tasklet for URB completion > > It is softIRQ context. The stream lock is not acquired. It doesn't > matter to call current XRUN API. > > 2. Failure at PCM operation called by ALSA PCM application > > It is process context. The stream lock is acquired before calling driver > code. When detecting any type of failure, driver code stores the state. > Then .pointer callback should return SNDRV_PCM_POS_XRUNrefering to > the state. Although being inexperienced to hack driver for USB audio device class, I attempt to post the patch to fix the issue of recursive stream lock. I apologies in advance since the patch is not tested yet... The 'in_xrun' member is newly added to 'struct snd_usb_substream'. When detecting any failure, false is assigned to the member. The assignment is expected to be done in both softIRQ context, and process context with stream lock, thus no need to take care of cocurrent access (e.g. by usage of WRITE_ONCE/READ_ONCE). Typical ALSA PCM application periodically calls PCM operation which calls .pointer in driver code. As I described, returning SNDRV_PCM_POS_XRUN takes ALSA PCM core to handle XRUN state of PCM substream in the timing. The negative point of the patch is the delay of XRUN notification to user space application. In the point, I think the new kernel API introduced by your patch has advantage. The in_xrun member can be replaced with a kind of EP_STATE_ enumerations; i.e. EP_STATE_XRUN. In the case, we need some care so that the state should be referred from pcm.c. For your information. ``` --- sound/usb/card.h | 1 + sound/usb/endpoint.c | 18 +++++++++++------- sound/usb/pcm.c | 3 ++- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/sound/usb/card.h b/sound/usb/card.h index 6ec95b2edf86..cb07d5eb09ad 100644 --- a/sound/usb/card.h +++ b/sound/usb/card.h @@ -172,6 +172,7 @@ struct snd_usb_substream { unsigned int hwptr_done; /* processed byte position in the buffer */ unsigned int transfer_done; /* processed frames since last period update */ unsigned int frame_limit; /* limits number of packets in URB */ + bool in_xrun; /* data and sync endpoints for this stream */ unsigned int ep_num; /* the endpoint number */ diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c index 1e0af1179ca8..41266c169404 100644 --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -400,13 +400,17 @@ static int prepare_inbound_urb(struct snd_usb_endpoint *ep, } /* notify an error as XRUN to the assigned PCM data substream */ -static void notify_xrun(struct snd_usb_endpoint *ep) +static void notify_xrun(struct snd_usb_endpoint *ep, bool in_stream_lock) { struct snd_usb_substream *data_subs; data_subs = READ_ONCE(ep->data_subs); - if (data_subs && data_subs->pcm_substream) - snd_pcm_stop_xrun(data_subs->pcm_substream); + if (data_subs && data_subs->pcm_substream && !data_subs->in_xrun) { + if (in_stream_lock) + data_subs->in_xrun = true; + else + snd_pcm_stop_xrun(data_subs->pcm_substream); + } } static struct snd_usb_packet_info * @@ -498,7 +502,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, if (err == -EAGAIN) push_back_to_ready_list(ep, ctx); else - notify_xrun(ep); + notify_xrun(ep, in_stream_lock); return; } @@ -507,7 +511,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); - notify_xrun(ep); + notify_xrun(ep, in_stream_lock); return; } @@ -574,7 +578,7 @@ static void snd_complete_urb(struct urb *urb) return; usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n", err); - notify_xrun(ep); + notify_xrun(ep, false); exit_clear: clear_bit(ctx->index, &ep->active_mask); @@ -1762,7 +1766,7 @@ static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "next package FIFO overflow EP 0x%x\n", ep->ep_num); - notify_xrun(ep); + notify_xrun(ep, false); return; } diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c index d959da7a1afb..8889c81297db 100644 --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -75,7 +75,7 @@ static snd_pcm_uframes_t snd_usb_pcm_pointer(struct snd_pcm_substream *substream struct snd_usb_substream *subs = runtime->private_data; unsigned int hwptr_done; - if (atomic_read(&subs->stream->chip->shutdown)) + if (atomic_read(&subs->stream->chip->shutdown) || subs->in_xrun) return SNDRV_PCM_POS_XRUN; spin_lock(&subs->lock); hwptr_done = subs->hwptr_done; @@ -671,6 +671,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream) subs->transfer_done = 0; subs->last_frame_number = 0; subs->period_elapsed_pending = 0; + subs->in_xrun = false; runtime->delay = 0; subs->lowlatency_playback = lowlatency_playback_available(runtime, subs); ``` Takashi Sakamoto
On Sun, 19 Mar 2023 04:28:53 +0100, Takashi Sakamoto wrote: > > Hi, > > On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > > On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > > > snd_usb_queue_pending_output_urbs() may be called from > > > snd_pcm_ops::ack() which means the PCM stream is locked. > > > > > > For the normal case where the call back into the PCM core is via > > > prepare_output_urb() the "_under_stream_lock" variant of > > > snd_pcm_period_elapsed() is called, but when an error occurs and the > > > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > > > the stream which results in deadlock. > > > > > > Follow the example of snd_pcm_period_elapsed() by adding > > > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > > > lock is already held. > > > > > > Signed-off-by: John Keeping <john@metanate.com> > > > --- > > > include/sound/pcm.h | 1 + > > > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > > > sound/usb/endpoint.c | 18 +++++++++++------- > > > 3 files changed, 36 insertions(+), 11 deletions(-) > > > > The name of added kernel API implies me that you refer to existent > > 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux > > v5.14. > > > > In my opinion, unlike the version of period elapsed API, the version of > > XRUN API seems not to be necessarily required to ALSA PCM core, since PCM > > device drivers can implement .pointer callback in the part of PCM operation. > > When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get > > occurence of XRUN as a result of any operation relevant to hwptr movement > > (e.g. SNDRV_PCM_IOCTL_HWSYNC). > > > > Therefore I think it possible to fix the issue without the proposed > > kernel API. I can assume some scenario: > > > > 1. Failure at tasklet for URB completion > > > > It is softIRQ context. The stream lock is not acquired. It doesn't > > matter to call current XRUN API. > > > > 2. Failure at PCM operation called by ALSA PCM application > > > > It is process context. The stream lock is acquired before calling driver > > code. When detecting any type of failure, driver code stores the state. > > Then .pointer callback should return SNDRV_PCM_POS_XRUNrefering to > > the state. > > Although being inexperienced to hack driver for USB audio device class, > I attempt to post the patch to fix the issue of recursive stream lock. > I apologies in advance since the patch is not tested yet... > > The 'in_xrun' member is newly added to 'struct snd_usb_substream'. When > detecting any failure, false is assigned to the member. The assignment > is expected to be done in both softIRQ context, and process context with > stream lock, thus no need to take care of cocurrent access (e.g. by usage > of WRITE_ONCE/READ_ONCE). > > Typical ALSA PCM application periodically calls PCM operation which calls > .pointer in driver code. As I described, returning SNDRV_PCM_POS_XRUN > takes ALSA PCM core to handle XRUN state of PCM substream in the timing. > > The negative point of the patch is the delay of XRUN notification to user > space application. In the point, I think the new kernel API introduced by > your patch has advantage. > > The in_xrun member can be replaced with a kind of EP_STATE_ > enumerations; i.e. EP_STATE_XRUN. In the case, we need some care so that > the state should be referred from pcm.c. Thanks for the patch. That would work, but the shortcoming side of this implementation is that it misses stopping / reporting the error immediately but waiting for the next pointer update. It might be simpler if we perform the xrun handling in the caller side, i.e. a change like below: --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, ret = substream->ops->ack(substream); if (ret < 0) { runtime->control->appl_ptr = old_appl_ptr; + if (ret == -EPIPE) + __snd_pcm_xrun(substream); return ret; } } ... and let the caller returning -EPIPE for XRUN: --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -455,8 +455,8 @@ static void push_back_to_ready_list(struct snd_usb_endpoint *ep, * This function is used both for implicit feedback endpoints and in low- * latency playback mode. */ -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, - bool in_stream_lock) +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, + bool in_stream_lock) { bool implicit_fb = snd_usb_endpoint_implicit_feedback_sink(ep); @@ -480,7 +480,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, spin_unlock_irqrestore(&ep->lock, flags); if (ctx == NULL) - return; + return 0; /* copy over the length information */ if (implicit_fb) { @@ -495,11 +495,11 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, break; if (err < 0) { /* push back to ready list again for -EAGAIN */ - if (err == -EAGAIN) + if (err == -EAGAIN) { push_back_to_ready_list(ep, ctx); - else - notify_xrun(ep); - return; + return 0; + } + return -EPIPE; } err = usb_submit_urb(ctx->urb, GFP_ATOMIC); @@ -507,8 +507,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); - notify_xrun(ep); - return; + return -EPIPE; } set_bit(ctx->index, &ep->active_mask); --- a/sound/usb/endpoint.h +++ b/sound/usb/endpoint.h @@ -52,7 +52,7 @@ int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep); int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, int idx, unsigned int avail); -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, - bool in_stream_lock); +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, + bool in_stream_lock); #endif /* __USBAUDIO_ENDPOINT_H */ --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1639,7 +1639,7 @@ static int snd_usb_pcm_playback_ack(struct snd_pcm_substream *substream) * outputs here */ if (!ep->active_mask) - snd_usb_queue_pending_output_urbs(ep, true); + return snd_usb_queue_pending_output_urbs(ep, true); return 0; } thanks, Takashi
On Sun, 19 Mar 2023 08:57:03 +0100, Takashi Iwai wrote: > > On Sun, 19 Mar 2023 04:28:53 +0100, > Takashi Sakamoto wrote: > > > > Hi, > > > > On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > > > On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > > > > snd_usb_queue_pending_output_urbs() may be called from > > > > snd_pcm_ops::ack() which means the PCM stream is locked. > > > > > > > > For the normal case where the call back into the PCM core is via > > > > prepare_output_urb() the "_under_stream_lock" variant of > > > > snd_pcm_period_elapsed() is called, but when an error occurs and the > > > > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > > > > the stream which results in deadlock. > > > > > > > > Follow the example of snd_pcm_period_elapsed() by adding > > > > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > > > > lock is already held. > > > > > > > > Signed-off-by: John Keeping <john@metanate.com> > > > > --- > > > > include/sound/pcm.h | 1 + > > > > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > > > > sound/usb/endpoint.c | 18 +++++++++++------- > > > > 3 files changed, 36 insertions(+), 11 deletions(-) > > > > > > The name of added kernel API implies me that you refer to existent > > > 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux > > > v5.14. > > > > > > In my opinion, unlike the version of period elapsed API, the version of > > > XRUN API seems not to be necessarily required to ALSA PCM core, since PCM > > > device drivers can implement .pointer callback in the part of PCM operation. > > > When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get > > > occurence of XRUN as a result of any operation relevant to hwptr movement > > > (e.g. SNDRV_PCM_IOCTL_HWSYNC). > > > > > > Therefore I think it possible to fix the issue without the proposed > > > kernel API. I can assume some scenario: > > > > > > 1. Failure at tasklet for URB completion > > > > > > It is softIRQ context. The stream lock is not acquired. It doesn't > > > matter to call current XRUN API. > > > > > > 2. Failure at PCM operation called by ALSA PCM application > > > > > > It is process context. The stream lock is acquired before calling driver > > > code. When detecting any type of failure, driver code stores the state. > > > Then .pointer callback should return SNDRV_PCM_POS_XRUNrefering to > > > the state. > > > > Although being inexperienced to hack driver for USB audio device class, > > I attempt to post the patch to fix the issue of recursive stream lock. > > I apologies in advance since the patch is not tested yet... > > > > The 'in_xrun' member is newly added to 'struct snd_usb_substream'. When > > detecting any failure, false is assigned to the member. The assignment > > is expected to be done in both softIRQ context, and process context with > > stream lock, thus no need to take care of cocurrent access (e.g. by usage > > of WRITE_ONCE/READ_ONCE). > > > > Typical ALSA PCM application periodically calls PCM operation which calls > > .pointer in driver code. As I described, returning SNDRV_PCM_POS_XRUN > > takes ALSA PCM core to handle XRUN state of PCM substream in the timing. > > > > The negative point of the patch is the delay of XRUN notification to user > > space application. In the point, I think the new kernel API introduced by > > your patch has advantage. > > > > The in_xrun member can be replaced with a kind of EP_STATE_ > > enumerations; i.e. EP_STATE_XRUN. In the case, we need some care so that > > the state should be referred from pcm.c. > > Thanks for the patch. That would work, but the shortcoming side of > this implementation is that it misses stopping / reporting the error > immediately but waiting for the next pointer update. > > It might be simpler if we perform the xrun handling in the caller > side, i.e. a change like below: > > --- a/sound/core/pcm_lib.c > +++ b/sound/core/pcm_lib.c > @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, > ret = substream->ops->ack(substream); > if (ret < 0) { > runtime->control->appl_ptr = old_appl_ptr; > + if (ret == -EPIPE) > + __snd_pcm_xrun(substream); > return ret; > } > } > > ... and let the caller returning -EPIPE for XRUN: and that misses the XRUN in the case of non-stream-lock. A revised version is below. Takashi -- 8< -- --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, ret = substream->ops->ack(substream); if (ret < 0) { runtime->control->appl_ptr = old_appl_ptr; + if (ret == -EPIPE) + __snd_pcm_xrun(substream); return ret; } } --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -455,8 +455,8 @@ static void push_back_to_ready_list(struct snd_usb_endpoint *ep, * This function is used both for implicit feedback endpoints and in low- * latency playback mode. */ -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, - bool in_stream_lock) +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, + bool in_stream_lock) { bool implicit_fb = snd_usb_endpoint_implicit_feedback_sink(ep); @@ -480,7 +480,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, spin_unlock_irqrestore(&ep->lock, flags); if (ctx == NULL) - return; + break; /* copy over the length information */ if (implicit_fb) { @@ -495,11 +495,14 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, break; if (err < 0) { /* push back to ready list again for -EAGAIN */ - if (err == -EAGAIN) + if (err == -EAGAIN) { push_back_to_ready_list(ep, ctx); - else + break; + } + + if (!in_stream_lock) notify_xrun(ep); - return; + return -EPIPE; } err = usb_submit_urb(ctx->urb, GFP_ATOMIC); @@ -507,13 +510,16 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); - notify_xrun(ep); - return; + if (!in_stream_lock) + notify_xrun(ep); + return -EPIPE; } set_bit(ctx->index, &ep->active_mask); atomic_inc(&ep->submitted_urbs); } + + return 0; } /* --- a/sound/usb/endpoint.h +++ b/sound/usb/endpoint.h @@ -52,7 +52,7 @@ int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep); int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep, struct snd_urb_ctx *ctx, int idx, unsigned int avail); -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, - bool in_stream_lock); +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, + bool in_stream_lock); #endif /* __USBAUDIO_ENDPOINT_H */ --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1639,7 +1639,7 @@ static int snd_usb_pcm_playback_ack(struct snd_pcm_substream *substream) * outputs here */ if (!ep->active_mask) - snd_usb_queue_pending_output_urbs(ep, true); + return snd_usb_queue_pending_output_urbs(ep, true); return 0; }
On Sun, Mar 19, 2023 at 10:15:55AM +0100, Takashi Iwai wrote: > On Sun, 19 Mar 2023 08:57:03 +0100, > Takashi Iwai wrote: > > > > On Sun, 19 Mar 2023 04:28:53 +0100, > > Takashi Sakamoto wrote: > > > > > > Hi, > > > > > > On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > > > > On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > > > > > snd_usb_queue_pending_output_urbs() may be called from > > > > > snd_pcm_ops::ack() which means the PCM stream is locked. > > > > > > > > > > For the normal case where the call back into the PCM core is via > > > > > prepare_output_urb() the "_under_stream_lock" variant of > > > > > snd_pcm_period_elapsed() is called, but when an error occurs and the > > > > > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > > > > > the stream which results in deadlock. > > > > > > > > > > Follow the example of snd_pcm_period_elapsed() by adding > > > > > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > > > > > lock is already held. > > > > > > > > > > Signed-off-by: John Keeping <john@metanate.com> > > > > > --- > > > > > include/sound/pcm.h | 1 + > > > > > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > > > > > sound/usb/endpoint.c | 18 +++++++++++------- > > > > > 3 files changed, 36 insertions(+), 11 deletions(-) > > > > > > > > The name of added kernel API implies me that you refer to existent > > > > 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux > > > > v5.14. > > > > > > > > In my opinion, unlike the version of period elapsed API, the version of > > > > XRUN API seems not to be necessarily required to ALSA PCM core, since PCM > > > > device drivers can implement .pointer callback in the part of PCM operation. > > > > When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get > > > > occurence of XRUN as a result of any operation relevant to hwptr movement > > > > (e.g. SNDRV_PCM_IOCTL_HWSYNC). > > > > > > > > Therefore I think it possible to fix the issue without the proposed > > > > kernel API. I can assume some scenario: > > > > > > > > 1. Failure at tasklet for URB completion > > > > > > > > It is softIRQ context. The stream lock is not acquired. It doesn't > > > > matter to call current XRUN API. > > > > > > > > 2. Failure at PCM operation called by ALSA PCM application > > > > > > > > It is process context. The stream lock is acquired before calling driver > > > > code. When detecting any type of failure, driver code stores the state. > > > > Then .pointer callback should return SNDRV_PCM_POS_XRUNrefering to > > > > the state. > > > > > > Although being inexperienced to hack driver for USB audio device class, > > > I attempt to post the patch to fix the issue of recursive stream lock. > > > I apologies in advance since the patch is not tested yet... > > > > > > The 'in_xrun' member is newly added to 'struct snd_usb_substream'. When > > > detecting any failure, false is assigned to the member. The assignment > > > is expected to be done in both softIRQ context, and process context with > > > stream lock, thus no need to take care of cocurrent access (e.g. by usage > > > of WRITE_ONCE/READ_ONCE). > > > > > > Typical ALSA PCM application periodically calls PCM operation which calls > > > .pointer in driver code. As I described, returning SNDRV_PCM_POS_XRUN > > > takes ALSA PCM core to handle XRUN state of PCM substream in the timing. > > > > > > The negative point of the patch is the delay of XRUN notification to user > > > space application. In the point, I think the new kernel API introduced by > > > your patch has advantage. > > > > > > The in_xrun member can be replaced with a kind of EP_STATE_ > > > enumerations; i.e. EP_STATE_XRUN. In the case, we need some care so that > > > the state should be referred from pcm.c. > > > > Thanks for the patch. That would work, but the shortcoming side of > > this implementation is that it misses stopping / reporting the error > > immediately but waiting for the next pointer update. > > > > It might be simpler if we perform the xrun handling in the caller > > side, i.e. a change like below: > > > > --- a/sound/core/pcm_lib.c > > +++ b/sound/core/pcm_lib.c > > @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, > > ret = substream->ops->ack(substream); > > if (ret < 0) { > > runtime->control->appl_ptr = old_appl_ptr; > > + if (ret == -EPIPE) > > + __snd_pcm_xrun(substream); > > return ret; > > } > > } > > > > ... and let the caller returning -EPIPE for XRUN: > > and that misses the XRUN in the case of non-stream-lock. > A revised version is below. Yes, it looks like this also solves the problem. If you roll this into a proper patch feel free to add: Tested-by: John Keeping <john@metanate.com> > > -- 8< -- > --- a/sound/core/pcm_lib.c > +++ b/sound/core/pcm_lib.c > @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, > ret = substream->ops->ack(substream); > if (ret < 0) { > runtime->control->appl_ptr = old_appl_ptr; > + if (ret == -EPIPE) > + __snd_pcm_xrun(substream); > return ret; > } > } > --- a/sound/usb/endpoint.c > +++ b/sound/usb/endpoint.c > @@ -455,8 +455,8 @@ static void push_back_to_ready_list(struct snd_usb_endpoint *ep, > * This function is used both for implicit feedback endpoints and in low- > * latency playback mode. > */ > -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > - bool in_stream_lock) > +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > + bool in_stream_lock) > { > bool implicit_fb = snd_usb_endpoint_implicit_feedback_sink(ep); > > @@ -480,7 +480,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > spin_unlock_irqrestore(&ep->lock, flags); > > if (ctx == NULL) > - return; > + break; > > /* copy over the length information */ > if (implicit_fb) { > @@ -495,11 +495,14 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > break; > if (err < 0) { > /* push back to ready list again for -EAGAIN */ > - if (err == -EAGAIN) > + if (err == -EAGAIN) { > push_back_to_ready_list(ep, ctx); > - else > + break; > + } > + > + if (!in_stream_lock) > notify_xrun(ep); > - return; > + return -EPIPE; > } > > err = usb_submit_urb(ctx->urb, GFP_ATOMIC); > @@ -507,13 +510,16 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > usb_audio_err(ep->chip, > "Unable to submit urb #%d: %d at %s\n", > ctx->index, err, __func__); > - notify_xrun(ep); > - return; > + if (!in_stream_lock) > + notify_xrun(ep); > + return -EPIPE; > } > > set_bit(ctx->index, &ep->active_mask); > atomic_inc(&ep->submitted_urbs); > } > + > + return 0; > } > > /* > --- a/sound/usb/endpoint.h > +++ b/sound/usb/endpoint.h > @@ -52,7 +52,7 @@ int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep); > int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep, > struct snd_urb_ctx *ctx, int idx, > unsigned int avail); > -void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > - bool in_stream_lock); > +int snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, > + bool in_stream_lock); > > #endif /* __USBAUDIO_ENDPOINT_H */ > --- a/sound/usb/pcm.c > +++ b/sound/usb/pcm.c > @@ -1639,7 +1639,7 @@ static int snd_usb_pcm_playback_ack(struct snd_pcm_substream *substream) > * outputs here > */ > if (!ep->active_mask) > - snd_usb_queue_pending_output_urbs(ep, true); > + return snd_usb_queue_pending_output_urbs(ep, true); > return 0; > } >
On Mon, 20 Mar 2023 13:04:22 +0100, John Keeping wrote: > > On Sun, Mar 19, 2023 at 10:15:55AM +0100, Takashi Iwai wrote: > > On Sun, 19 Mar 2023 08:57:03 +0100, > > Takashi Iwai wrote: > > > > > > On Sun, 19 Mar 2023 04:28:53 +0100, > > > Takashi Sakamoto wrote: > > > > > > > > Hi, > > > > > > > > On Sat, Mar 18, 2023 at 09:20:05AM +0900, Takashi Sakamoto wrote: > > > > > On Fri, Mar 17, 2023 at 07:51:27PM +0000, John Keeping wrote: > > > > > > snd_usb_queue_pending_output_urbs() may be called from > > > > > > snd_pcm_ops::ack() which means the PCM stream is locked. > > > > > > > > > > > > For the normal case where the call back into the PCM core is via > > > > > > prepare_output_urb() the "_under_stream_lock" variant of > > > > > > snd_pcm_period_elapsed() is called, but when an error occurs and the > > > > > > stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock > > > > > > the stream which results in deadlock. > > > > > > > > > > > > Follow the example of snd_pcm_period_elapsed() by adding > > > > > > snd_pcm_xrun_under_stream_lock() and use this when the PCM substream > > > > > > lock is already held. > > > > > > > > > > > > Signed-off-by: John Keeping <john@metanate.com> > > > > > > --- > > > > > > include/sound/pcm.h | 1 + > > > > > > sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- > > > > > > sound/usb/endpoint.c | 18 +++++++++++------- > > > > > > 3 files changed, 36 insertions(+), 11 deletions(-) > > > > > > > > > > The name of added kernel API implies me that you refer to existent > > > > > 'snd_pcm_period_elapsed_under_stream_lock()' which I added to Linux > > > > > v5.14. > > > > > > > > > > In my opinion, unlike the version of period elapsed API, the version of > > > > > XRUN API seems not to be necessarily required to ALSA PCM core, since PCM > > > > > device drivers can implement .pointer callback in the part of PCM operation. > > > > > When the callback returns SNDRV_PCM_POS_XRUN, ALSA PCM application get > > > > > occurence of XRUN as a result of any operation relevant to hwptr movement > > > > > (e.g. SNDRV_PCM_IOCTL_HWSYNC). > > > > > > > > > > Therefore I think it possible to fix the issue without the proposed > > > > > kernel API. I can assume some scenario: > > > > > > > > > > 1. Failure at tasklet for URB completion > > > > > > > > > > It is softIRQ context. The stream lock is not acquired. It doesn't > > > > > matter to call current XRUN API. > > > > > > > > > > 2. Failure at PCM operation called by ALSA PCM application > > > > > > > > > > It is process context. The stream lock is acquired before calling driver > > > > > code. When detecting any type of failure, driver code stores the state. > > > > > Then .pointer callback should return SNDRV_PCM_POS_XRUNrefering to > > > > > the state. > > > > > > > > Although being inexperienced to hack driver for USB audio device class, > > > > I attempt to post the patch to fix the issue of recursive stream lock. > > > > I apologies in advance since the patch is not tested yet... > > > > > > > > The 'in_xrun' member is newly added to 'struct snd_usb_substream'. When > > > > detecting any failure, false is assigned to the member. The assignment > > > > is expected to be done in both softIRQ context, and process context with > > > > stream lock, thus no need to take care of cocurrent access (e.g. by usage > > > > of WRITE_ONCE/READ_ONCE). > > > > > > > > Typical ALSA PCM application periodically calls PCM operation which calls > > > > .pointer in driver code. As I described, returning SNDRV_PCM_POS_XRUN > > > > takes ALSA PCM core to handle XRUN state of PCM substream in the timing. > > > > > > > > The negative point of the patch is the delay of XRUN notification to user > > > > space application. In the point, I think the new kernel API introduced by > > > > your patch has advantage. > > > > > > > > The in_xrun member can be replaced with a kind of EP_STATE_ > > > > enumerations; i.e. EP_STATE_XRUN. In the case, we need some care so that > > > > the state should be referred from pcm.c. > > > > > > Thanks for the patch. That would work, but the shortcoming side of > > > this implementation is that it misses stopping / reporting the error > > > immediately but waiting for the next pointer update. > > > > > > It might be simpler if we perform the xrun handling in the caller > > > side, i.e. a change like below: > > > > > > --- a/sound/core/pcm_lib.c > > > +++ b/sound/core/pcm_lib.c > > > @@ -2155,6 +2155,8 @@ int pcm_lib_apply_appl_ptr(struct snd_pcm_substream *substream, > > > ret = substream->ops->ack(substream); > > > if (ret < 0) { > > > runtime->control->appl_ptr = old_appl_ptr; > > > + if (ret == -EPIPE) > > > + __snd_pcm_xrun(substream); > > > return ret; > > > } > > > } > > > > > > ... and let the caller returning -EPIPE for XRUN: > > > > and that misses the XRUN in the case of non-stream-lock. > > A revised version is below. > > Yes, it looks like this also solves the problem. If you roll this into > a proper patch feel free to add: > > Tested-by: John Keeping <john@metanate.com> Thanks, then I'll submit a proper patch. Takashi
diff --git a/include/sound/pcm.h b/include/sound/pcm.h index 27040b472a4f..98551907453a 100644 --- a/include/sound/pcm.h +++ b/include/sound/pcm.h @@ -571,6 +571,7 @@ int snd_pcm_status64(struct snd_pcm_substream *substream, int snd_pcm_start(struct snd_pcm_substream *substream); int snd_pcm_stop(struct snd_pcm_substream *substream, snd_pcm_state_t status); int snd_pcm_drain_done(struct snd_pcm_substream *substream); +int snd_pcm_stop_xrun_under_stream_lock(struct snd_pcm_substream *substream); int snd_pcm_stop_xrun(struct snd_pcm_substream *substream); #ifdef CONFIG_PM int snd_pcm_suspend_all(struct snd_pcm *pcm); diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c index 331380c2438b..617f5dc74df0 100644 --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -1559,24 +1559,44 @@ int snd_pcm_drain_done(struct snd_pcm_substream *substream) SNDRV_PCM_STATE_SETUP); } +/** + * snd_pcm_stop_xrun_under_stream_lock - stop the running stream as XRUN under the lock of + * the PCM substream. + * @substream: the PCM substream instance + * + * This stops the given running substream (and all linked substreams) as XRUN. + * This function assumes that the substream lock is already held. + * + * Return: Zero if successful, or a negative error core. + */ +int snd_pcm_stop_xrun_under_stream_lock(struct snd_pcm_substream *substream) +{ + if (substream->runtime && snd_pcm_running(substream)) + __snd_pcm_xrun(substream); + + return 0; +} + /** * snd_pcm_stop_xrun - stop the running streams as XRUN * @substream: the PCM substream instance * + * This function is similar to ``snd_pcm_stop_xrun_under_stream_lock()`` except that it + * acquires the substream lock itself. + * * This stops the given running substream (and all linked substreams) as XRUN. - * Unlike snd_pcm_stop(), this function takes the substream lock by itself. * * Return: Zero if successful, or a negative error code. */ int snd_pcm_stop_xrun(struct snd_pcm_substream *substream) { unsigned long flags; + int ret; snd_pcm_stream_lock_irqsave(substream, flags); - if (substream->runtime && snd_pcm_running(substream)) - __snd_pcm_xrun(substream); + ret = snd_pcm_stop_xrun_under_stream_lock(substream); snd_pcm_stream_unlock_irqrestore(substream, flags); - return 0; + return ret; } EXPORT_SYMBOL_GPL(snd_pcm_stop_xrun); diff --git a/sound/usb/endpoint.c b/sound/usb/endpoint.c index 1e0af1179ca8..83a6b6d41374 100644 --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -400,13 +400,17 @@ static int prepare_inbound_urb(struct snd_usb_endpoint *ep, } /* notify an error as XRUN to the assigned PCM data substream */ -static void notify_xrun(struct snd_usb_endpoint *ep) +static void notify_xrun(struct snd_usb_endpoint *ep, bool in_stream_lock) { struct snd_usb_substream *data_subs; data_subs = READ_ONCE(ep->data_subs); - if (data_subs && data_subs->pcm_substream) - snd_pcm_stop_xrun(data_subs->pcm_substream); + if (data_subs && data_subs->pcm_substream) { + if (in_stream_lock) + snd_pcm_stop_xrun_under_stream_lock(data_subs->pcm_substream); + else + snd_pcm_stop_xrun(data_subs->pcm_substream); + } } static struct snd_usb_packet_info * @@ -498,7 +502,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, if (err == -EAGAIN) push_back_to_ready_list(ep, ctx); else - notify_xrun(ep); + notify_xrun(ep, in_stream_lock); return; } @@ -507,7 +511,7 @@ void snd_usb_queue_pending_output_urbs(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "Unable to submit urb #%d: %d at %s\n", ctx->index, err, __func__); - notify_xrun(ep); + notify_xrun(ep, in_stream_lock); return; } @@ -574,7 +578,7 @@ static void snd_complete_urb(struct urb *urb) return; usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n", err); - notify_xrun(ep); + notify_xrun(ep, false); exit_clear: clear_bit(ctx->index, &ep->active_mask); @@ -1762,7 +1766,7 @@ static void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, usb_audio_err(ep->chip, "next package FIFO overflow EP 0x%x\n", ep->ep_num); - notify_xrun(ep); + notify_xrun(ep, false); return; }
snd_usb_queue_pending_output_urbs() may be called from snd_pcm_ops::ack() which means the PCM stream is locked. For the normal case where the call back into the PCM core is via prepare_output_urb() the "_under_stream_lock" variant of snd_pcm_period_elapsed() is called, but when an error occurs and the stream is stopped as XRUN then snd_pcm_xrun() tries to recursively lock the stream which results in deadlock. Follow the example of snd_pcm_period_elapsed() by adding snd_pcm_xrun_under_stream_lock() and use this when the PCM substream lock is already held. Signed-off-by: John Keeping <john@metanate.com> --- include/sound/pcm.h | 1 + sound/core/pcm_native.c | 28 ++++++++++++++++++++++++---- sound/usb/endpoint.c | 18 +++++++++++------- 3 files changed, 36 insertions(+), 11 deletions(-)