Message ID | 20190922032853.6123-1-thematrixeatsyou@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Make pcm_ioplug check lock status before locking (fixes pcm_jack lockups) | expand |
On Sun, 22 Sep 2019 05:28:50 +0200, Ben Russell wrote: > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future. > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient. > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up. > > This should be enough to reproduce the bug: > > pcm.rawjack { > type jack > playback_ports { > 0 system:playback_1 > 1 system:playback_2 > } > capture_ports { > 0 system:capture_1 > 1 system:capture_2 > } > } > > pcm.!default { > type plug > slave.pcm "rawjack" > } > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point. > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex. > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place. > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know. Thanks for the analysis and patches. It's indeed a serious problem we need to address. The current semantics of locking in alsa-lib code assumes that the lock/unlock never fails. So the "right-and-quick" fix would be just to take the patch 3 to assert() upon pthread_mutex_lock/unlock errors. Then we need to paper over the actual invalid locks. I do wonder, though, exactly which code path triggers the pthread mutex error? You should be able to catch it via gdb after the patch. thanks, Takashi
On Sun, 22 Sep 2019 22:25:48 +0200 Takashi Iwai <tiwai@suse.de> wrote: > On Sun, 22 Sep 2019 05:28:50 +0200, > Ben Russell wrote: > > > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future. > > > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient. > > > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up. > > > > This should be enough to reproduce the bug: > > > > pcm.rawjack { > > type jack > > playback_ports { > > 0 system:playback_1 > > 1 system:playback_2 > > } > > capture_ports { > > 0 system:capture_1 > > 1 system:capture_2 > > } > > } > > > > pcm.!default { > > type plug > > slave.pcm "rawjack" > > } > > > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point. > > > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex. > > > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place. > > > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know. > > Thanks for the analysis and patches. It's indeed a serious problem we > need to address. > > The current semantics of locking in alsa-lib code assumes that the > lock/unlock never fails. So the "right-and-quick" fix would be just > to take the patch 3 to assert() upon pthread_mutex_lock/unlock > errors. Then we need to paper over the actual invalid locks. > > I do wonder, though, exactly which code path triggers the pthread > mutex error? You should be able to catch it via gdb after the patch. > > > thanks, > > Takashi Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert: $ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3 The relevant backtrace: #4 0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 #5 0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 if (pcm->lock_enabled && pcm->need_lock) { unlock_err = pthread_mutex_unlock(&pcm->lock); assert(unlock_err == 0); <-- this line } #6 snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167 snd_pcm_ioplug_reset(pcm); if (io->data->callback->prepare) { snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line err = io->data->callback->prepare(io->data); snd_pcm_lock(pcm); } #7 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214 snd_pcm_lock(pcm); if (pcm->fast_ops->prepare) err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line else err = -ENOSYS; #8 0x00005555555a2efa in ?? () Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion): #0 0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166 #1 snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167 #2 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214 #3 0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159 snd_pcm_plugin_t *plugin = pcm->private_data; int err; err = snd_pcm_prepare(plugin->gen.slave); <-- this line if (err < 0) return err; #4 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214 #5 0x00005555555a2efa in ?? () Regards, Ben R
On Tue, 24 Sep 2019 09:45:25 +0200, Ben Russell wrote: > > On Sun, 22 Sep 2019 22:25:48 +0200 > Takashi Iwai <tiwai@suse.de> wrote: > > > On Sun, 22 Sep 2019 05:28:50 +0200, > > Ben Russell wrote: > > > > > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future. > > > > > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient. > > > > > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up. > > > > > > This should be enough to reproduce the bug: > > > > > > pcm.rawjack { > > > type jack > > > playback_ports { > > > 0 system:playback_1 > > > 1 system:playback_2 > > > } > > > capture_ports { > > > 0 system:capture_1 > > > 1 system:capture_2 > > > } > > > } > > > > > > pcm.!default { > > > type plug > > > slave.pcm "rawjack" > > > } > > > > > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point. > > > > > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex. > > > > > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place. > > > > > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know. > > > > Thanks for the analysis and patches. It's indeed a serious problem we > > need to address. > > > > The current semantics of locking in alsa-lib code assumes that the > > lock/unlock never fails. So the "right-and-quick" fix would be just > > to take the patch 3 to assert() upon pthread_mutex_lock/unlock > > errors. Then we need to paper over the actual invalid locks. > > > > I do wonder, though, exactly which code path triggers the pthread > > mutex error? You should be able to catch it via gdb after the patch. > > > > > > thanks, > > > > Takashi > > Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert: > > $ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3 > > The relevant backtrace: > > #4 0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 > #5 0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 > if (pcm->lock_enabled && pcm->need_lock) { > unlock_err = pthread_mutex_unlock(&pcm->lock); > assert(unlock_err == 0); <-- this line > } > #6 snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167 > snd_pcm_ioplug_reset(pcm); > if (io->data->callback->prepare) { > snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line > err = io->data->callback->prepare(io->data); > snd_pcm_lock(pcm); > } > #7 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214 > snd_pcm_lock(pcm); > if (pcm->fast_ops->prepare) > err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line > else > err = -ENOSYS; > #8 0x00005555555a2efa in ?? () > > Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion): > > #0 0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166 > #1 snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167 > #2 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214 > #3 0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159 > snd_pcm_plugin_t *plugin = pcm->private_data; > int err; > err = snd_pcm_prepare(plugin->gen.slave); <-- this line > if (err < 0) > return err; > #4 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214 > #5 0x00005555555a2efa in ?? () Thanks, after reading the traces carefully, finally I figured out what went wrong. In some cases, pcm->fast_op_arg isn't pcm itself but has a different value. Then we call snd_pcm_lock() with pcm, but call fast_op with fast_op_arg as the PCM object. This leads to the unexpected behavior for the plugin that tries to unlock / relock from the given PCM object. So, we'd need to convert snd_pcm_lock()/unlock() with fast_op_arg or op_arg in most places. A test fix patch is below. Could you test it quickly? I'm going to traveling from tomorrow, so I'd like to have confirmation before merging into git repo. thanks, Takashi -- 8< -- From: Takashi Iwai <tiwai@suse.de> Subject: [PATCH] pcm: Fix the wrong PCM object passed for locking/unlocking Most of PCM API functions have snd_pcm_lock()/unlock() wraps for the actual calls of ops, and some plugins try to unlock/relock internally for the given PCM object. This, unfortunately, causes a problem in some configurations and leads to the unexpected behavior or deadlock. The main problem is that we call snd_pcm_lock() with the given PCM object, while calling the ops with pcm->op_arg or pcm->fast_op_arg as the slave PCM object. Meanwhile the plugin code assumes that the passed PCM object is already locked, and calls snd_pcm_unlock(). This bug doesn't hit always because in most cases pcm->op_arg and fast_op_arg are identical with pcm itself. But in some configurations they have different values, so the problem surfaces. This patch is an attempt to resolve these inconsistencies. It replaces most of snd_pcm_lock()/unlock() calls with either pcm->op_arg or pcm->fast_op_arg, depending on the call pattern, so that the plugin code can safely run snd_pcm_unlock() to the given PCM object. Reported-by: Ben Russell <thematrixeatsyou@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> --- src/pcm/pcm.c | 114 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 57 insertions(+), 57 deletions(-) diff --git a/src/pcm/pcm.c b/src/pcm/pcm.c index 178d43875f00..e7519f04686b 100644 --- a/src/pcm/pcm.c +++ b/src/pcm/pcm.c @@ -789,7 +789,7 @@ int snd_pcm_nonblock(snd_pcm_t *pcm, int nonblock) /* FIXME: __snd_pcm_lock() call below is commented out because of the * the possible deadlock in signal handler calling snd_pcm_abort() */ - /* __snd_pcm_lock(pcm); */ /* forced lock due to pcm field change */ + /* __snd_pcm_lock(pcm->op_arg); */ /* forced lock due to pcm field change */ if (pcm->ops->nonblock) err = pcm->ops->nonblock(pcm->op_arg, nonblock); else @@ -809,7 +809,7 @@ int snd_pcm_nonblock(snd_pcm_t *pcm, int nonblock) pcm->mode &= ~SND_PCM_NONBLOCK; } unlock: - /* __snd_pcm_unlock(pcm); */ /* FIXME: see above */ + /* __snd_pcm_unlock(pcm->op_arg); */ /* FIXME: see above */ return err; } @@ -989,13 +989,13 @@ int snd_pcm_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t *params) return -EINVAL; } #endif - __snd_pcm_lock(pcm); /* forced lock due to pcm field change */ + __snd_pcm_lock(pcm->op_arg); /* forced lock due to pcm field change */ if (pcm->ops->sw_params) err = pcm->ops->sw_params(pcm->op_arg, params); else err = -ENOSYS; if (err < 0) { - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->op_arg); return err; } pcm->tstamp_mode = params->tstamp_mode; @@ -1008,7 +1008,7 @@ int snd_pcm_sw_params(snd_pcm_t *pcm, snd_pcm_sw_params_t *params) pcm->silence_threshold = params->silence_threshold; pcm->silence_size = params->silence_size; pcm->boundary = params->boundary; - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->op_arg); return 0; } @@ -1025,12 +1025,12 @@ int snd_pcm_status(snd_pcm_t *pcm, snd_pcm_status_t *status) int err; assert(pcm && status); - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->status) err = pcm->fast_ops->status(pcm->fast_op_arg, status); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1050,9 +1050,9 @@ snd_pcm_state_t snd_pcm_state(snd_pcm_t *pcm) snd_pcm_state_t state; assert(pcm); - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); state = __snd_pcm_state(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return state; } @@ -1078,9 +1078,9 @@ int snd_pcm_hwsync(snd_pcm_t *pcm) SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_hwsync(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1123,9 +1123,9 @@ int snd_pcm_delay(snd_pcm_t *pcm, snd_pcm_sframes_t *delayp) SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_delay(pcm, delayp); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1181,12 +1181,12 @@ int snd_pcm_htimestamp(snd_pcm_t *pcm, snd_pcm_uframes_t *avail, snd_htimestamp_ SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->htimestamp) err = pcm->fast_ops->htimestamp(pcm->fast_op_arg, avail, tstamp); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1209,12 +1209,12 @@ int snd_pcm_prepare(snd_pcm_t *pcm) err = bad_pcm_state(pcm, ~P_STATE(DISCONNECTED)); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->prepare) err = pcm->fast_ops->prepare(pcm->fast_op_arg); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1236,12 +1236,12 @@ int snd_pcm_reset(snd_pcm_t *pcm) SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->reset) err = pcm->fast_ops->reset(pcm->fast_op_arg); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1264,9 +1264,9 @@ int snd_pcm_start(snd_pcm_t *pcm) err = bad_pcm_state(pcm, P_STATE(PREPARED)); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_start(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1296,12 +1296,12 @@ int snd_pcm_drop(snd_pcm_t *pcm) P_STATE(SUSPENDED)); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->drop) err = pcm->fast_ops->drop(pcm->fast_op_arg); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1364,12 +1364,12 @@ int snd_pcm_pause(snd_pcm_t *pcm, int enable) err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->pause) err = pcm->fast_ops->pause(pcm->fast_op_arg, enable); else err = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1397,12 +1397,12 @@ snd_pcm_sframes_t snd_pcm_rewindable(snd_pcm_t *pcm) err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->rewindable) result = pcm->fast_ops->rewindable(pcm->fast_op_arg); else result = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -1430,12 +1430,12 @@ snd_pcm_sframes_t snd_pcm_rewind(snd_pcm_t *pcm, snd_pcm_uframes_t frames) err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->rewind) result = pcm->fast_ops->rewind(pcm->fast_op_arg, frames); else result = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -1463,12 +1463,12 @@ snd_pcm_sframes_t snd_pcm_forwardable(snd_pcm_t *pcm) err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->forwardable) result = pcm->fast_ops->forwardable(pcm->fast_op_arg); else result = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -1500,12 +1500,12 @@ snd_pcm_sframes_t snd_pcm_forward(snd_pcm_t *pcm, snd_pcm_uframes_t frames) err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); if (pcm->fast_ops->forward) result = pcm->fast_ops->forward(pcm->fast_op_arg, frames); else result = -ENOSYS; - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } use_default_symbol_version(__snd_pcm_forward, snd_pcm_forward, ALSA_0.9.0rc8); @@ -1724,9 +1724,9 @@ int snd_pcm_poll_descriptors_count(snd_pcm_t *pcm) int count; assert(pcm); - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); count = __snd_pcm_poll_descriptors_count(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return count; } @@ -1782,9 +1782,9 @@ int snd_pcm_poll_descriptors(snd_pcm_t *pcm, struct pollfd *pfds, unsigned int s int err; assert(pcm && pfds); - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_poll_descriptors(pcm, pfds, space); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -1817,9 +1817,9 @@ int snd_pcm_poll_descriptors_revents(snd_pcm_t *pcm, struct pollfd *pfds, unsign int err; assert(pcm && pfds && revents); - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_poll_revents(pcm, pfds, nfds, revents); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -2816,9 +2816,9 @@ int snd_pcm_wait(snd_pcm_t *pcm, int timeout) { int err; - __snd_pcm_lock(pcm); /* forced lock */ + __snd_pcm_lock(pcm->fast_op_arg); /* forced lock */ err = __snd_pcm_wait_in_lock(pcm, timeout); - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -2865,9 +2865,9 @@ int snd_pcm_wait_nocheck(snd_pcm_t *pcm, int timeout) return -EIO; } do { - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->fast_op_arg); err_poll = poll(pfd, npfds, timeout); - __snd_pcm_lock(pcm); + __snd_pcm_lock(pcm->fast_op_arg); if (err_poll < 0) { if (errno == EINTR && !PCMINABORT(pcm)) continue; @@ -2926,9 +2926,9 @@ snd_pcm_sframes_t snd_pcm_avail_update(snd_pcm_t *pcm) { snd_pcm_sframes_t result; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); result = __snd_pcm_avail_update(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -2956,13 +2956,13 @@ snd_pcm_sframes_t snd_pcm_avail(snd_pcm_t *pcm) SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_hwsync(pcm); if (err < 0) result = err; else result = __snd_pcm_avail_update(pcm); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -2989,7 +2989,7 @@ int snd_pcm_avail_delay(snd_pcm_t *pcm, SNDMSG("PCM not set up"); return -EIO; } - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_hwsync(pcm); if (err < 0) goto unlock; @@ -3004,7 +3004,7 @@ int snd_pcm_avail_delay(snd_pcm_t *pcm, *availp = sf; err = 0; unlock: - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -7192,9 +7192,9 @@ int snd_pcm_mmap_begin(snd_pcm_t *pcm, err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); err = __snd_pcm_mmap_begin(pcm, areas, offset, frames); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return err; } @@ -7297,9 +7297,9 @@ snd_pcm_sframes_t snd_pcm_mmap_commit(snd_pcm_t *pcm, err = bad_pcm_state(pcm, P_STATE_RUNNABLE); if (err < 0) return err; - snd_pcm_lock(pcm); + snd_pcm_lock(pcm->fast_op_arg); result = __snd_pcm_mmap_commit(pcm, offset, frames); - snd_pcm_unlock(pcm); + snd_pcm_unlock(pcm->fast_op_arg); return result; } @@ -7375,7 +7375,7 @@ snd_pcm_sframes_t snd_pcm_read_areas(snd_pcm_t *pcm, const snd_pcm_channel_area_ if (size == 0) return 0; - __snd_pcm_lock(pcm); /* forced lock */ + __snd_pcm_lock(pcm->fast_op_arg); /* forced lock */ while (size > 0) { snd_pcm_uframes_t frames; snd_pcm_sframes_t avail; @@ -7434,7 +7434,7 @@ snd_pcm_sframes_t snd_pcm_read_areas(snd_pcm_t *pcm, const snd_pcm_channel_area_ xfer += frames; } _end: - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->fast_op_arg); return xfer > 0 ? (snd_pcm_sframes_t) xfer : snd_pcm_check_error(pcm, err); } @@ -7449,7 +7449,7 @@ snd_pcm_sframes_t snd_pcm_write_areas(snd_pcm_t *pcm, const snd_pcm_channel_area if (size == 0) return 0; - __snd_pcm_lock(pcm); /* forced lock */ + __snd_pcm_lock(pcm->fast_op_arg); /* forced lock */ while (size > 0) { snd_pcm_uframes_t frames; snd_pcm_sframes_t avail; @@ -7523,7 +7523,7 @@ snd_pcm_sframes_t snd_pcm_write_areas(snd_pcm_t *pcm, const snd_pcm_channel_area xfer += frames; } _end: - __snd_pcm_unlock(pcm); + __snd_pcm_unlock(pcm->fast_op_arg); return xfer > 0 ? (snd_pcm_sframes_t) xfer : snd_pcm_check_error(pcm, err); }
On Tue, 24 Sep 2019 12:18:24 +0200 Takashi Iwai <tiwai@suse.de> wrote: > On Tue, 24 Sep 2019 09:45:25 +0200, > Ben Russell wrote: > > > > On Sun, 22 Sep 2019 22:25:48 +0200 > > Takashi Iwai <tiwai@suse.de> wrote: > > > > > On Sun, 22 Sep 2019 05:28:50 +0200, > > > Ben Russell wrote: > > > > > > > > This is my first time contributing a patch to a mailing list. If I've made a mess, please let me know so I can learn how to avoid it in future. > > > > > > > > The purpose of this patchset is to fix a specific common lockup in the pcm_jack plugin (from alsa-plugins). I'm not sure if this is the right approach to take, but at the very least it should make the pcm_ioplug code a bit more resilient. > > > > > > > > The problem is this: When using the pcm_jack plugin, if a program attempts to play audio using the SND_PCM_FORMAT_FLOAT format, said program locks up. > > > > > > > > This should be enough to reproduce the bug: > > > > > > > > pcm.rawjack { > > > > type jack > > > > playback_ports { > > > > 0 system:playback_1 > > > > 1 system:playback_2 > > > > } > > > > capture_ports { > > > > 0 system:capture_1 > > > > 1 system:capture_2 > > > > } > > > > } > > > > > > > > pcm.!default { > > > > type plug > > > > slave.pcm "rawjack" > > > > } > > > > > > > > What's happening is that several snd_pcm_ioplug_* functions assume that the pcm mutex is locked already. It then proceeds to unlock the mutex, call a function, and then relock the mutex. When the mutex isn't locked already, the initial unlock results in a silently ignored pthread error, and the lock results in the program eventually deadlocking as it doesn't expect the mutex to be held at that point. > > > > > > > > Patch 2 modifies pcm_ioplug to check if the mutex is held before doing the unlock-act-lock sequence, and if the mutex is not held then it skips the unlock and lock stages. This depends on Patch 1, which adds a snd_pcm_is_locked function to give the state of the mutex. > > > > > > > > Patch 3 is completely optional. It adds assertions which make sure that all uses of snd_pcm_lock/snd_pcm_unlock are correct. On one hand this will likely result in crashes in some of the less refined parts of the code. On the other hand, when that happens, you'll know which parts need a bit more love. I know it was useful for finding this issue in the first place. > > > > > > > > These patches fix the problems I am having, but if you have a more suitable approach to fixing this problem then please let me know. > > > > > > Thanks for the analysis and patches. It's indeed a serious problem we > > > need to address. > > > > > > The current semantics of locking in alsa-lib code assumes that the > > > lock/unlock never fails. So the "right-and-quick" fix would be just > > > to take the patch 3 to assert() upon pthread_mutex_lock/unlock > > > errors. Then we need to paper over the actual invalid locks. > > > > > > I do wonder, though, exactly which code path triggers the pthread > > > mutex error? You should be able to catch it via gdb after the patch. > > > > > > > > > thanks, > > > > > > Takashi > > > > Sure thing. With patch #3 against commit 1d7a131f, this trips up on an assert: > > > > $ gdb --args mpv --ao=alsa --audio-format=float RESPECOGNIZE.mp3 > > > > The relevant backtrace: > > > > #4 0x00007ffff4b06cf3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 > > #5 0x00007ffff4b075b3 in snd_pcm_unlock (pcm=0x5555558625c0) at pcm_local.h:1187 > > if (pcm->lock_enabled && pcm->need_lock) { > > unlock_err = pthread_mutex_unlock(&pcm->lock); > > assert(unlock_err == 0); <-- this line > > } > > #6 snd_pcm_ioplug_prepare (pcm=0x5555558625c0) at pcm_ioplug.c:167 > > snd_pcm_ioplug_reset(pcm); > > if (io->data->callback->prepare) { > > snd_pcm_unlock(pcm); /* to avoid deadlock */ <-- this line > > err = io->data->callback->prepare(io->data); > > snd_pcm_lock(pcm); > > } > > #7 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x5555558627c0) at pcm.c:1214 > > snd_pcm_lock(pcm); > > if (pcm->fast_ops->prepare) > > err = pcm->fast_ops->prepare(pcm->fast_op_arg); <-- this line > > else > > err = -ENOSYS; > > #8 0x00005555555a2efa in ?? () > > > > Using --audio-format=s16 does not deadlock, as it goes via pcm_plugin before it can go via pcm_ioplug (it's most likely for sample format conversion): > > > > #0 0x00007ffff4b07522 in snd_pcm_unlock (pcm=<optimized out>) at pcm_ioplug.c:166 > > #1 snd_pcm_ioplug_prepare (pcm=0x555555862640) at pcm_ioplug.c:167 > > #2 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862640) at pcm.c:1214 > > #3 0x00007ffff4ad2ed7 in snd_pcm_plugin_prepare (pcm=0x555555862e50) at pcm_plugin.c:159 > > snd_pcm_plugin_t *plugin = pcm->private_data; > > int err; > > err = snd_pcm_prepare(plugin->gen.slave); <-- this line > > if (err < 0) > > return err; > > #4 0x00007ffff4abcba0 in snd_pcm_prepare (pcm=0x555555862840) at pcm.c:1214 > > #5 0x00005555555a2efa in ?? () > > Thanks, after reading the traces carefully, finally I figured out what > went wrong. In some cases, pcm->fast_op_arg isn't pcm itself but has > a different value. Then we call snd_pcm_lock() with pcm, but call > fast_op with fast_op_arg as the PCM object. This leads to the > unexpected behavior for the plugin that tries to unlock / relock from > the given PCM object. > > So, we'd need to convert snd_pcm_lock()/unlock() with fast_op_arg or > op_arg in most places. > > A test fix patch is below. Could you test it quickly? > I'm going to traveling from tomorrow, so I'd like to have confirmation > before merging into git repo. > > > thanks, > > Takashi > > -- 8< -- The patch works fine. mpv works for float and s16, Wine works, Mumble works for capture and playback. Thanks for tracking this down so quickly. Regards, Ben R