Message ID | 1433346797-1908-1-git-send-email-oder_chiou@realtek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 03, 2015 at 11:53:17PM +0800, Oder Chiou wrote: > The patch is for to avoid recursive locking in the DEADLOCK detector. > In the driver, it encountered the warnning message of the recursive locking > in the function "regmap_lock_mutex". Could you be more specific about the deadlock you're seeing? This isn't really enough to understand either what the problem is or why this fixes it. Converting all the per-regmap mutexes into a single global mutex isn't an immediately obvious step.
Hi Mark, On Thu, Jun 4, 2015 at 12:44 AM, Mark Brown <broonie@kernel.org> wrote: > > On Wed, Jun 03, 2015 at 11:53:17PM +0800, Oder Chiou wrote: > > > The patch is for to avoid recursive locking in the DEADLOCK detector. > > In the driver, it encountered the warnning message of the recursive locking > > in the function "regmap_lock_mutex". > > Could you be more specific about the deadlock you're seeing? This isn't > really enough to understand either what the problem is or why this fixes > it. Converting all the per-regmap mutexes into a single global mutex > isn't an immediately obvious step. We originally reported the issue to Realtek: [ 2.569449] ============================================= [ 2.569451] [ INFO: possible recursive locking detected ] [ 2.569454] 3.18.0 #311 Tainted: G S [ 2.569456] --------------------------------------------- [ 2.569458] swapper/0/1 is trying to acquire lock: [ 2.569469] (&map->mutex){+.+...}, at: [<ffffffc00037dba0>] regmap_lock_mutex+0x10/0x18 [ 2.569470] [ 2.569470] but task is already holding lock: [ 2.569476] (&map->mutex){+.+...}, at: [<ffffffc00037dba0>] regmap_lock_mutex+0x10/0x18 [ 2.569478] [ 2.569478] other info that might help us debug this: [ 2.569479] Possible unsafe locking scenario: [ 2.569479] [ 2.569480] CPU0 [ 2.569481] ---- [ 2.569484] lock(&map->mutex); [ 2.569486] lock(&map->mutex); [ 2.569487] [ 2.569487] *** DEADLOCK *** [ 2.569487] [ 2.569489] May be due to missing lock nesting notation [ 2.569489] [ 2.569491] 3 locks held by swapper/0/1: [ 2.569499] #0: (&dev->mutex){......}, at: [<ffffffc000369e80>] __driver_attach+0x38/0x98 [ 2.569505] #1: (&dev->mutex){......}, at: [<ffffffc000369ea0>] __driver_attach+0x58/0x98 [ 2.569512] #2: (&map->mutex){+.+...}, at: [<ffffffc00037dba0>] regmap_lock_mutex+0x10/0x18 [ 2.569513] [ 2.569513] stack backtrace: [ 2.569517] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G S 3.18.0 #311 [ 2.569521] Call trace: [ 2.569527] [<ffffffc000089198>] dump_backtrace+0x0/0x108 [ 2.569530] [<ffffffc0000892b0>] show_stack+0x10/0x1c [ 2.569535] [<ffffffc0005e09b4>] dump_stack+0x74/0x94 [ 2.569540] [<ffffffc0000d5d1c>] __lock_acquire+0x73c/0x1910 [ 2.569544] [<ffffffc0000d7674>] lock_acquire+0xec/0x128 [ 2.569548] [<ffffffc0005e3870>] mutex_lock_nested+0x58/0x354 [ 2.569551] [<ffffffc00037db9c>] regmap_lock_mutex+0xc/0x18 [ 2.569554] [<ffffffc00037ef50>] regmap_read+0x34/0x68 [ 2.569559] [<ffffffc0004ddc84>] rt5677_read+0x9c/0xb4 [ 2.569562] [<ffffffc00037ee8c>] _regmap_read+0xa8/0x138 [ 2.569565] [<ffffffc00037ef60>] regmap_read+0x44/0x68 [ 2.569569] [<ffffffc0004dd6a4>] rt5677_i2c_probe+0x25c/0x4a4 [ 2.569574] [<ffffffc00042984c>] i2c_device_probe+0xfc/0x130 [ 2.569577] [<ffffffc000369c58>] driver_probe_device+0xd4/0x23c [ 2.569580] [<ffffffc000369eb0>] __driver_attach+0x68/0x98 [ 2.569584] [<ffffffc000368dc8>] bus_for_each_dev+0x70/0x90 [ 2.569588] [<ffffffc0003697e4>] driver_attach+0x1c/0x28 [ 2.569591] [<ffffffc000369400>] bus_add_driver+0xd8/0x1e0 [ 2.569594] [<ffffffc00036a7f0>] driver_register+0xbc/0x10c [ 2.569598] [<ffffffc00042a4bc>] i2c_register_driver+0x48/0xac [ 2.569601] [<ffffffc0008d523c>] rt5677_i2c_driver_init+0x14/0x20 [ 2.569605] [<ffffffc0000828dc>] do_one_initcall+0xf4/0x18c [ 2.569609] [<ffffffc0008a4ae8>] kernel_init_freeable+0x144/0x1e4 [ 2.569613] [<ffffffc0005de3a4>] kernel_init+0x10/0xd4 It's actually a false alarm, the warning is triggered because 2 locks are acquired, from the same location: - rt5677_i2c_probe calls regmap_read(rt5677->regmap, RT5677_VENDOR_ID2, &val); - This locks rt5677->regmap->mutex (1st lock), then calls rt5677_read - Since rt5677->is_dsp_mode is false, regmap_read(rt5677->regmap_physical, reg, val); is called - This locks rt5677->regmap_physical->mutex (2nd lock). - The value is read, then both locks are released, in reverse order. AFAIK there is no code that would acquire the locks in the reverse order, so I don't think this can deadlock. I suggested reworking the register read/write calls in rt5677.c to direct them to the correct regmap earlier on (rt5677->regmap or rt5677->regmap_physical), before locks are acquired. But the patch above also fixes the issue (that is, it removes the warning). Thanks, Nicolas
On Fri, Jun 05, 2015 at 08:53:54AM +0800, Nicolas Boichat wrote: > > Could you be more specific about the deadlock you're seeing? This isn't > > really enough to understand either what the problem is or why this fixes > > it. Converting all the per-regmap mutexes into a single global mutex > > isn't an immediately obvious step. > We originally reported the issue to Realtek: Any analysis needs to be in the changelog for the commit and... > I suggested reworking the register read/write calls in rt5677.c to > direct them to the correct regmap earlier on (rt5677->regmap or > rt5677->regmap_physical), before locks are acquired. But the patch > above also fixes the issue (that is, it removes the warning). ...the above sounds like there's a bug in the locking anyway which this is just a bodge for?
On 06/05/2015 12:31 PM, Mark Brown wrote: > On Fri, Jun 05, 2015 at 08:53:54AM +0800, Nicolas Boichat wrote: > >>> Could you be more specific about the deadlock you're seeing? This isn't >>> really enough to understand either what the problem is or why this fixes >>> it. Converting all the per-regmap mutexes into a single global mutex >>> isn't an immediately obvious step. > >> We originally reported the issue to Realtek: > > Any analysis needs to be in the changelog for the commit and... > >> I suggested reworking the register read/write calls in rt5677.c to >> direct them to the correct regmap earlier on (rt5677->regmap or >> rt5677->regmap_physical), before locks are acquired. But the patch >> above also fixes the issue (that is, it removes the warning). > > ...the above sounds like there's a bug in the locking anyway which this > is just a bodge for? It's the same issue Antti reported a while ago. The issue is that lockdep for performance reasons does not look at each lock separately, but only at lock classes. By default all locks initialized by the same mutex_init() call end up in the same lock class. When one lock of a lock class is locked while already holding a lock from the same lock class lockdep complains about a potential deadlock because to lockdep those look like the same lock. For most locks this is OK since they do not recursively lock locks from the same lock class. Now the issue here is that we have nested regmap instances, meaning one regmap instances uses another instance in its read/write implementation. This will lead to nested locking of the mutex of the regmap struct. Since both mutexes are in the same lock class lockdep generates a warning. Rather than silencing the warning with some per driver hacks the correct way to fix this is to allow to properly annotate the locks as being different locks. I think Antti submitted some patches to attend to fix this, but there were still issues with the patches and they never got merged. - Lars
On Fri, Jun 05, 2015 at 12:53:27PM +0200, Lars-Peter Clausen wrote: > Now the issue here is that we have nested regmap instances, meaning one > regmap instances uses another instance in its read/write implementation. Ah, that sounds more familiar than unlocking in different orders. > I think Antti submitted some patches to attend to fix this, but there were > still issues with the patches and they never got merged. IIRC his patch was adding the ability for drivers to override the lock class which wasn't exactly fixing the problem but rather providing a different way to try to work around it in drivers.
diff --git a/sound/soc/codecs/rt5677.c b/sound/soc/codecs/rt5677.c index 31d969a..ee5b570 100644 --- a/sound/soc/codecs/rt5677.c +++ b/sound/soc/codecs/rt5677.c @@ -4998,6 +4998,22 @@ static const struct regmap_config rt5677_regmap_physical = { .num_ranges = ARRAY_SIZE(rt5677_ranges), }; +static void rt5677_regmap_lock_mutex(void *__lock) +{ + struct mutex *lock = __lock; + + mutex_lock(lock); +} + +static void rt5677_regmap_unlock_mutex(void *__lock) +{ + struct mutex *lock = __lock; + + mutex_unlock(lock); +} + +static struct mutex rt5677_regmap_lock; + static const struct regmap_config rt5677_regmap = { .reg_bits = 8, .val_bits = 16, @@ -5009,6 +5025,9 @@ static const struct regmap_config rt5677_regmap = { .readable_reg = rt5677_readable_register, .reg_read = rt5677_read, .reg_write = rt5677_write, + .lock = rt5677_regmap_lock_mutex, + .unlock = rt5677_regmap_unlock_mutex, + .lock_arg = &rt5677_regmap_lock, .cache_type = REGCACHE_RBTREE, .reg_defaults = rt5677_reg, @@ -5206,6 +5225,8 @@ static int rt5677_i2c_probe(struct i2c_client *i2c, return ret; } + mutex_init(&rt5677_regmap_lock); + regmap_read(rt5677->regmap, RT5677_VENDOR_ID2, &val); if (val != RT5677_DEVICE_ID) { dev_err(&i2c->dev,
The patch is for to avoid recursive locking in the DEADLOCK detector. In the driver, it encountered the warnning message of the recursive locking in the function "regmap_lock_mutex". Signed-off-by: Oder Chiou <oder_chiou@realtek.com> --- sound/soc/codecs/rt5677.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)