Message ID | 53A22A01.7080505@hp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote: > On 06/18/2014 08:03 PM, Marc Dionne wrote: >> >> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com> wrote: >>> >>> On 06/18/2014 07:30 PM, Waiman Long wrote: >>>> >>>> On 06/18/2014 07:27 PM, Chris Mason wrote: >>>>> >>>>> On 06/18/2014 07:19 PM, Waiman Long wrote: >>>>>> >>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote: >>>>>>> >>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote: >>>>>>>> >>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote: >>>>>>>>> >>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote: >>>>>>>>>> >>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 >>>>>>>>>>> similar >>>>>>>>>>> to what is reported here: >>>>>>>>>>> >>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> , along with the >>>>>>>>>>> occasional hard lockup, making it impossible to complete a >>>>>>>>>>> parallel >>>>>>>>>>> build on a btrfs filesystem for the package I work on. This was >>>>>>>>>>> working fine just a few days before rc1. >>>>>>>>>>> >>>>>>>>>>> Bisecting brought me to the following commit: >>>>>>>>>>> >>>>>>>>>>> commit bd01ec1a13f9a327950c8e3080096446c7804753 >>>>>>>>>>> Author: Waiman Long<Waiman.Long@hp.com> >>>>>>>>>>> Date: Mon Feb 3 13:18:57 2014 +0100 >>>>>>>>>>> >>>>>>>>>>> x86, locking/rwlocks: Enable qrwlocks on x86 >>>>>>>>>>> >>>>>>>>>>> And sure enough if I revert that commit on top of current >>>>>>>>>>> mainline, >>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs. >>>>>>>>>>> >>>>>>>>>>> Marc >>>>>>>>>> >>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not >>>>>>>>>> allowed unless the task is in an interrupt context. Doing >>>>>>>>>> recursive >>>>>>>>>> read_lock will hang the process when a write_lock happens >>>>>>>>>> somewhere in >>>>>>>>>> between. Are recursive read_lock being done in the btrfs code? >>>>>>>>>> >>>>>>>>> We walk down a tree and read lock each node as we walk down, is >>>>>>>>> that >>>>>>>>> what you mean? Or do you mean read_lock multiple times on the same >>>>>>>>> lock in the same process, cause we definitely don't do that. >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Josef >>>>>>>> >>>>>>>> I meant recursively read_lock the same lock in a process. >>>>>>> >>>>>>> I take it back, we do actually do this in some cases. Thanks, >>>>>>> >>>>>>> Josef >>>>>> >>>>>> This is what I thought when I looked at the looking code in btrfs. The >>>>>> unlock code doesn't clear the lock_owner pid, this may cause the >>>>>> lock_nested to be set incorrectly. >>>>>> >>>>>> Anyway, are you going to do something about it? >>>>> >>>>> Thanks for reporting this, we shouldn't be actually taking the lock >>>>> recursively. Could you please try with lockdep enabled? If the >>>>> problem >>>>> goes away with lockdep on, I think I know what's causing it. >>>>> Otherwise, >>>>> lockdep should clue us in. >>>>> >>>>> -chris >>>> >>>> I am not sure if lockdep will report recursive read_lock as this is >>>> possible in the past. If not, we certainly need to add that capability >>>> to it. >>>> >>>> One more thing, I saw comment in btrfs tree locking code about taking a >>>> read lock after taking a write (partial?) lock. That is not possible >>>> with even with the old rwlock code. >>> >>> With lockdep on, the clear_path_blocking function you're hitting >>> softlockups in is different. Futjitsu hit a similar problem during >>> quota rescans, and it goes away with lockdep on. I'm trying to nail >>> down where we went wrong, but please try lockdep on. >>> >>> -chris >> >> With lockdep on I'm unable to reproduce the lockups, and there are no >> lockdep warnings. >> >> Marc > > > Enabling lockdep may change the lock timing that make it hard to reproduce > the problem. Anyway, could you try to apply the following patch to see if it > shows any warning? > > -Longman > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index d24e433..b6c9f2e 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct > held_loc > if (hlock_class(prev) != hlock_class(next)) > continue; > > +#ifdef CONFIG_QUEUE_RWLOCK > + /* > + * Queue rwlock only allows read-after-read recursion of the > + * same lock class when the latter read is in an interrupt > + * context. > + */ > + if ((read == 2) && prev->read && in_interrupt()) > + return 2; > +#else > /* > * Allow read-after-read recursion of the same > * lock class (i.e. read_lock(lock)+read_lock(lock)): > */ > if ((read == 2) && prev->read) > return 2; > +#endif > > /* > * We're holding the nest_lock, which serializes this lock's > @@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct > held_lock > * write-lock never takes any other locks, then the reads are > * equivalent to a NOP. > */ > +#ifndef CONFIG_QUEUE_RWLOCK > if (next->read == 2 || prev->read == 2) > return 1; > +#endif > /* > * Is the <prev> -> <next> dependency already present? > * I still don't see any warnings with this patch added. Also tried along with removing a couple of ifdefs on CONFIG_DEBUG_LOCK_ALLOC in btrfs/ctree.c - still unable to generate any warnings or lockups. Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote: > On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote: >> On 06/18/2014 08:03 PM, Marc Dionne wrote: >>> >>> On Wed, Jun 18, 2014 at 7:53 PM, Chris Mason<clm@fb.com> wrote: >>>> >>>> On 06/18/2014 07:30 PM, Waiman Long wrote: >>>>> >>>>> On 06/18/2014 07:27 PM, Chris Mason wrote: >>>>>> >>>>>> On 06/18/2014 07:19 PM, Waiman Long wrote: >>>>>>> >>>>>>> On 06/18/2014 07:10 PM, Josef Bacik wrote: >>>>>>>> >>>>>>>> On 06/18/2014 03:47 PM, Waiman Long wrote: >>>>>>>>> >>>>>>>>> On 06/18/2014 06:27 PM, Josef Bacik wrote: >>>>>>>>>> >>>>>>>>>> On 06/18/2014 03:17 PM, Waiman Long wrote: >>>>>>>>>>> >>>>>>>>>>> On 06/18/2014 04:57 PM, Marc Dionne wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I've been seeing very reproducible soft lockups with 3.16-rc1 >>>>>>>>>>>> similar >>>>>>>>>>>> to what is reported here: >>>>>>>>>>>> >>>>>>>>>>>> https://urldefense.proofpoint.com/v1/url?u=http://marc.info/?l%3Dlinux-btrfs%26m%3D140290088532203%26w%3D2&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=aoagvtZMwVb16gh1HApZZL00I7eP50GurBpuEo3l%2B5g%3D%0A&s=c62558feb60a480bbb52802093de8c97b5e1f23d4100265b6120c8065bd99565 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> , along with the >>>>>>>>>>>> occasional hard lockup, making it impossible to complete a >>>>>>>>>>>> parallel >>>>>>>>>>>> build on a btrfs filesystem for the package I work on. This was >>>>>>>>>>>> working fine just a few days before rc1. >>>>>>>>>>>> >>>>>>>>>>>> Bisecting brought me to the following commit: >>>>>>>>>>>> >>>>>>>>>>>> commit bd01ec1a13f9a327950c8e3080096446c7804753 >>>>>>>>>>>> Author: Waiman Long<Waiman.Long@hp.com> >>>>>>>>>>>> Date: Mon Feb 3 13:18:57 2014 +0100 >>>>>>>>>>>> >>>>>>>>>>>> x86, locking/rwlocks: Enable qrwlocks on x86 >>>>>>>>>>>> >>>>>>>>>>>> And sure enough if I revert that commit on top of current >>>>>>>>>>>> mainline, >>>>>>>>>>>> I'm unable to reproduce the soft lockups and hangs. >>>>>>>>>>>> >>>>>>>>>>>> Marc >>>>>>>>>>> >>>>>>>>>>> The queue rwlock is fair. As a result, recursive read_lock is not >>>>>>>>>>> allowed unless the task is in an interrupt context. Doing >>>>>>>>>>> recursive >>>>>>>>>>> read_lock will hang the process when a write_lock happens >>>>>>>>>>> somewhere in >>>>>>>>>>> between. Are recursive read_lock being done in the btrfs code? >>>>>>>>>>> >>>>>>>>>> We walk down a tree and read lock each node as we walk down, is >>>>>>>>>> that >>>>>>>>>> what you mean? Or do you mean read_lock multiple times on the same >>>>>>>>>> lock in the same process, cause we definitely don't do that. >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Josef >>>>>>>>> >>>>>>>>> I meant recursively read_lock the same lock in a process. >>>>>>>> >>>>>>>> I take it back, we do actually do this in some cases. Thanks, >>>>>>>> >>>>>>>> Josef >>>>>>> >>>>>>> This is what I thought when I looked at the looking code in btrfs. The >>>>>>> unlock code doesn't clear the lock_owner pid, this may cause the >>>>>>> lock_nested to be set incorrectly. >>>>>>> >>>>>>> Anyway, are you going to do something about it? >>>>>> >>>>>> Thanks for reporting this, we shouldn't be actually taking the lock >>>>>> recursively. Could you please try with lockdep enabled? If the >>>>>> problem >>>>>> goes away with lockdep on, I think I know what's causing it. >>>>>> Otherwise, >>>>>> lockdep should clue us in. >>>>>> >>>>>> -chris >>>>> >>>>> I am not sure if lockdep will report recursive read_lock as this is >>>>> possible in the past. If not, we certainly need to add that capability >>>>> to it. >>>>> >>>>> One more thing, I saw comment in btrfs tree locking code about taking a >>>>> read lock after taking a write (partial?) lock. That is not possible >>>>> with even with the old rwlock code. >>>> >>>> With lockdep on, the clear_path_blocking function you're hitting >>>> softlockups in is different. Futjitsu hit a similar problem during >>>> quota rescans, and it goes away with lockdep on. I'm trying to nail >>>> down where we went wrong, but please try lockdep on. >>>> >>>> -chris >>> >>> With lockdep on I'm unable to reproduce the lockups, and there are no >>> lockdep warnings. >>> >>> Marc >> >> >> Enabling lockdep may change the lock timing that make it hard to reproduce >> the problem. Anyway, could you try to apply the following patch to see if it >> shows any warning? >> >> -Longman >> >> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c >> index d24e433..b6c9f2e 100644 >> --- a/kernel/locking/lockdep.c >> +++ b/kernel/locking/lockdep.c >> @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct >> held_loc >> if (hlock_class(prev) != hlock_class(next)) >> continue; >> >> +#ifdef CONFIG_QUEUE_RWLOCK >> + /* >> + * Queue rwlock only allows read-after-read recursion of the >> + * same lock class when the latter read is in an interrupt >> + * context. >> + */ >> + if ((read == 2) && prev->read && in_interrupt()) >> + return 2; >> +#else >> /* >> * Allow read-after-read recursion of the same >> * lock class (i.e. read_lock(lock)+read_lock(lock)): >> */ >> if ((read == 2) && prev->read) >> return 2; >> +#endif >> >> /* >> * We're holding the nest_lock, which serializes this lock's >> @@ -1852,8 +1862,10 @@ check_prev_add(struct task_struct *curr, struct >> held_lock >> * write-lock never takes any other locks, then the reads are >> * equivalent to a NOP. >> */ >> +#ifndef CONFIG_QUEUE_RWLOCK >> if (next->read == 2 || prev->read == 2) >> return 1; >> +#endif >> /* >> * Is the <prev> -> <next> dependency already present? >> * > > I still don't see any warnings with this patch added. Also tried > along with removing a couple of ifdefs on CONFIG_DEBUG_LOCK_ALLOC in > btrfs/ctree.c - still unable to generate any warnings or lockups. > > Marc And for an additional data point, just removing those CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent the symptoms when lockdep is not enabled. Marc -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/18/2014 10:03 PM, Marc Dionne wrote: > On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne <marc.c.dionne@gmail.com> wrote: >> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long <waiman.long@hp.com> wrote: >>> On 06/18/2014 08:03 PM, Marc Dionne wrote: > > And for an additional data point, just removing those > CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent > the symptoms when lockdep is not enabled. Ok, somehow we've added a lock inversion here that wasn't here before. Thanks for confirming, I'll nail it down. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/18/2014 10:11 PM, Chris Mason wrote: > On 06/18/2014 10:03 PM, Marc Dionne wrote: >> On Wed, Jun 18, 2014 at 8:41 PM, Marc Dionne<marc.c.dionne@gmail.com> wrote: >>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com> wrote: >>>> On 06/18/2014 08:03 PM, Marc Dionne wrote: >> And for an additional data point, just removing those >> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent >> the symptoms when lockdep is not enabled. > Ok, somehow we've added a lock inversion here that wasn't here before. > Thanks for confirming, I'll nail it down. > > -chris > I am pretty sure that the hangup is caused by the following kind of code fragment in the locking.c file: if (eb->lock_nested) { read_lock(&eb->lock); if (eb->lock_nested && current->pid == eb->lock_owner) { Is it possible to do the check without taking the read_lock? -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/18/2014 11:21 PM, Waiman Long wrote: > On 06/18/2014 10:11 PM, Chris Mason wrote: >> On 06/18/2014 10:03 PM, Marc Dionne wrote: >>> On Wed, Jun 18, 2014 at 8:41 PM, Marc >>> Dionne<marc.c.dionne@gmail.com> wrote: >>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com> >>>> wrote: >>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote: >>> And for an additional data point, just removing those >>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent >>> the symptoms when lockdep is not enabled. >> Ok, somehow we've added a lock inversion here that wasn't here before. >> Thanks for confirming, I'll nail it down. >> >> -chris >> > > I am pretty sure that the hangup is caused by the following kind of code > fragment in the locking.c file: > > if (eb->lock_nested) { > read_lock(&eb->lock); > if (eb->lock_nested && current->pid == eb->lock_owner) { > > Is it possible to do the check without taking the read_lock? I think you're right, we haven't added any new recursive takers of the lock. The path where we are deadlocking has an extent buffer that isn't in the path yet locked. I think we're taking the read lock while that one is write locked. Reworking the nesting a big here. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/19/2014 12:51 PM, Chris Mason wrote: > On 06/18/2014 11:21 PM, Waiman Long wrote: >> On 06/18/2014 10:11 PM, Chris Mason wrote: >>> On 06/18/2014 10:03 PM, Marc Dionne wrote: >>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc >>>> Dionne<marc.c.dionne@gmail.com> wrote: >>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com> >>>>> wrote: >>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote: >>>> And for an additional data point, just removing those >>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent >>>> the symptoms when lockdep is not enabled. >>> Ok, somehow we've added a lock inversion here that wasn't here before. >>> Thanks for confirming, I'll nail it down. >>> >>> -chris >>> >> I am pretty sure that the hangup is caused by the following kind of code >> fragment in the locking.c file: >> >> if (eb->lock_nested) { >> read_lock(&eb->lock); >> if (eb->lock_nested&& current->pid == eb->lock_owner) { >> >> Is it possible to do the check without taking the read_lock? > I think you're right, we haven't added any new recursive takers of the > lock. The path where we are deadlocking has an extent buffer that isn't > in the path yet locked. I think we're taking the read lock while that > one is write locked. > > Reworking the nesting a big here. > > -chris I would like to take back my comments. I took out the read_lock, but the process still hang while doing file activities on btrfs filesystem. So the problem is trickier than I thought. Below are the stack backtraces of some of the relevant processes. -Longman INFO: rcu_sched self-detected stall on CPU INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched self-detected stall on CPU {INFO: rcu_sched self-detected stall on CPU { 10 20} {} 0} (t=21000 jiff ies g=4633 c=4632 q=8579) (t=21000 jiffies g=4633 c=4632 q=8579) sending NMI to all CPUs: (t=21000 jiffies g=4633 c=4632 q=8579) NMI backtrace for cpu 0 CPU: 0 PID: 559 Comm: kworker/u65:8 Tainted: G E 3.16.0-rc1 #3 Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011 Workqueue: btrfs-endio-write normal_work_helper task: ffff88040e986510 ti: ffff88040e98c000 task.ti: ffff88040e98c000 RIP: 0010:[<ffffffff810a687d>] [<ffffffff810a687d>] do_raw_spin_unlock+0x3d/0xa 0 RSP: 0018:ffff88041fc03d08 EFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88041fc13680 RCX: ffff88040e98c010 RDX: 0000000000000acd RSI: 0000000000000001 RDI: ffff88041fc13680 RBP: ffff88041fc03d18 R08: 0000000000000000 R09: 0000000000000006 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88041fc13680 R13: 0000000000000082 R14: 0000000000000000 R15: ffff88041fc0d1a0 FS: 0000000000000000(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f40cbd78000 CR3: 0000000001c10000 CR4: 00000000000007f0 Stack: 0000000000000000 0000000000000082 ffff88041fc03d38 ffffffff8166e7e6 ffff88041fc13680 0000000000013680 ffff88041fc03d68 ffffffff810834a0 ffffffff81c4f940 0000000000000086 ffffffff81d57d60 0000000000000000 Call Trace: <IRQ> [<ffffffff8166e7e6>] _raw_spin_unlock_irqrestore+0x36/0x50 [<ffffffff810834a0>] resched_cpu+0x80/0x90 [<ffffffff810be01f>] print_cpu_stall+0x12f/0x140 [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0 [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210 [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0 [<ffffffff8105f938>] update_process_times+0x48/0x80 [<ffffffff810caa77>] tick_sched_handle+0x37/0x80 [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90 [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0 [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0 [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0 [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60 [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff810a7180>] ? queue_write_lock_slowpath+0x60/0x90 [<ffffffff810a692d>] do_raw_write_lock+0x4d/0xa0 [<ffffffff8166e3b9>] _raw_write_lock+0x39/0x40 [<ffffffff812948ef>] ? btrfs_try_tree_write_lock+0x4f/0xc0 [<ffffffff812948ef>] btrfs_try_tree_write_lock+0x4f/0xc0 [<ffffffff81236e62>] btrfs_search_slot+0x422/0x870 [<ffffffff81237c7e>] btrfs_insert_empty_items+0x7e/0xe0 [<ffffffff8125f83c>] insert_reserved_file_extent.clone.0+0x13c/0x2f0 [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560 [<ffffffff81262825>] finish_ordered_fn+0x15/0x20 [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0 [<ffffffff8107052b>] process_one_work+0x1db/0x510 [<ffffffff810704b6>] ? process_one_work+0x166/0x510 [<ffffffff810717ef>] worker_thread+0x11f/0x3c0 [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190 [<ffffffff8107791e>] kthread+0xde/0x100 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 Code: 4e ad de 48 89 fb 75 42 0f b7 13 0f b7 43 02 66 39 c2 74 66 65 48 8b 04 25 c0 b9 00 00 48 39 43 10 75 46 65 8b 04 25 30 b0 00 00 <39> 43 08 75 28 48 c7 43 10 ff ff ff ff c7 43 08 ff ff ff ff 66 -------------------------------------------------------- NMI backtrace for cpu 10 CPU: 10 PID: 23844 Comm: kworker/u65:14 Tainted: G E 3.16.0-rc1 #3 Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011 Workqueue: btrfs-endio-write normal_work_helper task: ffff880407c26190 ti: ffff88040799c000 task.ti: ffff88040799c000 RIP: 0010:[<ffffffff8103779c>] [<ffffffff8103779c>] default_send_IPI_mask_seque nce_phys+0x6c/0xf0 RSP: 0018:ffff88041fca3cf8 EFLAGS: 00000046 RAX: ffff88081faa0000 RBX: 0000000000000002 RCX: 0000000000000032 RDX: 000000000000000b RSI: 0000000000000020 RDI: 0000000000000020 RBP: ffff88041fca3d38 R08: ffffffff81d58160 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000b034 R13: ffffffff81d58160 R14: 0000000000000400 R15: 0000000000000092 FS: 0000000000000000(0000) GS:ffff88041fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f40ca200000 CR3: 0000000001c10000 CR4: 00000000000007e0 Stack: 0000000000000012 000000000000000a ffff88041fca3d18 0000000000000000 0000000000002183 ffffffff81d57d60 0000000000000000 ffff88041fcad1a0 ffff88041fca3d48 ffffffff8103acb7 ffff88041fca3d68 ffffffff81037941 Call Trace: <IRQ> [<ffffffff8103acb7>] physflat_send_IPI_all+0x17/0x20 [<ffffffff81037941>] arch_trigger_all_cpu_backtrace+0x61/0xa0 [<ffffffff810bdfcc>] print_cpu_stall+0xdc/0x140 [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0 [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210 [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0 [<ffffffff8105f938>] update_process_times+0x48/0x80 [<ffffffff810caa77>] tick_sched_handle+0x37/0x80 [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90 [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0 [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0 [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0 [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60 [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff810a7223>] ? queue_read_lock_slowpath+0x73/0x90 [<ffffffff810a69c4>] do_raw_read_lock+0x44/0x50 [<ffffffff8166e6bc>] _raw_read_lock+0x3c/0x50 [<ffffffff81294d31>] ? btrfs_clear_lock_blocking_rw+0x71/0x1d0 [<ffffffff81294d31>] btrfs_clear_lock_blocking_rw+0x71/0x1d0 [<ffffffff8122d92a>] btrfs_clear_path_blocking+0x3a/0x80 [<ffffffff81236f3d>] btrfs_search_slot+0x4fd/0x870 [<ffffffff8123873e>] btrfs_next_old_leaf+0x24e/0x480 [<ffffffff81238980>] btrfs_next_leaf+0x10/0x20 [<ffffffff8126e7f8>] __btrfs_drop_extents+0x2a8/0xe80 [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470 [<ffffffff811937e5>] ? kmem_cache_alloc+0x1a5/0x1d0 [<ffffffff8125f7a7>] insert_reserved_file_extent.clone.0+0xa7/0x2f0 [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560 [<ffffffff81262825>] finish_ordered_fn+0x15/0x20 [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0 [<ffffffff8107052b>] process_one_work+0x1db/0x510 [<ffffffff810704b6>] ? process_one_work+0x166/0x510 [<ffffffff810717ef>] worker_thread+0x11f/0x3c0 [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190 [<ffffffff8107791e>] kthread+0xde/0x100 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 Code: c2 01 4c 89 ef 48 63 d2 e8 22 8a 32 00 48 63 35 cf 26 d2 00 89 c2 48 39 f2 73 55 48 8b 04 d5 00 57 d4 81 83 fb 02 41 0f b7 0c 04 <74> 55 8b 04 25 00 c3 5f ff f6 c4 10 74 1b 66 0f 1f 44 00 00 f3 -------------------------------------------------------------------------- NMI backtrace for cpu 14 CPU: 14 PID: 23832 Comm: tar Tainted: G E 3.16.0-rc1 #3 Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011 task: ffff8808028f4950 ti: ffff8808028ac000 task.ti: ffff8808028ac000 RIP: 0010:[<ffffffff8135a5e7>] [<ffffffff8135a5e7>] delay_tsc+0x37/0x60 RSP: 0018:ffff88041fce3ad8 EFLAGS: 00000046 RAX: 000000002ae3de42 RBX: ffffffff82b54958 RCX: 000000000000000e RDX: 00000000000000ff RSI: 000000002ae3d696 RDI: 0000000000000b75 RBP: ffff88041fce3ad8 R08: 000000000000000e R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 000000000000270b R13: 0000000000000020 R14: 0000000000000020 R15: ffffffff8143d620 FS: 00007f84603367a0(0000) GS:ffff88041fce0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffea97afd8 CR3: 000000080ae98000 CR4: 00000000000007e0 Stack: ffff88041fce3ae8 ffffffff8135a5a8 ffff88041fce3b18 ffffffff8143d5b0 ffffffff8143309c ffffffff82b54958 0000000000000074 000000000000002a ffff88041fce3b38 ffffffff8143d646 ffffffff829cde5f ffffffff82b54958 Call Trace: <IRQ> [<ffffffff8135a5a8>] __const_udelay+0x28/0x30 [<ffffffff8143d5b0>] wait_for_xmitr+0x30/0xa0 [<ffffffff8143309c>] ? vt_console_print+0x2cc/0x3b0 [<ffffffff8143d646>] serial8250_console_putchar+0x26/0x40 [<ffffffff814387ae>] uart_console_write+0x3e/0x70 [<ffffffff8143fcf6>] serial8250_console_write+0xb6/0x180 [<ffffffff810b2635>] call_console_drivers.clone.2+0xa5/0x100 [<ffffffff810b2773>] console_cont_flush.clone.0+0xe3/0x190 [<ffffffff810b2858>] console_unlock+0x38/0x310 [<ffffffff810b1d57>] ? __down_trylock_console_sem+0x47/0x50 [<ffffffff810b303d>] ? vprintk_emit+0x2bd/0x5d0 [<ffffffff810b305e>] vprintk_emit+0x2de/0x5d0 [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470 [<ffffffff81669486>] printk+0x4d/0x4f [<ffffffff810bdf1d>] print_cpu_stall+0x2d/0x140 [<ffffffff810a0700>] ? cpuacct_css_alloc+0xb0/0xb0 [<ffffffff810be46f>] __rcu_pending+0x1ff/0x210 [<ffffffff810bf3cd>] rcu_check_callbacks+0xed/0x1a0 [<ffffffff8105f938>] update_process_times+0x48/0x80 [<ffffffff810caa77>] tick_sched_handle+0x37/0x80 [<ffffffff810cb2f4>] tick_sched_timer+0x54/0x90 [<ffffffff8107aba1>] __run_hrtimer+0x81/0x1c0 [<ffffffff810cb2a0>] ? tick_nohz_handler+0xc0/0xc0 [<ffffffff8107afc6>] hrtimer_interrupt+0x116/0x2a0 [<ffffffff8103643b>] local_apic_timer_interrupt+0x3b/0x60 [<ffffffff81671685>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff8166fc8a>] apic_timer_interrupt+0x6a/0x70 <EOI> [<ffffffff810a7223>] ? queue_read_lock_slowpath+0x73/0x90 [<ffffffff810a69c4>] do_raw_read_lock+0x44/0x50 [<ffffffff8166e6bc>] _raw_read_lock+0x3c/0x50 [<ffffffff81294be8>] ? btrfs_tree_read_lock+0x58/0x130 [<ffffffff81294be8>] btrfs_tree_read_lock+0x58/0x130 [<ffffffff8122d7ab>] btrfs_read_lock_root_node+0x3b/0x50 [<ffffffff8123709c>] btrfs_search_slot+0x65c/0x870 [<ffffffff8125cef8>] ? btrfs_set_bit_hook+0xd8/0x150 [<ffffffff8124b76d>] btrfs_lookup_dir_item+0x7d/0xd0 [<ffffffff8126a895>] btrfs_lookup_dentry+0xb5/0x390 [<ffffffff8166e82b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff8126ab86>] btrfs_lookup+0x16/0x40 [<ffffffff811ac3ad>] lookup_real+0x1d/0x60 [<ffffffff811aeac4>] lookup_open+0xc4/0x1c0 [<ffffffff811b0428>] ? do_last+0x338/0x8c0 [<ffffffff811b0442>] do_last+0x352/0x8c0 [<ffffffff811acc00>] ? __inode_permission+0x90/0xd0 [<ffffffff811b3224>] path_openat+0xc4/0x480 [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470 [<ffffffff810a082b>] ? cpuacct_charge+0x6b/0x90 [<ffffffff811c0b86>] ? __alloc_fd+0x36/0x150 [<ffffffff811b371a>] do_filp_open+0x4a/0xa0 [<ffffffff811c0bfc>] ? __alloc_fd+0xac/0x150 [<ffffffff810f4254>] ? __audit_syscall_entry+0x94/0x100 [<ffffffff811a131a>] do_sys_open+0x11a/0x230 [<ffffffff811a146e>] SyS_open+0x1e/0x20 [<ffffffff8166ee12>] system_call_fastpath+0x16/0x1b Code: 04 25 30 b0 00 00 66 66 90 0f ae e8 0f 31 89 c6 eb 11 66 90 f3 90 65 8b 0c 25 30 b0 00 00 44 39 c1 75 12 66 66 90 0f ae e8 0f 31 <89> c2 29 f2 39 fa 72 e1 c9 c3 29 c6 01 f7 66 66 90 0f ae e8 0f ----------------------------------------------------------------------- NMI backtrace for cpu 20 CPU: 20 PID: 154 Comm: kworker/u65:1 Tainted: G E 3.16.0-rc1 #3 Hardware name: HP ProLiant DL380 G6, BIOS P62 01/30/2011 Workqueue: btrfs-endio-write normal_work_helper task: ffff88040d83e190 ti: ffff88040d840000 task.ti: ffff88040d840000 RIP: 0010:[<ffffffff810a7184>] [<ffffffff810a7184>] queue_write_lock_slowpath+0 x64/0x90 RSP: 0018:ffff88040d843838 EFLAGS: 00000206 RAX: 0000000000000101 RBX: ffff8807f7b51bc8 RCX: 0000000000000101 RDX: 00000000000000ff RSI: ffff88040d83edc8 RDI: ffff8807f7b51bc8 RBP: ffff88040d843838 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8807f7b51bb4 R13: ffff8807f7b51c00 R14: ffff8807f7b51bb8 R15: ffff8807f7b51c48 FS: 0000000000000000(0000) GS:ffff88041fd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fc824402008 CR3: 0000000001c10000 CR4: 00000000000007e0 Stack: ffff88040d843858 ffffffff810a692d ffff8807f7b51bb8 ffff8807f7b51bc8 ffff88040d843888 ffffffff8166e3b9 ffffffff81294769 ffff8807f7b51bb4 ffff8807f7b51c00 ffff88040d8438b8 ffff88040d843918 ffffffff81294769 Call Trace: [<ffffffff810a692d>] do_raw_write_lock+0x4d/0xa0 [<ffffffff8166e3b9>] _raw_write_lock+0x39/0x40 [<ffffffff81294769>] ? btrfs_tree_lock+0xf9/0x230 [<ffffffff81294769>] btrfs_tree_lock+0xf9/0x230 [<ffffffff8122d70e>] ? btrfs_root_node+0x5e/0xc0 [<ffffffff8109a850>] ? bit_waitqueue+0xe0/0xe0 [<ffffffff8122d8db>] btrfs_lock_root_node+0x3b/0x50 [<ffffffff812370ef>] btrfs_search_slot+0x6af/0x870 [<ffffffff81294600>] ? btrfs_tree_unlock+0x70/0xe0 [<ffffffff8129462e>] ? btrfs_tree_unlock+0x9e/0xe0 [<ffffffff810a42c4>] ? __lock_release+0x84/0x180 [<ffffffff8124bf9d>] btrfs_lookup_file_extent+0x3d/0x40 [<ffffffff8126e6a2>] __btrfs_drop_extents+0x152/0xe80 [<ffffffff810a3b11>] ? __lock_acquire+0x1b1/0x470 [<ffffffff810bb412>] ? rcu_resched+0x22/0x30 [<ffffffff8166a316>] ? _cond_resched+0x36/0x60 [<ffffffff811937e5>] ? kmem_cache_alloc+0x1a5/0x1d0 [<ffffffff8125f7a7>] insert_reserved_file_extent.clone.0+0xa7/0x2f0 [<ffffffff81262745>] btrfs_finish_ordered_io+0x495/0x560 [<ffffffff81262825>] finish_ordered_fn+0x15/0x20 [<ffffffff8128a8ed>] normal_work_helper+0x8d/0x1b0 [<ffffffff8107052b>] process_one_work+0x1db/0x510 [<ffffffff810704b6>] ? process_one_work+0x166/0x510 [<ffffffff810717ef>] worker_thread+0x11f/0x3c0 [<ffffffff810716d0>] ? maybe_create_worker+0x190/0x190 [<ffffffff8107791e>] kthread+0xde/0x100 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8166ed6c>] ret_from_fork+0x7c/0xb0 [<ffffffff81077840>] ? __init_kthread_worker+0x70/0x70 Code: 83 47 04 01 c9 c3 90 f3 90 8b 17 84 d2 75 f8 89 d1 89 d0 83 c9 01 f0 0f b1 0f 39 d0 75 e9 ba ff 00 00 00 eb 04 66 90 f3 90 8b 07 <83> f8 01 75 f7 f0 0f b1 17 83 f8 01 75 ee eb c4 f3 90 0f b7 01 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/19/2014 01:52 PM, Waiman Long wrote: > On 06/19/2014 12:51 PM, Chris Mason wrote: >> On 06/18/2014 11:21 PM, Waiman Long wrote: >>> On 06/18/2014 10:11 PM, Chris Mason wrote: >>>> On 06/18/2014 10:03 PM, Marc Dionne wrote: >>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc >>>>> Dionne<marc.c.dionne@gmail.com> wrote: >>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com> >>>>>> wrote: >>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote: >>>>> And for an additional data point, just removing those >>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent >>>>> the symptoms when lockdep is not enabled. >>>> Ok, somehow we've added a lock inversion here that wasn't here before. >>>> Thanks for confirming, I'll nail it down. >>>> >>>> -chris >>>> >>> I am pretty sure that the hangup is caused by the following kind of code >>> fragment in the locking.c file: >>> >>> if (eb->lock_nested) { >>> read_lock(&eb->lock); >>> if (eb->lock_nested&& current->pid == >>> eb->lock_owner) { >>> >>> Is it possible to do the check without taking the read_lock? >> I think you're right, we haven't added any new recursive takers of the >> lock. The path where we are deadlocking has an extent buffer that isn't >> in the path yet locked. I think we're taking the read lock while that >> one is write locked. >> >> Reworking the nesting a big here. >> >> -chris > > I would like to take back my comments. I took out the read_lock, but the > process still hang while doing file activities on btrfs filesystem. So > the problem is trickier than I thought. Below are the stack backtraces > of some of the relevant processes. > You weren't wrong, but it was also the tree trylock code. Our trylocks only back off if the blocking lock is held. btrfs_next_leaf needs it to be a true trylock. The confusing part is this hasn't really changed, but one of the callers must be a spinner where we used to have a blocker. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index d24e433..b6c9f2e 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1766,12 +1766,22 @@ check_deadlock(struct task_struct *curr, struct held_loc if (hlock_class(prev) != hlock_class(next)) continue; +#ifdef CONFIG_QUEUE_RWLOCK + /* + * Queue rwlock only allows read-after-read recursion of the + * same lock class when the latter read is in an interrupt + * context. + */ + if ((read == 2) && prev->read && in_interrupt()) + return 2; +#else /* * Allow read-after-read recursion of the same * lock class (i.e. read_lock(lock)+read_lock(lock)): */ if ((read == 2) && prev->read) return 2; +#endif /*