diff mbox

Lockups with btrfs on 3.16-rc1 - bisected

Message ID 53A35B31.4060203@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Mason June 19, 2014, 9:50 p.m. UTC
On 06/19/2014 04:10 PM, Chris Mason wrote:
> On 06/19/2014 01:52 PM, Waiman Long wrote:
>> On 06/19/2014 12:51 PM, Chris Mason wrote:
>>> On 06/18/2014 11:21 PM, Waiman Long wrote:
>>>> On 06/18/2014 10:11 PM, Chris Mason wrote:
>>>>> On 06/18/2014 10:03 PM, Marc Dionne wrote:
>>>>>> On Wed, Jun 18, 2014 at 8:41 PM, Marc
>>>>>> Dionne<marc.c.dionne@gmail.com>   wrote:
>>>>>>> On Wed, Jun 18, 2014 at 8:08 PM, Waiman Long<waiman.long@hp.com>
>>>>>>> wrote:
>>>>>>>> On 06/18/2014 08:03 PM, Marc Dionne wrote:
>>>>>> And for an additional data point, just removing those
>>>>>> CONFIG_DEBUG_LOCK_ALLOC ifdefs looks like it's sufficient to prevent
>>>>>> the symptoms when lockdep is not enabled.
>>>>> Ok, somehow we've added a lock inversion here that wasn't here before.
>>>>> Thanks for confirming, I'll nail it down.
>>>>>
>>>>> -chris
>>>>>
>>>> I am pretty sure that the hangup is caused by the following kind of code
>>>> fragment in the locking.c file:
>>>>
>>>>   if (eb->lock_nested) {
>>>>                  read_lock(&eb->lock);
>>>>                  if (eb->lock_nested&&  current->pid ==
>>>> eb->lock_owner) {
>>>>
>>>> Is it possible to do the check without taking the read_lock?
>>> I think you're right, we haven't added any new recursive takers of the
>>> lock.  The path where we are deadlocking has an extent buffer that isn't
>>> in the path yet locked.  I think we're taking the read lock while that
>>> one is write locked.
>>>
>>> Reworking the nesting a big here.
>>>
>>> -chris
>>
>> I would like to take back my comments. I took out the read_lock, but the
>> process still hang while doing file activities on btrfs filesystem. So
>> the problem is trickier than I thought. Below are the stack backtraces
>> of some of the relevant processes.
>>
> 
> You weren't wrong, but it was also the tree trylock code.  Our trylocks
> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
> be a true trylock.  The confusing part is this hasn't really changed,
> but one of the callers must be a spinner where we used to have a blocker.

This is what I have queued up, it's working here.

-chris

commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
Author: Chris Mason <clm@fb.com>
Date:   Thu Jun 19 14:16:52 2014 -0700

    Btrfs: fix deadlocks with trylock on tree nodes
    
    The Btrfs tree trylock function is poorly named.  It always takes
    the spinlock and backs off if the blocking lock is held.  This
    can lead to surprising lockups because people expect it to really be a
    trylock.
    
    This commit makes it a pure trylock, both for the spinlock and the
    blocking lock.  It also reworks the nested lock handling slightly to
    avoid taking the read lock while a spinning write lock might be held.
    
    Signed-off-by: Chris Mason <clm@fb.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Waiman Long June 19, 2014, 11:21 p.m. UTC | #1
On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>
>>> I would like to take back my comments. I took out the read_lock, but the
>>> process still hang while doing file activities on btrfs filesystem. So
>>> the problem is trickier than I thought. Below are the stack backtraces
>>> of some of the relevant processes.
>>>
>> You weren't wrong, but it was also the tree trylock code.  Our trylocks
>> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
>> be a true trylock.  The confusing part is this hasn't really changed,
>> but one of the callers must be a spinner where we used to have a blocker.
> This is what I have queued up, it's working here.
>
> -chris
>
> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
> Author: Chris Mason<clm@fb.com>
> Date:   Thu Jun 19 14:16:52 2014 -0700
>
>      Btrfs: fix deadlocks with trylock on tree nodes
>
>      The Btrfs tree trylock function is poorly named.  It always takes
>      the spinlock and backs off if the blocking lock is held.  This
>      can lead to surprising lockups because people expect it to really be a
>      trylock.
>
>      This commit makes it a pure trylock, both for the spinlock and the
>      blocking lock.  It also reworks the nested lock handling slightly to
>      avoid taking the read lock while a spinning write lock might be held.
>
>      Signed-off-by: Chris Mason<clm@fb.com>

I didn't realize that those non-blocking lock functions are really 
trylocks. Yes, the patch did seem to fix the hanging problem that I saw 
when I just untar the kernel source files into a btrfs filesystem. 
However, when I tried did a kernel build on a 24-thread (-j 24) system, 
the build process hanged after a while. The following kind of stack 
trace messages were printed:

INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
       Tainted: G            E 3.16.0-rc1 #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
  ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
  0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
  ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff
Call Trace:
  [<ffffffff81592de9>] schedule+0x29/0x70
  [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
  [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
  [<ffffffff8106d595>] ? insert_work+0x65/0xb0
  [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
  [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
  [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
  [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
  [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
  [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
  [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
  [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
  [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
  [<ffffffff8107544e>] kthread+0xce/0xf0
  [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
  [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70

It looks like some more work may still be needed. Or it could be a 
problem in my system configuration.

-Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tsutomu Itoh June 20, 2014, 3:20 a.m. UTC | #2
On 2014/06/20 8:21, Waiman Long wrote:
> On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>>
>>>> I would like to take back my comments. I took out the read_lock, but the
>>>> process still hang while doing file activities on btrfs filesystem. So
>>>> the problem is trickier than I thought. Below are the stack backtraces
>>>> of some of the relevant processes.
>>>>
>>> You weren't wrong, but it was also the tree trylock code.  Our trylocks
>>> only back off if the blocking lock is held.  btrfs_next_leaf needs it to
>>> be a true trylock.  The confusing part is this hasn't really changed,
>>> but one of the callers must be a spinner where we used to have a blocker.
>> This is what I have queued up, it's working here.
>>
>> -chris
>>
>> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>> Author: Chris Mason<clm@fb.com>
>> Date:   Thu Jun 19 14:16:52 2014 -0700
>>
>>      Btrfs: fix deadlocks with trylock on tree nodes
>>
>>      The Btrfs tree trylock function is poorly named.  It always takes
>>      the spinlock and backs off if the blocking lock is held.  This
>>      can lead to surprising lockups because people expect it to really be a
>>      trylock.
>>
>>      This commit makes it a pure trylock, both for the spinlock and the
>>      blocking lock.  It also reworks the nested lock handling slightly to
>>      avoid taking the read lock while a spinning write lock might be held.
>>
>>      Signed-off-by: Chris Mason<clm@fb.com>
>
> I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed:
>
> INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
>        Tainted: G            E 3.16.0-rc1 #5
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
>   ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
>   0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
>   ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff
> Call Trace:
>   [<ffffffff81592de9>] schedule+0x29/0x70
>   [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
>   [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
>   [<ffffffff8106d595>] ? insert_work+0x65/0xb0
>   [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
>   [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
>   [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
>   [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
>   [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
>   [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
>   [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
>   [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
>   [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
>   [<ffffffff8107544e>] kthread+0xce/0xf0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>
> It looks like some more work may still be needed. Or it could be a problem in my system configuration.
>

Umm, after applying Chris's patch to my environment, xfstests ran
completely and the above messages were not output. (Are above messages
another bug?)

Thanks,
Tsutomu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Waiman Long June 21, 2014, 1:09 a.m. UTC | #3
Hi,

It may be caused by an corrupted btrfs filesystem due to the repeating hanging during the test. As long as your guys don't any problem, I am happy with the patch.

Thank,
Longman


-----Original Message-----
From: Tsutomu Itoh [mailto:t-itoh@jp.fujitsu.com] 
Sent: Thursday, June 19, 2014 11:21 PM
To: Chris Mason
Cc: Long, Wai Man; Marc Dionne; Josef Bacik; linux-btrfs@vger.kernel.org
Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected

On 2014/06/20 8:21, Waiman Long wrote:
> On 06/19/2014 05:50 PM, Chris Mason wrote:
>>>>
>>>> I would like to take back my comments. I took out the read_lock, 
>>>> but the process still hang while doing file activities on btrfs 
>>>> filesystem. So the problem is trickier than I thought. Below are 
>>>> the stack backtraces of some of the relevant processes.
>>>>
>>> You weren't wrong, but it was also the tree trylock code.  Our 
>>> trylocks only back off if the blocking lock is held.  
>>> btrfs_next_leaf needs it to be a true trylock.  The confusing part 
>>> is this hasn't really changed, but one of the callers must be a spinner where we used to have a blocker.
>> This is what I have queued up, it's working here.
>>
>> -chris
>>
>> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e
>> Author: Chris Mason<clm@fb.com>
>> Date:   Thu Jun 19 14:16:52 2014 -0700
>>
>>      Btrfs: fix deadlocks with trylock on tree nodes
>>
>>      The Btrfs tree trylock function is poorly named.  It always takes
>>      the spinlock and backs off if the blocking lock is held.  This
>>      can lead to surprising lockups because people expect it to really be a
>>      trylock.
>>
>>      This commit makes it a pure trylock, both for the spinlock and the
>>      blocking lock.  It also reworks the nested lock handling slightly to
>>      avoid taking the read lock while a spinning write lock might be held.
>>
>>      Signed-off-by: Chris Mason<clm@fb.com>
>
> I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed:
>
> INFO: task btrfs-transacti:16576 blocked for more than 120 seconds.
>        Tainted: G            E 3.16.0-rc1 #5
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> btrfs-transacti D 000000000000000f     0 16576      2 0x00000000
>   ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010
>   0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310
>   ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff 
> Call Trace:
>   [<ffffffff81592de9>] schedule+0x29/0x70
>   [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0
>   [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30
>   [<ffffffff8106d595>] ? insert_work+0x65/0xb0
>   [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100
>   [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220
>   [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs]
>   [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs]
>   [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs]
>   [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs]
>   [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70
>   [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs]
>   [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs]
>   [<ffffffff8107544e>] kthread+0xce/0xf0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70
>
> It looks like some more work may still be needed. Or it could be a problem in my system configuration.
>

Umm, after applying Chris's patch to my environment, xfstests ran completely and the above messages were not output. (Are above messages another bug?)

Thanks,
Tsutomu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c
index 01277b8..5665d21 100644
--- a/fs/btrfs/locking.c
+++ b/fs/btrfs/locking.c
@@ -33,14 +33,14 @@  static void btrfs_assert_tree_read_locked(struct extent_buffer *eb);
  */
 void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
 	if (rw == BTRFS_WRITE_LOCK) {
 		if (atomic_read(&eb->blocking_writers) == 0) {
 			WARN_ON(atomic_read(&eb->spinning_writers) != 1);
@@ -65,14 +65,15 @@  void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw)
  */
 void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
-	}
+	/*
+	 * no lock is required.  The lock owner may change if
+	 * we have a read lock, but it won't change to or away
+	 * from us.  If we have the write lock, we are the owner
+	 * and it'll never change.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner)
+		return;
+
 	if (rw == BTRFS_WRITE_LOCK_BLOCKING) {
 		BUG_ON(atomic_read(&eb->blocking_writers) != 1);
 		write_lock(&eb->lock);
@@ -99,6 +100,9 @@  void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw)
 void btrfs_tree_read_lock(struct extent_buffer *eb)
 {
 again:
+	BUG_ON(!atomic_read(&eb->blocking_writers) &&
+	       current->pid == eb->lock_owner);
+
 	read_lock(&eb->lock);
 	if (atomic_read(&eb->blocking_writers) &&
 	    current->pid == eb->lock_owner) {
@@ -132,7 +136,9 @@  int btrfs_try_tree_read_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers))
 		return 0;
 
-	read_lock(&eb->lock);
+	if (!read_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers)) {
 		read_unlock(&eb->lock);
 		return 0;
@@ -151,7 +157,10 @@  int btrfs_try_tree_write_lock(struct extent_buffer *eb)
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers))
 		return 0;
-	write_lock(&eb->lock);
+
+	if (!write_trylock(&eb->lock))
+		return 0;
+
 	if (atomic_read(&eb->blocking_writers) ||
 	    atomic_read(&eb->blocking_readers)) {
 		write_unlock(&eb->lock);
@@ -168,14 +177,15 @@  int btrfs_try_tree_write_lock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->spinning_readers) == 0);
@@ -189,14 +199,15 @@  void btrfs_tree_read_unlock(struct extent_buffer *eb)
  */
 void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb)
 {
-	if (eb->lock_nested) {
-		read_lock(&eb->lock);
-		if (eb->lock_nested && current->pid == eb->lock_owner) {
-			eb->lock_nested = 0;
-			read_unlock(&eb->lock);
-			return;
-		}
-		read_unlock(&eb->lock);
+	/*
+	 * if we're nested, we have the write lock.  No new locking
+	 * is needed as long as we are the lock owner.
+	 * The write unlock will do a barrier for us, and the lock_nested
+	 * field only matters to the lock owner.
+	 */
+	if (eb->lock_nested && current->pid == eb->lock_owner) {
+		eb->lock_nested = 0;
+		return;
 	}
 	btrfs_assert_tree_read_locked(eb);
 	WARN_ON(atomic_read(&eb->blocking_readers) == 0);
@@ -244,6 +255,7 @@  void btrfs_tree_unlock(struct extent_buffer *eb)
 	BUG_ON(blockers > 1);
 
 	btrfs_assert_tree_locked(eb);
+	eb->lock_owner = 0;
 	atomic_dec(&eb->write_locks);
 
 	if (blockers) {