Message ID | 53A35B31.4060203@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/19/2014 05:50 PM, Chris Mason wrote: >>> >>> I would like to take back my comments. I took out the read_lock, but the >>> process still hang while doing file activities on btrfs filesystem. So >>> the problem is trickier than I thought. Below are the stack backtraces >>> of some of the relevant processes. >>> >> You weren't wrong, but it was also the tree trylock code. Our trylocks >> only back off if the blocking lock is held. btrfs_next_leaf needs it to >> be a true trylock. The confusing part is this hasn't really changed, >> but one of the callers must be a spinner where we used to have a blocker. > This is what I have queued up, it's working here. > > -chris > > commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e > Author: Chris Mason<clm@fb.com> > Date: Thu Jun 19 14:16:52 2014 -0700 > > Btrfs: fix deadlocks with trylock on tree nodes > > The Btrfs tree trylock function is poorly named. It always takes > the spinlock and backs off if the blocking lock is held. This > can lead to surprising lockups because people expect it to really be a > trylock. > > This commit makes it a pure trylock, both for the spinlock and the > blocking lock. It also reworks the nested lock handling slightly to > avoid taking the read lock while a spinning write lock might be held. > > Signed-off-by: Chris Mason<clm@fb.com> I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed: INFO: task btrfs-transacti:16576 blocked for more than 120 seconds. Tainted: G E 3.16.0-rc1 #5 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. btrfs-transacti D 000000000000000f 0 16576 2 0x00000000 ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010 0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310 ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff Call Trace: [<ffffffff81592de9>] schedule+0x29/0x70 [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0 [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30 [<ffffffff8106d595>] ? insert_work+0x65/0xb0 [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100 [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220 [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs] [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs] [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs] [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs] [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70 [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs] [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs] [<ffffffff8107544e>] kthread+0xce/0xf0 [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0 [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 It looks like some more work may still be needed. Or it could be a problem in my system configuration. -Longman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2014/06/20 8:21, Waiman Long wrote: > On 06/19/2014 05:50 PM, Chris Mason wrote: >>>> >>>> I would like to take back my comments. I took out the read_lock, but the >>>> process still hang while doing file activities on btrfs filesystem. So >>>> the problem is trickier than I thought. Below are the stack backtraces >>>> of some of the relevant processes. >>>> >>> You weren't wrong, but it was also the tree trylock code. Our trylocks >>> only back off if the blocking lock is held. btrfs_next_leaf needs it to >>> be a true trylock. The confusing part is this hasn't really changed, >>> but one of the callers must be a spinner where we used to have a blocker. >> This is what I have queued up, it's working here. >> >> -chris >> >> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e >> Author: Chris Mason<clm@fb.com> >> Date: Thu Jun 19 14:16:52 2014 -0700 >> >> Btrfs: fix deadlocks with trylock on tree nodes >> >> The Btrfs tree trylock function is poorly named. It always takes >> the spinlock and backs off if the blocking lock is held. This >> can lead to surprising lockups because people expect it to really be a >> trylock. >> >> This commit makes it a pure trylock, both for the spinlock and the >> blocking lock. It also reworks the nested lock handling slightly to >> avoid taking the read lock while a spinning write lock might be held. >> >> Signed-off-by: Chris Mason<clm@fb.com> > > I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed: > > INFO: task btrfs-transacti:16576 blocked for more than 120 seconds. > Tainted: G E 3.16.0-rc1 #5 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > btrfs-transacti D 000000000000000f 0 16576 2 0x00000000 > ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010 > 0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310 > ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff > Call Trace: > [<ffffffff81592de9>] schedule+0x29/0x70 > [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0 > [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30 > [<ffffffff8106d595>] ? insert_work+0x65/0xb0 > [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100 > [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220 > [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs] > [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs] > [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs] > [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs] > [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70 > [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs] > [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs] > [<ffffffff8107544e>] kthread+0xce/0xf0 > [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 > [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0 > [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 > > It looks like some more work may still be needed. Or it could be a problem in my system configuration. > Umm, after applying Chris's patch to my environment, xfstests ran completely and the above messages were not output. (Are above messages another bug?) Thanks, Tsutomu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, It may be caused by an corrupted btrfs filesystem due to the repeating hanging during the test. As long as your guys don't any problem, I am happy with the patch. Thank, Longman -----Original Message----- From: Tsutomu Itoh [mailto:t-itoh@jp.fujitsu.com] Sent: Thursday, June 19, 2014 11:21 PM To: Chris Mason Cc: Long, Wai Man; Marc Dionne; Josef Bacik; linux-btrfs@vger.kernel.org Subject: Re: Lockups with btrfs on 3.16-rc1 - bisected On 2014/06/20 8:21, Waiman Long wrote: > On 06/19/2014 05:50 PM, Chris Mason wrote: >>>> >>>> I would like to take back my comments. I took out the read_lock, >>>> but the process still hang while doing file activities on btrfs >>>> filesystem. So the problem is trickier than I thought. Below are >>>> the stack backtraces of some of the relevant processes. >>>> >>> You weren't wrong, but it was also the tree trylock code. Our >>> trylocks only back off if the blocking lock is held. >>> btrfs_next_leaf needs it to be a true trylock. The confusing part >>> is this hasn't really changed, but one of the callers must be a spinner where we used to have a blocker. >> This is what I have queued up, it's working here. >> >> -chris >> >> commit ea4ebde02e08558b020c4b61bb9a4c0fcf63028e >> Author: Chris Mason<clm@fb.com> >> Date: Thu Jun 19 14:16:52 2014 -0700 >> >> Btrfs: fix deadlocks with trylock on tree nodes >> >> The Btrfs tree trylock function is poorly named. It always takes >> the spinlock and backs off if the blocking lock is held. This >> can lead to surprising lockups because people expect it to really be a >> trylock. >> >> This commit makes it a pure trylock, both for the spinlock and the >> blocking lock. It also reworks the nested lock handling slightly to >> avoid taking the read lock while a spinning write lock might be held. >> >> Signed-off-by: Chris Mason<clm@fb.com> > > I didn't realize that those non-blocking lock functions are really trylocks. Yes, the patch did seem to fix the hanging problem that I saw when I just untar the kernel source files into a btrfs filesystem. However, when I tried did a kernel build on a 24-thread (-j 24) system, the build process hanged after a while. The following kind of stack trace messages were printed: > > INFO: task btrfs-transacti:16576 blocked for more than 120 seconds. > Tainted: G E 3.16.0-rc1 #5 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > btrfs-transacti D 000000000000000f 0 16576 2 0x00000000 > ffff88080eabbbf8 0000000000000046 ffff880803b98350 ffff88080eab8010 > 0000000000012b80 0000000000012b80 ffff880805ed8f10 ffff88080d162310 > ffff88080eabbce8 ffff8807be170880 ffff8807be170888 7fffffffffffffff > Call Trace: > [<ffffffff81592de9>] schedule+0x29/0x70 > [<ffffffff815920bd>] schedule_timeout+0x13d/0x1d0 > [<ffffffff8106b474>] ? wake_up_worker+0x24/0x30 > [<ffffffff8106d595>] ? insert_work+0x65/0xb0 > [<ffffffff81593cc6>] wait_for_completion+0xc6/0x100 > [<ffffffff810868d0>] ? try_to_wake_up+0x220/0x220 > [<ffffffffa06bb9ba>] btrfs_wait_and_free_delalloc_work+0x1a/0x30 [btrfs] > [<ffffffffa06d458d>] btrfs_run_ordered_operations+0x1dd/0x2c0 [btrfs] > [<ffffffffa06b7fd5>] btrfs_flush_all_pending_stuffs+0x35/0x40 [btrfs] > [<ffffffffa06ba099>] btrfs_commit_transaction+0x229/0xa30 [btrfs] > [<ffffffff8105ef30>] ? lock_timer_base+0x70/0x70 > [<ffffffffa06b51db>] transaction_kthread+0x1eb/0x270 [btrfs] > [<ffffffffa06b4ff0>] ? close_ctree+0x2d0/0x2d0 [btrfs] > [<ffffffff8107544e>] kthread+0xce/0xf0 > [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 > [<ffffffff8159636c>] ret_from_fork+0x7c/0xb0 > [<ffffffff81075380>] ? kthread_freezable_should_stop+0x70/0x70 > > It looks like some more work may still be needed. Or it could be a problem in my system configuration. > Umm, after applying Chris's patch to my environment, xfstests ran completely and the above messages were not output. (Are above messages another bug?) Thanks, Tsutomu -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/locking.c b/fs/btrfs/locking.c index 01277b8..5665d21 100644 --- a/fs/btrfs/locking.c +++ b/fs/btrfs/locking.c @@ -33,14 +33,14 @@ static void btrfs_assert_tree_read_locked(struct extent_buffer *eb); */ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); - } + /* + * no lock is required. The lock owner may change if + * we have a read lock, but it won't change to or away + * from us. If we have the write lock, we are the owner + * and it'll never change. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) + return; if (rw == BTRFS_WRITE_LOCK) { if (atomic_read(&eb->blocking_writers) == 0) { WARN_ON(atomic_read(&eb->spinning_writers) != 1); @@ -65,14 +65,15 @@ void btrfs_set_lock_blocking_rw(struct extent_buffer *eb, int rw) */ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); - } + /* + * no lock is required. The lock owner may change if + * we have a read lock, but it won't change to or away + * from us. If we have the write lock, we are the owner + * and it'll never change. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) + return; + if (rw == BTRFS_WRITE_LOCK_BLOCKING) { BUG_ON(atomic_read(&eb->blocking_writers) != 1); write_lock(&eb->lock); @@ -99,6 +100,9 @@ void btrfs_clear_lock_blocking_rw(struct extent_buffer *eb, int rw) void btrfs_tree_read_lock(struct extent_buffer *eb) { again: + BUG_ON(!atomic_read(&eb->blocking_writers) && + current->pid == eb->lock_owner); + read_lock(&eb->lock); if (atomic_read(&eb->blocking_writers) && current->pid == eb->lock_owner) { @@ -132,7 +136,9 @@ int btrfs_try_tree_read_lock(struct extent_buffer *eb) if (atomic_read(&eb->blocking_writers)) return 0; - read_lock(&eb->lock); + if (!read_trylock(&eb->lock)) + return 0; + if (atomic_read(&eb->blocking_writers)) { read_unlock(&eb->lock); return 0; @@ -151,7 +157,10 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb) if (atomic_read(&eb->blocking_writers) || atomic_read(&eb->blocking_readers)) return 0; - write_lock(&eb->lock); + + if (!write_trylock(&eb->lock)) + return 0; + if (atomic_read(&eb->blocking_writers) || atomic_read(&eb->blocking_readers)) { write_unlock(&eb->lock); @@ -168,14 +177,15 @@ int btrfs_try_tree_write_lock(struct extent_buffer *eb) */ void btrfs_tree_read_unlock(struct extent_buffer *eb) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - eb->lock_nested = 0; - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); + /* + * if we're nested, we have the write lock. No new locking + * is needed as long as we are the lock owner. + * The write unlock will do a barrier for us, and the lock_nested + * field only matters to the lock owner. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) { + eb->lock_nested = 0; + return; } btrfs_assert_tree_read_locked(eb); WARN_ON(atomic_read(&eb->spinning_readers) == 0); @@ -189,14 +199,15 @@ void btrfs_tree_read_unlock(struct extent_buffer *eb) */ void btrfs_tree_read_unlock_blocking(struct extent_buffer *eb) { - if (eb->lock_nested) { - read_lock(&eb->lock); - if (eb->lock_nested && current->pid == eb->lock_owner) { - eb->lock_nested = 0; - read_unlock(&eb->lock); - return; - } - read_unlock(&eb->lock); + /* + * if we're nested, we have the write lock. No new locking + * is needed as long as we are the lock owner. + * The write unlock will do a barrier for us, and the lock_nested + * field only matters to the lock owner. + */ + if (eb->lock_nested && current->pid == eb->lock_owner) { + eb->lock_nested = 0; + return; } btrfs_assert_tree_read_locked(eb); WARN_ON(atomic_read(&eb->blocking_readers) == 0); @@ -244,6 +255,7 @@ void btrfs_tree_unlock(struct extent_buffer *eb) BUG_ON(blockers > 1); btrfs_assert_tree_locked(eb); + eb->lock_owner = 0; atomic_dec(&eb->write_locks); if (blockers) {