diff mbox

[1/2] btrfs: reada: limit max works count

Message ID 9d3a1b584cec0081382f832ab0a7f9b31b1d9798.1452584763.git.zhaolei@cn.fujitsu.com (mailing list archive)
State Superseded
Headers show

Commit Message

Zhaolei Jan. 12, 2016, 7:46 a.m. UTC
reada create 2 works for each level of tree in recursion.

In case of a tree having many levels, the number of created works
is 2^level_of_tree.
Actually we don't need so many works in parallel, this patch limit
max works to BTRFS_MAX_MIRRORS * 2.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
---
 fs/btrfs/reada.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

Comments

Chris Mason Jan. 20, 2016, 3:16 p.m. UTC | #1
On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> reada create 2 works for each level of tree in recursion.
> 
> In case of a tree having many levels, the number of created works
> is 2^level_of_tree.
> Actually we don't need so many works in parallel, this patch limit
> max works to BTRFS_MAX_MIRRORS * 2.

Hi,

I don't think you end up calling atomic_dec() for every time that
reada_start_machine() is called.  Also, I'd rather not have a global
static variable to limit the parallel workers, when we have more than
one FS mounted it'll end up limiting things too much.

With this patch applied, I'm seeing deadlocks during btrfs/066.    You
have to run the scrub tests as well, basically we're just getting
fsstress run alongside scrub.

I'll run a few more times with it reverted to make sure, but I think
it's the root cause.

-----
stack summary

6 hits: 
[<ffffffff813ec92a>] wait_current_trans+0xca/0x140
[<ffffffff813ee248>] start_transaction+0x278/0x5b0
[<ffffffff813eea97>] btrfs_attach_transaction_barrier+0x27/0x60
[<ffffffff813b4835>] btrfs_sync_fs+0x85/0x1d0
[<ffffffff8122bcf0>] sync_fs_one_sb+0x20/0x30
[<ffffffff811f579f>] iterate_supers+0xaf/0xe0
[<ffffffff8122c1e5>] sys_sync+0x55/0x90
[<ffffffff819c00c7>] tracesys_phase2+0x84/0x89
[<ffffffffffffffff>] 0xffffffffffffffff

-----
1 hit:
[<ffffffff813ec92a>] wait_current_trans+0xca/0x140
[<ffffffff813ee248>] start_transaction+0x278/0x5b0
[<ffffffff813ee597>] btrfs_attach_transaction+0x17/0x20
[<ffffffff813e6b27>] transaction_kthread+0x1b7/0x290
[<ffffffff81082e09>] kthread+0xe9/0x110
[<ffffffff819c02ff>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff

-----
[<ffffffff814506cf>] btrfs_scrub_pause+0xdf/0x150
[<ffffffff813ed2f4>] btrfs_commit_transaction+0x3b4/0xc70
[<ffffffff81424724>] create_subvol+0x504/0x8d0
[<ffffffff81424c63>] btrfs_mksubvol+0x173/0x510
[<ffffffff8142511e>] btrfs_ioctl_snap_create_transid+0x11e/0x1a0
[<ffffffff814251fe>] btrfs_ioctl_snap_create+0x5e/0x80
[<ffffffff8142dbbb>] btrfs_ioctl+0xc6b/0x1190
[<ffffffff8120624a>] do_vfs_ioctl+0x8a/0x560
[<ffffffff812067b2>] SyS_ioctl+0x92/0xa0
[<ffffffff819c00c7>] tracesys_phase2+0x84/0x89
[<ffffffffffffffff>] 0xffffffffffffffff

-----
[<ffffffff81458d36>] btrfs_reada_wait+0x86/0xf0
[<ffffffff81456dc4>] scrub_stripe+0x274/0x1180
[<ffffffff81457de9>] scrub_chunk+0x119/0x160
[<ffffffff814581b7>] scrub_enumerate_chunks+0x387/0x730
[<ffffffff81458740>] btrfs_scrub_dev+0x1e0/0x620
[<ffffffff8142b7d1>] btrfs_ioctl_scrub+0xb1/0x120
[<ffffffff8142d970>] btrfs_ioctl+0xa20/0x1190
[<ffffffff8120624a>] do_vfs_ioctl+0x8a/0x560
[<ffffffff812067b2>] SyS_ioctl+0x92/0xa0
[<ffffffff819c00c7>] tracesys_phase2+0x84/0x89
[<ffffffffffffffff>] 0xffffffffffffffff

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 20, 2016, 5:48 p.m. UTC | #2
On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > reada create 2 works for each level of tree in recursion.
> > 
> > In case of a tree having many levels, the number of created works
> > is 2^level_of_tree.
> > Actually we don't need so many works in parallel, this patch limit
> > max works to BTRFS_MAX_MIRRORS * 2.
> 
> Hi,
> 
> I don't think you end up calling atomic_dec() for every time that
> reada_start_machine() is called.  Also, I'd rather not have a global
> static variable to limit the parallel workers, when we have more than
> one FS mounted it'll end up limiting things too much.
> 
> With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> have to run the scrub tests as well, basically we're just getting
> fsstress run alongside scrub.
> 
> I'll run a few more times with it reverted to make sure, but I think
> it's the root cause.

I spoke too soon, it ended up deadlocking a few tests later.  Sorry for
now I'm pulling all the reada patches.  We'll sort out bug fixes vs
cleanups in later rcs.

With all of the reada patches removed, the deadlocks are gone.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Jan. 21, 2016, 3:36 a.m. UTC | #3
> -----Original Message-----
> From: Chris Mason [mailto:clm@fb.com]
> Sent: Thursday, January 21, 2016 1:48 AM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> 
> On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > reada create 2 works for each level of tree in recursion.
> > >
> > > In case of a tree having many levels, the number of created works is
> > > 2^level_of_tree.
> > > Actually we don't need so many works in parallel, this patch limit
> > > max works to BTRFS_MAX_MIRRORS * 2.
> >
> > Hi,
> >
> > I don't think you end up calling atomic_dec() for every time that
> > reada_start_machine() is called.  Also, I'd rather not have a global
> > static variable to limit the parallel workers, when we have more than
> > one FS mounted it'll end up limiting things too much.
> >
> > With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> > have to run the scrub tests as well, basically we're just getting
> > fsstress run alongside scrub.
> >
> > I'll run a few more times with it reverted to make sure, but I think
> > it's the root cause.
> 
> I spoke too soon, it ended up deadlocking a few tests later.  Sorry for now I'm
> pulling all the reada patches.  We'll sort out bug fixes vs cleanups in later rcs.
> 
> With all of the reada patches removed, the deadlocks are gone.
> 
Sorry for hear it.

Actually I run xfstests with all patch applied, and see no regression in my env:

FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 lenovo 4.4.0-rc6_HEAD_8e16378041f7f3531c256fd3e17a36a4fca92d29_+
MKFS_OPTIONS  -- /dev/sdb6
MOUNT_OPTIONS -- /dev/sdb6 /var/ltf/tester/scratch_mnt

btrfs/066 151s ... 164s
Ran: btrfs/066
Passed all 1 tests

I'll investigate the root reason.

Thanks
Zhaolei

> -chris




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Jan. 21, 2016, 10:06 a.m. UTC | #4
Hi, Chris Mason

> -----Original Message-----
> From: Chris Mason [mailto:clm@fb.com]
> Sent: Thursday, January 21, 2016 1:48 AM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>; linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> 
> On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > reada create 2 works for each level of tree in recursion.
> > >
> > > In case of a tree having many levels, the number of created works is
> > > 2^level_of_tree.
> > > Actually we don't need so many works in parallel, this patch limit
> > > max works to BTRFS_MAX_MIRRORS * 2.
> >
> > Hi,
> >
> > I don't think you end up calling atomic_dec() for every time that
> > reada_start_machine() is called.  Also, I'd rather not have a global
> > static variable to limit the parallel workers, when we have more than
> > one FS mounted it'll end up limiting things too much.
> >
> > With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> > have to run the scrub tests as well, basically we're just getting
> > fsstress run alongside scrub.
> >
> > I'll run a few more times with it reverted to make sure, but I think
> > it's the root cause.
> 
> I spoke too soon, it ended up deadlocking a few tests later.
>
In logic, even if the calculation of atomic_dec() in this patch having bug,
in worst condition, reada will works in single-thread mode, and will not
introduce deadlock.

And by looking the backtrace in this mail, maybe it is caused by
reada_control->elems in someplace of this patchset.

I recheck xfstests/066 in both vm and physical machine, on top of my pull-request
git today, with btrfs-progs 4.4 for many times, but had not triggered the bug.

Could you tell me your test environment(TEST_DEV size, mount option),
and odds of fails in btrfs/066?

Thanks
Zhaolei

> Sorry for now I'm pulling all the reada patches.  We'll sort out bug fixes vs cleanups in later rcs.
> 
> With all of the reada patches removed, the deadlocks are gone.
> 
> -chris




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 21, 2016, 2:14 p.m. UTC | #5
On Thu, Jan 21, 2016 at 06:06:21PM +0800, Zhao Lei wrote:
> Hi, Chris Mason
> 
> > -----Original Message-----
> > From: Chris Mason [mailto:clm@fb.com]
> > Sent: Thursday, January 21, 2016 1:48 AM
> > To: Zhao Lei <zhaolei@cn.fujitsu.com>; linux-btrfs@vger.kernel.org
> > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > 
> > On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > > reada create 2 works for each level of tree in recursion.
> > > >
> > > > In case of a tree having many levels, the number of created works is
> > > > 2^level_of_tree.
> > > > Actually we don't need so many works in parallel, this patch limit
> > > > max works to BTRFS_MAX_MIRRORS * 2.
> > >
> > > Hi,
> > >
> > > I don't think you end up calling atomic_dec() for every time that
> > > reada_start_machine() is called.  Also, I'd rather not have a global
> > > static variable to limit the parallel workers, when we have more than
> > > one FS mounted it'll end up limiting things too much.
> > >
> > > With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> > > have to run the scrub tests as well, basically we're just getting
> > > fsstress run alongside scrub.
> > >
> > > I'll run a few more times with it reverted to make sure, but I think
> > > it's the root cause.
> > 
> > I spoke too soon, it ended up deadlocking a few tests later.
> >
> In logic, even if the calculation of atomic_dec() in this patch having bug,
> in worst condition, reada will works in single-thread mode, and will not
> introduce deadlock.
> 
> And by looking the backtrace in this mail, maybe it is caused by
> reada_control->elems in someplace of this patchset.
> 
> I recheck xfstests/066 in both vm and physical machine, on top of my pull-request
> git today, with btrfs-progs 4.4 for many times, but had not triggered the bug.

Just running 066 alone doesn't trigger it for me.  I have to run
everything from 00->066.

My setup is 5 drives.  I use a script to carve them up into logical
volumes, 5 for the test device and 5 for the scratch pool.  I think it
should reproduce with a single drive, if you still can't trigger I'll
confirm that.

> 
> Could you tell me your test environment(TEST_DEV size, mount option),
> and odds of fails in btrfs/066?

100% odds of failing, one time it made it up to btrfs/072.  I think more
important than the drive setup is that I have all the debugging on.
CONFIG_DEBUG_PAGEALLOC, spinlock debugging, mutex debugging and lock dep
enabled.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Jan. 22, 2016, 12:25 p.m. UTC | #6
Hi, Chris Mason

> -----Original Message-----
> From: Chris Mason [mailto:clm@fb.com]
> Sent: Thursday, January 21, 2016 10:15 PM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> 
> On Thu, Jan 21, 2016 at 06:06:21PM +0800, Zhao Lei wrote:
> > Hi, Chris Mason
> >
> > > -----Original Message-----
> > > From: Chris Mason [mailto:clm@fb.com]
> > > Sent: Thursday, January 21, 2016 1:48 AM
> > > To: Zhao Lei <zhaolei@cn.fujitsu.com>; linux-btrfs@vger.kernel.org
> > > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > >
> > > On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > > > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > > > reada create 2 works for each level of tree in recursion.
> > > > >
> > > > > In case of a tree having many levels, the number of created
> > > > > works is 2^level_of_tree.
> > > > > Actually we don't need so many works in parallel, this patch
> > > > > limit max works to BTRFS_MAX_MIRRORS * 2.
> > > >
> > > > Hi,
> > > >
> > > > I don't think you end up calling atomic_dec() for every time that
> > > > reada_start_machine() is called.  Also, I'd rather not have a
> > > > global static variable to limit the parallel workers, when we have
> > > > more than one FS mounted it'll end up limiting things too much.
> > > >
> > > > With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> > > > have to run the scrub tests as well, basically we're just getting
> > > > fsstress run alongside scrub.
> > > >
> > > > I'll run a few more times with it reverted to make sure, but I
> > > > think it's the root cause.
> > >
> > > I spoke too soon, it ended up deadlocking a few tests later.
> > >
> > In logic, even if the calculation of atomic_dec() in this patch having
> > bug, in worst condition, reada will works in single-thread mode, and
> > will not introduce deadlock.
> >
> > And by looking the backtrace in this mail, maybe it is caused by
> > reada_control->elems in someplace of this patchset.
> >
> > I recheck xfstests/066 in both vm and physical machine, on top of my
> > pull-request git today, with btrfs-progs 4.4 for many times, but had not
> triggered the bug.
> 
> Just running 066 alone doesn't trigger it for me.  I have to run everything from
> 00->066.
> 
> My setup is 5 drives.  I use a script to carve them up into logical volumes, 5 for
> the test device and 5 for the scratch pool.  I think it should reproduce with a
> single drive, if you still can't trigger I'll confirm that.
> 
> >
> > Could you tell me your test environment(TEST_DEV size, mount option),
> > and odds of fails in btrfs/066?
> 
> 100% odds of failing, one time it made it up to btrfs/072.  I think more
> important than the drive setup is that I have all the debugging on.
> CONFIG_DEBUG_PAGEALLOC, spinlock debugging, mutex debugging and lock
> dep enabled.
> 
Thanks for your answer.

But unfortunately I hadn't reproduce the dead_lock in above way today...
Now I queued loop of above reproduce script in more nodes, and hopes
it can happen in this weekend.

And by reviewing code, I found a problem which can introduce similar bad result
in logic, and made a patch for it.
[PATCH] [RFC] btrfs: reada: avoid undone reada extents in btrfs_reada_wait

Because it is only a problem in logic, but rarely happened, I only confirmed
no-problem after patch applied.

Sorry for increased your works, could you apply this patch and test is it
works?

Thanks
Zhaolei

> -chris




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 22, 2016, 2:19 p.m. UTC | #7
On Fri, Jan 22, 2016 at 08:25:56PM +0800, Zhao Lei wrote:
> Hi, Chris Mason
> 
> > -----Original Message-----
> > From: Chris Mason [mailto:clm@fb.com]
> > Sent: Thursday, January 21, 2016 10:15 PM
> > To: Zhao Lei <zhaolei@cn.fujitsu.com>
> > Cc: linux-btrfs@vger.kernel.org
> > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > 
> > On Thu, Jan 21, 2016 at 06:06:21PM +0800, Zhao Lei wrote:
> > > Hi, Chris Mason
> > >
> > > > -----Original Message-----
> > > > From: Chris Mason [mailto:clm@fb.com]
> > > > Sent: Thursday, January 21, 2016 1:48 AM
> > > > To: Zhao Lei <zhaolei@cn.fujitsu.com>; linux-btrfs@vger.kernel.org
> > > > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > > >
> > > > On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > > > > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > > > > reada create 2 works for each level of tree in recursion.
> > > > > >
> > > > > > In case of a tree having many levels, the number of created
> > > > > > works is 2^level_of_tree.
> > > > > > Actually we don't need so many works in parallel, this patch
> > > > > > limit max works to BTRFS_MAX_MIRRORS * 2.
> > > > >
> > > > > Hi,
> > > > >
> > > > > I don't think you end up calling atomic_dec() for every time that
> > > > > reada_start_machine() is called.  Also, I'd rather not have a
> > > > > global static variable to limit the parallel workers, when we have
> > > > > more than one FS mounted it'll end up limiting things too much.
> > > > >
> > > > > With this patch applied, I'm seeing deadlocks during btrfs/066.    You
> > > > > have to run the scrub tests as well, basically we're just getting
> > > > > fsstress run alongside scrub.
> > > > >
> > > > > I'll run a few more times with it reverted to make sure, but I
> > > > > think it's the root cause.
> > > >
> > > > I spoke too soon, it ended up deadlocking a few tests later.
> > > >
> > > In logic, even if the calculation of atomic_dec() in this patch having
> > > bug, in worst condition, reada will works in single-thread mode, and
> > > will not introduce deadlock.
> > >
> > > And by looking the backtrace in this mail, maybe it is caused by
> > > reada_control->elems in someplace of this patchset.
> > >
> > > I recheck xfstests/066 in both vm and physical machine, on top of my
> > > pull-request git today, with btrfs-progs 4.4 for many times, but had not
> > triggered the bug.
> > 
> > Just running 066 alone doesn't trigger it for me.  I have to run everything from
> > 00->066.
> > 
> > My setup is 5 drives.  I use a script to carve them up into logical volumes, 5 for
> > the test device and 5 for the scratch pool.  I think it should reproduce with a
> > single drive, if you still can't trigger I'll confirm that.
> > 
> > >
> > > Could you tell me your test environment(TEST_DEV size, mount option),
> > > and odds of fails in btrfs/066?
> > 
> > 100% odds of failing, one time it made it up to btrfs/072.  I think more
> > important than the drive setup is that I have all the debugging on.
> > CONFIG_DEBUG_PAGEALLOC, spinlock debugging, mutex debugging and lock
> > dep enabled.
> > 
> Thanks for your answer.
> 
> But unfortunately I hadn't reproduce the dead_lock in above way today...
> Now I queued loop of above reproduce script in more nodes, and hopes
> it can happen in this weekend.
> 
> And by reviewing code, I found a problem which can introduce similar bad result
> in logic, and made a patch for it.
> [PATCH] [RFC] btrfs: reada: avoid undone reada extents in btrfs_reada_wait
> 
> Because it is only a problem in logic, but rarely happened, I only confirmed
> no-problem after patch applied.
> 
> Sorry for increased your works, could you apply this patch and test is it
> works?

No problem, I'll try the patch and see if I can get a more reliable way
to reproduce if it doesn't fix things.  Thanks!

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Jan. 26, 2016, 9:08 a.m. UTC | #8
Hi, Chris Mason

> -----Original Message-----
> From: Chris Mason [mailto:clm@fb.com]
> Sent: Friday, January 22, 2016 10:19 PM
> To: Zhao Lei <zhaolei@cn.fujitsu.com>
> Cc: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> 
> On Fri, Jan 22, 2016 at 08:25:56PM +0800, Zhao Lei wrote:
> > Hi, Chris Mason
> >
> > > -----Original Message-----
> > > From: Chris Mason [mailto:clm@fb.com]
> > > Sent: Thursday, January 21, 2016 10:15 PM
> > > To: Zhao Lei <zhaolei@cn.fujitsu.com>
> > > Cc: linux-btrfs@vger.kernel.org
> > > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > >
> > > On Thu, Jan 21, 2016 at 06:06:21PM +0800, Zhao Lei wrote:
> > > > Hi, Chris Mason
> > > >
> > > > > -----Original Message-----
> > > > > From: Chris Mason [mailto:clm@fb.com]
> > > > > Sent: Thursday, January 21, 2016 1:48 AM
> > > > > To: Zhao Lei <zhaolei@cn.fujitsu.com>;
> > > > > linux-btrfs@vger.kernel.org
> > > > > Subject: Re: [PATCH 1/2] btrfs: reada: limit max works count
> > > > >
> > > > > On Wed, Jan 20, 2016 at 10:16:27AM -0500, Chris Mason wrote:
> > > > > > On Tue, Jan 12, 2016 at 03:46:26PM +0800, Zhao Lei wrote:
> > > > > > > reada create 2 works for each level of tree in recursion.
> > > > > > >
> > > > > > > In case of a tree having many levels, the number of created
> > > > > > > works is 2^level_of_tree.
> > > > > > > Actually we don't need so many works in parallel, this patch
> > > > > > > limit max works to BTRFS_MAX_MIRRORS * 2.
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I don't think you end up calling atomic_dec() for every time
> > > > > > that
> > > > > > reada_start_machine() is called.  Also, I'd rather not have a
> > > > > > global static variable to limit the parallel workers, when we
> > > > > > have more than one FS mounted it'll end up limiting things too much.
> > > > > >
> > > > > > With this patch applied, I'm seeing deadlocks during btrfs/066.
> You
> > > > > > have to run the scrub tests as well, basically we're just
> > > > > > getting fsstress run alongside scrub.
> > > > > >
> > > > > > I'll run a few more times with it reverted to make sure, but I
> > > > > > think it's the root cause.
> > > > >
> > > > > I spoke too soon, it ended up deadlocking a few tests later.
> > > > >
> > > > In logic, even if the calculation of atomic_dec() in this patch
> > > > having bug, in worst condition, reada will works in single-thread
> > > > mode, and will not introduce deadlock.
> > > >
> > > > And by looking the backtrace in this mail, maybe it is caused by
> > > > reada_control->elems in someplace of this patchset.
> > > >
> > > > I recheck xfstests/066 in both vm and physical machine, on top of
> > > > my pull-request git today, with btrfs-progs 4.4 for many times,
> > > > but had not
> > > triggered the bug.
> > >
> > > Just running 066 alone doesn't trigger it for me.  I have to run
> > > everything from
> > > 00->066.
> > >
> > > My setup is 5 drives.  I use a script to carve them up into logical
> > > volumes, 5 for the test device and 5 for the scratch pool.  I think
> > > it should reproduce with a single drive, if you still can't trigger I'll confirm
> that.
> > >
> > > >
> > > > Could you tell me your test environment(TEST_DEV size, mount
> > > > option), and odds of fails in btrfs/066?
> > >
> > > 100% odds of failing, one time it made it up to btrfs/072.  I think
> > > more important than the drive setup is that I have all the debugging on.
> > > CONFIG_DEBUG_PAGEALLOC, spinlock debugging, mutex debugging and
> lock
> > > dep enabled.
> > >
> > Thanks for your answer.
> >
> > But unfortunately I hadn't reproduce the dead_lock in above way today...
> > Now I queued loop of above reproduce script in more nodes, and hopes
> > it can happen in this weekend.
> >
> > And by reviewing code, I found a problem which can introduce similar
> > bad result in logic, and made a patch for it.
> > [PATCH] [RFC] btrfs: reada: avoid undone reada extents in
> > btrfs_reada_wait
> >
> > Because it is only a problem in logic, but rarely happened, I only
> > confirmed no-problem after patch applied.
> >
> > Sorry for increased your works, could you apply this patch and test is
> > it works?
> 
> No problem, I'll try the patch and see if I can get a more reliable way to
> reproduce if it doesn't fix things.  Thanks!
> 
Thanks for your effective help.

I reproduced the bug in one of my node.
And I got the bug reason.

1: The background read thread in reada is not designed to complete
  all works, as above description in this mail, plus addition case in
  following.

2: For DUP, current code created 2 zones for it, and one of the zone
  is "dummy"(we only read first strip for DUP).
  And when the "dummy" zone is selected, current code ignore
  read action, just bypass and do a cleanup.
  Current code just return without re-select zone in this case to make
  logic easy, and it make code likely to exit reada thread.
  So, in DUP case, more background thread exit before all works done.
  It is why btrfs/066 always hang in DUP profile.

3: This problem exist in old code too, but rarely happened, my patchset
  trigger the problem because:
  a. Limited background thread number
    PATCH: btrfs: reada: limit max works count
    In old code, there are more background threads, and if one of them exit,
    remain threads will continue remain extents.
  b. reduce thread lift time
    PATCH: btrfs: reada: Avoid many times of empty loop
    The lift time of thread is reduced, and make the no-thread window large.

Fix: 
  We have following solution for this problem:
  a. Not add above dummy zone for DUP
    It will reduce the happen odds of the problem, but because
    "device workload limit(MAX_IN_FLIGHT)" and "total reads limit"
    in code, so the problem will still exist in very small case.
  b. let the reada background thread do all works before exit.
    It conflict with the limit design in [a].
  c. Check to ensure we have at least one thread in btrfs_reada_wait()
    It can fix the problem completely.
  So I will fix the problem by way "c", based on:
  [RFC] btrfs: reada: avoid undone reada extents in btrfs_reada_wait
  With some enhancement.

I'll make the fix and test it.

Thanks
Zhaolei

> -chris
> 




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhaolei Jan. 28, 2016, 7:49 a.m. UTC | #9
Hi, Chris Mason

> > > > > > > reada create 2 works for each level of tree in recursion.
> > > > > > >
> > > > > > > In case of a tree having many levels, the number of created
> > > > > > > works is 2^level_of_tree.
> > > > > > > Actually we don't need so many works in parallel, this patch
> > > > > > > limit max works to BTRFS_MAX_MIRRORS * 2.
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I don't think you end up calling atomic_dec() for every time
> > > > > > that
> > > > > > reada_start_machine() is called.  Also, I'd rather not have a
> > > > > > global static variable to limit the parallel workers, when we
> > > > > > have more than one FS mounted it'll end up limiting things too much.
> > > > > >
> > > > > > With this patch applied, I'm seeing deadlocks during btrfs/066.
> You
> > > > > > have to run the scrub tests as well, basically we're just
> > > > > > getting fsstress run alongside scrub.
> > > > > >
> > > > > > I'll run a few more times with it reverted to make sure, but I
> > > > > > think it's the root cause.
> > > > >
> > > > > I spoke too soon, it ended up deadlocking a few tests later.
> > > > >
> > > > In logic, even if the calculation of atomic_dec() in this patch
> > > > having bug, in worst condition, reada will works in single-thread
> > > > mode, and will not introduce deadlock.
> > > >
> > > > And by looking the backtrace in this mail, maybe it is caused by
> > > > reada_control->elems in someplace of this patchset.
> > > >
> > > > I recheck xfstests/066 in both vm and physical machine, on top of
> > > > my pull-request git today, with btrfs-progs 4.4 for many times,
> > > > but had not
> > > triggered the bug.
> > >
> > > Just running 066 alone doesn't trigger it for me.  I have to run
> > > everything from
> > > 00->066.
> > >
> > > My setup is 5 drives.  I use a script to carve them up into logical
> > > volumes, 5 for the test device and 5 for the scratch pool.  I think
> > > it should reproduce with a single drive, if you still can't trigger I'll confirm
> that.
> > >
> > > >
> > > > Could you tell me your test environment(TEST_DEV size, mount
> > > > option), and odds of fails in btrfs/066?
> > >
> > > 100% odds of failing, one time it made it up to btrfs/072.  I think
> > > more important than the drive setup is that I have all the debugging on.
> > > CONFIG_DEBUG_PAGEALLOC, spinlock debugging, mutex debugging and
> lock
> > > dep enabled.
> > >
> > Thanks for your answer.
> >
> > But unfortunately I hadn't reproduce the dead_lock in above way today...
> > Now I queued loop of above reproduce script in more nodes, and hopes
> > it can happen in this weekend.
> >
> > And by reviewing code, I found a problem which can introduce similar
> > bad result in logic, and made a patch for it.
> > [PATCH] [RFC] btrfs: reada: avoid undone reada extents in
> > btrfs_reada_wait
> >
> > Because it is only a problem in logic, but rarely happened, I only
> > confirmed no-problem after patch applied.
> >
> > Sorry for increased your works, could you apply this patch and test is
> > it works?
> 
> No problem, I'll try the patch and see if I can get a more reliable way to
> reproduce if it doesn't fix things.  Thanks!
> 

I rebased following branch:
https://github.com/zhaoleidd/btrfs.git integration-4.5

With updated patch to fix btrfs/066 bug.
Bug reason is descripted in changelog of:
btrfs: reada: avoid undone reada extents in btrfs_reada_wait

Test:
1: In the node which can repgoduce btrfs/066 bug,
  Confirmed HAVING_BUG before patch, and NO_BUG after patch.
2: Run xfstests's btrfs group, confirmed no regression.

Most patchs in this branch are for reada, except this one for NO_SPACE bug:
btrfs: Continue write in case of can_not_nocow

Cound you consider merging it in suitable time?

Thanks
Zhaolei

> -chris
> 




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason Jan. 28, 2016, 1:30 p.m. UTC | #10
On Thu, Jan 28, 2016 at 03:49:54PM +0800, Zhao Lei wrote:
> I rebased following branch:
> https://github.com/zhaoleidd/btrfs.git integration-4.5
> 
> With updated patch to fix btrfs/066 bug.
> Bug reason is descripted in changelog of:
> btrfs: reada: avoid undone reada extents in btrfs_reada_wait
> 
> Test:
> 1: In the node which can repgoduce btrfs/066 bug,
>   Confirmed HAVING_BUG before patch, and NO_BUG after patch.
> 2: Run xfstests's btrfs group, confirmed no regression.
> 
> Most patchs in this branch are for reada, except this one for NO_SPACE bug:
> btrfs: Continue write in case of can_not_nocow
> 
> Cound you consider merging it in suitable time?

Thanks for tracking all of this down, I'll take a look.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 53ee7b1..7b150b2 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -103,6 +103,9 @@  static void __reada_start_machine(struct btrfs_fs_info *fs_info);
 static int reada_add_block(struct reada_control *rc, u64 logical,
 			   struct btrfs_key *top, u64 generation);
 
+/* To limit max reada works */
+static atomic_t works_cnt = ATOMIC_INIT(0);
+
 /* recurses */
 /* in case of err, eb might be NULL */
 static void __readahead_hook(struct btrfs_fs_info *fs_info,
@@ -759,6 +762,8 @@  static void reada_start_machine_worker(struct btrfs_work *work)
 	set_task_ioprio(current, BTRFS_IOPRIO_READA);
 	__reada_start_machine(fs_info);
 	set_task_ioprio(current, old_ioprio);
+
+	atomic_dec(&works_cnt);
 }
 
 static void __reada_start_machine(struct btrfs_fs_info *fs_info)
@@ -790,8 +795,11 @@  static void __reada_start_machine(struct btrfs_fs_info *fs_info)
 	 * enqueue to workers to finish it. This will distribute the load to
 	 * the cores.
 	 */
-	for (i = 0; i < 2; ++i)
+	for (i = 0; i < 2; ++i) {
 		reada_start_machine(fs_info);
+		if (atomic_read(&works_cnt) > BTRFS_MAX_MIRRORS * 2)
+			break;
+	}
 }
 
 static void reada_start_machine(struct btrfs_fs_info *fs_info)
@@ -808,6 +816,7 @@  static void reada_start_machine(struct btrfs_fs_info *fs_info)
 	rmw->fs_info = fs_info;
 
 	btrfs_queue_work(fs_info->readahead_workers, &rmw->work);
+	atomic_inc(&works_cnt);
 }
 
 #ifdef DEBUG