Btrfs: fix a deadlock on chunk mutex

On Wed, Jan 30, 2013 at 02:37:40PM -0700, Jim Schutt wrote:
> On 01/30/2013 09:38 AM, Josef Bacik wrote:
> > On Tue, Jan 29, 2013 at 04:05:17PM -0700, Jim Schutt wrote:
> >> > On 01/29/2013 01:04 PM, Josef Bacik wrote:
> >>> > > On Tue, Jan 29, 2013 at 11:41:10AM -0700, Jim Schutt wrote:
> >>>>> > >> > On 01/28/2013 02:23 PM, Josef Bacik wrote:
> >>>>>>> > >>> > > On Thu, Jan 03, 2013 at 11:44:46AM -0700, Jim Schutt wrote:
> >>>>>>>>> > >>>> > >> Hi Josef,
> >>>>>>>>> > >>>> > >>
> >>>>>>>>> > >>>> > >> Thanks for the patch - sorry for the long delay in testing...
> >>>>>>>>> > >>>> > >>
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > Jim,
> >>>>>>> > >>> > > 
> >>>>>>> > >>> > > I've been trying to reason out how this happens, could you do a btrfs fi df on
> >>>>>>> > >>> > > the filesystem thats giving you trouble so I can see if what I think is
> >>>>>>> > >>> > > happening is what's actually happening.  Thanks,
> >>>>> > >> > 
> >>>>> > >> > Here's an example, using a slightly different kernel than
> >>>>> > >> > my previous report.  It's your btrfs-next master branch
> >>>>> > >> > (commit 8f139e59d5 "Btrfs: use bit operation for ->fs_state")
> >>>>> > >> > with ceph 3.8 for-linus (commit 0fa6ebc600 from linus' tree).
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here I'm finding the file system in question:
> >>>>> > >> > 
> >>>>> > >> > # ls -l /dev/mapper | grep dm-93
> >>>>> > >> > lrwxrwxrwx 1 root root       8 Jan 29 11:13 cs53s19p2 -> ../dm-93
> >>>>> > >> > 
> >>>>> > >> > # df -h | grep -A 1 cs53s19p2
> >>>>> > >> > /dev/mapper/cs53s19p2
> >>>>> > >> >                       896G  1.1G  896G   1% /ram/mnt/ceph/data.osd.522
> >>>>> > >> > 
> >>>>> > >> > 
> >>>>> > >> > Here's the info you asked for:
> >>>>> > >> > 
> >>>>> > >> > # btrfs fi df /ram/mnt/ceph/data.osd.522
> >>>>> > >> > Data: total=2.01GB, used=1.00GB
> >>>>> > >> > System: total=4.00MB, used=64.00KB
> >>>>> > >> > Metadata: total=8.00MB, used=7.56MB
> >>>>> > >> > 
> >>> > > How big is the disk you are using, and what mount options?  I have a patch to
> >>> > > keep the panic from happening and hopefully the abort, could you try this?  I
> >>> > > still want to keep the underlying error from happening because it shouldn't be,
> >>> > > but no reason I can't fix the error case while you can easily reproduce it :).
> >>> > > Thanks,
> >>> > > 
> >>> > > Josef
> >>> > > 
> >>> > >>From c50b725c74c7d39064e553ef85ac9753efbd8aec Mon Sep 17 00:00:00 2001
> >>> > > From: Josef Bacik <jbacik@fusionio.com>
> >>> > > Date: Tue, 29 Jan 2013 15:03:37 -0500
> >>> > > Subject: [PATCH] Btrfs: fix chunk allocation error handling
> >>> > > 
> >>> > > If we error out allocating a dev extent we will have already created the
> >>> > > block group and such which will cause problems since the allocator may have
> >>> > > tried to allocate out of the block group that no longer exists.  This will
> >>> > > cause BUG_ON()'s in the bio submission path.  This also makes a failure to
> >>> > > allocate a dev extent a non-abort error, we will just clean up the dev
> >>> > > extents we did allocate and exit.  Now if we fail to delete the dev extents
> >>> > > we will abort since we can't have half of the dev extents hanging around,
> >>> > > but this will make us much less likely to abort.  Thanks,
> >>> > > 
> >>> > > Signed-off-by: Josef Bacik <jbacik@fusionio.com>
> >>> > > ---
> >> > 
> >> > Interesting - with your patch applied I triggered the following, just
> >> > bringing up a fresh Ceph filesystem - I didn't even get a chance to
> >> > mount it on my Ceph clients:
> >> > 
> > Ok can you give this patch a whirl as well?  It seems to fix the problem for me.
> 
> With this patch on top of your previous patch, after several trials of
> my test I am also unable to reproduce the issue.  Since I had been
> having trouble first time, every time, I think it also seems to fix
> the problem for me.
> 

Hey Jim,

Could you test this patch instead?  I think it's a little less hamfisted and
should give us a nice balance between not crashing and being good for
performance.  Thanks,

Josef

commit 43510c0e5faad8e5e4d8ba13baa1dd5dfb3d39ce
Author: Josef Bacik <jbacik@fusionio.com>
Date:   Wed Jan 30 17:02:51 2013 -0500

    Btrfs: do not allow overcommit to happen if we are over 80% in use

    Because of how little we allocate chunks now we can get really tight on
    metadata space before we will allocate a new chunk.  This resulted in being
    unable to add device extents when allocating a new metadata chunk as we did
    not have enough space.  This is because we were allowed to overcommit too
    much metadata without actually making sure we had enough space to make
    allocations.  The idea behind overcommit is that we are allowed to say "sure
    you can have that reservation" when most of the free space is occupied by
    reservations, not actual allocations.  But in this case where a majority of
    the total space is in use by actual allocations we can screw ourselves by
    not being able to make real allocations when it matters.  So put this cap in
    place for now to keep us from overcommitting so much that we run out of
    space.  Thanks,

    Reported-and-tested-by: Jim Schutt <jaschut@sandia.gov>
    Signed-off-by: Josef Bacik <jbacik@fusionio.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs: fix a deadlock on chunk mutex

Commit Message

Comments

Patch