Btrfs: fix deadlock with nested trans handles

Message ID	1394150467-5990-1-git-send-email-jbacik@fb.com (mailing list archive)
State	Accepted
Headers	show Return-Path: <linux-btrfs-owner@kernel.org> From: Josef Bacik <jbacik@fb.com> To: <linux-btrfs@vger.kernel.org> CC: <stable@vger.kernel.org> Subject: [PATCH] Btrfs: fix deadlock with nested trans handles Date: Thu, 6 Mar 2014 19:01:07 -0500 Message-ID: <1394150467-5990-1-git-send-email-jbacik@fb.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk

Josef Bacik March 7, 2014, 12:01 a.m. UTC

Zach found this deadlock that would happen like this

btrfs_end_transaction <- reduce trans->use_count to 0
  btrfs_run_delayed_refs
    btrfs_cow_block
      find_free_extent
	btrfs_start_transaction <- increase trans->use_count to 1
          allocate chunk
	btrfs_end_transaction <- decrease trans->use_count to 0
	  btrfs_run_delayed_refs
	    lock tree block we are cowing above ^^

We need to only decrease trans->use_count if it is above 1, otherwise leave it
alone.  This will make nested trans be the only ones who decrease their added
ref, and will let us get rid of the trans->use_count++ hack if we have to commit
the transaction.  Thanks,

cc: stable@vger.kernel.org
Reported-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/transaction.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

Zach Brown March 7, 2014, 12:25 a.m. UTC | #1

On Thu, Mar 06, 2014 at 07:01:07PM -0500, Josef Bacik wrote:
> Zach found this deadlock that would happen like this
> 
> btrfs_end_transaction <- reduce trans->use_count to 0
>   btrfs_run_delayed_refs
>     btrfs_cow_block
>       find_free_extent
> 	btrfs_start_transaction <- increase trans->use_count to 1
>           allocate chunk
> 	btrfs_end_transaction <- decrease trans->use_count to 0
> 	  btrfs_run_delayed_refs
> 	    lock tree block we are cowing above ^^

Indeed, I stumbled across this while trying to reproduce reported
problems with iozone.  This deadlock would consistently hit during
random 1k reads in a 2gig file.

> We need to only decrease trans->use_count if it is above 1, otherwise leave it
> alone.  This will make nested trans be the only ones who decrease their added
> ref, and will let us get rid of the trans->use_count++ hack if we have to commit
> the transaction.  Thanks,

And this fixes it.  It's run through a few times successfully.

> cc: stable@vger.kernel.org
> Reported-by: Zach Brown <zab@redhat.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Tested-by: Zach Brown <zab@redhat.com>

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rich Freeman March 12, 2014, 12:56 p.m. UTC | #2

On Thu, Mar 6, 2014 at 7:25 PM, Zach Brown <zab@redhat.com> wrote:
> On Thu, Mar 06, 2014 at 07:01:07PM -0500, Josef Bacik wrote:
>> Zach found this deadlock that would happen like this
>>
>
> And this fixes it.  It's run through a few times successfully.

I'm not sure if my issue is related to this or not - happy to start a
new thread if not.  I applied this patch as I was running into locks,
but I am still having them.

See: http://picpaste.com/IMG_20140312_072458-KPH35pQ6.jpg

After a number of reboots the system became stable, presumably
whatever race condition btrfs was hitting followed a favorable path.

I do have a 2GB btrfs-image pre-dating my application of this patch
that was causing the issue last week.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Josef Bacik March 12, 2014, 3:24 p.m. UTC | #3

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 03/12/2014 08:56 AM, Rich Freeman wrote:
> On Thu, Mar 6, 2014 at 7:25 PM, Zach Brown <zab@redhat.com> wrote:
>> On Thu, Mar 06, 2014 at 07:01:07PM -0500, Josef Bacik wrote:
>>> Zach found this deadlock that would happen like this
>>> 
>> 
>> And this fixes it.  It's run through a few times successfully.
> 
> I'm not sure if my issue is related to this or not - happy to start
> a new thread if not.  I applied this patch as I was running into
> locks, but I am still having them.
> 
> See:
> https://urldefense.proofpoint.com/v1/url?u=http://picpaste.com/IMG_20140312_072458-KPH35pQ6.jpg&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=i0z2iBhr7rZW%2Bkc1oo9FXeEzYukWs4Q36PGBkcCzub4%3D%0A&s=a3430d4f9555ea4c4ae32e4570a75bb66153a7d4b76e99e46da7e7c3d2a80a2e
>
>  After a number of reboots the system became stable, presumably 
> whatever race condition btrfs was hitting followed a favorable
> path.
> 
> I do have a 2GB btrfs-image pre-dating my application of this
> patch that was causing the issue last week.
> 

Uhm wow that's pretty epic.  I will talk to chris and figure out how
we want to deal with that and send you a patch shortly.  Thanks,

Josef

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTIHxGAAoJEANb+wAKly3BNMYP/0HYjAVhf9NndNM/ZsLcR48k
4Br0S3QPuNuMvyZMtTSmpSWmLjhtPOl87llOSBfYOTti3AVXYuyzHruGxpkjrqAS
PLYF+1wlrJeJ7FO6of7B0ZTzbEZcpMOV5Y0FV5m2ONGO1HthLISVTipVgm9KIcWG
wa1BmyqdVg2h2aRmBeFTJVHWWHCK3m2qLXiNpiH3Vh399vZwrdh7tmVI3Pvv+nVj
NGl+xskoaQHV9TUz4RnpyQ2dg/y3A+f+GmbDHcI3WUi/6is1Pnctv0RrGb1XYwlu
DuhYxP/vbrgD9PE4C3/6kaXMfnzVkqQk9dN6hqWZnOq5Vo1zbTeEpemvDvRPRYVf
PwJ29PXXRJsUtHpr68/xBAjBE2elLQvnIHOkfqVSsTNKA4PZzojcw7k0WIFTRveN
iWDDGqcL+Pnvq9Ajj8hjlnVIqQwExx5K0Bikue/Nr8p3vuy1FEvdOXlxmOqPVd6K
Bw2H9mPsJeLHZdwV3Kb0aH3rA8ruPqZHy3Dp0/r0gcYKwpQTKv5cDnjhu5aJ4xgZ
k7Ve8AFZwqChnXPdr9yZr1dBrxBTVa7b/uY+7PiAQQiBXa5M2yZOP9R3H+yak1PP
w11FNrhemaKpYX63Equ9VVRDv0TFIslklLfvVRIQIUKpZ+4XX43hmU0EqYbbN+Eh
fGiSsK1fFbWt/js23RoD
=sGCH
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rich Freeman March 12, 2014, 4:34 p.m. UTC | #4

On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik <jbacik@fb.com> wrote:
> On 03/12/2014 08:56 AM, Rich Freeman wrote:
>>
>>  After a number of reboots the system became stable, presumably
>> whatever race condition btrfs was hitting followed a favorable
>> path.
>>
>> I do have a 2GB btrfs-image pre-dating my application of this
>> patch that was causing the issue last week.
>>
>
> Uhm wow that's pretty epic.  I will talk to chris and figure out how
> we want to deal with that and send you a patch shortly.  Thanks,

If you need any info from me at all beyond the capture let me know.

A tiny bit more background.  The system would boot normally, but panic
after about 30-90 seconds (usually long enough to log into KDE,
perhaps even fire up a browser/etc).  In single-user mode I could
mount the filesystem read-only without issue.  If I mounted it
read-write (in recovery mode or normally) I'd get the panic after
about 30-60 seconds.  On one occasion it seemed stable, but panicked
when I unmounted it.

I have to say that I'm impressed that it recovers at all.  I'd rather
have the file system not write anything if it isn't sure it can't
write it correctly, and that seems to be the effect here.  Just about
all the issues I've run into with btrfs have tended to be lockup/etc
type issues, and not silent corruption.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rich Freeman March 14, 2014, 10:40 p.m. UTC | #5

On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman
<r-btrfs@thefreemanclan.net> wrote:
> On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik <jbacik@fb.com> wrote:
>> On 03/12/2014 08:56 AM, Rich Freeman wrote:
>>>
>>>  After a number of reboots the system became stable, presumably
>>> whatever race condition btrfs was hitting followed a favorable
>>> path.
>>>
>>> I do have a 2GB btrfs-image pre-dating my application of this
>>> patch that was causing the issue last week.
>>>
>>
>> Uhm wow that's pretty epic.  I will talk to chris and figure out how
>> we want to deal with that and send you a patch shortly.  Thanks,
>
> A tiny bit more background.

And some more background.  I had more reboots over the next two days
at the same time each day, just after my crontab successfully
completed.  One of the last thing it does is runs the snapper cleanups
which delete a bunch of snapshots.  During a reboot I checked and
there were a bunch of deleted snapshots, which disappeared over the
next 30-60 seconds before the panic, and then they would re-appear on
the next reboot.

I disabled the snapper cron job and this morning had no issues at all.
 One day isn't much to establish a trend, but I suspect that this is
the cause.  Obviously getting rid of snapshots would be desirable at
some point, but I can wait for a patch.  Snapper would be deleting
about 48 snapshots at the same time, since I create them hourly and
the cleanup occurs daily on two different subvolumes on the same
filesystem.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan March 15, 2014, 11:51 a.m. UTC | #6

Rich Freeman posted on Fri, 14 Mar 2014 18:40:25 -0400 as excerpted:

> And some more background.  I had more reboots over the next two days at
> the same time each day, just after my crontab successfully completed. 
> One of the last thing it does is runs the snapper cleanups which delete
> a bunch of snapshots.  During a reboot I checked and there were a bunch
> of deleted snapshots, which disappeared over the next 30-60 seconds
> before the panic, and then they would re-appear on the next reboot.
> 
> I disabled the snapper cron job and this morning had no issues at all.
>  One day isn't much to establish a trend, but I suspect that this is
> the cause.  Obviously getting rid of snapshots would be desirable at
> some point, but I can wait for a patch.  Snapper would be deleting about
> 48 snapshots at the same time, since I create them hourly and the
> cleanup occurs daily on two different subvolumes on the same filesystem.

Hi, Rich.  Imagine seeing you here! =:^)  (Note to others, I run gentoo 
and he's a gentoo dev, so we normally see each other on the gentoo 
lists.  But btrfs comes up occasionally there too, so we knew we were 
both running it, I'd just not noticed any of his posts here, previously.)

Three things:

1) Does running the snapper cleanup command from that cron job manually 
trigger the problem as well?

Presumably if you run it manually, you'll do so at a different time of 
day, thus eliminating the possibility that it's a combination of that and 
something else occurring at that specific time, as well as confirming 
that it is indeed the snapper 

2) What about modifying the cron job to run hourly, or perhaps every six 
hours, so it's deleting only 2 or 12 instead of 48 at a time?  Does that 
help?

If so then it's a thundering herd problem.  While definitely still a bug, 
you'll at least have a workaround until its fixed. 

3) I'd be wary of letting too many snapshots build up.  A couple hundred 
shouldn't be a huge issue, but particularly when the snapshot-aware-
defrag was still enabled, people were reporting problems with thousands 
of snapshots, so I'd recommend trying to keep it under 500 or so, at 
least of the same subvol (so under 1000 total since you're snapshotting 
two different subvols).

So a hourly cron job deleting or at least thinning down snapshots over 
say 2 days old, possibly in the same cron job that creates the new snaps, 
might be a good idea.  That'd only do two at a time, the same rate 
they're created, but with a 48 hour set of snaps before deletion.

Josef Bacik March 17, 2014, 2:34 p.m. UTC | #7

On 03/14/2014 06:40 PM, Rich Freeman wrote:
> On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman
> <r-btrfs@thefreemanclan.net> wrote:
>> On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik <jbacik@fb.com> wrote:
>>> On 03/12/2014 08:56 AM, Rich Freeman wrote:
>>>>
>>>>   After a number of reboots the system became stable, presumably
>>>> whatever race condition btrfs was hitting followed a favorable
>>>> path.
>>>>
>>>> I do have a 2GB btrfs-image pre-dating my application of this
>>>> patch that was causing the issue last week.
>>>>
>>>
>>> Uhm wow that's pretty epic.  I will talk to chris and figure out how
>>> we want to deal with that and send you a patch shortly.  Thanks,
>>
>> A tiny bit more background.
>
> And some more background.  I had more reboots over the next two days
> at the same time each day, just after my crontab successfully
> completed.  One of the last thing it does is runs the snapper cleanups
> which delete a bunch of snapshots.  During a reboot I checked and
> there were a bunch of deleted snapshots, which disappeared over the
> next 30-60 seconds before the panic, and then they would re-appear on
> the next reboot.
>
> I disabled the snapper cron job and this morning had no issues at all.
>   One day isn't much to establish a trend, but I suspect that this is
> the cause.  Obviously getting rid of snapshots would be desirable at
> some point, but I can wait for a patch.  Snapper would be deleting
> about 48 snapshots at the same time, since I create them hourly and
> the cleanup occurs daily on two different subvolumes on the same
> filesystem.

Ok that's helpful, I'm no longer positive I know what's causing this, 
I'll try to reproduce once I've nailed down these backref problems and 
balance corruption.  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rich Freeman March 21, 2014, 2:13 a.m. UTC | #8

On Sat, Mar 15, 2014 at 7:51 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> 1) Does running the snapper cleanup command from that cron job manually
> trigger the problem as well?

As you can imagine I'm not too keen to trigger this often.  But yes, I
just gave it a shot on my SSD and cleaning a few days of timelines
triggered a panic.

> 2) What about modifying the cron job to run hourly, or perhaps every six
> hours, so it's deleting only 2 or 12 instead of 48 at a time?  Does that
> help?
>
> If so then it's a thundering herd problem.  While definitely still a bug,
> you'll at least have a workaround until its fixed.

Definitely looks like a thundering herd problem.

I stopped the cron jobs (including the creation of snapshots based on
your later warning).  However, I am my snapshots one at a time at a
rate of one every 5-30 minutes, and while that is creating
surprisingly high disk loads on my ssd and hard drives, I don't get
any panics.  I figured that having only one deletion pending per
checkpoint would eliminate locking risk.

I did get some blocked task messages in dmesg, like:
[105538.121239] INFO: task mysqld:3006 blocked for more than 120 seconds.
[105538.121251]       Not tainted 3.13.6-gentoo #1
[105538.121256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[105538.121262] mysqld          D ffff880395f63e80  3432  3006      1 0x00000000
[105538.121273]  ffff88028b623d38 0000000000000086 ffff88028b623dc8
ffffffff81c10440
[105538.121283]  0000000000000200 ffff88028b623fd8 ffff880395f63b80
0000000000012c40
[105538.121291]  0000000000012c40 ffff880395f63b80 00000000532b7877
ffff880410e7e578
[105538.121299] Call Trace:
[105538.121316]  [<ffffffff81623d73>] schedule+0x6a/0x6c
[105538.121327]  [<ffffffff81623f52>] schedule_preempt_disabled+0x9/0xb
[105538.121337]  [<ffffffff816251af>] __mutex_lock_slowpath+0x155/0x1af
[105538.121347]  [<ffffffff812b9db0>] ? radix_tree_tag_set+0x71/0xd4
[105538.121356]  [<ffffffff81625225>] mutex_lock+0x1c/0x2e
[105538.121365]  [<ffffffff8123c168>] btrfs_log_inode_parent+0x161/0x308
[105538.121373]  [<ffffffff8162466d>] ? mutex_unlock+0x11/0x13
[105538.121382]  [<ffffffff8123cd37>] btrfs_log_dentry_safe+0x39/0x52
[105538.121390]  [<ffffffff8121a0c9>] btrfs_sync_file+0x1bc/0x280
[105538.121401]  [<ffffffff811339a3>] vfs_fsync_range+0x13/0x1d
[105538.121409]  [<ffffffff811339c4>] vfs_fsync+0x17/0x19
[105538.121416]  [<ffffffff81133c3c>] do_fsync+0x30/0x55
[105538.121423]  [<ffffffff81133e40>] SyS_fsync+0xb/0xf
[105538.121432]  [<ffffffff8162c2e2>] system_call_fastpath+0x16/0x1b

I suspect that this may not be terribly helpful - it probably reflects
tasks waiting for a lock rather than whatever is holding it.  It was
more of a problem when I was trying to delete a snapshot per minute on
my ssd, or one every 5 min on hdd.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Duncan March 21, 2014, 5:44 a.m. UTC | #9

Rich Freeman posted on Thu, 20 Mar 2014 22:13:51 -0400 as excerpted:

> However, I am my snapshots one at a time at a rate of one every 5-30
> minutes, and while that is creating surprisingly high disk loads on my
> ssd and hard drives, I don't get any panics.  I figured that having only
> one deletion pending per checkpoint would eliminate locking risk.
> 
> I did get some blocked task messages in dmesg, like:
> [105538.121239] INFO: task mysqld:3006 blocked for more than 120
> seconds.

These... are a continuing issue.  The devs are working on it, but...

The people that seem to have it the worst are doing both scripted 
snapshotting and large (gig+) constantly internal-rewritten files such as 
VM images (the most commonly reported case) or databases.  Properly 
setting NOCOW on the files[1] helps, but...

* The key thing to realize about snapshotting continually rewritten NOCOW 
files is that the first change to a block after a snapshot by definition 
MUST be COWed anyway, since the file content has changed from that of the 
snapshot.  Further writes to the same block (until the next snapshot) 
will be rewritten in-place (the existing NOCOW attribute  is maintained 
thru that mandatory COW), but next snapshot and following write, BAM! 
gotta COW again!

So while NOCOW helps, in scenarios such as hourly snapshotting of active 
VM-image data loads its ability to control actual fragmentation is 
unfortunately rather limited.  And it's precisely this fragmentation that 
appears to be the problem! =:^(

It's almost certainly that fragmentation that's triggering your blocked 
for X seconds issues.  But the interesting thing here is the reports even 
from people with fast SSDs where seek-time and even IOPs shouldn't be a 
huge issue.  In at least some cases, the problem has been CPU time, not 
physical media access.

Which is one reason the snapshot-aware-defrag was disabled again 
recently, because it simply wasn't scaling.  (To answer the question, 
yes, defrag still works; it's only the snapshot-awareness that was 
disabled.  Defrag is back to dumbly ignoring other snapshots and simply 
defragging the working file-extent-mapping the defrag is being run on, 
with other snapshots staying untouched.)  They're reworking the whole 
feature now in ordered to scale better.

But while that considerably reduces the pain point, people were seeing 
little or no defrag/balance/restripe progress in /hours/ if they had 
enough snapshots and that problem has been bypassed for the moment, we're 
still left with these nasty N-second stalls at times, especially when 
doing anything else involving those snapshots and the corresponding 
fragmentation they cover, including deleting them.  Hopefully tweaking 
the algorithms and eventually optimizing can do away with much of this 
problem eventually, but I've a feeling it'll be around to some degree for 
some years.

Meanwhile, for data that fits that known problematic profile, the current 
recommendation is, preferably, to isolate it to a subvolume that has only 
very limited or no snapshotting done.

The other alternative, of course, since NOCOW already turns off many of 
the features a lot of people are using btrfs for in the first place 
(checksumming and compression are disabled with NOCOW as well, tho it 
turns out they're not so well suited to VM images in the first place), is 
that given the subvolume isolation already, just stick it on an entirely 
different filesystem, either btrfs with the nocow mount option, or 
arguably something a bit more traditional and mature such as ext4 or xfs, 
where xfs of course is actually targeted at large to huge file use-cases 
so multi-gig VMs should be an ideal fit.  Of course you lose the benefits 
of btrfs doing that, but given its COW nature, btrfs arguably isn't the 
ideal solution for such huge internal-write files in the first place, and 
even when fully mature will likely only have /acceptable/ performance 
with them as suitable for use as a general purpose filesystem, with xfs 
or similar still likely being a better dedicated filesystem for such use-
cases.

Meanwhile, I think everyone agrees that getting that locking down to 
avoid the deadlocks, etc, really must be priority one, at least now that 
the huge scaling blocker of snapshot-aware-defrag is (hopefully 
temporarily) disabled.  Blocking for a couple minutes at a time certainly 
isn't ideal, but since the triggering jobs such as snapshot deletion, 
etc, can be rescheduled to otherwise idle time, that's certainly less 
critical than crashes if people accidentally or in ignorance queue up too 
many snapshot deletions at a time!

---
[1] NOCOW: chattr +C .  With btrfs, this should be done while the file is 
zero-size, before it has content.  The easiest way to do that is to 
create a dedicated directory for these files and set the attribute on the 
directory, such that the files inherit it at file creation.

Alex Lyakas May 3, 2014, 8:04 p.m. UTC | #10

Hi Josef,
this problem could not happen when find_free_extent() was receiving a
transaction handle (which was changed in "Btrfs: avoid starting a
transaction in the write path"), correct? Because it would have used
the passed transaction handle to do the chunk allocation, and thus
would not need to do join_transaction/end_transaction leading to
recursive run_delayed_refs call.

Alex.


On Fri, Mar 7, 2014 at 3:01 AM, Josef Bacik <jbacik@fb.com> wrote:
> Zach found this deadlock that would happen like this
>
> btrfs_end_transaction <- reduce trans->use_count to 0
>   btrfs_run_delayed_refs
>     btrfs_cow_block
>       find_free_extent
>         btrfs_start_transaction <- increase trans->use_count to 1
>           allocate chunk
>         btrfs_end_transaction <- decrease trans->use_count to 0
>           btrfs_run_delayed_refs
>             lock tree block we are cowing above ^^
>
> We need to only decrease trans->use_count if it is above 1, otherwise leave it
> alone.  This will make nested trans be the only ones who decrease their added
> ref, and will let us get rid of the trans->use_count++ hack if we have to commit
> the transaction.  Thanks,
>
> cc: stable@vger.kernel.org
> Reported-by: Zach Brown <zab@redhat.com>
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/transaction.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 34cd831..b05bf58 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -683,7 +683,8 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>         int lock = (trans->type != TRANS_JOIN_NOLOCK);
>         int err = 0;
>
> -       if (--trans->use_count) {
> +       if (trans->use_count > 1) {
> +               trans->use_count--;
>                 trans->block_rsv = trans->orig_rsv;
>                 return 0;
>         }
> @@ -731,17 +732,10 @@ static int __btrfs_end_transaction(struct btrfs_trans_handle *trans,
>         }
>
>         if (lock && ACCESS_ONCE(cur_trans->state) == TRANS_STATE_BLOCKED) {
> -               if (throttle) {
> -                       /*
> -                        * We may race with somebody else here so end up having
> -                        * to call end_transaction on ourselves again, so inc
> -                        * our use_count.
> -                        */
> -                       trans->use_count++;
> +               if (throttle)
>                         return btrfs_commit_transaction(trans, root);
> -               } else {
> +               else
>                         wake_up_process(info->transaction_kthread);
> -               }
>         }
>
>         if (trans->type & __TRANS_FREEZABLE)
> --
> 1.8.3.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs: fix deadlock with nested trans handles

Commit Message

Comments

Patch