diff mbox

[13/19] MM: set PF_FSTRANS while allocating per-cpu memory to avoid deadlock.

Message ID 20140416040336.10604.67456.stgit@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown April 16, 2014, 4:03 a.m. UTC
lockdep reports a locking chain

  sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex

As sk_lock may be needed to reclaim memory, allowing that
reclaim while pcu_alloc_mutex is held can lead to deadlock.
So set PF_FSTRANS while it is help to avoid the FS reclaim.

pcpu_alloc_mutex can be taken when rtnl_mutex is held:

    [<ffffffff8117f979>] pcpu_alloc+0x49/0x960
    [<ffffffff8118029b>] __alloc_percpu+0xb/0x10
    [<ffffffff8193b9f7>] loopback_dev_init+0x17/0x60
    [<ffffffff81aaf30c>] register_netdevice+0xec/0x550
    [<ffffffff81aaf785>] register_netdev+0x15/0x30

Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/percpu.c |    4 ++++
 1 file changed, 4 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Dave Chinner April 16, 2014, 5:49 a.m. UTC | #1
On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote:
> lockdep reports a locking chain
> 
>   sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex
> 
> As sk_lock may be needed to reclaim memory, allowing that
> reclaim while pcu_alloc_mutex is held can lead to deadlock.
> So set PF_FSTRANS while it is help to avoid the FS reclaim.
> 
> pcpu_alloc_mutex can be taken when rtnl_mutex is held:
> 
>     [<ffffffff8117f979>] pcpu_alloc+0x49/0x960
>     [<ffffffff8118029b>] __alloc_percpu+0xb/0x10
>     [<ffffffff8193b9f7>] loopback_dev_init+0x17/0x60
>     [<ffffffff81aaf30c>] register_netdevice+0xec/0x550
>     [<ffffffff81aaf785>] register_netdev+0x15/0x30
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

This looks like a workaround to avoid passing a gfp mask around to
describe the context in which the allocation is taking place.
Whether or not that's the right solution, I can't say, but spreading
this "we can turn off all reclaim of filesystem objects" mechanism
all around the kernel doesn't sit well with me...

And, again, PF_FSTRANS looks plainly wrong in this code - it sure
isn't a fs transaction context we are worried about here...
NeilBrown April 16, 2014, 6:22 a.m. UTC | #2
On Wed, 16 Apr 2014 15:49:42 +1000 Dave Chinner <david@fromorbit.com> wrote:

> On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote:
> > lockdep reports a locking chain
> > 
> >   sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex
> > 
> > As sk_lock may be needed to reclaim memory, allowing that
> > reclaim while pcu_alloc_mutex is held can lead to deadlock.
> > So set PF_FSTRANS while it is help to avoid the FS reclaim.
> > 
> > pcpu_alloc_mutex can be taken when rtnl_mutex is held:
> > 
> >     [<ffffffff8117f979>] pcpu_alloc+0x49/0x960
> >     [<ffffffff8118029b>] __alloc_percpu+0xb/0x10
> >     [<ffffffff8193b9f7>] loopback_dev_init+0x17/0x60
> >     [<ffffffff81aaf30c>] register_netdevice+0xec/0x550
> >     [<ffffffff81aaf785>] register_netdev+0x15/0x30
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> 
> This looks like a workaround to avoid passing a gfp mask around to
> describe the context in which the allocation is taking place.
> Whether or not that's the right solution, I can't say, but spreading
> this "we can turn off all reclaim of filesystem objects" mechanism
> all around the kernel doesn't sit well with me...

We are (effectively) passing a gfp mask around, except that it lives in
'current' rather than lots of other places.
I actually like the idea of discarding PF_MEMALLOC, PF_FSTRANS and
PF_MEMALLOC_NOIO, and just having current->gfp_allowed_mask (to match the
global variable of the same name).

> 
> And, again, PF_FSTRANS looks plainly wrong in this code - it sure
> isn't a fs transaction context we are worried about here...

So would PF_MEMALLOC_NOFS work for you?

NeilBrown
Dave Chinner April 16, 2014, 6:30 a.m. UTC | #3
On Wed, Apr 16, 2014 at 04:22:01PM +1000, NeilBrown wrote:
> On Wed, 16 Apr 2014 15:49:42 +1000 Dave Chinner <david@fromorbit.com> wrote:
> 
> > On Wed, Apr 16, 2014 at 02:03:36PM +1000, NeilBrown wrote:
> > > lockdep reports a locking chain
> > > 
> > >   sk_lock-AF_INET --> rtnl_mutex --> pcpu_alloc_mutex
> > > 
> > > As sk_lock may be needed to reclaim memory, allowing that
> > > reclaim while pcu_alloc_mutex is held can lead to deadlock.
> > > So set PF_FSTRANS while it is help to avoid the FS reclaim.
> > > 
> > > pcpu_alloc_mutex can be taken when rtnl_mutex is held:
> > > 
> > >     [<ffffffff8117f979>] pcpu_alloc+0x49/0x960
> > >     [<ffffffff8118029b>] __alloc_percpu+0xb/0x10
> > >     [<ffffffff8193b9f7>] loopback_dev_init+0x17/0x60
> > >     [<ffffffff81aaf30c>] register_netdevice+0xec/0x550
> > >     [<ffffffff81aaf785>] register_netdev+0x15/0x30
> > > 
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > 
> > This looks like a workaround to avoid passing a gfp mask around to
> > describe the context in which the allocation is taking place.
> > Whether or not that's the right solution, I can't say, but spreading
> > this "we can turn off all reclaim of filesystem objects" mechanism
> > all around the kernel doesn't sit well with me...
> 
> We are (effectively) passing a gfp mask around, except that it lives in
> 'current' rather than lots of other places.
> I actually like the idea of discarding PF_MEMALLOC, PF_FSTRANS and
> PF_MEMALLOC_NOIO, and just having current->gfp_allowed_mask (to match the
> global variable of the same name).

Given that we've had problems getting gfp flags propagated into the
VM code (vmalloc, I'm looking at you!) making the current task
carry the valid memory allocation and reclaim context mask woul dbe
a good idea. That's effectively the problem PF_MEMALLOC_NOIO is
working around, and we've recently added it to XFS to silence all
the lockdep warnings using vm_map_ram in GFP_NOFS contexts have been
causing us....

> > And, again, PF_FSTRANS looks plainly wrong in this code - it sure
> > isn't a fs transaction context we are worried about here...
> 
> So would PF_MEMALLOC_NOFS work for you?

Better than PF_FSTRANS, that's for sure ;)

Cheers,

Dave.
diff mbox

Patch

diff --git a/mm/percpu.c b/mm/percpu.c
index 036cfe07050f..77dd24032f41 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -712,6 +712,7 @@  static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved)
 	int slot, off, new_alloc;
 	unsigned long flags;
 	void __percpu *ptr;
+	unsigned int pflags;
 
 	if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE)) {
 		WARN(true, "illegal size (%zu) or align (%zu) for "
@@ -720,6 +721,7 @@  static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved)
 	}
 
 	mutex_lock(&pcpu_alloc_mutex);
+	current_set_flags_nested(&pflags, PF_FSTRANS);
 	spin_lock_irqsave(&pcpu_lock, flags);
 
 	/* serve reserved allocations from the reserved chunk if available */
@@ -801,6 +803,7 @@  area_found:
 		goto fail_unlock;
 	}
 
+	current_restore_flags_nested(&pflags, PF_FSTRANS);
 	mutex_unlock(&pcpu_alloc_mutex);
 
 	/* return address relative to base address */
@@ -811,6 +814,7 @@  area_found:
 fail_unlock:
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 fail_unlock_mutex:
+	current_restore_flags_nested(&pflags, PF_FSTRANS);
 	mutex_unlock(&pcpu_alloc_mutex);
 	if (warn_limit) {
 		pr_warning("PERCPU: allocation failed, size=%zu align=%zu, "