diff mbox series

xfs: eliminate lockdep false positives in xfs_attr_shortform_list

Message ID 20240622082631.2661148-1-leo.lilong@huawei.com (mailing list archive)
State Queued
Headers show
Series xfs: eliminate lockdep false positives in xfs_attr_shortform_list | expand

Commit Message

Long Li June 22, 2024, 8:26 a.m. UTC
xfs_attr_shortform_list() only called from a non-transactional context, it
hold ilock before alloc memory and maybe trapped in memory reclaim. Since
commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
false positives by use __GFP_NOLOCKDEP to alloc memory
in xfs_attr_shortform_list().

[1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
 fs/xfs/xfs_attr_list.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Darrick J. Wong June 24, 2024, 4:03 p.m. UTC | #1
On Sat, Jun 22, 2024 at 04:26:31PM +0800, Long Li wrote:
> xfs_attr_shortform_list() only called from a non-transactional context, it
> hold ilock before alloc memory and maybe trapped in memory reclaim. Since
> commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
> GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
> false positives by use __GFP_NOLOCKDEP to alloc memory
> in xfs_attr_shortform_list().
> 
> [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
> Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/xfs_attr_list.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 5c947e5ce8b8..8cd6088e6190 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -114,7 +114,8 @@ xfs_attr_shortform_list(
>  	 * It didn't all fit, so we have to sort everything on hashval.
>  	 */
>  	sbsize = sf->count * sizeof(*sbuf);
> -	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
> +	sbp = sbuf = kmalloc(sbsize,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);

Why wouldn't we memalloc_nofs_save any time we take an ILOCK when we're
not in transaction context?  Surely you'd want to NOFS /any/ allocation
when the ILOCK is held, right?

--D

>  
>  	/*
>  	 * Scan the attribute list for the rest of the entries, storing
> -- 
> 2.39.2
> 
>
Long Li June 25, 2024, 2:10 p.m. UTC | #2
On Mon, Jun 24, 2024 at 09:03:42AM -0700, Darrick J. Wong wrote:
> On Sat, Jun 22, 2024 at 04:26:31PM +0800, Long Li wrote:
> > xfs_attr_shortform_list() only called from a non-transactional context, it
> > hold ilock before alloc memory and maybe trapped in memory reclaim. Since
> > commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
> > GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
> > false positives by use __GFP_NOLOCKDEP to alloc memory
> > in xfs_attr_shortform_list().
> > 
> > [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
> > Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> >  fs/xfs/xfs_attr_list.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > index 5c947e5ce8b8..8cd6088e6190 100644
> > --- a/fs/xfs/xfs_attr_list.c
> > +++ b/fs/xfs/xfs_attr_list.c
> > @@ -114,7 +114,8 @@ xfs_attr_shortform_list(
> >  	 * It didn't all fit, so we have to sort everything on hashval.
> >  	 */
> >  	sbsize = sf->count * sizeof(*sbuf);
> > -	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
> > +	sbp = sbuf = kmalloc(sbsize,
> > +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
> 
> Why wouldn't we memalloc_nofs_save any time we take an ILOCK when we're
> not in transaction context?  Surely you'd want to NOFS /any/ allocation
> when the ILOCK is held, right?
> 
> --D
> 
> 

I believe using memalloc_nofs_save could solve the problem, sometimes it may be 
more effective than using the __GFP_NOLOCKDEP flag. However, looking at similar
functions, for example xfs_btree_alloc_cursor, it uses __GFP_NOLOCKDEP to prevent
ABBA deadlock false positive warnings.

xfs_attr_list_ilocked
  xfs_iread_extents
    xfs_bmbt_init_cursor
      xfs_btree_alloc_cursor
        kmem_cache_zalloc(cache, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL)

After thinking a little more, I found out that just using __GFP_NOLOCKDEP may
not be enough, AA deadlock false positive warnings [1] still exist in the
mainline kernel if my understanding is correct.

[1] https://lore.kernel.org/linux-xfs/20240622094411.GA830005@ceph-admin/T/#m6f7ab8438bf82f0dc44c6d42d183ae08c07dcd5f

thanks,
Long Li
Eric Sandeen July 8, 2024, 3:40 p.m. UTC | #3
On 6/24/24 11:03 AM, Darrick J. Wong wrote:
> On Sat, Jun 22, 2024 at 04:26:31PM +0800, Long Li wrote:
>> xfs_attr_shortform_list() only called from a non-transactional context, it
>> hold ilock before alloc memory and maybe trapped in memory reclaim. Since
>> commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
>> GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
>> false positives by use __GFP_NOLOCKDEP to alloc memory
>> in xfs_attr_shortform_list().
>>
>> [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
>> Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
>> Signed-off-by: Long Li <leo.lilong@huawei.com>
>> ---
>>  fs/xfs/xfs_attr_list.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
>> index 5c947e5ce8b8..8cd6088e6190 100644
>> --- a/fs/xfs/xfs_attr_list.c
>> +++ b/fs/xfs/xfs_attr_list.c
>> @@ -114,7 +114,8 @@ xfs_attr_shortform_list(
>>  	 * It didn't all fit, so we have to sort everything on hashval.
>>  	 */
>>  	sbsize = sf->count * sizeof(*sbuf);
>> -	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
>> +	sbp = sbuf = kmalloc(sbsize,
>> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
> 
> Why wouldn't we memalloc_nofs_save any time we take an ILOCK when we're
> not in transaction context?  Surely you'd want to NOFS /any/ allocation
> when the ILOCK is held, right?

I'm not sure I understand this. AFAICT, this is indeed a false positive, and can
be fixed by applying exactly the same pattern used elsewhere in
94a69db2367e ("xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS")

Using memalloc_nofs_save implies that this really /would/ deadlock without
GFP_NOFS, right? Is that the case?

I was under the impression that this was simply a missed callsite in 94a69db2367e
and as Long Li points out, other allocations under xfs_attr_list_ilocked()
use the exact same (GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL) pattern
proposed in this change.

Thanks,
-Eric
Dave Chinner July 8, 2024, 10:38 p.m. UTC | #4
On Mon, Jul 08, 2024 at 12:00:05PM -0700, Darrick J. Wong wrote:
> On Mon, Jul 08, 2024 at 10:40:37AM -0500, Eric Sandeen wrote:
> > On 6/24/24 11:03 AM, Darrick J. Wong wrote:
> > > On Sat, Jun 22, 2024 at 04:26:31PM +0800, Long Li wrote:
> > >> xfs_attr_shortform_list() only called from a non-transactional context, it
> > >> hold ilock before alloc memory and maybe trapped in memory reclaim. Since
> > >> commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
> > >> GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
> > >> false positives by use __GFP_NOLOCKDEP to alloc memory
> > >> in xfs_attr_shortform_list().
> > >>
> > >> [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
> > >> Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
> > >> Signed-off-by: Long Li <leo.lilong@huawei.com>
> > >> ---
> > >>  fs/xfs/xfs_attr_list.c | 3 ++-
> > >>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > >> index 5c947e5ce8b8..8cd6088e6190 100644
> > >> --- a/fs/xfs/xfs_attr_list.c
> > >> +++ b/fs/xfs/xfs_attr_list.c
> > >> @@ -114,7 +114,8 @@ xfs_attr_shortform_list(
> > >>  	 * It didn't all fit, so we have to sort everything on hashval.
> > >>  	 */
> > >>  	sbsize = sf->count * sizeof(*sbuf);
> > >> -	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
> > >> +	sbp = sbuf = kmalloc(sbsize,
> > >> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
> > > 
> > > Why wouldn't we memalloc_nofs_save any time we take an ILOCK when we're
> > > not in transaction context?  Surely you'd want to NOFS /any/ allocation
> > > when the ILOCK is held, right?
> > 
> > I'm not sure I understand this. AFAICT, this is indeed a false positive, and can
> > be fixed by applying exactly the same pattern used elsewhere in
> > 94a69db2367e ("xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS")
> > 
> > Using memalloc_nofs_save implies that this really /would/ deadlock without
> > GFP_NOFS, right? Is that the case?
> > 
> > I was under the impression that this was simply a missed callsite in 94a69db2367e
> > and as Long Li points out, other allocations under xfs_attr_list_ilocked()
> > use the exact same (GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL) pattern
> > proposed in this change.
> 
> Oh, now I see that the alleged deadlock is between the ILOCK of a
> directory that we're accessing, and a different inode that we're trying
> to reclaim.  Lockdep doesn't know that these two contexts are mutually
> exclusive since reclaim cannot target an inode with an active ref.  NOFS
> is a big hammer, which is why the proposal is to turn off lockdep for
> the allocation?  Why not fix lockdep's tracking?
> 
> <sees another thread>
> https://lore.kernel.org/linux-xfs/Zou8FCgPKqqWXKyS@dread.disaster.area/
> 
> We can't use an ILOCK subclass for the reclaim code because we've run
> out of lockdep subclasses.  I guess you could abuse lockdep_set_class to
> change the lockdep class of an ILOCK when the inode enters reclaim (and
> change it back if the inode gets recycled) but that's a bit gross.
> 
> What if we got rid of XFS_ILOCK_RT{BITMAP,SUMMARY} to free up subclass
> bits?
> 
> https://lore.kernel.org/linux-xfs/?q=xfs%3A+remove+XFS_ILOCK_RT

Yes, that would probably work - all we need is a single subclass for
the ilock to say reclaim locking is a different context. There
should only be one lock site that we need that annotation for
(the final xfs_ilock() in xfs_reclaim_inode() after the inode has
been removed from the radix tree), and we don't need nesting because
we are only locking a single inode at a time.

-Dave.
Dave Chinner Nov. 21, 2024, 4 a.m. UTC | #5
[cc Carlos]

This still needs to be fixed.

On Sat, Jun 22, 2024 at 04:26:31PM +0800, Long Li wrote:
> xfs_attr_shortform_list() only called from a non-transactional context, it
> hold ilock before alloc memory and maybe trapped in memory reclaim. Since
> commit 204fae32d5f7("xfs: clean up remaining GFP_NOFS users") removed
> GFP_NOFS flag, lockdep warning will be report as [1]. Eliminate lockdep
> false positives by use __GFP_NOLOCKDEP to alloc memory
> in xfs_attr_shortform_list().
> 
> [1] https://lore.kernel.org/linux-xfs/000000000000e33add0616358204@google.com/
> Reported-by: syzbot+4248e91deb3db78358a2@syzkaller.appspotmail.com
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
>  fs/xfs/xfs_attr_list.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 5c947e5ce8b8..8cd6088e6190 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -114,7 +114,8 @@ xfs_attr_shortform_list(
>  	 * It didn't all fit, so we have to sort everything on hashval.
>  	 */
>  	sbsize = sf->count * sizeof(*sbuf);
> -	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
> +	sbp = sbuf = kmalloc(sbsize,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  	/*
>  	 * Scan the attribute list for the rest of the entries, storing

Reviewed-by: Dave Chinner <dchinner@redhat.com>

Carlos, can you please pick this patch up? We're still getting new
lockdep false positives being reported from this issue, and this is
the correct fix to make right now.

-Dave.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 5c947e5ce8b8..8cd6088e6190 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -114,7 +114,8 @@  xfs_attr_shortform_list(
 	 * It didn't all fit, so we have to sort everything on hashval.
 	 */
 	sbsize = sf->count * sizeof(*sbuf);
-	sbp = sbuf = kmalloc(sbsize, GFP_KERNEL | __GFP_NOFAIL);
+	sbp = sbuf = kmalloc(sbsize,
+			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
 
 	/*
 	 * Scan the attribute list for the rest of the entries, storing