diff mbox series

kernfs: also call kernfs_set_rev() for positive dentry

Message ID 20210928140750.1274441-1-houtao1@huawei.com (mailing list archive)
State New, archived
Headers show
Series kernfs: also call kernfs_set_rev() for positive dentry | expand

Commit Message

Hou Tao Sept. 28, 2021, 2:07 p.m. UTC
A KMSAN warning is reported by Alexander Potapenko:

BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
fs/kernfs/dir.c:1053
 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
 d_revalidate fs/namei.c:854
 lookup_dcache fs/namei.c:1522
 __lookup_hash+0x3a6/0x590 fs/namei.c:1543
 filename_create+0x312/0x7c0 fs/namei.c:3657
 do_mkdirat+0x103/0x930 fs/namei.c:3900
 __do_sys_mkdir fs/namei.c:3931
 __se_sys_mkdir fs/namei.c:3929
 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
 do_syscall_x64 arch/x86/entry/common.c:51

It seems a positive dentry in kernfs becomes a negative dentry directly
through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
when accessing it in kernfs_dop_revalidate(), because it is only
initialized when created as negative dentry in kernfs_iop_lookup().

The problem can be reproduced by the following command:

  cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi

A simple fixes seems to be initializing d->d_time for positive dentry
in kernfs_iop_lookup() as well. The downside is the negative dentry
will be revalidated again after it becomes negative in d_delete(),
because the revison of its parent must have been increased due to
its removal.

Alternative solution is implement .d_iput for kernfs, and assign d_time
for the newly-generated negative dentry in it. But we may need to
take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
on the parent directory, it is a little over-killing. Now the simple
fix is chosen.

Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching")
Reported-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/kernfs/dir.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Ian Kent Sept. 29, 2021, 10:09 a.m. UTC | #1
On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote:
> A KMSAN warning is reported by Alexander Potapenko:

I haven't looked at this thoroughly yet I admit but ...

I'm very sorry but again I don't quite agree with what's done
in the patch.

> 
> BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
> fs/kernfs/dir.c:1053
>  kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
>  d_revalidate fs/namei.c:854
>  lookup_dcache fs/namei.c:1522
>  __lookup_hash+0x3a6/0x590 fs/namei.c:1543
>  filename_create+0x312/0x7c0 fs/namei.c:3657
>  do_mkdirat+0x103/0x930 fs/namei.c:3900
>  __do_sys_mkdir fs/namei.c:3931
>  __se_sys_mkdir fs/namei.c:3929
>  __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
>  do_syscall_x64 arch/x86/entry/common.c:51
> 
> It seems a positive dentry in kernfs becomes a negative dentry directly
> through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
> when accessing it in kernfs_dop_revalidate(), because it is only
> initialized when created as negative dentry in kernfs_iop_lookup().

It looks more like the final dput() that's sending the dentry to
the LRU list than the call to d_delete().

But I could be mistaken and it's just a detail.

I think the real question is do we want kernfs ->rmdir() to utilize
the VFS negative dentry facility.

Clearly, based in the kernfs ->rmdir() op function definition kernfs
doesn't want file systems that use it to be able to decide that.

Unless there is a case for allowing dentries to be re-used I think
they should be dropped in kernfs_iop_rmdir().

> 
> The problem can be reproduced by the following command:
> 
>   cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi

And immediately recreating the directory is probably not a common
use case so it doesn't provide a case for kernfs ->rmdir() not
dropping the dentry to prevent the dentry making it to the LRU
negative dentry list.

The more likely case is an unnecessary accumulation of negative
dentries from file system activity with occasional negative dentry
re-use which might be undesirable for long running systems.

> 
> A simple fixes seems to be initializing d->d_time for positive dentry
> in kernfs_iop_lookup() as well. The downside is the negative dentry
> will be revalidated again after it becomes negative in d_delete(),
> because the revison of its parent must have been increased due to
> its removal.

I did that for a long time is the series before deciding to do
it only for negatives ...

The usual case of directory removal is the dentry is dropped by
the ->rmdir() function either directly or via d_invalidate() to
prevent possible incorrect initialization on re-create.

But this is probably unlikely for kernfs attributes of the same
name so I'm not certain dropping the dentry is in fact what really
should be done here.

I admit I did miss this case.

> 
> Alternative solution is implement .d_iput for kernfs, and assign d_time
> for the newly-generated negative dentry in it. But we may need to
> take kernfs_rwsem to protect again the concurrent kernfs_link_sibling()
> on the parent directory, it is a little over-killing. Now the simple
> fix is chosen.

Please don't even consider this, it's misleading because here we
are concerned with the dentry not the inode so I don't think adding
->d_iput() is the right thing to do.

> 
> Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
> Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching")
> Reported-by: Alexander Potapenko <glider@google.com>
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  fs/kernfs/dir.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 0c7f1558f489..f7618ba5b3b2 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1140,8 +1140,13 @@ static struct dentry *kernfs_iop_lookup(struct
> inode *dir,
>                 if (!inode)
>                         inode = ERR_PTR(-ENOMEM);
>         }
> -       /* Needed only for negative dentry validation */
> -       if (!inode)
> +       /*
> +        * Needed for negative dentry validation.
> +        * The negative dentry can be created in kernfs_iop_lookup()
> +        * or transforms from positive dentry in dentry_unlink_inode()
> +        * called from vfs_rmdir().
> +        */
> +       if (!IS_ERR(inode))
>                 kernfs_set_rev(parent, dentry);
>         up_read(&kernfs_rwsem);
>  

This is not a bad idea to do and is what I did for quite a while too
but maybe it would be better to do this:

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index a957c944cf3a..3fd9d8fa4b92 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry)
                return -ENODEV;
 
        ret = scops->rmdir(kn);
+       if (!ret)
+               d_invalidate(dentry);
 
        kernfs_put_active(kn);
        return ret;
Ian Kent Oct. 4, 2021, 6:40 a.m. UTC | #2
On Wed, 2021-09-29 at 18:09 +0800, Ian Kent wrote:
> On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote:
> > A KMSAN warning is reported by Alexander Potapenko:
> 
> I haven't looked at this thoroughly yet I admit but ...
> 
> I'm very sorry but again I don't quite agree with what's done
> in the patch.
> 
> > 
> > BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840
> > fs/kernfs/dir.c:1053
> >  kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053
> >  d_revalidate fs/namei.c:854
> >  lookup_dcache fs/namei.c:1522
> >  __lookup_hash+0x3a6/0x590 fs/namei.c:1543
> >  filename_create+0x312/0x7c0 fs/namei.c:3657
> >  do_mkdirat+0x103/0x930 fs/namei.c:3900
> >  __do_sys_mkdir fs/namei.c:3931
> >  __se_sys_mkdir fs/namei.c:3929
> >  __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929
> >  do_syscall_x64 arch/x86/entry/common.c:51
> > 
> > It seems a positive dentry in kernfs becomes a negative dentry
> > directly
> > through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized
> > when accessing it in kernfs_dop_revalidate(), because it is only
> > initialized when created as negative dentry in kernfs_iop_lookup().
> 
> It looks more like the final dput() that's sending the dentry to
> the LRU list than the call to d_delete().
> 
> But I could be mistaken and it's just a detail.
> 
> I think the real question is do we want kernfs ->rmdir() to utilize
> the VFS negative dentry facility.
> 
> Clearly, based in the kernfs ->rmdir() op function definition kernfs
> doesn't want file systems that use it to be able to decide that.
> 
> Unless there is a case for allowing dentries to be re-used I think
> they should be dropped in kernfs_iop_rmdir().
> 
> > 
> > The problem can be reproduced by the following command:
> > 
> >   cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat
> > hi
> 
> And immediately recreating the directory is probably not a common
> use case so it doesn't provide a case for kernfs ->rmdir() not
> dropping the dentry to prevent the dentry making it to the LRU
> negative dentry list.
> 
> The more likely case is an unnecessary accumulation of negative
> dentries from file system activity with occasional negative dentry
> re-use which might be undesirable for long running systems.
> 
> > 
> > A simple fixes seems to be initializing d->d_time for positive
> > dentry
> > in kernfs_iop_lookup() as well. The downside is the negative dentry
> > will be revalidated again after it becomes negative in d_delete(),
> > because the revison of its parent must have been increased due to
> > its removal.
> 
> I did that for a long time is the series before deciding to do
> it only for negatives ...
> 
> The usual case of directory removal is the dentry is dropped by
> the ->rmdir() function either directly or via d_invalidate() to
> prevent possible incorrect initialization on re-create.
> 
> But this is probably unlikely for kernfs attributes of the same
> name so I'm not certain dropping the dentry is in fact what really
> should be done here.
> 
> I admit I did miss this case.
> 
> > 
> > Alternative solution is implement .d_iput for kernfs, and assign
> > d_time
> > for the newly-generated negative dentry in it. But we may need to
> > take kernfs_rwsem to protect again the concurrent
> > kernfs_link_sibling()
> > on the parent directory, it is a little over-killing. Now the
> > simple
> > fix is chosen.
> 
> Please don't even consider this, it's misleading because here we
> are concerned with the dentry not the inode so I don't think adding
> ->d_iput() is the right thing to do.
> 
> > 
> > Link: https://marc.info/?l=linux-fsdevel&m=163249838610499
> > Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching")
> > Reported-by: Alexander Potapenko <glider@google.com>
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  fs/kernfs/dir.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > index 0c7f1558f489..f7618ba5b3b2 100644
> > --- a/fs/kernfs/dir.c
> > +++ b/fs/kernfs/dir.c
> > @@ -1140,8 +1140,13 @@ static struct dentry
> > *kernfs_iop_lookup(struct
> > inode *dir,
> >                 if (!inode)
> >                         inode = ERR_PTR(-ENOMEM);
> >         }
> > -       /* Needed only for negative dentry validation */
> > -       if (!inode)
> > +       /*
> > +        * Needed for negative dentry validation.
> > +        * The negative dentry can be created in
> > kernfs_iop_lookup()
> > +        * or transforms from positive dentry in
> > dentry_unlink_inode()
> > +        * called from vfs_rmdir().
> > +        */
> > +       if (!IS_ERR(inode))
> >                 kernfs_set_rev(parent, dentry);
> >         up_read(&kernfs_rwsem);
> >  
> 
> This is not a bad idea to do and is what I did for quite a while too
> but maybe it would be better to do this:

I see Greg has pushed the above patch to Linus.

After thinking about it this is a good call, after all the reported
bug was about a lack of initialization and that's specifically what
the patch addresses.

Ian
> 
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index a957c944cf3a..3fd9d8fa4b92 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir,
> struct dentry *dentry)
>                 return -ENODEV;
>  
>         ret = scops->rmdir(kn);
> +       if (!ret)
> +               d_invalidate(dentry);
>  
>         kernfs_put_active(kn);
>         return ret;
>
diff mbox series

Patch

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index 0c7f1558f489..f7618ba5b3b2 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -1140,8 +1140,13 @@  static struct dentry *kernfs_iop_lookup(struct inode *dir,
 		if (!inode)
 			inode = ERR_PTR(-ENOMEM);
 	}
-	/* Needed only for negative dentry validation */
-	if (!inode)
+	/*
+	 * Needed for negative dentry validation.
+	 * The negative dentry can be created in kernfs_iop_lookup()
+	 * or transforms from positive dentry in dentry_unlink_inode()
+	 * called from vfs_rmdir().
+	 */
+	if (!IS_ERR(inode))
 		kernfs_set_rev(parent, dentry);
 	up_read(&kernfs_rwsem);