Message ID | 20210928140750.1274441-1-houtao1@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kernfs: also call kernfs_set_rev() for positive dentry | expand |
On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote: > A KMSAN warning is reported by Alexander Potapenko: I haven't looked at this thoroughly yet I admit but ... I'm very sorry but again I don't quite agree with what's done in the patch. > > BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 > fs/kernfs/dir.c:1053 > kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 > d_revalidate fs/namei.c:854 > lookup_dcache fs/namei.c:1522 > __lookup_hash+0x3a6/0x590 fs/namei.c:1543 > filename_create+0x312/0x7c0 fs/namei.c:3657 > do_mkdirat+0x103/0x930 fs/namei.c:3900 > __do_sys_mkdir fs/namei.c:3931 > __se_sys_mkdir fs/namei.c:3929 > __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 > do_syscall_x64 arch/x86/entry/common.c:51 > > It seems a positive dentry in kernfs becomes a negative dentry directly > through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized > when accessing it in kernfs_dop_revalidate(), because it is only > initialized when created as negative dentry in kernfs_iop_lookup(). It looks more like the final dput() that's sending the dentry to the LRU list than the call to d_delete(). But I could be mistaken and it's just a detail. I think the real question is do we want kernfs ->rmdir() to utilize the VFS negative dentry facility. Clearly, based in the kernfs ->rmdir() op function definition kernfs doesn't want file systems that use it to be able to decide that. Unless there is a case for allowing dentries to be re-used I think they should be dropped in kernfs_iop_rmdir(). > > The problem can be reproduced by the following command: > > cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi And immediately recreating the directory is probably not a common use case so it doesn't provide a case for kernfs ->rmdir() not dropping the dentry to prevent the dentry making it to the LRU negative dentry list. The more likely case is an unnecessary accumulation of negative dentries from file system activity with occasional negative dentry re-use which might be undesirable for long running systems. > > A simple fixes seems to be initializing d->d_time for positive dentry > in kernfs_iop_lookup() as well. The downside is the negative dentry > will be revalidated again after it becomes negative in d_delete(), > because the revison of its parent must have been increased due to > its removal. I did that for a long time is the series before deciding to do it only for negatives ... The usual case of directory removal is the dentry is dropped by the ->rmdir() function either directly or via d_invalidate() to prevent possible incorrect initialization on re-create. But this is probably unlikely for kernfs attributes of the same name so I'm not certain dropping the dentry is in fact what really should be done here. I admit I did miss this case. > > Alternative solution is implement .d_iput for kernfs, and assign d_time > for the newly-generated negative dentry in it. But we may need to > take kernfs_rwsem to protect again the concurrent kernfs_link_sibling() > on the parent directory, it is a little over-killing. Now the simple > fix is chosen. Please don't even consider this, it's misleading because here we are concerned with the dentry not the inode so I don't think adding ->d_iput() is the right thing to do. > > Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 > Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") > Reported-by: Alexander Potapenko <glider@google.com> > Signed-off-by: Hou Tao <houtao1@huawei.com> > --- > fs/kernfs/dir.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index 0c7f1558f489..f7618ba5b3b2 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -1140,8 +1140,13 @@ static struct dentry *kernfs_iop_lookup(struct > inode *dir, > if (!inode) > inode = ERR_PTR(-ENOMEM); > } > - /* Needed only for negative dentry validation */ > - if (!inode) > + /* > + * Needed for negative dentry validation. > + * The negative dentry can be created in kernfs_iop_lookup() > + * or transforms from positive dentry in dentry_unlink_inode() > + * called from vfs_rmdir(). > + */ > + if (!IS_ERR(inode)) > kernfs_set_rev(parent, dentry); > up_read(&kernfs_rwsem); > This is not a bad idea to do and is what I did for quite a while too but maybe it would be better to do this: diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index a957c944cf3a..3fd9d8fa4b92 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir, struct dentry *dentry) return -ENODEV; ret = scops->rmdir(kn); + if (!ret) + d_invalidate(dentry); kernfs_put_active(kn); return ret;
On Wed, 2021-09-29 at 18:09 +0800, Ian Kent wrote: > On Tue, 2021-09-28 at 22:07 +0800, Hou Tao wrote: > > A KMSAN warning is reported by Alexander Potapenko: > > I haven't looked at this thoroughly yet I admit but ... > > I'm very sorry but again I don't quite agree with what's done > in the patch. > > > > > BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 > > fs/kernfs/dir.c:1053 > > kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 > > d_revalidate fs/namei.c:854 > > lookup_dcache fs/namei.c:1522 > > __lookup_hash+0x3a6/0x590 fs/namei.c:1543 > > filename_create+0x312/0x7c0 fs/namei.c:3657 > > do_mkdirat+0x103/0x930 fs/namei.c:3900 > > __do_sys_mkdir fs/namei.c:3931 > > __se_sys_mkdir fs/namei.c:3929 > > __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 > > do_syscall_x64 arch/x86/entry/common.c:51 > > > > It seems a positive dentry in kernfs becomes a negative dentry > > directly > > through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized > > when accessing it in kernfs_dop_revalidate(), because it is only > > initialized when created as negative dentry in kernfs_iop_lookup(). > > It looks more like the final dput() that's sending the dentry to > the LRU list than the call to d_delete(). > > But I could be mistaken and it's just a detail. > > I think the real question is do we want kernfs ->rmdir() to utilize > the VFS negative dentry facility. > > Clearly, based in the kernfs ->rmdir() op function definition kernfs > doesn't want file systems that use it to be able to decide that. > > Unless there is a case for allowing dentries to be re-used I think > they should be dropped in kernfs_iop_rmdir(). > > > > > The problem can be reproduced by the following command: > > > > cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat > > hi > > And immediately recreating the directory is probably not a common > use case so it doesn't provide a case for kernfs ->rmdir() not > dropping the dentry to prevent the dentry making it to the LRU > negative dentry list. > > The more likely case is an unnecessary accumulation of negative > dentries from file system activity with occasional negative dentry > re-use which might be undesirable for long running systems. > > > > > A simple fixes seems to be initializing d->d_time for positive > > dentry > > in kernfs_iop_lookup() as well. The downside is the negative dentry > > will be revalidated again after it becomes negative in d_delete(), > > because the revison of its parent must have been increased due to > > its removal. > > I did that for a long time is the series before deciding to do > it only for negatives ... > > The usual case of directory removal is the dentry is dropped by > the ->rmdir() function either directly or via d_invalidate() to > prevent possible incorrect initialization on re-create. > > But this is probably unlikely for kernfs attributes of the same > name so I'm not certain dropping the dentry is in fact what really > should be done here. > > I admit I did miss this case. > > > > > Alternative solution is implement .d_iput for kernfs, and assign > > d_time > > for the newly-generated negative dentry in it. But we may need to > > take kernfs_rwsem to protect again the concurrent > > kernfs_link_sibling() > > on the parent directory, it is a little over-killing. Now the > > simple > > fix is chosen. > > Please don't even consider this, it's misleading because here we > are concerned with the dentry not the inode so I don't think adding > ->d_iput() is the right thing to do. > > > > > Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 > > Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") > > Reported-by: Alexander Potapenko <glider@google.com> > > Signed-off-by: Hou Tao <houtao1@huawei.com> > > --- > > fs/kernfs/dir.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > > index 0c7f1558f489..f7618ba5b3b2 100644 > > --- a/fs/kernfs/dir.c > > +++ b/fs/kernfs/dir.c > > @@ -1140,8 +1140,13 @@ static struct dentry > > *kernfs_iop_lookup(struct > > inode *dir, > > if (!inode) > > inode = ERR_PTR(-ENOMEM); > > } > > - /* Needed only for negative dentry validation */ > > - if (!inode) > > + /* > > + * Needed for negative dentry validation. > > + * The negative dentry can be created in > > kernfs_iop_lookup() > > + * or transforms from positive dentry in > > dentry_unlink_inode() > > + * called from vfs_rmdir(). > > + */ > > + if (!IS_ERR(inode)) > > kernfs_set_rev(parent, dentry); > > up_read(&kernfs_rwsem); > > > > This is not a bad idea to do and is what I did for quite a while too > but maybe it would be better to do this: I see Greg has pushed the above patch to Linus. After thinking about it this is a good call, after all the reported bug was about a lack of initialization and that's specifically what the patch addresses. Ian > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c > index a957c944cf3a..3fd9d8fa4b92 100644 > --- a/fs/kernfs/dir.c > +++ b/fs/kernfs/dir.c > @@ -1165,6 +1165,8 @@ static int kernfs_iop_rmdir(struct inode *dir, > struct dentry *dentry) > return -ENODEV; > > ret = scops->rmdir(kn); > + if (!ret) > + d_invalidate(dentry); > > kernfs_put_active(kn); > return ret; >
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 0c7f1558f489..f7618ba5b3b2 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1140,8 +1140,13 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir, if (!inode) inode = ERR_PTR(-ENOMEM); } - /* Needed only for negative dentry validation */ - if (!inode) + /* + * Needed for negative dentry validation. + * The negative dentry can be created in kernfs_iop_lookup() + * or transforms from positive dentry in dentry_unlink_inode() + * called from vfs_rmdir(). + */ + if (!IS_ERR(inode)) kernfs_set_rev(parent, dentry); up_read(&kernfs_rwsem);
A KMSAN warning is reported by Alexander Potapenko: BUG: KMSAN: uninit-value in kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 kernfs_dop_revalidate+0x61f/0x840 fs/kernfs/dir.c:1053 d_revalidate fs/namei.c:854 lookup_dcache fs/namei.c:1522 __lookup_hash+0x3a6/0x590 fs/namei.c:1543 filename_create+0x312/0x7c0 fs/namei.c:3657 do_mkdirat+0x103/0x930 fs/namei.c:3900 __do_sys_mkdir fs/namei.c:3931 __se_sys_mkdir fs/namei.c:3929 __x64_sys_mkdir+0xda/0x120 fs/namei.c:3929 do_syscall_x64 arch/x86/entry/common.c:51 It seems a positive dentry in kernfs becomes a negative dentry directly through d_delete() in vfs_rmdir(). dentry->d_time is uninitialized when accessing it in kernfs_dop_revalidate(), because it is only initialized when created as negative dentry in kernfs_iop_lookup(). The problem can be reproduced by the following command: cd /sys/fs/cgroup/pids && mkdir hi && stat hi && rmdir hi && stat hi A simple fixes seems to be initializing d->d_time for positive dentry in kernfs_iop_lookup() as well. The downside is the negative dentry will be revalidated again after it becomes negative in d_delete(), because the revison of its parent must have been increased due to its removal. Alternative solution is implement .d_iput for kernfs, and assign d_time for the newly-generated negative dentry in it. But we may need to take kernfs_rwsem to protect again the concurrent kernfs_link_sibling() on the parent directory, it is a little over-killing. Now the simple fix is chosen. Link: https://marc.info/?l=linux-fsdevel&m=163249838610499 Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching") Reported-by: Alexander Potapenko <glider@google.com> Signed-off-by: Hou Tao <houtao1@huawei.com> --- fs/kernfs/dir.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)