Message ID | 20231228201510.985235-1-trondmy@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | knfsd: fix the fallback implementation of the get_name export operation | expand |
[CC: fsdevel, viro] On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > The fallback implementation for the get_name export operation uses > readdir() to try to match the inode number to a filename. That filename > is then used together with lookup_one() to produce a dentry. > A problem arises when we match the '.' or '..' entries, since that > causes lookup_one() to fail. This has sometimes been seen to occur for > filesystems that violate POSIX requirements around uniqueness of inode > numbers, something that is common for snapshot directories. Ouch. Nasty. Looks to me like the root cause is "filesystems that violate POSIX requirements around uniqueness of inode numbers". This violation can cause any of the parent's children to wrongly match get_name() not only '.' and '..' and fail the d_inode sanity check after lookup_one(). I understand why this would be common with parent of snapshot dir, but the only fs that support snapshots that I know of (btrfs, bcachefs) do implement ->get_name(), so which filesystem did you encounter this behavior with? can it be fixed by implementing a snapshot aware ->get_name()? > > This patch just ensures that we skip '.' and '..' rather than allowing a > match. I agree that skipping '.' and '..' makes sense, but... > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") ...This Fixes is a bit odd to me. Does the problem go away if the Fixes patch is reverted? I don't think so, I think you would just hit the d_inode sanity check after lookup_one() succeeds. Maybe I did not understand the problem then. > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/exportfs/expfs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > index 3ae0154c5680..84af58eaf2ca 100644 > --- a/fs/exportfs/expfs.c > +++ b/fs/exportfs/expfs.c > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, > container_of(ctx, struct getdents_callback, ctx); > > buf->sequence++; > - if (buf->ino == ino && len <= NAME_MAX) { > + /* Ignore the '.' and '..' entries */ > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && I wish I did not have to review that this condition is correct. I wish there was a common helper is_dot_or_dotdot() that would be used here as !is_dot_dotdot(name, len). I found 3 copies of is_dot_dotdot(). I didn't even try to find how many places have open coded this. Thanks, Amir.
On Thu, 2023-12-28 at 15:15 -0500, trondmy@kernel.org wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > The fallback implementation for the get_name export operation uses > readdir() to try to match the inode number to a filename. That filename > is then used together with lookup_one() to produce a dentry. > A problem arises when we match the '.' or '..' entries, since that > causes lookup_one() to fail. This has sometimes been seen to occur for > filesystems that violate POSIX requirements around uniqueness of inode > numbers, something that is common for snapshot directories. > > This patch just ensures that we skip '.' and '..' rather than allowing a > match. > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > --- > fs/exportfs/expfs.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > index 3ae0154c5680..84af58eaf2ca 100644 > --- a/fs/exportfs/expfs.c > +++ b/fs/exportfs/expfs.c > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, > container_of(ctx, struct getdents_callback, ctx); > > buf->sequence++; > - if (buf->ino == ino && len <= NAME_MAX) { > + /* Ignore the '.' and '..' entries */ > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && > + buf->ino == ino && len <= NAME_MAX) { > memcpy(buf->name, name, len); > buf->name[len] = '\0'; > buf->found = 1; Reviewed-by: Jeff Layton <jlayton@kernel.org>
On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > [CC: fsdevel, viro] Thanks for picking this up, Amir, and for copying viro/fsdevel. I was planning to repost this next week when more folks are back, but this works too. Trond, if you'd like, I can handle review changes if you don't have time to follow up. > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > The fallback implementation for the get_name export operation uses > > readdir() to try to match the inode number to a filename. That filename > > is then used together with lookup_one() to produce a dentry. > > A problem arises when we match the '.' or '..' entries, since that > > causes lookup_one() to fail. This has sometimes been seen to occur for > > filesystems that violate POSIX requirements around uniqueness of inode > > numbers, something that is common for snapshot directories. > > Ouch. Nasty. > > Looks to me like the root cause is "filesystems that violate POSIX > requirements around uniqueness of inode numbers". > This violation can cause any of the parent's children to wrongly match > get_name() not only '.' and '..' and fail the d_inode sanity check after > lookup_one(). > > I understand why this would be common with parent of snapshot dir, > but the only fs that support snapshots that I know of (btrfs, bcachefs) > do implement ->get_name(), so which filesystem did you encounter > this behavior with? can it be fixed by implementing a snapshot > aware ->get_name()? > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > match. > > I agree that skipping '.' and '..' makes sense, but... Does skipping '.' and '..' make sense for file systems that do indeed guarantee inode number uniqueness? Given your explanation here, I'm wondering whether the generic get_name() function is the right place to address the issue. > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > ...This Fixes is a bit odd to me. Me too, but I didn't see a more obvious choice. Maybe drop the specific Fixes: tag in favor of just Cc: stable. > Does the problem go away if the Fixes patch is reverted? > I don't think so, I think you would just hit the d_inode sanity check > after lookup_one() succeeds. > Maybe I did not understand the problem then. > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/exportfs/expfs.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > > index 3ae0154c5680..84af58eaf2ca 100644 > > --- a/fs/exportfs/expfs.c > > +++ b/fs/exportfs/expfs.c > > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, > > container_of(ctx, struct getdents_callback, ctx); > > > > buf->sequence++; > > - if (buf->ino == ino && len <= NAME_MAX) { > > + /* Ignore the '.' and '..' entries */ > > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && > > I wish I did not have to review that this condition is correct. > I wish there was a common helper is_dot_or_dotdot() that would be > used here as !is_dot_dotdot(name, len). > I found 3 copies of is_dot_dotdot(). > I didn't even try to find how many places have open coded this.
On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote: > [CC: fsdevel, viro] > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > The fallback implementation for the get_name export operation uses > > readdir() to try to match the inode number to a filename. That > > filename > > is then used together with lookup_one() to produce a dentry. > > A problem arises when we match the '.' or '..' entries, since that > > causes lookup_one() to fail. This has sometimes been seen to occur > > for > > filesystems that violate POSIX requirements around uniqueness of > > inode > > numbers, something that is common for snapshot directories. > > Ouch. Nasty. > > Looks to me like the root cause is "filesystems that violate POSIX > requirements around uniqueness of inode numbers". > This violation can cause any of the parent's children to wrongly > match > get_name() not only '.' and '..' and fail the d_inode sanity check > after > lookup_one(). > > I understand why this would be common with parent of snapshot dir, > but the only fs that support snapshots that I know of (btrfs, > bcachefs) > do implement ->get_name(), so which filesystem did you encounter > this behavior with? can it be fixed by implementing a snapshot > aware ->get_name()? NFS (i.e. re-exporting NFS). Why do you not want a fix in the generic code? > > > > > This patch just ensures that we skip '.' and '..' rather than > > allowing a > > match. > > I agree that skipping '.' and '..' makes sense, but... > > > > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > ...This Fixes is a bit odd to me. > Does the problem go away if the Fixes patch is reverted? > I don't think so, I think you would just hit the d_inode sanity check > after lookup_one() succeeds. > Maybe I did not understand the problem then. > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > --- > > fs/exportfs/expfs.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c > > index 3ae0154c5680..84af58eaf2ca 100644 > > --- a/fs/exportfs/expfs.c > > +++ b/fs/exportfs/expfs.c > > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context > > *ctx, const char *name, int len, > > container_of(ctx, struct getdents_callback, ctx); > > > > buf->sequence++; > > - if (buf->ino == ino && len <= NAME_MAX) { > > + /* Ignore the '.' and '..' entries */ > > + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != > > '.')) && > > I wish I did not have to review that this condition is correct. > I wish there was a common helper is_dot_or_dotdot() that would be > used here as !is_dot_dotdot(name, len). > I found 3 copies of is_dot_dotdot(). > I didn't even try to find how many places have open coded this. > > Thanks, > Amir. >
On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > [CC: fsdevel, viro] > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > was planning to repost this next week when more folks are back, but > this works too. > > Trond, if you'd like, I can handle review changes if you don't have > time to follow up. > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > The fallback implementation for the get_name export operation uses > > > readdir() to try to match the inode number to a filename. That filename > > > is then used together with lookup_one() to produce a dentry. > > > A problem arises when we match the '.' or '..' entries, since that > > > causes lookup_one() to fail. This has sometimes been seen to occur for > > > filesystems that violate POSIX requirements around uniqueness of inode > > > numbers, something that is common for snapshot directories. > > > > Ouch. Nasty. > > > > Looks to me like the root cause is "filesystems that violate POSIX > > requirements around uniqueness of inode numbers". > > This violation can cause any of the parent's children to wrongly match > > get_name() not only '.' and '..' and fail the d_inode sanity check after > > lookup_one(). > > > > I understand why this would be common with parent of snapshot dir, > > but the only fs that support snapshots that I know of (btrfs, bcachefs) > > do implement ->get_name(), so which filesystem did you encounter > > this behavior with? can it be fixed by implementing a snapshot > > aware ->get_name()? > > > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > > match. > > > > I agree that skipping '.' and '..' makes sense, but... > > Does skipping '.' and '..' make sense for file systems that do It makes sense because if the child's name in its parent would have been "." or ".." it would have been its own parent or its own grandparent (ELOOP situation). IOW, we can safely skip "." and "..", regardless of anything else. > indeed guarantee inode number uniqueness? Given your explanation > here, I'm wondering whether the generic get_name() function is the > right place to address the issue. > > > > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..") > > > > ...This Fixes is a bit odd to me. > > Me too, but I didn't see a more obvious choice. Maybe drop the > specific Fixes: tag in favor of just Cc: stable. > Yeh, that'd be fine. Thanks, Amir.
On Fri, Dec 29, 2023 at 5:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote: > > [CC: fsdevel, viro] > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > The fallback implementation for the get_name export operation uses > > > readdir() to try to match the inode number to a filename. That > > > filename > > > is then used together with lookup_one() to produce a dentry. > > > A problem arises when we match the '.' or '..' entries, since that > > > causes lookup_one() to fail. This has sometimes been seen to occur > > > for > > > filesystems that violate POSIX requirements around uniqueness of > > > inode > > > numbers, something that is common for snapshot directories. > > > > Ouch. Nasty. > > > > Looks to me like the root cause is "filesystems that violate POSIX > > requirements around uniqueness of inode numbers". > > This violation can cause any of the parent's children to wrongly > > match > > get_name() not only '.' and '..' and fail the d_inode sanity check > > after > > lookup_one(). > > > > I understand why this would be common with parent of snapshot dir, > > but the only fs that support snapshots that I know of (btrfs, > > bcachefs) > > do implement ->get_name(), so which filesystem did you encounter > > this behavior with? can it be fixed by implementing a snapshot > > aware ->get_name()? > > NFS (i.e. re-exporting NFS). > Ah. > Why do you not want a fix in the generic code? > I do not object to your fix at all. I only objected to the Fixes tag. I was just pointing out that this is not a complete solution. A decode of an NFS (re-exported) file handle could fail if get_name() iterates the parent of a snapshot root dir and finds a false match (which is not "." nor "..") before it finds the snapshot subdir name. It may be solved by nfs_get_name() which does not stop when if finds an ino match but checks further, but I don't know nfs re-export to know what else could be checked. Anyway, for this patch, without the Fixes tag, feel free to add: Acked-by: Amir Goldstein <amir73il@gmail.com> I'd prefer the use of is_dot_dotdot(), but I do not insist. Thanks and a happy new year! Amir.
On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote: > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > [CC: fsdevel, viro] > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > was planning to repost this next week when more folks are back, but > > this works too. > > > > Trond, if you'd like, I can handle review changes if you don't have > > time to follow up. > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > The fallback implementation for the get_name export operation uses > > > > readdir() to try to match the inode number to a filename. That filename > > > > is then used together with lookup_one() to produce a dentry. > > > > A problem arises when we match the '.' or '..' entries, since that > > > > causes lookup_one() to fail. This has sometimes been seen to occur for > > > > filesystems that violate POSIX requirements around uniqueness of inode > > > > numbers, something that is common for snapshot directories. > > > > > > Ouch. Nasty. > > > > > > Looks to me like the root cause is "filesystems that violate POSIX > > > requirements around uniqueness of inode numbers". > > > This violation can cause any of the parent's children to wrongly match > > > get_name() not only '.' and '..' and fail the d_inode sanity check after > > > lookup_one(). > > > > > > I understand why this would be common with parent of snapshot dir, > > > but the only fs that support snapshots that I know of (btrfs, bcachefs) > > > do implement ->get_name(), so which filesystem did you encounter > > > this behavior with? can it be fixed by implementing a snapshot > > > aware ->get_name()? > > > > > > > This patch just ensures that we skip '.' and '..' rather than allowing a > > > > match. > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > Does skipping '.' and '..' make sense for file systems that do > > It makes sense because if the child's name in its parent would > have been "." or ".." it would have been its own parent or its own > grandparent (ELOOP situation). > IOW, we can safely skip "." and "..", regardless of anything else. This new comment: + /* Ignore the '.' and '..' entries */ then seems inadequate to explain why dot and dot-dot are now never matched. Perhaps the function's documenting comment could expand on this a little. I'll give it some thought. > > indeed guarantee inode number uniqueness? Given your explanation > > here, I'm wondering whether the generic get_name() function is the > > right place to address the issue.
On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > <chuck.lever@oracle.com> wrote: > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > > [CC: fsdevel, viro] > > > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > > was planning to repost this next week when more folks are back, > > > but > > > this works too. > > > > > > Trond, if you'd like, I can handle review changes if you don't > > > have > > > time to follow up. > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > The fallback implementation for the get_name export operation > > > > > uses > > > > > readdir() to try to match the inode number to a filename. > > > > > That filename > > > > > is then used together with lookup_one() to produce a dentry. > > > > > A problem arises when we match the '.' or '..' entries, since > > > > > that > > > > > causes lookup_one() to fail. This has sometimes been seen to > > > > > occur for > > > > > filesystems that violate POSIX requirements around uniqueness > > > > > of inode > > > > > numbers, something that is common for snapshot directories. > > > > > > > > Ouch. Nasty. > > > > > > > > Looks to me like the root cause is "filesystems that violate > > > > POSIX > > > > requirements around uniqueness of inode numbers". > > > > This violation can cause any of the parent's children to > > > > wrongly match > > > > get_name() not only '.' and '..' and fail the d_inode sanity > > > > check after > > > > lookup_one(). > > > > > > > > I understand why this would be common with parent of snapshot > > > > dir, > > > > but the only fs that support snapshots that I know of (btrfs, > > > > bcachefs) > > > > do implement ->get_name(), so which filesystem did you > > > > encounter > > > > this behavior with? can it be fixed by implementing a snapshot > > > > aware ->get_name()? > > > > > > > > > This patch just ensures that we skip '.' and '..' rather than > > > > > allowing a > > > > > match. > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > Does skipping '.' and '..' make sense for file systems that do > > > > It makes sense because if the child's name in its parent would > > have been "." or ".." it would have been its own parent or its own > > grandparent (ELOOP situation). > > IOW, we can safely skip "." and "..", regardless of anything else. > > This new comment: > > + /* Ignore the '.' and '..' entries */ > > then seems inadequate to explain why dot and dot-dot are now never > matched. Perhaps the function's documenting comment could expand on > this a little. I'll give it some thought. The point of this code is to attempt to create a valid path that connects the inode found by the filehandle to the export point. The readdir() must determine a valid name for a dentry that is a component of that path, which is why '.' and '..' can never be acceptable. This is why I think we should keep the 'Fixes:' line. The commit it points to explains quite concisely why this patch is needed. > > > > > indeed guarantee inode number uniqueness? Given your explanation > > > here, I'm wondering whether the generic get_name() function is > > > the > > > right place to address the issue. >
On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > <chuck.lever@oracle.com> wrote: > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote: > > > > > [CC: fsdevel, viro] > > > > > > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I > > > > was planning to repost this next week when more folks are back, > > > > but > > > > this works too. > > > > > > > > Trond, if you'd like, I can handle review changes if you don't > > > > have > > > > time to follow up. > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote: > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > The fallback implementation for the get_name export operation > > > > > > uses > > > > > > readdir() to try to match the inode number to a filename. > > > > > > That filename > > > > > > is then used together with lookup_one() to produce a dentry. > > > > > > A problem arises when we match the '.' or '..' entries, since > > > > > > that > > > > > > causes lookup_one() to fail. This has sometimes been seen to > > > > > > occur for > > > > > > filesystems that violate POSIX requirements around uniqueness > > > > > > of inode > > > > > > numbers, something that is common for snapshot directories. > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > Looks to me like the root cause is "filesystems that violate > > > > > POSIX > > > > > requirements around uniqueness of inode numbers". > > > > > This violation can cause any of the parent's children to > > > > > wrongly match > > > > > get_name() not only '.' and '..' and fail the d_inode sanity > > > > > check after > > > > > lookup_one(). > > > > > > > > > > I understand why this would be common with parent of snapshot > > > > > dir, > > > > > but the only fs that support snapshots that I know of (btrfs, > > > > > bcachefs) > > > > > do implement ->get_name(), so which filesystem did you > > > > > encounter > > > > > this behavior with? can it be fixed by implementing a snapshot > > > > > aware ->get_name()? > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather than > > > > > > allowing a > > > > > > match. > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > Does skipping '.' and '..' make sense for file systems that do > > > > > > It makes sense because if the child's name in its parent would > > > have been "." or ".." it would have been its own parent or its own > > > grandparent (ELOOP situation). > > > IOW, we can safely skip "." and "..", regardless of anything else. > > > > This new comment: > > > > + /* Ignore the '.' and '..' entries */ > > > > then seems inadequate to explain why dot and dot-dot are now never > > matched. Perhaps the function's documenting comment could expand on > > this a little. I'll give it some thought. > > The point of this code is to attempt to create a valid path that > connects the inode found by the filehandle to the export point. The > readdir() must determine a valid name for a dentry that is a component > of that path, which is why '.' and '..' can never be acceptable. > > This is why I think we should keep the 'Fixes:' line. The commit it > points to explains quite concisely why this patch is needed. > By all means, mention this commit, just not with a fixed tag please. IIUC, commit 21d8a15ac333 did not introduce a regression that this patch fixes. Right? So why insist on abusing Fixes: tag instead of a mention? Thanks, Amir.
On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote: > On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust > <trondmy@hammerspace.com> wrote: > > > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > > <chuck.lever@oracle.com> wrote: > > > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein > > > > > wrote: > > > > > > [CC: fsdevel, viro] > > > > > > > > > > Thanks for picking this up, Amir, and for copying > > > > > viro/fsdevel. I > > > > > was planning to repost this next week when more folks are > > > > > back, > > > > > but > > > > > this works too. > > > > > > > > > > Trond, if you'd like, I can handle review changes if you > > > > > don't > > > > > have > > > > > time to follow up. > > > > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> > > > > > > wrote: > > > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > > > The fallback implementation for the get_name export > > > > > > > operation > > > > > > > uses > > > > > > > readdir() to try to match the inode number to a filename. > > > > > > > That filename > > > > > > > is then used together with lookup_one() to produce a > > > > > > > dentry. > > > > > > > A problem arises when we match the '.' or '..' entries, > > > > > > > since > > > > > > > that > > > > > > > causes lookup_one() to fail. This has sometimes been seen > > > > > > > to > > > > > > > occur for > > > > > > > filesystems that violate POSIX requirements around > > > > > > > uniqueness > > > > > > > of inode > > > > > > > numbers, something that is common for snapshot > > > > > > > directories. > > > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > > > Looks to me like the root cause is "filesystems that > > > > > > violate > > > > > > POSIX > > > > > > requirements around uniqueness of inode numbers". > > > > > > This violation can cause any of the parent's children to > > > > > > wrongly match > > > > > > get_name() not only '.' and '..' and fail the d_inode > > > > > > sanity > > > > > > check after > > > > > > lookup_one(). > > > > > > > > > > > > I understand why this would be common with parent of > > > > > > snapshot > > > > > > dir, > > > > > > but the only fs that support snapshots that I know of > > > > > > (btrfs, > > > > > > bcachefs) > > > > > > do implement ->get_name(), so which filesystem did you > > > > > > encounter > > > > > > this behavior with? can it be fixed by implementing a > > > > > > snapshot > > > > > > aware ->get_name()? > > > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather > > > > > > > than > > > > > > > allowing a > > > > > > > match. > > > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > > > Does skipping '.' and '..' make sense for file systems that > > > > > do > > > > > > > > It makes sense because if the child's name in its parent would > > > > have been "." or ".." it would have been its own parent or its > > > > own > > > > grandparent (ELOOP situation). > > > > IOW, we can safely skip "." and "..", regardless of anything > > > > else. > > > > > > This new comment: > > > > > > + /* Ignore the '.' and '..' entries */ > > > > > > then seems inadequate to explain why dot and dot-dot are now > > > never > > > matched. Perhaps the function's documenting comment could expand > > > on > > > this a little. I'll give it some thought. > > > > The point of this code is to attempt to create a valid path that > > connects the inode found by the filehandle to the export point. The > > readdir() must determine a valid name for a dentry that is a > > component > > of that path, which is why '.' and '..' can never be acceptable. > > > > This is why I think we should keep the 'Fixes:' line. The commit it > > points to explains quite concisely why this patch is needed. > > > > By all means, mention this commit, just not with a fixed tag please. > IIUC, commit 21d8a15ac333 did not introduce a regression that this > patch fixes. Right? > So why insist on abusing Fixes: tag instead of a mention? I don't see it as being that straightforward. Prior to commit 21d8a15ac333, the call to lookup_one_len() could return a dentry (albeit one with an invalid name) depending on whether or not the filesystem lookup succeeds. Note that knfsd does support a lookup of "." and "..", as do several other NFS servers. With commit 21d8a15ac333 applied, however, lookup_one_len() automatically returns an EACCES error. So while I agree that there are good reasons for introducing commit 21d8a15ac333, it does change the behaviour in this code path.
On Sat, Dec 30, 2023 at 9:36 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote: > > On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote: > > > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote: > > > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever > > > > > <chuck.lever@oracle.com> wrote: > > > > > > > > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein > > > > > > wrote: > > > > > > > [CC: fsdevel, viro] > > > > > > > > > > > > Thanks for picking this up, Amir, and for copying > > > > > > viro/fsdevel. I > > > > > > was planning to repost this next week when more folks are > > > > > > back, > > > > > > but > > > > > > this works too. > > > > > > > > > > > > Trond, if you'd like, I can handle review changes if you > > > > > > don't > > > > > > have > > > > > > time to follow up. > > > > > > > > > > > > > > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> > > > > > > > wrote: > > > > > > > > > > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > > > > > > > > > > > > > The fallback implementation for the get_name export > > > > > > > > operation > > > > > > > > uses > > > > > > > > readdir() to try to match the inode number to a filename. > > > > > > > > That filename > > > > > > > > is then used together with lookup_one() to produce a > > > > > > > > dentry. > > > > > > > > A problem arises when we match the '.' or '..' entries, > > > > > > > > since > > > > > > > > that > > > > > > > > causes lookup_one() to fail. This has sometimes been seen > > > > > > > > to > > > > > > > > occur for > > > > > > > > filesystems that violate POSIX requirements around > > > > > > > > uniqueness > > > > > > > > of inode > > > > > > > > numbers, something that is common for snapshot > > > > > > > > directories. > > > > > > > > > > > > > > Ouch. Nasty. > > > > > > > > > > > > > > Looks to me like the root cause is "filesystems that > > > > > > > violate > > > > > > > POSIX > > > > > > > requirements around uniqueness of inode numbers". > > > > > > > This violation can cause any of the parent's children to > > > > > > > wrongly match > > > > > > > get_name() not only '.' and '..' and fail the d_inode > > > > > > > sanity > > > > > > > check after > > > > > > > lookup_one(). > > > > > > > > > > > > > > I understand why this would be common with parent of > > > > > > > snapshot > > > > > > > dir, > > > > > > > but the only fs that support snapshots that I know of > > > > > > > (btrfs, > > > > > > > bcachefs) > > > > > > > do implement ->get_name(), so which filesystem did you > > > > > > > encounter > > > > > > > this behavior with? can it be fixed by implementing a > > > > > > > snapshot > > > > > > > aware ->get_name()? > > > > > > > > > > > > > > > This patch just ensures that we skip '.' and '..' rather > > > > > > > > than > > > > > > > > allowing a > > > > > > > > match. > > > > > > > > > > > > > > I agree that skipping '.' and '..' makes sense, but... > > > > > > > > > > > > Does skipping '.' and '..' make sense for file systems that > > > > > > do > > > > > > > > > > It makes sense because if the child's name in its parent would > > > > > have been "." or ".." it would have been its own parent or its > > > > > own > > > > > grandparent (ELOOP situation). > > > > > IOW, we can safely skip "." and "..", regardless of anything > > > > > else. > > > > > > > > This new comment: > > > > > > > > + /* Ignore the '.' and '..' entries */ > > > > > > > > then seems inadequate to explain why dot and dot-dot are now > > > > never > > > > matched. Perhaps the function's documenting comment could expand > > > > on > > > > this a little. I'll give it some thought. > > > > > > The point of this code is to attempt to create a valid path that > > > connects the inode found by the filehandle to the export point. The > > > readdir() must determine a valid name for a dentry that is a > > > component > > > of that path, which is why '.' and '..' can never be acceptable. > > > > > > This is why I think we should keep the 'Fixes:' line. The commit it > > > points to explains quite concisely why this patch is needed. > > > > > > > By all means, mention this commit, just not with a fixed tag please. > > IIUC, commit 21d8a15ac333 did not introduce a regression that this > > patch fixes. Right? > > So why insist on abusing Fixes: tag instead of a mention? > > I don't see it as being that straightforward. > > Prior to commit 21d8a15ac333, the call to lookup_one_len() could return > a dentry (albeit one with an invalid name) depending on whether or not > the filesystem lookup succeeds. Note that knfsd does support a lookup > of "." and "..", as do several other NFS servers. > > With commit 21d8a15ac333 applied, however, lookup_one_len() > automatically returns an EACCES error. > > So while I agree that there are good reasons for introducing commit > 21d8a15ac333, it does change the behaviour in this code path. > I feel that we are miscommunicating. Let me explain how I understand the code and please tell me where I am wrong. The way I see it, before 21d8a15ac333, exportfs_decode_fh_raw() would call lookup_one() and may get a dentry (with invalid name), but then the sanity check following lookup_one() would surely fail, because no fs should allow a directory to be its own parent/grandparent: if (unlikely(nresult->d_inode != result->d_inode)) { dput(nresult); nresult = ERR_PTR(-ESTALE); } The way I see it, the only thing that commit 21d8a15ac333 changed in this code is the return value of exportfs_decode_fh_raw() from -ESTALE to -EACCES. exportfs_decode_fh() converts both these errors to -ESTALE and so does nfsd_set_fh_dentry(). Bottom line, if I am reading the code correctly, commit 21d8a15ac333 did not change the behaviour for knfsd nor any user visible behavior for open_by_handle_at() for userspace nfsd. Your fix is good because: 1. It saves an unneeded call to lookup_one() 2. skipping "." and ".." increases the chance of finding the correct child name in the case of non-unique ino So I have no objection to your fix in generic code, but I do not see it being a regression fix. Where are we miscommunicating? What am I missing? Thanks, Amir.
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 3ae0154c5680..84af58eaf2ca 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len, container_of(ctx, struct getdents_callback, ctx); buf->sequence++; - if (buf->ino == ino && len <= NAME_MAX) { + /* Ignore the '.' and '..' entries */ + if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) && + buf->ino == ino && len <= NAME_MAX) { memcpy(buf->name, name, len); buf->name[len] = '\0'; buf->found = 1;