Message ID | 20221213180826.216690-1-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nfsd: fix handling of readdir in v4root vs. mount upcall timeout | expand |
> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: > > If v4 READDIR operation hits a mountpoint and gets back an error, > then it will include that entry in the reply and set RDATTR_ERROR for it > to the error. > > That's fine for "normal" exported filesystems, but on the v4root, we > need to be more careful to only expose the existence of dentries that > lead to exports. > > If the mountd upcall times out while checking to see whether a > mountpoint on the v4root is exported, then we have no recourse other > than to fail the whole operation. Thank you for chasing this down! Failing the whole READDIR when mountd times out might be a bad idea. If the mountd upcall times out every time, the client can't make any progress and will continue to emit the failing READDIR request. Would it be better to skip the unresolvable entry instead and let the READDIR succeed without that entry? > Cc: Steve Dickson <steved@redhat.com> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 > Reported-by: JianHong Yin <yin-jianhong@163.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/nfsd/nfs4xdr.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 2b4ae858c89b..984528ce8d68 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -3588,6 +3588,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, > struct readdir_cd *ccd = ccdv; > struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common); > struct xdr_stream *xdr = cd->xdr; > + struct svc_export *exp = cd->rd_fhp->fh_export; > int start_offset = xdr->buf->len; > int cookie_offset; > u32 name_and_cookie; > @@ -3629,6 +3630,17 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, > case nfserr_noent: > xdr_truncate_encode(xdr, start_offset); > goto skip_entry; > + case nfserr_jukebox: > + /* > + * The pseudoroot should only display dentries that lead to > + * exports. If we get EJUKEBOX here, then we can't tell whether > + * this entry should be included. Just fail the whole READDIR > + * with NFS4ERR_DELAY in that case, and hope that the situation > + * will resolve itself by the client's next attempt. > + */ > + if (exp->ex_flags & NFSEXP_V4ROOT) > + goto fail; > + fallthrough; > default: > /* > * If the client requested the RDATTR_ERROR attribute, > -- > 2.38.1 > -- Chuck Lever
On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > If v4 READDIR operation hits a mountpoint and gets back an error, > > then it will include that entry in the reply and set RDATTR_ERROR for it > > to the error. > > > > That's fine for "normal" exported filesystems, but on the v4root, we > > need to be more careful to only expose the existence of dentries that > > lead to exports. > > > > If the mountd upcall times out while checking to see whether a > > mountpoint on the v4root is exported, then we have no recourse other > > than to fail the whole operation. > > Thank you for chasing this down! > > Failing the whole READDIR when mountd times out might be a bad idea. > If the mountd upcall times out every time, the client can't make > any progress and will continue to emit the failing READDIR request. > > Would it be better to skip the unresolvable entry instead and let > the READDIR succeed without that entry? > Mounting doesn't usually require working READDIR. In that situation, a readdir() might hang (until the client kills), but a lookup of other dentries that aren't perpetually stalled should be ok in this situation. If mountd is that hosed then I think it's unlikely that any progress will be possible anyway. > > > Cc: Steve Dickson <steved@redhat.com> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 > > Reported-by: JianHong Yin <yin-jianhong@163.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > fs/nfsd/nfs4xdr.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > index 2b4ae858c89b..984528ce8d68 100644 > > --- a/fs/nfsd/nfs4xdr.c > > +++ b/fs/nfsd/nfs4xdr.c > > @@ -3588,6 +3588,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, > > struct readdir_cd *ccd = ccdv; > > struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common); > > struct xdr_stream *xdr = cd->xdr; > > + struct svc_export *exp = cd->rd_fhp->fh_export; > > int start_offset = xdr->buf->len; > > int cookie_offset; > > u32 name_and_cookie; > > @@ -3629,6 +3630,17 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, > > case nfserr_noent: > > xdr_truncate_encode(xdr, start_offset); > > goto skip_entry; > > + case nfserr_jukebox: > > + /* > > + * The pseudoroot should only display dentries that lead to > > + * exports. If we get EJUKEBOX here, then we can't tell whether > > + * this entry should be included. Just fail the whole READDIR > > + * with NFS4ERR_DELAY in that case, and hope that the situation > > + * will resolve itself by the client's next attempt. > > + */ > > + if (exp->ex_flags & NFSEXP_V4ROOT) > > + goto fail; > > + fallthrough; > > default: > > /* > > * If the client requested the RDATTR_ERROR attribute, > > -- > > 2.38.1 > > > > -- > Chuck Lever > > >
On 14/12/22 04:02, Jeff Layton wrote: > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>> >>> If v4 READDIR operation hits a mountpoint and gets back an error, >>> then it will include that entry in the reply and set RDATTR_ERROR for it >>> to the error. >>> >>> That's fine for "normal" exported filesystems, but on the v4root, we >>> need to be more careful to only expose the existence of dentries that >>> lead to exports. >>> >>> If the mountd upcall times out while checking to see whether a >>> mountpoint on the v4root is exported, then we have no recourse other >>> than to fail the whole operation. >> Thank you for chasing this down! >> >> Failing the whole READDIR when mountd times out might be a bad idea. >> If the mountd upcall times out every time, the client can't make >> any progress and will continue to emit the failing READDIR request. >> >> Would it be better to skip the unresolvable entry instead and let >> the READDIR succeed without that entry? >> > Mounting doesn't usually require working READDIR. In that situation, a > readdir() might hang (until the client kills), but a lookup of other > dentries that aren't perpetually stalled should be ok in this situation. > > If mountd is that hosed then I think it's unlikely that any progress > will be possible anyway. The READDIR shouldn't trigger a mount yes, but if it's a valid automount point (basically a valid dentry in this case I think) it should be listed. It certainly shouldn't hold up the READDIR, passing into it is when a mount should occur. That's usually the behavior we want for automounts, we don't want mount storms on directories full of automount points. Ian
On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: > On 14/12/22 04:02, Jeff Layton wrote: > > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: > > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > If v4 READDIR operation hits a mountpoint and gets back an error, > > > > then it will include that entry in the reply and set RDATTR_ERROR for it > > > > to the error. > > > > > > > > That's fine for "normal" exported filesystems, but on the v4root, we > > > > need to be more careful to only expose the existence of dentries that > > > > lead to exports. > > > > > > > > If the mountd upcall times out while checking to see whether a > > > > mountpoint on the v4root is exported, then we have no recourse other > > > > than to fail the whole operation. > > > Thank you for chasing this down! > > > > > > Failing the whole READDIR when mountd times out might be a bad idea. > > > If the mountd upcall times out every time, the client can't make > > > any progress and will continue to emit the failing READDIR request. > > > > > > Would it be better to skip the unresolvable entry instead and let > > > the READDIR succeed without that entry? > > > > > Mounting doesn't usually require working READDIR. In that situation, a > > readdir() might hang (until the client kills), but a lookup of other > > dentries that aren't perpetually stalled should be ok in this situation. > > > > If mountd is that hosed then I think it's unlikely that any progress > > will be possible anyway. > > The READDIR shouldn't trigger a mount yes, but if it's a valid automount > > point (basically a valid dentry in this case I think) it should be listed. > > It certainly shouldn't hold up the READDIR, passing into it is when a > > mount should occur. > > > That's usually the behavior we want for automounts, we don't want mount > > storms on directories full of automount points. > We only want to display it if it's a valid _exported_ mountpoint. The idea here is to only reveal the parts of the namespace that are exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- only exported mountpoints and ancestor directories of those mountpoints. We don't want mountd triggering automounts, in general. If the underlying filesystem was exported, then it should also already be mounted, since nfsd doesn't currently trigger automounts in follow_down(). There is also a separate patchset by Richard Weinberger to allow nfsd to trigger automounts if the parent filesystem is exported with -o crossmnt. That should be ok with this patch, since the automount will be triggered before the upcall to mountd. That should ensure that it's already mounted by the time we get to upcalling for its export.
On 14/12/22 08:39, Jeff Layton wrote: > On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >> On 14/12/22 04:02, Jeff Layton wrote: >>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>> >>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>> then it will include that entry in the reply and set RDATTR_ERROR for it >>>>> to the error. >>>>> >>>>> That's fine for "normal" exported filesystems, but on the v4root, we >>>>> need to be more careful to only expose the existence of dentries that >>>>> lead to exports. >>>>> >>>>> If the mountd upcall times out while checking to see whether a >>>>> mountpoint on the v4root is exported, then we have no recourse other >>>>> than to fail the whole operation. >>>> Thank you for chasing this down! >>>> >>>> Failing the whole READDIR when mountd times out might be a bad idea. >>>> If the mountd upcall times out every time, the client can't make >>>> any progress and will continue to emit the failing READDIR request. >>>> >>>> Would it be better to skip the unresolvable entry instead and let >>>> the READDIR succeed without that entry? >>>> >>> Mounting doesn't usually require working READDIR. In that situation, a >>> readdir() might hang (until the client kills), but a lookup of other >>> dentries that aren't perpetually stalled should be ok in this situation. >>> >>> If mountd is that hosed then I think it's unlikely that any progress >>> will be possible anyway. >> The READDIR shouldn't trigger a mount yes, but if it's a valid automount >> >> point (basically a valid dentry in this case I think) it should be listed. >> >> It certainly shouldn't hold up the READDIR, passing into it is when a >> >> mount should occur. >> >> >> That's usually the behavior we want for automounts, we don't want mount >> >> storms on directories full of automount points. >> > > We only want to display it if it's a valid _exported_ mountpoint. > > The idea here is to only reveal the parts of the namespace that are > exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- > only exported mountpoints and ancestor directories of those mountpoints. > > We don't want mountd triggering automounts, in general. If the > underlying filesystem was exported, then it should also already be > mounted, since nfsd doesn't currently trigger automounts in > follow_down(). Umm ... must they already be mounted? Can't it be a valid mount point either not yet mounted or timed out and umounted. In that case shouldn't it be listed, I know that's not the that good an outcome because its stat info will change when it gets walked into but it's usually the only sane choice. > > There is also a separate patchset by Richard Weinberger to allow nfsd to > trigger automounts if the parent filesystem is exported with -o > crossmnt. That should be ok with this patch, since the automount will be > triggered before the upcall to mountd. That should ensure that it's > already mounted by the time we get to upcalling for its export. Yep, saw that, ;) Ian
> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote: > > On 14/12/22 08:39, Jeff Layton wrote: >> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >>> On 14/12/22 04:02, Jeff Layton wrote: >>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>>> >>>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it >>>>>> to the error. >>>>>> >>>>>> That's fine for "normal" exported filesystems, but on the v4root, we >>>>>> need to be more careful to only expose the existence of dentries that >>>>>> lead to exports. >>>>>> >>>>>> If the mountd upcall times out while checking to see whether a >>>>>> mountpoint on the v4root is exported, then we have no recourse other >>>>>> than to fail the whole operation. >>>>> Thank you for chasing this down! >>>>> >>>>> Failing the whole READDIR when mountd times out might be a bad idea. >>>>> If the mountd upcall times out every time, the client can't make >>>>> any progress and will continue to emit the failing READDIR request. >>>>> >>>>> Would it be better to skip the unresolvable entry instead and let >>>>> the READDIR succeed without that entry? >>>>> >>>> Mounting doesn't usually require working READDIR. In that situation, a >>>> readdir() might hang (until the client kills), but a lookup of other >>>> dentries that aren't perpetually stalled should be ok in this situation. >>>> >>>> If mountd is that hosed then I think it's unlikely that any progress >>>> will be possible anyway. >>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount >>> >>> point (basically a valid dentry in this case I think) it should be listed. >>> >>> It certainly shouldn't hold up the READDIR, passing into it is when a >>> >>> mount should occur. >>> >>> >>> That's usually the behavior we want for automounts, we don't want mount >>> >>> storms on directories full of automount points. >>> >> >> We only want to display it if it's a valid _exported_ mountpoint. >> >> The idea here is to only reveal the parts of the namespace that are >> exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- >> only exported mountpoints and ancestor directories of those mountpoints. >> >> We don't want mountd triggering automounts, in general. If the >> underlying filesystem was exported, then it should also already be >> mounted, since nfsd doesn't currently trigger automounts in >> follow_down(). > > Umm ... must they already be mounted? > > > Can't it be a valid mount point either not yet mounted or timed out > > and umounted. In that case shouldn't it be listed, I know that's > > not the that good an outcome because its stat info will change when > > it gets walked into but it's usually the only sane choice. > > >> >> There is also a separate patchset by Richard Weinberger to allow nfsd to >> trigger automounts if the parent filesystem is exported with -o >> crossmnt. That should be ok with this patch, since the automount will be >> triggered before the upcall to mountd. That should ensure that it's >> already mounted by the time we get to upcalling for its export. > > Yep, saw that, ;) I'm not sure if there is consensus on this patch. It's been pushed to nfsd's for-rc branch for wider testing, but if there's a strong objection I can pull it out before the next -rc PR. -- Chuck Lever
> On Jan 1, 2023, at 1:09 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > >> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote: >> >> On 14/12/22 08:39, Jeff Layton wrote: >>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >>>> On 14/12/22 04:02, Jeff Layton wrote: >>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>>>> >>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it >>>>>>> to the error. >>>>>>> >>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we >>>>>>> need to be more careful to only expose the existence of dentries that >>>>>>> lead to exports. >>>>>>> >>>>>>> If the mountd upcall times out while checking to see whether a >>>>>>> mountpoint on the v4root is exported, then we have no recourse other >>>>>>> than to fail the whole operation. >>>>>> Thank you for chasing this down! >>>>>> >>>>>> Failing the whole READDIR when mountd times out might be a bad idea. >>>>>> If the mountd upcall times out every time, the client can't make >>>>>> any progress and will continue to emit the failing READDIR request. >>>>>> >>>>>> Would it be better to skip the unresolvable entry instead and let >>>>>> the READDIR succeed without that entry? >>>>>> >>>>> Mounting doesn't usually require working READDIR. In that situation, a >>>>> readdir() might hang (until the client kills), but a lookup of other >>>>> dentries that aren't perpetually stalled should be ok in this situation. >>>>> >>>>> If mountd is that hosed then I think it's unlikely that any progress >>>>> will be possible anyway. >>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount >>>> >>>> point (basically a valid dentry in this case I think) it should be listed. >>>> >>>> It certainly shouldn't hold up the READDIR, passing into it is when a >>>> >>>> mount should occur. >>>> >>>> >>>> That's usually the behavior we want for automounts, we don't want mount >>>> >>>> storms on directories full of automount points. >>>> >>> >>> We only want to display it if it's a valid _exported_ mountpoint. >>> >>> The idea here is to only reveal the parts of the namespace that are >>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- >>> only exported mountpoints and ancestor directories of those mountpoints. >>> >>> We don't want mountd triggering automounts, in general. If the >>> underlying filesystem was exported, then it should also already be >>> mounted, since nfsd doesn't currently trigger automounts in >>> follow_down(). >> >> Umm ... must they already be mounted? >> >> >> Can't it be a valid mount point either not yet mounted or timed out >> >> and umounted. In that case shouldn't it be listed, I know that's >> >> not the that good an outcome because its stat info will change when >> >> it gets walked into but it's usually the only sane choice. >> >> >>> >>> There is also a separate patchset by Richard Weinberger to allow nfsd to >>> trigger automounts if the parent filesystem is exported with -o >>> crossmnt. That should be ok with this patch, since the automount will be >>> triggered before the upcall to mountd. That should ensure that it's >>> already mounted by the time we get to upcalling for its export. >> >> Yep, saw that, ;) > > I'm not sure if there is consensus on this patch. > > It's been pushed to nfsd's for-rc branch for wider testing, but if > there's a strong objection I can pull it out before the next -rc PR. Also, do we agree that it should get a "Cc: stable" tag? -- Chuck Lever
On Wed, 2022-12-14 at 13:37 +0800, Ian Kent wrote: > On 14/12/22 08:39, Jeff Layton wrote: > > On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: > > > On 14/12/22 04:02, Jeff Layton wrote: > > > > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: > > > > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > > > If v4 READDIR operation hits a mountpoint and gets back an error, > > > > > > then it will include that entry in the reply and set RDATTR_ERROR for it > > > > > > to the error. > > > > > > > > > > > > That's fine for "normal" exported filesystems, but on the v4root, we > > > > > > need to be more careful to only expose the existence of dentries that > > > > > > lead to exports. > > > > > > > > > > > > If the mountd upcall times out while checking to see whether a > > > > > > mountpoint on the v4root is exported, then we have no recourse other > > > > > > than to fail the whole operation. > > > > > Thank you for chasing this down! > > > > > > > > > > Failing the whole READDIR when mountd times out might be a bad idea. > > > > > If the mountd upcall times out every time, the client can't make > > > > > any progress and will continue to emit the failing READDIR request. > > > > > > > > > > Would it be better to skip the unresolvable entry instead and let > > > > > the READDIR succeed without that entry? > > > > > > > > > Mounting doesn't usually require working READDIR. In that situation, a > > > > readdir() might hang (until the client kills), but a lookup of other > > > > dentries that aren't perpetually stalled should be ok in this situation. > > > > > > > > If mountd is that hosed then I think it's unlikely that any progress > > > > will be possible anyway. > > > The READDIR shouldn't trigger a mount yes, but if it's a valid automount > > > > > > point (basically a valid dentry in this case I think) it should be listed. > > > > > > It certainly shouldn't hold up the READDIR, passing into it is when a > > > > > > mount should occur. > > > > > > > > > That's usually the behavior we want for automounts, we don't want mount > > > > > > storms on directories full of automount points. > > > > > > > We only want to display it if it's a valid _exported_ mountpoint. > > > > The idea here is to only reveal the parts of the namespace that are > > exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- > > only exported mountpoints and ancestor directories of those mountpoints. > > > > We don't want mountd triggering automounts, in general. If the > > underlying filesystem was exported, then it should also already be > > mounted, since nfsd doesn't currently trigger automounts in > > follow_down(). > > Umm ... must they already be mounted? > > > Can't it be a valid mount point either not yet mounted or timed out > > and umounted. In that case shouldn't it be listed, I know that's > > not the that good an outcome because its stat info will change when > > it gets walked into but it's usually the only sane choice. > Yes, it does need to already be mounted. The proposed kernel patches from Richard only trigger an automount if the parent mount is exported with -o crossmnt. I think this is necessary to avoid nfs client activity triggering automounts of filesystems that are not exported. > > > > > There is also a separate patchset by Richard Weinberger to allow nfsd to > > trigger automounts if the parent filesystem is exported with -o > > crossmnt. That should be ok with this patch, since the automount will be > > triggered before the upcall to mountd. That should ensure that it's > > already mounted by the time we get to upcalling for its export. >
On 2/1/23 02:09, Chuck Lever III wrote: > >> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote: >> >> On 14/12/22 08:39, Jeff Layton wrote: >>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >>>> On 14/12/22 04:02, Jeff Layton wrote: >>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>>>> >>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it >>>>>>> to the error. >>>>>>> >>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we >>>>>>> need to be more careful to only expose the existence of dentries that >>>>>>> lead to exports. >>>>>>> >>>>>>> If the mountd upcall times out while checking to see whether a >>>>>>> mountpoint on the v4root is exported, then we have no recourse other >>>>>>> than to fail the whole operation. >>>>>> Thank you for chasing this down! >>>>>> >>>>>> Failing the whole READDIR when mountd times out might be a bad idea. >>>>>> If the mountd upcall times out every time, the client can't make >>>>>> any progress and will continue to emit the failing READDIR request. >>>>>> >>>>>> Would it be better to skip the unresolvable entry instead and let >>>>>> the READDIR succeed without that entry? >>>>>> >>>>> Mounting doesn't usually require working READDIR. In that situation, a >>>>> readdir() might hang (until the client kills), but a lookup of other >>>>> dentries that aren't perpetually stalled should be ok in this situation. >>>>> >>>>> If mountd is that hosed then I think it's unlikely that any progress >>>>> will be possible anyway. >>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount >>>> >>>> point (basically a valid dentry in this case I think) it should be listed. >>>> >>>> It certainly shouldn't hold up the READDIR, passing into it is when a >>>> >>>> mount should occur. >>>> >>>> >>>> That's usually the behavior we want for automounts, we don't want mount >>>> >>>> storms on directories full of automount points. >>>> >>> We only want to display it if it's a valid _exported_ mountpoint. >>> >>> The idea here is to only reveal the parts of the namespace that are >>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- >>> only exported mountpoints and ancestor directories of those mountpoints. >>> >>> We don't want mountd triggering automounts, in general. If the >>> underlying filesystem was exported, then it should also already be >>> mounted, since nfsd doesn't currently trigger automounts in >>> follow_down(). >> Umm ... must they already be mounted? >> >> >> Can't it be a valid mount point either not yet mounted or timed out >> >> and umounted. In that case shouldn't it be listed, I know that's >> >> not the that good an outcome because its stat info will change when >> >> it gets walked into but it's usually the only sane choice. >> >> >>> There is also a separate patchset by Richard Weinberger to allow nfsd to >>> trigger automounts if the parent filesystem is exported with -o >>> crossmnt. That should be ok with this patch, since the automount will be >>> triggered before the upcall to mountd. That should ensure that it's >>> already mounted by the time we get to upcalling for its export. >> Yep, saw that, ;) > I'm not sure if there is consensus on this patch. > > It's been pushed to nfsd's for-rc branch for wider testing, but if > there's a strong objection I can pull it out before the next -rc PR. I don't have any objections, my original comment about it breaking existing behavior has been addressed. The only reason I've commented further is because of my time with automounting but, as Jeff kind-off points out nfsd is not quite the same as what I'm used to, specifically the way exports are implemented in nfsd. Still you never know, my comments may trigger a thought in someone along the way, ;) Ian
On 2/1/23 05:16, Jeff Layton wrote: > On Wed, 2022-12-14 at 13:37 +0800, Ian Kent wrote: >> On 14/12/22 08:39, Jeff Layton wrote: >>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >>>> On 14/12/22 04:02, Jeff Layton wrote: >>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: >>>>>>> >>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>>>> then it will include that entry in the reply and set RDATTR_ERROR for it >>>>>>> to the error. >>>>>>> >>>>>>> That's fine for "normal" exported filesystems, but on the v4root, we >>>>>>> need to be more careful to only expose the existence of dentries that >>>>>>> lead to exports. >>>>>>> >>>>>>> If the mountd upcall times out while checking to see whether a >>>>>>> mountpoint on the v4root is exported, then we have no recourse other >>>>>>> than to fail the whole operation. >>>>>> Thank you for chasing this down! >>>>>> >>>>>> Failing the whole READDIR when mountd times out might be a bad idea. >>>>>> If the mountd upcall times out every time, the client can't make >>>>>> any progress and will continue to emit the failing READDIR request. >>>>>> >>>>>> Would it be better to skip the unresolvable entry instead and let >>>>>> the READDIR succeed without that entry? >>>>>> >>>>> Mounting doesn't usually require working READDIR. In that situation, a >>>>> readdir() might hang (until the client kills), but a lookup of other >>>>> dentries that aren't perpetually stalled should be ok in this situation. >>>>> >>>>> If mountd is that hosed then I think it's unlikely that any progress >>>>> will be possible anyway. >>>> The READDIR shouldn't trigger a mount yes, but if it's a valid automount >>>> >>>> point (basically a valid dentry in this case I think) it should be listed. >>>> >>>> It certainly shouldn't hold up the READDIR, passing into it is when a >>>> >>>> mount should occur. >>>> >>>> >>>> That's usually the behavior we want for automounts, we don't want mount >>>> >>>> storms on directories full of automount points. >>>> >>> We only want to display it if it's a valid _exported_ mountpoint. >>> >>> The idea here is to only reveal the parts of the namespace that are >>> exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- >>> only exported mountpoints and ancestor directories of those mountpoints. >>> >>> We don't want mountd triggering automounts, in general. If the >>> underlying filesystem was exported, then it should also already be >>> mounted, since nfsd doesn't currently trigger automounts in >>> follow_down(). >> Umm ... must they already be mounted? >> >> >> Can't it be a valid mount point either not yet mounted or timed out >> >> and umounted. In that case shouldn't it be listed, I know that's >> >> not the that good an outcome because its stat info will change when >> >> it gets walked into but it's usually the only sane choice. >> > Yes, it does need to already be mounted. > > The proposed kernel patches from Richard only trigger an automount if > the parent mount is exported with -o crossmnt. I think this is necessary > to avoid nfs client activity triggering automounts of filesystems that > are not exported. I'll be interested to see how this goes. Over the years I've had a lot of difficulty with automount unwanted mounting ... Still nfsd exports are a bit like invisible dentry trees to the local system aren't they ... so this situation is very different to what I've worked on ... Ian > >>> There is also a separate patchset by Richard Weinberger to allow nfsd to >>> trigger automounts if the parent filesystem is exported with -o >>> crossmnt. That should be ok with this patch, since the automount will be >>> triggered before the upcall to mountd. That should ensure that it's >>> already mounted by the time we get to upcalling for its export.
On 2/1/23 14:34, Ian Kent wrote: > > On 2/1/23 02:09, Chuck Lever III wrote: >> >>> On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote: >>> >>> On 14/12/22 08:39, Jeff Layton wrote: >>>> On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: >>>>> On 14/12/22 04:02, Jeff Layton wrote: >>>>>> On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: >>>>>>>> On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>> If v4 READDIR operation hits a mountpoint and gets back an error, >>>>>>>> then it will include that entry in the reply and set >>>>>>>> RDATTR_ERROR for it >>>>>>>> to the error. >>>>>>>> >>>>>>>> That's fine for "normal" exported filesystems, but on the >>>>>>>> v4root, we >>>>>>>> need to be more careful to only expose the existence of >>>>>>>> dentries that >>>>>>>> lead to exports. >>>>>>>> >>>>>>>> If the mountd upcall times out while checking to see whether a >>>>>>>> mountpoint on the v4root is exported, then we have no recourse >>>>>>>> other >>>>>>>> than to fail the whole operation. >>>>>>> Thank you for chasing this down! >>>>>>> >>>>>>> Failing the whole READDIR when mountd times out might be a bad >>>>>>> idea. >>>>>>> If the mountd upcall times out every time, the client can't make >>>>>>> any progress and will continue to emit the failing READDIR request. >>>>>>> >>>>>>> Would it be better to skip the unresolvable entry instead and let >>>>>>> the READDIR succeed without that entry? >>>>>>> >>>>>> Mounting doesn't usually require working READDIR. In that >>>>>> situation, a >>>>>> readdir() might hang (until the client kills), but a lookup of other >>>>>> dentries that aren't perpetually stalled should be ok in this >>>>>> situation. >>>>>> >>>>>> If mountd is that hosed then I think it's unlikely that any progress >>>>>> will be possible anyway. >>>>> The READDIR shouldn't trigger a mount yes, but if it's a valid >>>>> automount >>>>> >>>>> point (basically a valid dentry in this case I think) it should be >>>>> listed. >>>>> >>>>> It certainly shouldn't hold up the READDIR, passing into it is when a >>>>> >>>>> mount should occur. >>>>> >>>>> >>>>> That's usually the behavior we want for automounts, we don't want >>>>> mount >>>>> >>>>> storms on directories full of automount points. >>>>> >>>> We only want to display it if it's a valid _exported_ mountpoint. >>>> >>>> The idea here is to only reveal the parts of the namespace that are >>>> exported in the nfsv4 pseudoroot. The "normal" contents are not >>>> shown -- >>>> only exported mountpoints and ancestor directories of those >>>> mountpoints. >>>> >>>> We don't want mountd triggering automounts, in general. If the >>>> underlying filesystem was exported, then it should also already be >>>> mounted, since nfsd doesn't currently trigger automounts in >>>> follow_down(). >>> Umm ... must they already be mounted? >>> >>> >>> Can't it be a valid mount point either not yet mounted or timed out >>> >>> and umounted. In that case shouldn't it be listed, I know that's >>> >>> not the that good an outcome because its stat info will change when >>> >>> it gets walked into but it's usually the only sane choice. >>> >>> >>>> There is also a separate patchset by Richard Weinberger to allow >>>> nfsd to >>>> trigger automounts if the parent filesystem is exported with -o >>>> crossmnt. That should be ok with this patch, since the automount >>>> will be >>>> triggered before the upcall to mountd. That should ensure that it's >>>> already mounted by the time we get to upcalling for its export. >>> Yep, saw that, ;) >> I'm not sure if there is consensus on this patch. >> >> It's been pushed to nfsd's for-rc branch for wider testing, but if >> there's a strong objection I can pull it out before the next -rc PR. > > > I don't have any objections, my original comment about it breaking > > existing behavior has been addressed. Actually I'm confused with the other patch series Jeff mentioned. I still don't have any objections, ;) I was a little curious about the error handling but that's because my memories of the jukebox error handling on the client side are different to what's being done but here it's the server so it makes sense to assume the client will do the work and retry or whatever. Ian
On Sun, 2023-01-01 at 18:18 +0000, Chuck Lever III wrote: > > > On Jan 1, 2023, at 1:09 PM, Chuck Lever III <chuck.lever@oracle.com> wrote: > > > > > > > > > On Dec 14, 2022, at 12:37 AM, Ian Kent <raven@themaw.net> wrote: > > > > > > On 14/12/22 08:39, Jeff Layton wrote: > > > > On Wed, 2022-12-14 at 07:14 +0800, Ian Kent wrote: > > > > > On 14/12/22 04:02, Jeff Layton wrote: > > > > > > On Tue, 2022-12-13 at 19:00 +0000, Chuck Lever III wrote: > > > > > > > > On Dec 13, 2022, at 1:08 PM, Jeff Layton <jlayton@kernel.org> wrote: > > > > > > > > > > > > > > > > If v4 READDIR operation hits a mountpoint and gets back an error, > > > > > > > > then it will include that entry in the reply and set RDATTR_ERROR for it > > > > > > > > to the error. > > > > > > > > > > > > > > > > That's fine for "normal" exported filesystems, but on the v4root, we > > > > > > > > need to be more careful to only expose the existence of dentries that > > > > > > > > lead to exports. > > > > > > > > > > > > > > > > If the mountd upcall times out while checking to see whether a > > > > > > > > mountpoint on the v4root is exported, then we have no recourse other > > > > > > > > than to fail the whole operation. > > > > > > > Thank you for chasing this down! > > > > > > > > > > > > > > Failing the whole READDIR when mountd times out might be a bad idea. > > > > > > > If the mountd upcall times out every time, the client can't make > > > > > > > any progress and will continue to emit the failing READDIR request. > > > > > > > > > > > > > > Would it be better to skip the unresolvable entry instead and let > > > > > > > the READDIR succeed without that entry? > > > > > > > > > > > > > Mounting doesn't usually require working READDIR. In that situation, a > > > > > > readdir() might hang (until the client kills), but a lookup of other > > > > > > dentries that aren't perpetually stalled should be ok in this situation. > > > > > > > > > > > > If mountd is that hosed then I think it's unlikely that any progress > > > > > > will be possible anyway. > > > > > The READDIR shouldn't trigger a mount yes, but if it's a valid automount > > > > > > > > > > point (basically a valid dentry in this case I think) it should be listed. > > > > > > > > > > It certainly shouldn't hold up the READDIR, passing into it is when a > > > > > > > > > > mount should occur. > > > > > > > > > > > > > > > That's usually the behavior we want for automounts, we don't want mount > > > > > > > > > > storms on directories full of automount points. > > > > > > > > > > > > > We only want to display it if it's a valid _exported_ mountpoint. > > > > > > > > The idea here is to only reveal the parts of the namespace that are > > > > exported in the nfsv4 pseudoroot. The "normal" contents are not shown -- > > > > only exported mountpoints and ancestor directories of those mountpoints. > > > > > > > > We don't want mountd triggering automounts, in general. If the > > > > underlying filesystem was exported, then it should also already be > > > > mounted, since nfsd doesn't currently trigger automounts in > > > > follow_down(). > > > > > > Umm ... must they already be mounted? > > > > > > > > > Can't it be a valid mount point either not yet mounted or timed out > > > > > > and umounted. In that case shouldn't it be listed, I know that's > > > > > > not the that good an outcome because its stat info will change when > > > > > > it gets walked into but it's usually the only sane choice. > > > > > > > > > > > > > > There is also a separate patchset by Richard Weinberger to allow nfsd to > > > > trigger automounts if the parent filesystem is exported with -o > > > > crossmnt. That should be ok with this patch, since the automount will be > > > > triggered before the upcall to mountd. That should ensure that it's > > > > already mounted by the time we get to upcalling for its export. > > > > > > Yep, saw that, ;) > > > > I'm not sure if there is consensus on this patch. > > > > It's been pushed to nfsd's for-rc branch for wider testing, but if > > there's a strong objection I can pull it out before the next -rc PR. > > Also, do we agree that it should get a "Cc: stable" tag? > Yes, I think so. This potentially exposes some info to clients that they really shouldn't have.
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 2b4ae858c89b..984528ce8d68 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3588,6 +3588,7 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, struct readdir_cd *ccd = ccdv; struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common); struct xdr_stream *xdr = cd->xdr; + struct svc_export *exp = cd->rd_fhp->fh_export; int start_offset = xdr->buf->len; int cookie_offset; u32 name_and_cookie; @@ -3629,6 +3630,17 @@ nfsd4_encode_dirent(void *ccdv, const char *name, int namlen, case nfserr_noent: xdr_truncate_encode(xdr, start_offset); goto skip_entry; + case nfserr_jukebox: + /* + * The pseudoroot should only display dentries that lead to + * exports. If we get EJUKEBOX here, then we can't tell whether + * this entry should be included. Just fail the whole READDIR + * with NFS4ERR_DELAY in that case, and hope that the situation + * will resolve itself by the client's next attempt. + */ + if (exp->ex_flags & NFSEXP_V4ROOT) + goto fail; + fallthrough; default: /* * If the client requested the RDATTR_ERROR attribute,
If v4 READDIR operation hits a mountpoint and gets back an error, then it will include that entry in the reply and set RDATTR_ERROR for it to the error. That's fine for "normal" exported filesystems, but on the v4root, we need to be more careful to only expose the existence of dentries that lead to exports. If the mountd upcall times out while checking to see whether a mountpoint on the v4root is exported, then we have no recourse other than to fail the whole operation. Cc: Steve Dickson <steved@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 Reported-by: JianHong Yin <yin-jianhong@163.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/nfsd/nfs4xdr.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)