Message ID | 167279203612.13974.15063003557908413815@noble.neil.brown.name (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFS: Handle missing attributes in OPEN reply | expand |
On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote: > > If a NFSv4 OPEN reply reports that the file was successfully opened > but > the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone > GETATTR request. If that also fails, handling of the reply is > aborted > with error -EAGAIN and the open is attempted again from the start. > > This leaves the server with an active state (because the OPEN > operation > succeeded) which the client doesn't know about. If the open-owner > (local user) did not have the file already open, this has minimal > consequences for the client and only causes the server to spend > resources on an open state that will never be used or explicitly > closed. > > If the open-owner DID already have the file open, then it will hold a > reference to the open-state for that file, but the seq-id in the > state-id will now be out-of-sync with the server. The server will > have > incremented the seq-id, but the client will not have noticed. So > when > the client next attempts to access the file using that state (READ, > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID. > > The Linux-client assumes this error is due to a race and simply > retries > on the basis that the local state-id information should have been > updated by another thread. This basis is invalid in this case and > the > result is an infinite loop attempting IO and getting OLD_STATEID. > > This has been observed with a NetApp Filer as the server (ONTAP 9.8 > p5, > using NFSv4.0). The client is creating, writing, and unlinking a > particular file from multiple clients (.bash_history). If a new OPEN > from one client races with a REMOVE from another client while the > first > client already has the file open, the Filer can report success for > the > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN > request. This gets the seq-id out-of-sync and a subsequent write to > the > other open on the first client causes the infinite loop to occur. > > The reason that the client returns -EAGAIN is that it needs to find > the > inode so it can find the associated state to update the seq-id, but > the > inode lookup requires the file-id which is provided in the GETATTR > reply. Without the file-id normal inode lookup cannot be used. > > This patch changes the lookup so that when the file-id is not > available > the list of states owned by the open-owner is examined to find the > state > with the correct state-id (ignoring the seq-id part of the state-id). > If this is found it is used just as when a normal inode lookup finds > an > inode. If it isn't found, -EAGAIN is returned as before. > > This bug can be demonstrated by modifying the Linux NFS server as > follows: > > 1/ The second time a file is opened, unlink it. This simulates > a race with another client, without needing to have a race: > > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct > nfsd4_compound_state *cstate, > if (reclaim && !status) > nn->somebody_reclaimed = true; > out: > + if (!status && open->op_stateid.si_generation > 1) { > + printk("Opening gen %d\n", (int)open- > >op_stateid.si_generation); > + vfs_unlink(mnt_user_ns(resfh->fh_export- > >ex_path.mnt), > + resfh->fh_dentry->d_parent->d_inode, > + resfh->fh_dentry, NULL); > + } > if (open->op_filp) { > fput(open->op_filp); > open->op_filp = NULL; > > 2/ When a GETATTR op is attempted on an unlinked file, return ESTALE > > @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct > nfsd4_compound_state *cstate, > if (status) > return status; > > + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) { > + printk("Return Estale for unlinked file\n"); > + return nfserr_stale; > + } > + > if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1) > return nfserr_inval; > > Then mount the filesystem and > > Thread 1 Thread 2 > open a file > open the same file (will fail) > write to that file > > I use this shell fragment, using 'sleep' for synchronisation. > The use of /bin/echo ensures the write is flushed when /bin/echo > closes > the fd on exit. > > ( > /bin/echo hello > sleep 3 > /bin/echo there > ) > /import/A/afile & > sleep 3 > cat /import/A/afile > > Probably when the OPEN succeeds, the GETATTR fails, and we don't > already > have the state open, we should explicitly close the state. Leaving > it > open could cause problems if, for example, the server revoked it and > signalled the client that there was a revoked state. The client > would > not be able to find the state that needed to be relinquished. I > haven't > attempted to implement this. If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do we care about stateid values? Just mark the inode as stale and drop it on the floor. If the server tries to declare the state as revoked, then it is clearly borken, so let NetApp fix their own bug.
On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > we care about stateid values? Just mark the inode as stale and drop it > on the floor. We have a valid state from the server - we really shouldn't just ignore it. Maybe it would be OK to mark the inode stale. I don't know if various retry loops abort properly when the inode is stale. > > If the server tries to declare the state as revoked, then it is clearly > borken, so let NetApp fix their own bug. That's probably fair. Thanks, NeilBrown > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > >
On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, > > why do > > we care about stateid values? Just mark the inode as stale and drop > > it > > on the floor. > > We have a valid state from the server - we really shouldn't just > ignore > it. > > Maybe it would be OK to mark the inode stale. I don't know if > various > retry loops abort properly when the inode is stale. Yes, they are all supposed to do that. Otherwise we would end up looping forever in close(), for instance, since the PUTFH, GETATTR and CLOSE can all return NFS4ERR_STALE as well.
On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote: > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote: > > > > If a NFSv4 OPEN reply reports that the file was successfully opened > > but > > the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone > > GETATTR request. If that also fails, handling of the reply is > > aborted > > with error -EAGAIN and the open is attempted again from the start. > > > > This leaves the server with an active state (because the OPEN > > operation > > succeeded) which the client doesn't know about. If the open-owner > > (local user) did not have the file already open, this has minimal > > consequences for the client and only causes the server to spend > > resources on an open state that will never be used or explicitly > > closed. > > > > If the open-owner DID already have the file open, then it will hold a > > reference to the open-state for that file, but the seq-id in the > > state-id will now be out-of-sync with the server. The server will > > have > > incremented the seq-id, but the client will not have noticed. So > > when > > the client next attempts to access the file using that state (READ, > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID. > > > > The Linux-client assumes this error is due to a race and simply > > retries > > on the basis that the local state-id information should have been > > updated by another thread. This basis is invalid in this case and > > the > > result is an infinite loop attempting IO and getting OLD_STATEID. > > > > This has been observed with a NetApp Filer as the server (ONTAP 9.8 > > p5, > > using NFSv4.0). The client is creating, writing, and unlinking a > > particular file from multiple clients (.bash_history). If a new OPEN > > from one client races with a REMOVE from another client while the > > first > > client already has the file open, the Filer can report success for > > the > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN > > request. This gets the seq-id out-of-sync and a subsequent write to > > the > > other open on the first client causes the infinite loop to occur. > > > > The reason that the client returns -EAGAIN is that it needs to find > > the > > inode so it can find the associated state to update the seq-id, but > > the > > inode lookup requires the file-id which is provided in the GETATTR > > reply. Without the file-id normal inode lookup cannot be used. > > > > This patch changes the lookup so that when the file-id is not > > available > > the list of states owned by the open-owner is examined to find the > > state > > with the correct state-id (ignoring the seq-id part of the state-id). > > If this is found it is used just as when a normal inode lookup finds > > an > > inode. If it isn't found, -EAGAIN is returned as before. > > > > This bug can be demonstrated by modifying the Linux NFS server as > > follows: > > > > 1/ The second time a file is opened, unlink it. This simulates > > a race with another client, without needing to have a race: > > > > --- a/fs/nfsd/nfs4proc.c > > +++ b/fs/nfsd/nfs4proc.c > > @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct > > nfsd4_compound_state *cstate, > > if (reclaim && !status) > > nn->somebody_reclaimed = true; > > out: > > + if (!status && open->op_stateid.si_generation > 1) { > > + printk("Opening gen %d\n", (int)open- > > >op_stateid.si_generation); > > + vfs_unlink(mnt_user_ns(resfh->fh_export- > > >ex_path.mnt), > > + resfh->fh_dentry->d_parent->d_inode, > > + resfh->fh_dentry, NULL); > > + } > > if (open->op_filp) { > > fput(open->op_filp); > > open->op_filp = NULL; > > > > 2/ When a GETATTR op is attempted on an unlinked file, return ESTALE > > > > @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct > > nfsd4_compound_state *cstate, > > if (status) > > return status; > > > > + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) { > > + printk("Return Estale for unlinked file\n"); > > + return nfserr_stale; > > + } > > + > > if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1) > > return nfserr_inval; > > > > Then mount the filesystem and > > > > Thread 1 Thread 2 > > open a file > > open the same file (will fail) > > write to that file > > > > I use this shell fragment, using 'sleep' for synchronisation. > > The use of /bin/echo ensures the write is flushed when /bin/echo > > closes > > the fd on exit. > > > > ( > > /bin/echo hello > > sleep 3 > > /bin/echo there > > ) > /import/A/afile & > > sleep 3 > > cat /import/A/afile > > > > Probably when the OPEN succeeds, the GETATTR fails, and we don't > > already > > have the state open, we should explicitly close the state. Leaving > > it > > open could cause problems if, for example, the server revoked it and > > signalled the client that there was a revoked state. The client > > would > > not be able to find the state that needed to be relinquished. I > > haven't > > attempted to implement this. > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > we care about stateid values? It is acceptable for the server to return ESTALE to the GETATTR after the processing the open (due to a REMOVE that comes in) and that open generating a valid stateid which client should care about when there are pre-existing opens. The server will keep the state of an existing opens valid even if the file is removed. Which is what's happening, the previous open is being used for IO but the stateid is updated on the server but not on the client. > Just mark the inode as stale and drop it > on the floor. Why would that be correct? Any pre-existing opens should continue operating, thus the inode can't be marked stale. We don't do it now (silly rename allows preexisting IO to continue). > If the server tries to declare the state as revoked, then it is clearly > borken, so let NetApp fix their own bug. The server does not declare the state revoked. The open succeeded and its state's seqid was updated but the client is using an old stateid. Server isn't at fault here. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On Wed, 04 Jan 2023, Trond Myklebust wrote: > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, > > > why do > > > we care about stateid values? Just mark the inode as stale and drop > > > it > > > on the floor. > > > > We have a valid state from the server - we really shouldn't just > > ignore > > it. > > > > Maybe it would be OK to mark the inode stale. I don't know if > > various > > retry loops abort properly when the inode is stale. > > Yes, they are all supposed to do that. Otherwise we would end up > looping forever in close(), for instance, since the PUTFH, GETATTR and > CLOSE can all return NFS4ERR_STALE as well. To mark the inode as STALE we still need to find the inode, and that is the key bit missing in the current code. Once we find the inode, we could mark it stale, but maybe some other error resulted in the missing GETATTR... It might make sense to put the new code in _nfs4_proc_open() after the explicit nfs4_proc_getattr() fails. We would need to find the inode given only the filehandle. Is there any easy way to do that? Thanks, NeilBrown
On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote: > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> > wrote: > > > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote: > > > > > > If a NFSv4 OPEN reply reports that the file was successfully > > > opened > > > but > > > the subsequent GETATTR fails, Linux-NFS will attempt a stand- > > > alone > > > GETATTR request. If that also fails, handling of the reply is > > > aborted > > > with error -EAGAIN and the open is attempted again from the > > > start. > > > > > > This leaves the server with an active state (because the OPEN > > > operation > > > succeeded) which the client doesn't know about. If the open- > > > owner > > > (local user) did not have the file already open, this has minimal > > > consequences for the client and only causes the server to spend > > > resources on an open state that will never be used or explicitly > > > closed. > > > > > > If the open-owner DID already have the file open, then it will > > > hold a > > > reference to the open-state for that file, but the seq-id in the > > > state-id will now be out-of-sync with the server. The server > > > will > > > have > > > incremented the seq-id, but the client will not have noticed. So > > > when > > > the client next attempts to access the file using that state > > > (READ, > > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID. > > > > > > The Linux-client assumes this error is due to a race and simply > > > retries > > > on the basis that the local state-id information should have been > > > updated by another thread. This basis is invalid in this case > > > and > > > the > > > result is an infinite loop attempting IO and getting OLD_STATEID. > > > > > > This has been observed with a NetApp Filer as the server (ONTAP > > > 9.8 > > > p5, > > > using NFSv4.0). The client is creating, writing, and unlinking a > > > particular file from multiple clients (.bash_history). If a new > > > OPEN > > > from one client races with a REMOVE from another client while the > > > first > > > client already has the file open, the Filer can report success > > > for > > > the > > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the > > > OPEN > > > request. This gets the seq-id out-of-sync and a subsequent write > > > to > > > the > > > other open on the first client causes the infinite loop to occur. > > > > > > The reason that the client returns -EAGAIN is that it needs to > > > find > > > the > > > inode so it can find the associated state to update the seq-id, > > > but > > > the > > > inode lookup requires the file-id which is provided in the > > > GETATTR > > > reply. Without the file-id normal inode lookup cannot be used. > > > > > > This patch changes the lookup so that when the file-id is not > > > available > > > the list of states owned by the open-owner is examined to find > > > the > > > state > > > with the correct state-id (ignoring the seq-id part of the state- > > > id). > > > If this is found it is used just as when a normal inode lookup > > > finds > > > an > > > inode. If it isn't found, -EAGAIN is returned as before. > > > > > > This bug can be demonstrated by modifying the Linux NFS server as > > > follows: > > > > > > 1/ The second time a file is opened, unlink it. This simulates > > > a race with another client, without needing to have a race: > > > > > > --- a/fs/nfsd/nfs4proc.c > > > +++ b/fs/nfsd/nfs4proc.c > > > @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, > > > struct > > > nfsd4_compound_state *cstate, > > > if (reclaim && !status) > > > nn->somebody_reclaimed = true; > > > out: > > > + if (!status && open->op_stateid.si_generation > 1) { > > > + printk("Opening gen %d\n", (int)open- > > > > op_stateid.si_generation); > > > + vfs_unlink(mnt_user_ns(resfh->fh_export- > > > > ex_path.mnt), > > > + resfh->fh_dentry->d_parent->d_inode, > > > + resfh->fh_dentry, NULL); > > > + } > > > if (open->op_filp) { > > > fput(open->op_filp); > > > open->op_filp = NULL; > > > > > > 2/ When a GETATTR op is attempted on an unlinked file, return > > > ESTALE > > > > > > @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, > > > struct > > > nfsd4_compound_state *cstate, > > > if (status) > > > return status; > > > > > > + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) > > > { > > > + printk("Return Estale for unlinked file\n"); > > > + return nfserr_stale; > > > + } > > > + > > > if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1) > > > return nfserr_inval; > > > > > > Then mount the filesystem and > > > > > > Thread 1 Thread 2 > > > open a file > > > open the same file (will fail) > > > write to that file > > > > > > I use this shell fragment, using 'sleep' for synchronisation. > > > The use of /bin/echo ensures the write is flushed when /bin/echo > > > closes > > > the fd on exit. > > > > > > ( > > > /bin/echo hello > > > sleep 3 > > > /bin/echo there > > > ) > /import/A/afile & > > > sleep 3 > > > cat /import/A/afile > > > > > > Probably when the OPEN succeeds, the GETATTR fails, and we don't > > > already > > > have the state open, we should explicitly close the state. > > > Leaving > > > it > > > open could cause problems if, for example, the server revoked it > > > and > > > signalled the client that there was a revoked state. The client > > > would > > > not be able to find the state that needed to be relinquished. I > > > haven't > > > attempted to implement this. > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, > > why do > > we care about stateid values? > > It is acceptable for the server to return ESTALE to the GETATTR after > the processing the open (due to a REMOVE that comes in) and that open > generating a valid stateid which client should care about when there > are pre-existing opens. The server will keep the state of an existing > opens valid even if the file is removed. Which is what's happening, > the previous open is being used for IO but the stateid is updated on > the server but not on the client. > > > Just mark the inode as stale and drop it > > on the floor. > > Why would that be correct? Any pre-existing opens should continue > operating, thus the inode can't be marked stale. We don't do it now > (silly rename allows preexisting IO to continue). > > > If the server tries to declare the state as revoked, then it is > > clearly > > borken, so let NetApp fix their own bug. > > The server does not declare the state revoked. The open succeeded and > its state's seqid was updated but the client is using an old stateid. > Server isn't at fault here. If the client can't send a GETATTR, then it can't revalidate attributes, do I/O, and it can't even close the file. So we're not going to waste time and effort trying to support this, whether or not NetApp thinks it is a good idea.
On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote: > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > we care about stateid values? > > It is acceptable for the server to return ESTALE to the GETATTR after > the processing the open (due to a REMOVE that comes in) and that open > generating a valid stateid which client should care about when there > are pre-existing opens. The server will keep the state of an existing > opens valid even if the file is removed. Which is what's happening, > the previous open is being used for IO but the stateid is updated on > the server but not on the client. I agree that it is acceptable to return ESTALE to the GETATTR, but having done that I don't think it is acceptable for a PUTFH of the same filehandle to succeed. Certainly any attempt to again use the filehandle after the PUTFH should fail with NFS4ERR_STALE. RFC7530 says 13.1.2.7. NFS4ERR_STALE (Error Code 70) The current or saved filehandle value designating an argument to the current operation is invalid. The file system object referred to by that filehandle no longer exists, or access to it has been revoked. So the file doesn't exist or access has been revoked. So any writes should fail. Failing with OLD_STATEID is weird - and having writes succeed if we use the correct stateid is also odd. Failing with STALE would be perfectly sensible and I suspect the Linux client would handle that just fine. Thanks, NeilBrown
On Wed, 04 Jan 2023, NeilBrown wrote: > On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote: > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > > we care about stateid values? > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > the processing the open (due to a REMOVE that comes in) and that open > > generating a valid stateid which client should care about when there > > are pre-existing opens. The server will keep the state of an existing > > opens valid even if the file is removed. Which is what's happening, > > the previous open is being used for IO but the stateid is updated on > > the server but not on the client. > > I agree that it is acceptable to return ESTALE to the GETATTR, but > having done that I don't think it is acceptable for a PUTFH of the same > filehandle to succeed. Certainly any attempt to again use the > filehandle after the PUTFH should fail with NFS4ERR_STALE. > > RFC7530 says > > 13.1.2.7. NFS4ERR_STALE (Error Code 70) > > The current or saved filehandle value designating an argument to the > current operation is invalid. The file system object referred to by > that filehandle no longer exists, or access to it has been revoked. > > So the file doesn't exist or access has been revoked. So any writes > should fail. Failing with OLD_STATEID is weird - and having writes > succeed if we use the correct stateid is also odd. Failing with STALE > would be perfectly sensible and I suspect the Linux client would handle > that just fine. I checked a recent tcpdump (with patched SLE kernel talking to Netapp) and I see that the writes don't succeed after the first NFS4ERR_STALE. If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE. If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID. So it seems that it just has these two checks in the wrong order. NeilBrown
On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@suse.de> wrote: > > On Wed, 04 Jan 2023, NeilBrown wrote: > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote: > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > > > we care about stateid values? > > > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > > the processing the open (due to a REMOVE that comes in) and that open > > > generating a valid stateid which client should care about when there > > > are pre-existing opens. The server will keep the state of an existing > > > opens valid even if the file is removed. Which is what's happening, > > > the previous open is being used for IO but the stateid is updated on > > > the server but not on the client. > > > > I agree that it is acceptable to return ESTALE to the GETATTR, but > > having done that I don't think it is acceptable for a PUTFH of the same > > filehandle to succeed. Certainly any attempt to again use the > > filehandle after the PUTFH should fail with NFS4ERR_STALE. > > > > RFC7530 says > > > > 13.1.2.7. NFS4ERR_STALE (Error Code 70) > > > > The current or saved filehandle value designating an argument to the > > current operation is invalid. The file system object referred to by > > that filehandle no longer exists, or access to it has been revoked. > > > > So the file doesn't exist or access has been revoked. So any writes > > should fail. Failing with OLD_STATEID is weird - and having writes > > succeed if we use the correct stateid is also odd. Failing with STALE > > would be perfectly sensible and I suspect the Linux client would handle > > that just fine. > > I checked a recent tcpdump (with patched SLE kernel talking to Netapp) > and I see that the writes don't succeed after the first NFS4ERR_STALE. > > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE. > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID. > > So it seems that it just has these two checks in the wrong order. Does Netapp return ERR_STALE on the PUTFH if the stateid is correct? If it's the WRITE operation that returns an error and if the server has multiple errors (ie bad statid and bad file handle)` then I don't think the spec says which error is more important. In this case, I think the server should fail PUTFH with ERR_STALE. > > NeilBrown
On Tue, Jan 3, 2023 at 9:17 PM Trond Myklebust <trondmy@kernel.org> wrote: > > On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote: > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> > > wrote: > > > > > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote: > > > > > > > > If a NFSv4 OPEN reply reports that the file was successfully > > > > opened > > > > but > > > > the subsequent GETATTR fails, Linux-NFS will attempt a stand- > > > > alone > > > > GETATTR request. If that also fails, handling of the reply is > > > > aborted > > > > with error -EAGAIN and the open is attempted again from the > > > > start. > > > > > > > > This leaves the server with an active state (because the OPEN > > > > operation > > > > succeeded) which the client doesn't know about. If the open- > > > > owner > > > > (local user) did not have the file already open, this has minimal > > > > consequences for the client and only causes the server to spend > > > > resources on an open state that will never be used or explicitly > > > > closed. > > > > > > > > If the open-owner DID already have the file open, then it will > > > > hold a > > > > reference to the open-state for that file, but the seq-id in the > > > > state-id will now be out-of-sync with the server. The server > > > > will > > > > have > > > > incremented the seq-id, but the client will not have noticed. So > > > > when > > > > the client next attempts to access the file using that state > > > > (READ, > > > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID. > > > > > > > > The Linux-client assumes this error is due to a race and simply > > > > retries > > > > on the basis that the local state-id information should have been > > > > updated by another thread. This basis is invalid in this case > > > > and > > > > the > > > > result is an infinite loop attempting IO and getting OLD_STATEID. > > > > > > > > This has been observed with a NetApp Filer as the server (ONTAP > > > > 9.8 > > > > p5, > > > > using NFSv4.0). The client is creating, writing, and unlinking a > > > > particular file from multiple clients (.bash_history). If a new > > > > OPEN > > > > from one client races with a REMOVE from another client while the > > > > first > > > > client already has the file open, the Filer can report success > > > > for > > > > the > > > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the > > > > OPEN > > > > request. This gets the seq-id out-of-sync and a subsequent write > > > > to > > > > the > > > > other open on the first client causes the infinite loop to occur. > > > > > > > > The reason that the client returns -EAGAIN is that it needs to > > > > find > > > > the > > > > inode so it can find the associated state to update the seq-id, > > > > but > > > > the > > > > inode lookup requires the file-id which is provided in the > > > > GETATTR > > > > reply. Without the file-id normal inode lookup cannot be used. > > > > > > > > This patch changes the lookup so that when the file-id is not > > > > available > > > > the list of states owned by the open-owner is examined to find > > > > the > > > > state > > > > with the correct state-id (ignoring the seq-id part of the state- > > > > id). > > > > If this is found it is used just as when a normal inode lookup > > > > finds > > > > an > > > > inode. If it isn't found, -EAGAIN is returned as before. > > > > > > > > This bug can be demonstrated by modifying the Linux NFS server as > > > > follows: > > > > > > > > 1/ The second time a file is opened, unlink it. This simulates > > > > a race with another client, without needing to have a race: > > > > > > > > --- a/fs/nfsd/nfs4proc.c > > > > +++ b/fs/nfsd/nfs4proc.c > > > > @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, > > > > struct > > > > nfsd4_compound_state *cstate, > > > > if (reclaim && !status) > > > > nn->somebody_reclaimed = true; > > > > out: > > > > + if (!status && open->op_stateid.si_generation > 1) { > > > > + printk("Opening gen %d\n", (int)open- > > > > > op_stateid.si_generation); > > > > + vfs_unlink(mnt_user_ns(resfh->fh_export- > > > > > ex_path.mnt), > > > > + resfh->fh_dentry->d_parent->d_inode, > > > > + resfh->fh_dentry, NULL); > > > > + } > > > > if (open->op_filp) { > > > > fput(open->op_filp); > > > > open->op_filp = NULL; > > > > > > > > 2/ When a GETATTR op is attempted on an unlinked file, return > > > > ESTALE > > > > > > > > @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, > > > > struct > > > > nfsd4_compound_state *cstate, > > > > if (status) > > > > return status; > > > > > > > > + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) > > > > { > > > > + printk("Return Estale for unlinked file\n"); > > > > + return nfserr_stale; > > > > + } > > > > + > > > > if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1) > > > > return nfserr_inval; > > > > > > > > Then mount the filesystem and > > > > > > > > Thread 1 Thread 2 > > > > open a file > > > > open the same file (will fail) > > > > write to that file > > > > > > > > I use this shell fragment, using 'sleep' for synchronisation. > > > > The use of /bin/echo ensures the write is flushed when /bin/echo > > > > closes > > > > the fd on exit. > > > > > > > > ( > > > > /bin/echo hello > > > > sleep 3 > > > > /bin/echo there > > > > ) > /import/A/afile & > > > > sleep 3 > > > > cat /import/A/afile > > > > > > > > Probably when the OPEN succeeds, the GETATTR fails, and we don't > > > > already > > > > have the state open, we should explicitly close the state. > > > > Leaving > > > > it > > > > open could cause problems if, for example, the server revoked it > > > > and > > > > signalled the client that there was a revoked state. The client > > > > would > > > > not be able to find the state that needed to be relinquished. I > > > > haven't > > > > attempted to implement this. > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, > > > why do > > > we care about stateid values? > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > the processing the open (due to a REMOVE that comes in) and that open > > generating a valid stateid which client should care about when there > > are pre-existing opens. The server will keep the state of an existing > > opens valid even if the file is removed. Which is what's happening, > > the previous open is being used for IO but the stateid is updated on > > the server but not on the client. > > > > > Just mark the inode as stale and drop it > > > on the floor. > > > > Why would that be correct? Any pre-existing opens should continue > > operating, thus the inode can't be marked stale. We don't do it now > > (silly rename allows preexisting IO to continue). > > > > > If the server tries to declare the state as revoked, then it is > > > clearly > > > borken, so let NetApp fix their own bug. > > > > The server does not declare the state revoked. The open succeeded and > > its state's seqid was updated but the client is using an old stateid. > > Server isn't at fault here. > > If the client can't send a GETATTR, then it can't revalidate > attributes, do I/O, and it can't even close the file. So we're not > going to waste time and effort trying to support this, whether or not > NetApp thinks it is a good idea. That makes sense but I think the server should fail PUTFHs in the compounds after the REMOVE was processed, not the other (GETATTR, WRITE) operations, right? > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On Wed, 2023-01-04 at 10:09 -0500, Olga Kornievskaia wrote: > On Tue, Jan 3, 2023 at 9:17 PM Trond Myklebust <trondmy@kernel.org> > wrote: > > > > On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote: > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust > > > <trondmy@kernel.org> > > > wrote: > > > > > > > > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote: > > > > > > > > > > If a NFSv4 OPEN reply reports that the file was successfully > > > > > opened > > > > > but > > > > > the subsequent GETATTR fails, Linux-NFS will attempt a stand- > > > > > alone > > > > > GETATTR request. If that also fails, handling of the reply > > > > > is > > > > > aborted > > > > > with error -EAGAIN and the open is attempted again from the > > > > > start. > > > > > > > > > > This leaves the server with an active state (because the OPEN > > > > > operation > > > > > succeeded) which the client doesn't know about. If the open- > > > > > owner > > > > > (local user) did not have the file already open, this has > > > > > minimal > > > > > consequences for the client and only causes the server to > > > > > spend > > > > > resources on an open state that will never be used or > > > > > explicitly > > > > > closed. > > > > > > > > > > If the open-owner DID already have the file open, then it > > > > > will > > > > > hold a > > > > > reference to the open-state for that file, but the seq-id in > > > > > the > > > > > state-id will now be out-of-sync with the server. The server > > > > > will > > > > > have > > > > > incremented the seq-id, but the client will not have > > > > > noticed. So > > > > > when > > > > > the client next attempts to access the file using that state > > > > > (READ, > > > > > WRITE, SETATTR), the attempt will fail with > > > > > NFS4ERR_OLD_STATEID. > > > > > > > > > > The Linux-client assumes this error is due to a race and > > > > > simply > > > > > retries > > > > > on the basis that the local state-id information should have > > > > > been > > > > > updated by another thread. This basis is invalid in this > > > > > case > > > > > and > > > > > the > > > > > result is an infinite loop attempting IO and getting > > > > > OLD_STATEID. > > > > > > > > > > This has been observed with a NetApp Filer as the server > > > > > (ONTAP > > > > > 9.8 > > > > > p5, > > > > > using NFSv4.0). The client is creating, writing, and > > > > > unlinking a > > > > > particular file from multiple clients (.bash_history). If a > > > > > new > > > > > OPEN > > > > > from one client races with a REMOVE from another client while > > > > > the > > > > > first > > > > > client already has the file open, the Filer can report > > > > > success > > > > > for > > > > > the > > > > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in > > > > > the > > > > > OPEN > > > > > request. This gets the seq-id out-of-sync and a subsequent > > > > > write > > > > > to > > > > > the > > > > > other open on the first client causes the infinite loop to > > > > > occur. > > > > > > > > > > The reason that the client returns -EAGAIN is that it needs > > > > > to > > > > > find > > > > > the > > > > > inode so it can find the associated state to update the seq- > > > > > id, > > > > > but > > > > > the > > > > > inode lookup requires the file-id which is provided in the > > > > > GETATTR > > > > > reply. Without the file-id normal inode lookup cannot be > > > > > used. > > > > > > > > > > This patch changes the lookup so that when the file-id is not > > > > > available > > > > > the list of states owned by the open-owner is examined to > > > > > find > > > > > the > > > > > state > > > > > with the correct state-id (ignoring the seq-id part of the > > > > > state- > > > > > id). > > > > > If this is found it is used just as when a normal inode > > > > > lookup > > > > > finds > > > > > an > > > > > inode. If it isn't found, -EAGAIN is returned as before. > > > > > > > > > > This bug can be demonstrated by modifying the Linux NFS > > > > > server as > > > > > follows: > > > > > > > > > > 1/ The second time a file is opened, unlink it. This > > > > > simulates > > > > > a race with another client, without needing to have a > > > > > race: > > > > > > > > > > --- a/fs/nfsd/nfs4proc.c > > > > > +++ b/fs/nfsd/nfs4proc.c > > > > > @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, > > > > > struct > > > > > nfsd4_compound_state *cstate, > > > > > if (reclaim && !status) > > > > > nn->somebody_reclaimed = true; > > > > > out: > > > > > + if (!status && open->op_stateid.si_generation > 1) { > > > > > + printk("Opening gen %d\n", (int)open- > > > > > > op_stateid.si_generation); > > > > > + vfs_unlink(mnt_user_ns(resfh->fh_export- > > > > > > ex_path.mnt), > > > > > + resfh->fh_dentry->d_parent- > > > > > >d_inode, > > > > > + resfh->fh_dentry, NULL); > > > > > + } > > > > > if (open->op_filp) { > > > > > fput(open->op_filp); > > > > > open->op_filp = NULL; > > > > > > > > > > 2/ When a GETATTR op is attempted on an unlinked file, return > > > > > ESTALE > > > > > > > > > > @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst > > > > > *rqstp, > > > > > struct > > > > > nfsd4_compound_state *cstate, > > > > > if (status) > > > > > return status; > > > > > > > > > > + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == > > > > > 0) > > > > > { > > > > > + printk("Return Estale for unlinked file\n"); > > > > > + return nfserr_stale; > > > > > + } > > > > > + > > > > > if (getattr->ga_bmval[1] & > > > > > NFSD_WRITEONLY_ATTRS_WORD1) > > > > > return nfserr_inval; > > > > > > > > > > Then mount the filesystem and > > > > > > > > > > Thread 1 Thread 2 > > > > > open a file > > > > > open the same file (will fail) > > > > > write to that file > > > > > > > > > > I use this shell fragment, using 'sleep' for synchronisation. > > > > > The use of /bin/echo ensures the write is flushed when > > > > > /bin/echo > > > > > closes > > > > > the fd on exit. > > > > > > > > > > ( > > > > > /bin/echo hello > > > > > sleep 3 > > > > > /bin/echo there > > > > > ) > /import/A/afile & > > > > > sleep 3 > > > > > cat /import/A/afile > > > > > > > > > > Probably when the OPEN succeeds, the GETATTR fails, and we > > > > > don't > > > > > already > > > > > have the state open, we should explicitly close the state. > > > > > Leaving > > > > > it > > > > > open could cause problems if, for example, the server revoked > > > > > it > > > > > and > > > > > signalled the client that there was a revoked state. The > > > > > client > > > > > would > > > > > not be able to find the state that needed to be > > > > > relinquished. I > > > > > haven't > > > > > attempted to implement this. > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR > > > > requests, > > > > why do > > > > we care about stateid values? > > > > > > It is acceptable for the server to return ESTALE to the GETATTR > > > after > > > the processing the open (due to a REMOVE that comes in) and that > > > open > > > generating a valid stateid which client should care about when > > > there > > > are pre-existing opens. The server will keep the state of an > > > existing > > > opens valid even if the file is removed. Which is what's > > > happening, > > > the previous open is being used for IO but the stateid is updated > > > on > > > the server but not on the client. > > > > > > > Just mark the inode as stale and drop it > > > > on the floor. > > > > > > Why would that be correct? Any pre-existing opens should continue > > > operating, thus the inode can't be marked stale. We don't do it > > > now > > > (silly rename allows preexisting IO to continue). > > > > > > > If the server tries to declare the state as revoked, then it is > > > > clearly > > > > borken, so let NetApp fix their own bug. > > > > > > The server does not declare the state revoked. The open succeeded > > > and > > > its state's seqid was updated but the client is using an old > > > stateid. > > > Server isn't at fault here. > > > > If the client can't send a GETATTR, then it can't revalidate > > attributes, do I/O, and it can't even close the file. So we're not > > going to waste time and effort trying to support this, whether or > > not > > NetApp thinks it is a good idea. > > That makes sense but I think the server should fail PUTFHs in the > compounds after the REMOVE was processed, not the other (GETATTR, > WRITE) operations, right? > > In theory, the server is free to reply NFS4_OK to PUTFH, and then NFS4ERR_STALE to the GETATTR, because a COMPOUND is not atomic w.r.t. execution of the individual operations. i.e. if the file is deleted by some other RPC call between the execution of the PUTFH and the GETATTR in the COMPOUND, then this situation can happen. However, my point is that once the server starts handing out NFS4ERR_STALE errors, then the client can assume that the filehandle being used is bad, because under RFC5661, Section 4.2.2, the server MUST return NFS4ERR_STALE if it is presented with that filehandle again. IOW: the client should just toss the file associated with that filehandle out of its inode caches, and forget any state associated with that filehandle, because all future attempts to use it MUST result in NFS4ERR_STALE.
On Thu, 05 Jan 2023, Olga Kornievskaia wrote: > On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@suse.de> wrote: > > > > On Wed, 04 Jan 2023, NeilBrown wrote: > > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@kernel.org> wrote: > > > > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > > > > we care about stateid values? > > > > > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > > > the processing the open (due to a REMOVE that comes in) and that open > > > > generating a valid stateid which client should care about when there > > > > are pre-existing opens. The server will keep the state of an existing > > > > opens valid even if the file is removed. Which is what's happening, > > > > the previous open is being used for IO but the stateid is updated on > > > > the server but not on the client. > > > > > > I agree that it is acceptable to return ESTALE to the GETATTR, but > > > having done that I don't think it is acceptable for a PUTFH of the same > > > filehandle to succeed. Certainly any attempt to again use the > > > filehandle after the PUTFH should fail with NFS4ERR_STALE. > > > > > > RFC7530 says > > > > > > 13.1.2.7. NFS4ERR_STALE (Error Code 70) > > > > > > The current or saved filehandle value designating an argument to the > > > current operation is invalid. The file system object referred to by > > > that filehandle no longer exists, or access to it has been revoked. > > > > > > So the file doesn't exist or access has been revoked. So any writes > > > should fail. Failing with OLD_STATEID is weird - and having writes > > > succeed if we use the correct stateid is also odd. Failing with STALE > > > would be perfectly sensible and I suspect the Linux client would handle > > > that just fine. > > > > I checked a recent tcpdump (with patched SLE kernel talking to Netapp) > > and I see that the writes don't succeed after the first NFS4ERR_STALE. > > > > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE. > > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID. > > > > So it seems that it just has these two checks in the wrong order. > > Does Netapp return ERR_STALE on the PUTFH if the stateid is correct? In the traces I have, Netapp never returns ERR_STALE on PUTFH. Of course the PUTFH operation doesn't have a stateid, so the "if the stateid is correct" is not meaningful. ACCESS, READ, WRITE, SETATTR, and GETATTR are the only ops that I have seen to result in ERR_STALE. ACCESS and GETATTR don't have a stateid. READ, WRITE, and SETATTR do. These get OLD_STATEID if the stateid is old, and only get STALE if the stateid is current. > If it's the WRITE operation that returns an error and if the server > has multiple errors (ie bad statid and bad file handle)` then I don't > think the spec says which error is more important. In this case, I > think the server should fail PUTFH with ERR_STALE. I agree. If the PUTFH returned STALE there would not be a problem. Even if a race resulted in the PUTFH succeeding and the WRITE getting OLD_STATEID, Linux-NFS would retry an the new PUTFH could then fail correctly. NeilBrown > > > > > NeilBrown >
On Wed, 04 Jan 2023, NeilBrown wrote: > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, > > > > why do > > > > we care about stateid values? Just mark the inode as stale and drop > > > > it > > > > on the floor. > > > > > > We have a valid state from the server - we really shouldn't just > > > ignore > > > it. > > > > > > Maybe it would be OK to mark the inode stale. I don't know if > > > various > > > retry loops abort properly when the inode is stale. > > > > Yes, they are all supposed to do that. Otherwise we would end up > > looping forever in close(), for instance, since the PUTFH, GETATTR and > > CLOSE can all return NFS4ERR_STALE as well. > > To mark the inode as STALE we still need to find the inode, and that is > the key bit missing in the current code. Once we find the inode, we > could mark it stale, but maybe some other error resulted in the missing > GETATTR... > > It might make sense to put the new code in _nfs4_proc_open() after the > explicit nfs4_proc_getattr() fails. We would need to find the inode > given only the filehandle. Is there any easy way to do that? > > Thanks, > NeilBrown > I couldn't see a consistent pattern to follow for when to mark an inode as stale. Do this, on top of the previous patch, seem reasonable? Thanks, NeilBrown diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index b441b1d14a50..04497cb42154 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct nfs_server *server, case -ESTALE: if (inode != NULL && S_ISREG(inode->i_mode)) pnfs_destroy_layout(NFS_I(inode)); + if (inode) + nfs_set_inode_stale(inode); break; case -NFS4ERR_DELEG_REVOKED: case -NFS4ERR_ADMIN_REVOKED: @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct nfs4_opendata *data, return status; } if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { + struct inode *inode = nfs4_get_inode_by_stateid( + &data->o_res.stateid, + data->owner); nfs4_sequence_free_slot(&o_res->seq_res); - nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, NULL); + nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, inode); + iput(inode); } return 0; } @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle, { struct nfs4_exception exception = { .interruptible = true, + .inode = inode, }; int err; do {
On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: > On Wed, 04 Jan 2023, NeilBrown wrote: > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR > > > > > requests, > > > > > why do > > > > > we care about stateid values? Just mark the inode as stale > > > > > and drop > > > > > it > > > > > on the floor. > > > > > > > > We have a valid state from the server - we really shouldn't > > > > just > > > > ignore > > > > it. > > > > > > > > Maybe it would be OK to mark the inode stale. I don't know if > > > > various > > > > retry loops abort properly when the inode is stale. > > > > > > Yes, they are all supposed to do that. Otherwise we would end up > > > looping forever in close(), for instance, since the PUTFH, > > > GETATTR and > > > CLOSE can all return NFS4ERR_STALE as well. > > > > To mark the inode as STALE we still need to find the inode, and > > that is > > the key bit missing in the current code. Once we find the inode, > > we > > could mark it stale, but maybe some other error resulted in the > > missing > > GETATTR... > > > > It might make sense to put the new code in _nfs4_proc_open() after > > the > > explicit nfs4_proc_getattr() fails. We would need to find the > > inode > > given only the filehandle. Is there any easy way to do that? > > > > Thanks, > > NeilBrown > > > > I couldn't see a consistent pattern to follow for when to mark an > inode > as stale. Do this, on top of the previous patch, seem reasonable? > > Thanks, > NeilBrown > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index b441b1d14a50..04497cb42154 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct > nfs_server *server, > case -ESTALE: > if (inode != NULL && S_ISREG(inode->i_mode)) > pnfs_destroy_layout(NFS_I(inode)); > + if (inode) > + nfs_set_inode_stale(inode); This is normally dealt with in the generic code inside nfs_revalidate_inode(). There should be no need to duplicate it here. > break; > case -NFS4ERR_DELEG_REVOKED: > case -NFS4ERR_ADMIN_REVOKED: > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct > nfs4_opendata *data, > return status; > } > if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { > + struct inode *inode = nfs4_get_inode_by_stateid( > + &data->o_res.stateid, > + data->owner); There shouldn't be a need to go looking for open descriptors here, because they will hit ESTALE at some point later anyway. If we're going to change anything, I'd rather see us return -EACCES and -ESTALE from the decode_access() and decode_getfattr() calls in nfs4_xdr_dec_open() (and only those errors from those two calls!) so that we can skip the unnecessary getattr call here. In fact, the only case that this extra getattr should be trying to address is the one where the server returns NFS4ERR_DELAY to either the decode_access() or the decode_getfattr() calls specifically, and where we therefore don't want to replay the entire open call. > nfs4_sequence_free_slot(&o_res->seq_res); > - nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > NULL); > + nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > inode); > + iput(inode); > } > return 0; > } > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server > *server, struct nfs_fh *fhandle, > { > struct nfs4_exception exception = { > .interruptible = true, > + .inode = inode, > }; > int err; > do {
On Fri, 06 Jan 2023, Trond Myklebust wrote: > On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: > > On Wed, 04 Jan 2023, NeilBrown wrote: > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR > > > > > > requests, > > > > > > why do > > > > > > we care about stateid values? Just mark the inode as stale > > > > > > and drop > > > > > > it > > > > > > on the floor. > > > > > > > > > > We have a valid state from the server - we really shouldn't > > > > > just > > > > > ignore > > > > > it. > > > > > > > > > > Maybe it would be OK to mark the inode stale. I don't know if > > > > > various > > > > > retry loops abort properly when the inode is stale. > > > > > > > > Yes, they are all supposed to do that. Otherwise we would end up > > > > looping forever in close(), for instance, since the PUTFH, > > > > GETATTR and > > > > CLOSE can all return NFS4ERR_STALE as well. > > > > > > To mark the inode as STALE we still need to find the inode, and > > > that is > > > the key bit missing in the current code. Once we find the inode, > > > we > > > could mark it stale, but maybe some other error resulted in the > > > missing > > > GETATTR... > > > > > > It might make sense to put the new code in _nfs4_proc_open() after > > > the > > > explicit nfs4_proc_getattr() fails. We would need to find the > > > inode > > > given only the filehandle. Is there any easy way to do that? > > > > > > Thanks, > > > NeilBrown > > > > > > > I couldn't see a consistent pattern to follow for when to mark an > > inode > > as stale. Do this, on top of the previous patch, seem reasonable? > > > > Thanks, > > NeilBrown > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > index b441b1d14a50..04497cb42154 100644 > > --- a/fs/nfs/nfs4proc.c > > +++ b/fs/nfs/nfs4proc.c > > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct > > nfs_server *server, > > case -ESTALE: > > if (inode != NULL && S_ISREG(inode->i_mode)) > > pnfs_destroy_layout(NFS_I(inode)); > > + if (inode) > > + nfs_set_inode_stale(inode); > > This is normally dealt with in the generic code inside > nfs_revalidate_inode(). There should be no need to duplicate it here. > > > break; > > case -NFS4ERR_DELEG_REVOKED: > > case -NFS4ERR_ADMIN_REVOKED: > > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct > > nfs4_opendata *data, > > return status; > > } > > if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { > > + struct inode *inode = nfs4_get_inode_by_stateid( > > + &data->o_res.stateid, > > + data->owner); > > There shouldn't be a need to go looking for open descriptors here, > because they will hit ESTALE at some point later anyway. The problem is that they don't hit ESTALE later. Unless we update our stored stateid with the result of the OPEN, we can use the old stateid, and get the corresponding error. The only way to avoid the infinite loop is to either mark the inode as stale, or update the state-id. For either of those we need to find the inode. We don't have fileid so we cannot use iget. We do have file handle and state-id. Maybe you are saying this is a server bug that the client cannot be expect to cope with at all, and that an infinite loop is a valid client response to this particular server behaviour. In that case, I guess no patch is needed. NeilBrown > > If we're going to change anything, I'd rather see us return -EACCES and > -ESTALE from the decode_access() and decode_getfattr() calls in > nfs4_xdr_dec_open() (and only those errors from those two calls!) so > that we can skip the unnecessary getattr call here. > > In fact, the only case that this extra getattr should be trying to > address is the one where the server returns NFS4ERR_DELAY to either the > decode_access() or the decode_getfattr() calls specifically, and where > we therefore don't want to replay the entire open call. > > > nfs4_sequence_free_slot(&o_res->seq_res); > > - nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > > NULL); > > + nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > > inode); > > + iput(inode); > > } > > return 0; > > } > > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server > > *server, struct nfs_fh *fhandle, > > { > > struct nfs4_exception exception = { > > .interruptible = true, > > + .inode = inode, > > }; > > int err; > > do { > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > >
On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote: > > On Fri, 06 Jan 2023, Trond Myklebust wrote: > > On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: > > > On Wed, 04 Jan 2023, NeilBrown wrote: > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > > > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote: > > > > > > > > > > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR > > > > > > > requests, > > > > > > > why do > > > > > > > we care about stateid values? Just mark the inode as stale > > > > > > > and drop > > > > > > > it > > > > > > > on the floor. > > > > > > > > > > > > We have a valid state from the server - we really shouldn't > > > > > > just > > > > > > ignore > > > > > > it. > > > > > > > > > > > > Maybe it would be OK to mark the inode stale. I don't know if > > > > > > various > > > > > > retry loops abort properly when the inode is stale. > > > > > > > > > > Yes, they are all supposed to do that. Otherwise we would end up > > > > > looping forever in close(), for instance, since the PUTFH, > > > > > GETATTR and > > > > > CLOSE can all return NFS4ERR_STALE as well. > > > > > > > > To mark the inode as STALE we still need to find the inode, and > > > > that is > > > > the key bit missing in the current code. Once we find the inode, > > > > we > > > > could mark it stale, but maybe some other error resulted in the > > > > missing > > > > GETATTR... > > > > > > > > It might make sense to put the new code in _nfs4_proc_open() after > > > > the > > > > explicit nfs4_proc_getattr() fails. We would need to find the > > > > inode > > > > given only the filehandle. Is there any easy way to do that? > > > > > > > > Thanks, > > > > NeilBrown > > > > > > > > > > I couldn't see a consistent pattern to follow for when to mark an > > > inode > > > as stale. Do this, on top of the previous patch, seem reasonable? > > > > > > Thanks, > > > NeilBrown > > > > > > > > > > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > > > index b441b1d14a50..04497cb42154 100644 > > > --- a/fs/nfs/nfs4proc.c > > > +++ b/fs/nfs/nfs4proc.c > > > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct > > > nfs_server *server, > > > case -ESTALE: > > > if (inode != NULL && S_ISREG(inode->i_mode)) > > > pnfs_destroy_layout(NFS_I(inode)); > > > + if (inode) > > > + nfs_set_inode_stale(inode); > > > > This is normally dealt with in the generic code inside > > nfs_revalidate_inode(). There should be no need to duplicate it here. > > > > > break; > > > case -NFS4ERR_DELEG_REVOKED: > > > case -NFS4ERR_ADMIN_REVOKED: > > > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct > > > nfs4_opendata *data, > > > return status; > > > } > > > if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { > > > + struct inode *inode = nfs4_get_inode_by_stateid( > > > + &data->o_res.stateid, > > > + data->owner); > > > > There shouldn't be a need to go looking for open descriptors here, > > because they will hit ESTALE at some point later anyway. > > The problem is that they don't hit ESTALE later. Unless we update our > stored stateid with the result of the OPEN, we can use the old stateid, > and get the corresponding error. > > The only way to avoid the infinite loop is to either mark the inode as > stale, or update the state-id. For either of those we need to find the > inode. We don't have fileid so we cannot use iget. We do have file > handle and state-id. > > Maybe you are saying this is a server bug that the client cannot be > expect to cope with at all, and that an infinite loop is a valid client > response to this particular server behaviour. In that case, I guess no > patch is needed. I'm not arguing that the server should do something else. But I would like to talk about it from the spec perspective. When PUTFH+WRITE is sent to the server (with an incorrect stateid) and it's processing the WRITE compound (as the spec doesn't require the server to validate a filehandle on PUTFH. The spec says PUTFH is to "set" the correct filehandle), the server is dealing with 2 errors that it can possibly return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing in the spec that speaks to the orders of errors to be returned. Of course I'd like to say that the server should prioritize ERR_STALE over ERR_OLD_STATEID (simply because it's a MUST in the spec and ERR_OLD_STATEIDs are written as "rules"). > > NeilBrown > > > > > > If we're going to change anything, I'd rather see us return -EACCES and > > -ESTALE from the decode_access() and decode_getfattr() calls in > > nfs4_xdr_dec_open() (and only those errors from those two calls!) so > > that we can skip the unnecessary getattr call here. > > > > In fact, the only case that this extra getattr should be trying to > > address is the one where the server returns NFS4ERR_DELAY to either the > > decode_access() or the decode_getfattr() calls specifically, and where > > we therefore don't want to replay the entire open call. > > > > > nfs4_sequence_free_slot(&o_res->seq_res); > > > - nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > > > NULL); > > > + nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr, > > > inode); > > > + iput(inode); > > > } > > > return 0; > > > } > > > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server > > > *server, struct nfs_fh *fhandle, > > > { > > > struct nfs4_exception exception = { > > > .interruptible = true, > > > + .inode = inode, > > > }; > > > int err; > > > do { > > > > -- > > Trond Myklebust > > Linux NFS client maintainer, Hammerspace > > trond.myklebust@hammerspace.com > > > > > >
> On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote: >> >> On Fri, 06 Jan 2023, Trond Myklebust wrote: >>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: >>>> On Wed, 04 Jan 2023, NeilBrown wrote: >>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: >>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: >>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: >>>>>>>> >>>>>>>> >>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR >>>>>>>> requests, >>>>>>>> why do >>>>>>>> we care about stateid values? Just mark the inode as stale >>>>>>>> and drop >>>>>>>> it >>>>>>>> on the floor. >>>>>>> >>>>>>> We have a valid state from the server - we really shouldn't >>>>>>> just >>>>>>> ignore >>>>>>> it. >>>>>>> >>>>>>> Maybe it would be OK to mark the inode stale. I don't know if >>>>>>> various >>>>>>> retry loops abort properly when the inode is stale. >>>>>> >>>>>> Yes, they are all supposed to do that. Otherwise we would end up >>>>>> looping forever in close(), for instance, since the PUTFH, >>>>>> GETATTR and >>>>>> CLOSE can all return NFS4ERR_STALE as well. >>>>> >>>>> To mark the inode as STALE we still need to find the inode, and >>>>> that is >>>>> the key bit missing in the current code. Once we find the inode, >>>>> we >>>>> could mark it stale, but maybe some other error resulted in the >>>>> missing >>>>> GETATTR... >>>>> >>>>> It might make sense to put the new code in _nfs4_proc_open() after >>>>> the >>>>> explicit nfs4_proc_getattr() fails. We would need to find the >>>>> inode >>>>> given only the filehandle. Is there any easy way to do that? >>>>> >>>>> Thanks, >>>>> NeilBrown >>>>> >>>> >>>> I couldn't see a consistent pattern to follow for when to mark an >>>> inode >>>> as stale. Do this, on top of the previous patch, seem reasonable? >>>> >>>> Thanks, >>>> NeilBrown >>>> >>>> >>>> >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>> index b441b1d14a50..04497cb42154 100644 >>>> --- a/fs/nfs/nfs4proc.c >>>> +++ b/fs/nfs/nfs4proc.c >>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct >>>> nfs_server *server, >>>> case -ESTALE: >>>> if (inode != NULL && S_ISREG(inode->i_mode)) >>>> pnfs_destroy_layout(NFS_I(inode)); >>>> + if (inode) >>>> + nfs_set_inode_stale(inode); >>> >>> This is normally dealt with in the generic code inside >>> nfs_revalidate_inode(). There should be no need to duplicate it here. >>> >>>> break; >>>> case -NFS4ERR_DELEG_REVOKED: >>>> case -NFS4ERR_ADMIN_REVOKED: >>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct >>>> nfs4_opendata *data, >>>> return status; >>>> } >>>> if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { >>>> + struct inode *inode = nfs4_get_inode_by_stateid( >>>> + &data->o_res.stateid, >>>> + data->owner); >>> >>> There shouldn't be a need to go looking for open descriptors here, >>> because they will hit ESTALE at some point later anyway. >> >> The problem is that they don't hit ESTALE later. Unless we update our >> stored stateid with the result of the OPEN, we can use the old stateid, >> and get the corresponding error. >> >> The only way to avoid the infinite loop is to either mark the inode as >> stale, or update the state-id. For either of those we need to find the >> inode. We don't have fileid so we cannot use iget. We do have file >> handle and state-id. >> >> Maybe you are saying this is a server bug that the client cannot be >> expect to cope with at all, and that an infinite loop is a valid client >> response to this particular server behaviour. In that case, I guess no >> patch is needed. > > I'm not arguing that the server should do something else. But I would > like to talk about it from the spec perspective. When PUTFH+WRITE is > sent to the server (with an incorrect stateid) and it's processing the > WRITE compound (as the spec doesn't require the server to validate a > filehandle on PUTFH. The spec says PUTFH is to "set" the correct > filehandle), the server is dealing with 2 errors that it can possibly > return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing > in the spec that speaks to the orders of errors to be returned. Of > course I'd like to say that the server should prioritize ERR_STALE > over ERR_OLD_STATEID (simply because it's a MUST in the spec and > ERR_OLD_STATEIDs are written as "rules"). > I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE.
On Mon, Jan 9, 2023 at 10:47 AM Trond Myklebust <trondmy@hammerspace.com> wrote: > > > > > On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote: > > > > On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote: > >> > >> On Fri, 06 Jan 2023, Trond Myklebust wrote: > >>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: > >>>> On Wed, 04 Jan 2023, NeilBrown wrote: > >>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: > >>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: > >>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR > >>>>>>>> requests, > >>>>>>>> why do > >>>>>>>> we care about stateid values? Just mark the inode as stale > >>>>>>>> and drop > >>>>>>>> it > >>>>>>>> on the floor. > >>>>>>> > >>>>>>> We have a valid state from the server - we really shouldn't > >>>>>>> just > >>>>>>> ignore > >>>>>>> it. > >>>>>>> > >>>>>>> Maybe it would be OK to mark the inode stale. I don't know if > >>>>>>> various > >>>>>>> retry loops abort properly when the inode is stale. > >>>>>> > >>>>>> Yes, they are all supposed to do that. Otherwise we would end up > >>>>>> looping forever in close(), for instance, since the PUTFH, > >>>>>> GETATTR and > >>>>>> CLOSE can all return NFS4ERR_STALE as well. > >>>>> > >>>>> To mark the inode as STALE we still need to find the inode, and > >>>>> that is > >>>>> the key bit missing in the current code. Once we find the inode, > >>>>> we > >>>>> could mark it stale, but maybe some other error resulted in the > >>>>> missing > >>>>> GETATTR... > >>>>> > >>>>> It might make sense to put the new code in _nfs4_proc_open() after > >>>>> the > >>>>> explicit nfs4_proc_getattr() fails. We would need to find the > >>>>> inode > >>>>> given only the filehandle. Is there any easy way to do that? > >>>>> > >>>>> Thanks, > >>>>> NeilBrown > >>>>> > >>>> > >>>> I couldn't see a consistent pattern to follow for when to mark an > >>>> inode > >>>> as stale. Do this, on top of the previous patch, seem reasonable? > >>>> > >>>> Thanks, > >>>> NeilBrown > >>>> > >>>> > >>>> > >>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > >>>> index b441b1d14a50..04497cb42154 100644 > >>>> --- a/fs/nfs/nfs4proc.c > >>>> +++ b/fs/nfs/nfs4proc.c > >>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct > >>>> nfs_server *server, > >>>> case -ESTALE: > >>>> if (inode != NULL && S_ISREG(inode->i_mode)) > >>>> pnfs_destroy_layout(NFS_I(inode)); > >>>> + if (inode) > >>>> + nfs_set_inode_stale(inode); > >>> > >>> This is normally dealt with in the generic code inside > >>> nfs_revalidate_inode(). There should be no need to duplicate it here. > >>> > >>>> break; > >>>> case -NFS4ERR_DELEG_REVOKED: > >>>> case -NFS4ERR_ADMIN_REVOKED: > >>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct > >>>> nfs4_opendata *data, > >>>> return status; > >>>> } > >>>> if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { > >>>> + struct inode *inode = nfs4_get_inode_by_stateid( > >>>> + &data->o_res.stateid, > >>>> + data->owner); > >>> > >>> There shouldn't be a need to go looking for open descriptors here, > >>> because they will hit ESTALE at some point later anyway. > >> > >> The problem is that they don't hit ESTALE later. Unless we update our > >> stored stateid with the result of the OPEN, we can use the old stateid, > >> and get the corresponding error. > >> > >> The only way to avoid the infinite loop is to either mark the inode as > >> stale, or update the state-id. For either of those we need to find the > >> inode. We don't have fileid so we cannot use iget. We do have file > >> handle and state-id. > >> > >> Maybe you are saying this is a server bug that the client cannot be > >> expect to cope with at all, and that an infinite loop is a valid client > >> response to this particular server behaviour. In that case, I guess no > >> patch is needed. > > > > I'm not arguing that the server should do something else. But I would > > like to talk about it from the spec perspective. When PUTFH+WRITE is > > sent to the server (with an incorrect stateid) and it's processing the > > WRITE compound (as the spec doesn't require the server to validate a > > filehandle on PUTFH. The spec says PUTFH is to "set" the correct > > filehandle), the server is dealing with 2 errors that it can possibly > > return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing > > in the spec that speaks to the orders of errors to be returned. Of > > course I'd like to say that the server should prioritize ERR_STALE > > over ERR_OLD_STATEID (simply because it's a MUST in the spec and > > ERR_OLD_STATEIDs are written as "rules"). > > > > I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE. We are talking about 4.0 and not 4.1. In 4.0 all operations that use PUTFH can return ERR_BADHANDLE. > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com >
> On Jan 9, 2023, at 11:07, Olga Kornievskaia <aglo@umich.edu> wrote: > > On Mon, Jan 9, 2023 at 10:47 AM Trond Myklebust <trondmy@hammerspace.com> wrote: >> >> >> >>> On Jan 9, 2023, at 10:33, Olga Kornievskaia <aglo@umich.edu> wrote: >>> >>> On Thu, Jan 5, 2023 at 10:48 PM NeilBrown <neilb@suse.de> wrote: >>>> >>>> On Fri, 06 Jan 2023, Trond Myklebust wrote: >>>>> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote: >>>>>> On Wed, 04 Jan 2023, NeilBrown wrote: >>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: >>>>>>>> On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote: >>>>>>>>> On Wed, 04 Jan 2023, Trond Myklebust wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> If the server starts to reply NFS4ERR_STALE to GETATTR >>>>>>>>>> requests, >>>>>>>>>> why do >>>>>>>>>> we care about stateid values? Just mark the inode as stale >>>>>>>>>> and drop >>>>>>>>>> it >>>>>>>>>> on the floor. >>>>>>>>> >>>>>>>>> We have a valid state from the server - we really shouldn't >>>>>>>>> just >>>>>>>>> ignore >>>>>>>>> it. >>>>>>>>> >>>>>>>>> Maybe it would be OK to mark the inode stale. I don't know if >>>>>>>>> various >>>>>>>>> retry loops abort properly when the inode is stale. >>>>>>>> >>>>>>>> Yes, they are all supposed to do that. Otherwise we would end up >>>>>>>> looping forever in close(), for instance, since the PUTFH, >>>>>>>> GETATTR and >>>>>>>> CLOSE can all return NFS4ERR_STALE as well. >>>>>>> >>>>>>> To mark the inode as STALE we still need to find the inode, and >>>>>>> that is >>>>>>> the key bit missing in the current code. Once we find the inode, >>>>>>> we >>>>>>> could mark it stale, but maybe some other error resulted in the >>>>>>> missing >>>>>>> GETATTR... >>>>>>> >>>>>>> It might make sense to put the new code in _nfs4_proc_open() after >>>>>>> the >>>>>>> explicit nfs4_proc_getattr() fails. We would need to find the >>>>>>> inode >>>>>>> given only the filehandle. Is there any easy way to do that? >>>>>>> >>>>>>> Thanks, >>>>>>> NeilBrown >>>>>>> >>>>>> >>>>>> I couldn't see a consistent pattern to follow for when to mark an >>>>>> inode >>>>>> as stale. Do this, on top of the previous patch, seem reasonable? >>>>>> >>>>>> Thanks, >>>>>> NeilBrown >>>>>> >>>>>> >>>>>> >>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>>>> index b441b1d14a50..04497cb42154 100644 >>>>>> --- a/fs/nfs/nfs4proc.c >>>>>> +++ b/fs/nfs/nfs4proc.c >>>>>> @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct >>>>>> nfs_server *server, >>>>>> case -ESTALE: >>>>>> if (inode != NULL && S_ISREG(inode->i_mode)) >>>>>> pnfs_destroy_layout(NFS_I(inode)); >>>>>> + if (inode) >>>>>> + nfs_set_inode_stale(inode); >>>>> >>>>> This is normally dealt with in the generic code inside >>>>> nfs_revalidate_inode(). There should be no need to duplicate it here. >>>>> >>>>>> break; >>>>>> case -NFS4ERR_DELEG_REVOKED: >>>>>> case -NFS4ERR_ADMIN_REVOKED: >>>>>> @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct >>>>>> nfs4_opendata *data, >>>>>> return status; >>>>>> } >>>>>> if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) { >>>>>> + struct inode *inode = nfs4_get_inode_by_stateid( >>>>>> + &data->o_res.stateid, >>>>>> + data->owner); >>>>> >>>>> There shouldn't be a need to go looking for open descriptors here, >>>>> because they will hit ESTALE at some point later anyway. >>>> >>>> The problem is that they don't hit ESTALE later. Unless we update our >>>> stored stateid with the result of the OPEN, we can use the old stateid, >>>> and get the corresponding error. >>>> >>>> The only way to avoid the infinite loop is to either mark the inode as >>>> stale, or update the state-id. For either of those we need to find the >>>> inode. We don't have fileid so we cannot use iget. We do have file >>>> handle and state-id. >>>> >>>> Maybe you are saying this is a server bug that the client cannot be >>>> expect to cope with at all, and that an infinite loop is a valid client >>>> response to this particular server behaviour. In that case, I guess no >>>> patch is needed. >>> >>> I'm not arguing that the server should do something else. But I would >>> like to talk about it from the spec perspective. When PUTFH+WRITE is >>> sent to the server (with an incorrect stateid) and it's processing the >>> WRITE compound (as the spec doesn't require the server to validate a >>> filehandle on PUTFH. The spec says PUTFH is to "set" the correct >>> filehandle), the server is dealing with 2 errors that it can possibly >>> return to the client ERR_STALE and ERR_OLD_STATEID. There is nothing >>> in the spec that speaks to the orders of errors to be returned. Of >>> course I'd like to say that the server should prioritize ERR_STALE >>> over ERR_OLD_STATEID (simply because it's a MUST in the spec and >>> ERR_OLD_STATEIDs are written as "rules"). >>> >> >> I disagree for the reason already pointed to in the spec. There is nothing in the spec that appears to allow the PUTFH to return anything other than NFS4ERR_STALE after the file has been deleted (and yes, RFC5661, Section 15.2 does list NFS4ERR_STALE as an error for PUTFH). PUTFH is definitely required to validate the file handle, since it is the ONLY operation that can return NFS4ERR_BADHANDLE. > > We are talking about 4.0 and not 4.1. In 4.0 all operations that use > PUTFH can return ERR_BADHANDLE. Same difference: RFC7530 Section 4.2.2 uses the exact same language as in RFC5661 to describe how file handles work for deleted files, and RFC7530 Section 13.2 list NFS4ERR_STALE as being a valid error for PUTFH.
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 5edd1704f735..5f2f5371a620 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -493,6 +493,7 @@ extern void nfs4_put_state_owner(struct nfs4_state_owner *); extern void nfs4_purge_state_owners(struct nfs_server *, struct list_head *); extern void nfs4_free_state_owners(struct list_head *head); extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *); +extern struct inode *nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner); extern void nfs4_put_open_state(struct nfs4_state *); extern void nfs4_close_state(struct nfs4_state *, fmode_t); extern void nfs4_close_sync(struct nfs4_state *, fmode_t); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 40d749f29ed3..b441b1d14a50 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2008,10 +2008,20 @@ nfs4_opendata_get_inode(struct nfs4_opendata *data) case NFS4_OPEN_CLAIM_NULL: case NFS4_OPEN_CLAIM_DELEGATE_CUR: case NFS4_OPEN_CLAIM_DELEGATE_PREV: - if (!(data->f_attr.valid & NFS_ATTR_FATTR)) - return ERR_PTR(-EAGAIN); - inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh, - &data->f_attr); + if (data->f_attr.valid & NFS_ATTR_FATTR) { + inode = nfs_fhget(data->dir->d_sb, &data->o_res.fh, + &data->f_attr); + } else { + /* We don't have the fileid and so cannot do inode + * lookup. If we already have this state open we MUST + * update the seqid to match the server, so we need to + * find it if possible. + */ + inode = nfs4_get_inode_by_stateid(&data->o_res.stateid, + data->owner); + if (!inode) + inode = ERR_PTR(-EAGAIN); + } break; default: inode = d_inode(data->dentry); diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 2a0ca5c7f082..80c36c61ae20 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -752,6 +752,23 @@ nfs4_get_open_state(struct inode *inode, struct nfs4_state_owner *owner) return state; } +struct inode * +nfs4_get_inode_by_stateid(nfs4_stateid *stateid, struct nfs4_state_owner *owner) +{ + struct nfs4_state *state; + struct inode *inode = NULL; + + spin_lock(&owner->so_lock); + list_for_each_entry(state, &owner->so_states, open_states) + if (nfs4_stateid_match_other(stateid, &state->open_stateid)) { + inode = state->inode; + ihold(inode); + break; + } + spin_unlock(&owner->so_lock); + return inode; +} + void nfs4_put_open_state(struct nfs4_state *state) { struct inode *inode = state->inode;
If a NFSv4 OPEN reply reports that the file was successfully opened but the subsequent GETATTR fails, Linux-NFS will attempt a stand-alone GETATTR request. If that also fails, handling of the reply is aborted with error -EAGAIN and the open is attempted again from the start. This leaves the server with an active state (because the OPEN operation succeeded) which the client doesn't know about. If the open-owner (local user) did not have the file already open, this has minimal consequences for the client and only causes the server to spend resources on an open state that will never be used or explicitly closed. If the open-owner DID already have the file open, then it will hold a reference to the open-state for that file, but the seq-id in the state-id will now be out-of-sync with the server. The server will have incremented the seq-id, but the client will not have noticed. So when the client next attempts to access the file using that state (READ, WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID. The Linux-client assumes this error is due to a race and simply retries on the basis that the local state-id information should have been updated by another thread. This basis is invalid in this case and the result is an infinite loop attempting IO and getting OLD_STATEID. This has been observed with a NetApp Filer as the server (ONTAP 9.8 p5, using NFSv4.0). The client is creating, writing, and unlinking a particular file from multiple clients (.bash_history). If a new OPEN from one client races with a REMOVE from another client while the first client already has the file open, the Filer can report success for the OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the OPEN request. This gets the seq-id out-of-sync and a subsequent write to the other open on the first client causes the infinite loop to occur. The reason that the client returns -EAGAIN is that it needs to find the inode so it can find the associated state to update the seq-id, but the inode lookup requires the file-id which is provided in the GETATTR reply. Without the file-id normal inode lookup cannot be used. This patch changes the lookup so that when the file-id is not available the list of states owned by the open-owner is examined to find the state with the correct state-id (ignoring the seq-id part of the state-id). If this is found it is used just as when a normal inode lookup finds an inode. If it isn't found, -EAGAIN is returned as before. This bug can be demonstrated by modifying the Linux NFS server as follows: 1/ The second time a file is opened, unlink it. This simulates a race with another client, without needing to have a race: --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (reclaim && !status) nn->somebody_reclaimed = true; out: + if (!status && open->op_stateid.si_generation > 1) { + printk("Opening gen %d\n", (int)open->op_stateid.si_generation); + vfs_unlink(mnt_user_ns(resfh->fh_export->ex_path.mnt), + resfh->fh_dentry->d_parent->d_inode, + resfh->fh_dentry, NULL); + } if (open->op_filp) { fput(open->op_filp); open->op_filp = NULL; 2/ When a GETATTR op is attempted on an unlinked file, return ESTALE @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (status) return status; + if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0) { + printk("Return Estale for unlinked file\n"); + return nfserr_stale; + } + if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1) return nfserr_inval; Then mount the filesystem and Thread 1 Thread 2 open a file open the same file (will fail) write to that file I use this shell fragment, using 'sleep' for synchronisation. The use of /bin/echo ensures the write is flushed when /bin/echo closes the fd on exit. ( /bin/echo hello sleep 3 /bin/echo there ) > /import/A/afile & sleep 3 cat /import/A/afile Probably when the OPEN succeeds, the GETATTR fails, and we don't already have the state open, we should explicitly close the state. Leaving it open could cause problems if, for example, the server revoked it and signalled the client that there was a revoked state. The client would not be able to find the state that needed to be relinquished. I haven't attempted to implement this. For stable backports prior to 4.14 the nfs_fhget() that needs an alternate is in _nfs4_opendata_to_nfs4_state(). Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de> --- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 18 ++++++++++++++---- fs/nfs/nfs4state.c | 17 +++++++++++++++++ 3 files changed, 32 insertions(+), 4 deletions(-)