Message ID | 20220624154737.1387850-1-anna@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSD: Don't continue encoding if READ_PLUS gets confused | expand |
Hi Anna! > On Jun 24, 2022, at 11:47 AM, Anna Schumaker <anna@kernel.org> wrote: > > From: Anna Schumaker <Anna.Schumaker@Netapp.com> > > If we were in a HOLE segement, but vfs_llseek() claimed we were encoding > DATA then we would switch over to the DATA encoding function. This > conflicts with Chuck's latest xdr cleanup patches and can result in a > crash or silent hang. Let's just return nfserr_io if we find we are in > this situation, which will cause the encoder to return to the client > with the number of segments already encoded. The client can then try the > READ_PLUS call again. > > Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()) checkpatch complains about this tag. Also, Bruce said he wasn't able to reproduce the issue on 6c254bf3b637, only on the whole series. Were you able to hit it with just this commit applied? > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> I find this somewhat problematic. I can't apply this patch in good conscience: * The usual guideline for applying a workaround upstream is that there's been a demonstration that it will be impossible to find and fix the real problem. I don't see that here. * We still don't understand the failure. The XDR code itself might be broken? Therefore we don't understand if this workaround is 100% effective * Usually with a workaround, there's a hint of a long-term fix... is this the long-term fix? In other words, I might give this patch to a customer who needed to get back on her feet quickly. I'm hesitant to take it as an upstream change without further justification. IMHO a fix in an XDR consumer (like READ_PLUS) needs to demonstrate that the current XDR code is working as designed. Otherwise, the XDR code itself is what needs to be fixed. > --- > fs/nfsd/nfs4xdr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 61b2aae81abb..dc97d7c7e595 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, > if (data_pos == -ENXIO) > data_pos = f_size; > else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) > - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); > + return nfserr_io; > count = data_pos - read->rd_offset; > > /* Content type, offset, byte count */ > -- > 2.36.1 > -- Chuck Lever
On Fri, Jun 24, 2022 at 1:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote: > > Hi Anna! > > > On Jun 24, 2022, at 11:47 AM, Anna Schumaker <anna@kernel.org> wrote: > > > > From: Anna Schumaker <Anna.Schumaker@Netapp.com> > > > > If we were in a HOLE segement, but vfs_llseek() claimed we were encoding > > DATA then we would switch over to the DATA encoding function. This > > conflicts with Chuck's latest xdr cleanup patches and can result in a > > crash or silent hang. Let's just return nfserr_io if we find we are in > > this situation, which will cause the encoder to return to the client > > with the number of segments already encoded. The client can then try the > > READ_PLUS call again. > > > > Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()) > > checkpatch complains about this tag. Okay. I'll look into that. > > Also, Bruce said he wasn't able to reproduce the issue on > 6c254bf3b637, only on the whole series. Were you able to hit > it with just this commit applied? I've been having trouble hitting it in general. I think I only got an oops once, but most of the time it just silently hangs. I'll try a bit longer to see if I can hit it again. > > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> > > I find this somewhat problematic. I can't apply this patch in > good conscience: > > * The usual guideline for applying a workaround upstream is > that there's been a demonstration that it will be impossible > to find and fix the real problem. I don't see that here. > > * We still don't understand the failure. The XDR code itself > might be broken? Therefore we don't understand if this > workaround is 100% effective > > * Usually with a workaround, there's a hint of a long-term > fix... is this the long-term fix? No, the long term fix is my new READ_PLUS code. I'm hoping to have a v2 ready soon, but I tested v1 and didn't have any problem (no oops or silent hang) > > In other words, I might give this patch to a customer who > needed to get back on her feet quickly. I'm hesitant to take > it as an upstream change without further justification. > > IMHO a fix in an XDR consumer (like READ_PLUS) needs to > demonstrate that the current XDR code is working as designed. > Otherwise, the XDR code itself is what needs to be fixed. > > > > --- > > fs/nfsd/nfs4xdr.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > index 61b2aae81abb..dc97d7c7e595 100644 > > --- a/fs/nfsd/nfs4xdr.c > > +++ b/fs/nfsd/nfs4xdr.c > > @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, > > if (data_pos == -ENXIO) > > data_pos = f_size; > > else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) > > - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); > > + return nfserr_io; > > count = data_pos - read->rd_offset; > > > > /* Content type, offset, byte count */ > > -- > > 2.36.1 > > > > -- > Chuck Lever > > >
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c index 61b2aae81abb..dc97d7c7e595 100644 --- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp, if (data_pos == -ENXIO) data_pos = f_size; else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE)) - return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size); + return nfserr_io; count = data_pos - read->rd_offset; /* Content type, offset, byte count */