diff mbox series

NFSD: Don't continue encoding if READ_PLUS gets confused

Message ID 20220624154737.1387850-1-anna@kernel.org (mailing list archive)
State New, archived
Headers show
Series NFSD: Don't continue encoding if READ_PLUS gets confused | expand

Commit Message

Anna Schumaker June 24, 2022, 3:47 p.m. UTC
From: Anna Schumaker <Anna.Schumaker@Netapp.com>

If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
DATA then we would switch over to the DATA encoding function. This
conflicts with Chuck's latest xdr cleanup patches and can result in a
crash or silent hang. Let's just return nfserr_io if we find we are in
this situation, which will cause the encoder to return to the client
with the number of segments already encoded. The client can then try the
READ_PLUS call again.

Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chuck Lever III June 24, 2022, 5:40 p.m. UTC | #1
Hi Anna!

> On Jun 24, 2022, at 11:47 AM, Anna Schumaker <anna@kernel.org> wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
> DATA then we would switch over to the DATA encoding function. This
> conflicts with Chuck's latest xdr cleanup patches and can result in a
> crash or silent hang. Let's just return nfserr_io if we find we are in
> this situation, which will cause the encoder to return to the client
> with the number of segments already encoded. The client can then try the
> READ_PLUS call again.
> 
> Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())

checkpatch complains about this tag.

Also, Bruce said he wasn't able to reproduce the issue on
6c254bf3b637, only on the whole series. Were you able to hit
it with just this commit applied?


> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

I find this somewhat problematic. I can't apply this patch in
good conscience:

* The usual guideline for applying a workaround upstream is
  that there's been a demonstration that it will be impossible
  to find and fix the real problem. I don't see that here.

* We still don't understand the failure. The XDR code itself
  might be broken? Therefore we don't understand if this
  workaround is 100% effective

* Usually with a workaround, there's a hint of a long-term
  fix... is this the long-term fix?

In other words, I might give this patch to a customer who
needed to get back on her feet quickly. I'm hesitant to take
it as an upstream change without further justification.

IMHO a fix in an XDR consumer (like READ_PLUS) needs to
demonstrate that the current XDR code is working as designed.
Otherwise, the XDR code itself is what needs to be fixed.


> ---
> fs/nfsd/nfs4xdr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 61b2aae81abb..dc97d7c7e595 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> 	if (data_pos == -ENXIO)
> 		data_pos = f_size;
> 	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> -		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> +		return nfserr_io;
> 	count = data_pos - read->rd_offset;
> 
> 	/* Content type, offset, byte count */
> -- 
> 2.36.1
> 

--
Chuck Lever
Anna Schumaker June 24, 2022, 6:12 p.m. UTC | #2
On Fri, Jun 24, 2022 at 1:40 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
> Hi Anna!
>
> > On Jun 24, 2022, at 11:47 AM, Anna Schumaker <anna@kernel.org> wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > If we were in a HOLE segement, but vfs_llseek() claimed we were encoding
> > DATA then we would switch over to the DATA encoding function. This
> > conflicts with Chuck's latest xdr cleanup patches and can result in a
> > crash or silent hang. Let's just return nfserr_io if we find we are in
> > this situation, which will cause the encoder to return to the client
> > with the number of segments already encoded. The client can then try the
> > READ_PLUS call again.
> >
> > Fxes: 6c254bf3b637 (SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer())
>
> checkpatch complains about this tag.

Okay. I'll look into that.

>
> Also, Bruce said he wasn't able to reproduce the issue on
> 6c254bf3b637, only on the whole series. Were you able to hit
> it with just this commit applied?

I've been having trouble hitting it in general. I think I only got an
oops once, but most of the time it just silently hangs. I'll try a bit
longer to see if I can hit it again.

>
>
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>
> I find this somewhat problematic. I can't apply this patch in
> good conscience:
>
> * The usual guideline for applying a workaround upstream is
>   that there's been a demonstration that it will be impossible
>   to find and fix the real problem. I don't see that here.
>
> * We still don't understand the failure. The XDR code itself
>   might be broken? Therefore we don't understand if this
>   workaround is 100% effective
>
> * Usually with a workaround, there's a hint of a long-term
>   fix... is this the long-term fix?

No, the long term fix is my new READ_PLUS code. I'm hoping to have a
v2 ready soon, but I tested v1 and didn't have any problem (no oops or
silent hang)

>
> In other words, I might give this patch to a customer who
> needed to get back on her feet quickly. I'm hesitant to take
> it as an upstream change without further justification.
>
> IMHO a fix in an XDR consumer (like READ_PLUS) needs to
> demonstrate that the current XDR code is working as designed.
> Otherwise, the XDR code itself is what needs to be fixed.
>
>
> > ---
> > fs/nfsd/nfs4xdr.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 61b2aae81abb..dc97d7c7e595 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -4792,7 +4792,7 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> >       if (data_pos == -ENXIO)
> >               data_pos = f_size;
> >       else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
> > -             return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
> > +             return nfserr_io;
> >       count = data_pos - read->rd_offset;
> >
> >       /* Content type, offset, byte count */
> > --
> > 2.36.1
> >
>
> --
> Chuck Lever
>
>
>
diff mbox series

Patch

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 61b2aae81abb..dc97d7c7e595 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4792,7 +4792,7 @@  nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
 	if (data_pos == -ENXIO)
 		data_pos = f_size;
 	else if (data_pos <= read->rd_offset || (data_pos < f_size && data_pos % PAGE_SIZE))
-		return nfsd4_encode_read_plus_data(resp, read, maxcount, eof, &f_size);
+		return nfserr_io;
 	count = data_pos - read->rd_offset;
 
 	/* Content type, offset, byte count */