diff mbox

NFSv3/v2: Fix data corruption with NFS short reads.

Message ID 20130411112121.74d996c5@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown April 11, 2013, 1:21 a.m. UTC
On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <malahal@us.ibm.com> wrote:

> This bug seems to be present in v2.6.37 or lower versions. The code was
> re-organized in v2.6.38 that eliminated the bug. Current upstream code
> doesn't have this bug. This may be applicable to some longterm releases!
> 
> Here are the bug details:
> 
> 1. nfs_read_rpcsetup(), args.count and res.count are both set to the
>    actual number of bytes to be read. Let us assume that the request is
>    for 16K, so arg.count = res.count = 16K
> 2. nfs3_xdr_readres() conditionally sets res.count to to the actual
>    number of bytes read. This condition is true for the first response
>    as res.count was set to args.count before the first request. Let us
>    say the server returned only 4K bytes. res.count=4K
> 3. Another read request is sent for the remaining data. Note that
>    res.count is NOT updated. It is still set to the actual amount of
>    bytes we got in the first response.  The client will send a READ
>    request for the remaining 12K.

This is looks like a real bug, but I think the "NOT" above is the best thing
to fix.
i.e. when another read request is set, res.count *SHOULD*BE* updated.  That
makes it consistent with the original send, and consistency is good!

So this patch.



This would old affect clients with servers which would sometimes return
partial reads, and I don't think the Linux NFS server does.  What server have
you seen this against?

NeilBrown



> 4. Assume that the server gave all 12K bytes. We still think we got ony
>    4K bytes due to conditional update in nfs3_xdr_readres(). The client
>    sends further requests, if not EOF! If this response includes EOF, it
>    truncates pages beyond 4K+4K causing data corruption beyond 8K. The
>    corrupted data is filled with zeros!
> 
> Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
> 
> ---
> fs/nfs/nfs2xdr.c | 3 +--
>  fs/nfs/nfs3xdr.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> index 5914a19..710ca56 100644
> --- a/fs/nfs/nfs2xdr.c
> +++ b/fs/nfs/nfs2xdr.c
> @@ -287,8 +287,7 @@ nfs_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
>  	}
>  
>  	dprintk("RPC:      readres OK count %u\n", count);
> -	if (count < res->count)
> -		res->count = count;
> +	res->count = count;
>  
>  	return count;
>  }
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index f6cc60f..152a5e4 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -914,8 +914,7 @@ nfs3_xdr_readres(struct rpc_rqst *req, __be32 *p, struct nfs_readres *res)
>  		res->eof = 0;
>  	}
>  
> -	if (count < res->count)
> -		res->count = count;
> +	res->count = count;
>  
>  	return count;
>  }

Comments

malahal naineni April 11, 2013, 3:55 a.m. UTC | #1
NeilBrown [neilb@suse.de] wrote:
> On Fri, 29 Mar 2013 19:57:06 -0000 malahal naineni <malahal@us.ibm.com> wrote:
> 
> > This bug seems to be present in v2.6.37 or lower versions. The code was
> > re-organized in v2.6.38 that eliminated the bug. Current upstream code
> > doesn't have this bug. This may be applicable to some longterm releases!
> > 
> > Here are the bug details:
> > 
> > 1. nfs_read_rpcsetup(), args.count and res.count are both set to the
> >    actual number of bytes to be read. Let us assume that the request is
> >    for 16K, so arg.count = res.count = 16K
> > 2. nfs3_xdr_readres() conditionally sets res.count to to the actual
> >    number of bytes read. This condition is true for the first response
> >    as res.count was set to args.count before the first request. Let us
> >    say the server returned only 4K bytes. res.count=4K
> > 3. Another read request is sent for the remaining data. Note that
> >    res.count is NOT updated. It is still set to the actual amount of
> >    bytes we got in the first response.  The client will send a READ
> >    request for the remaining 12K.
> 
> This is looks like a real bug, but I think the "NOT" above is the best thing
> to fix.

No doubt, a real bug! Easily reproduced with an instrumented nfsd.

> i.e. when another read request is set, res.count *SHOULD*BE* updated.  That
> makes it consistent with the original send, and consistency is good!

I thought about it, but the resp->count is NOT really used as far as I
know. And the current upstream code unconditionally sets the resp->count
in xdr function. So I chose the upstream method! I agree, your patch is
more consistent with the existing code.

> Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c
> ===================================================================
> --- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c	2013-03-20 16:24:31.426605189 +1100
> +++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c	2013-04-11 11:19:57.670724540 +1000
> @@ -368,6 +368,7 @@ static void nfs_readpage_retry(struct rp
>  	argp->offset += resp->count;
>  	argp->pgbase += resp->count;
>  	argp->count -= resp->count;
> +	resp->count = argp->count;
>  	nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
>  	return;
>  out:

This patch should fix the bug as well.

> This would old affect clients with servers which would sometimes
> return partial reads, and I don't think the Linux NFS server does.
> What server have you seen this against?

We came across under Ganesha development, and I was told the same thing
that Linux NFS server doesn't do this. I didn't bother to post then, but
now we saw the same thing with linux NFS server, so I decided to post it
now. Although linux NFS server doesn't in itself create short reads but
it just calls the underlying back-end file system and sends without a
short read "check" to NFS clients. In other words, the short read
behaviour actually depends on the back-end file system rather than linux
NFS server. We saw this bug with our GPFS file system in combination
with HSM -- request for reading data on tape would fail until it is
brought back to disk.

Regards, Malahal.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c
===================================================================
--- linux-2.6.32-SLE11-SP1-LTSS.orig/fs/nfs/read.c	2013-03-20 16:24:31.426605189 +1100
+++ linux-2.6.32-SLE11-SP1-LTSS/fs/nfs/read.c	2013-04-11 11:19:57.670724540 +1000
@@ -368,6 +368,7 @@  static void nfs_readpage_retry(struct rp
 	argp->offset += resp->count;
 	argp->pgbase += resp->count;
 	argp->count -= resp->count;
+	resp->count = argp->count;
 	nfs4_restart_rpc(task, NFS_SERVER(data->inode)->nfs_client);
 	return;
 out: