diff mbox

nfsd: supports read buffer from multiples pages

Message ID 56AE0302.6050101@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Kinglong Mee Jan. 31, 2016, 12:50 p.m. UTC
ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This patch lets nfsd supports read buffer from multiples pages.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
---
 fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

J. Bruce Fields Feb. 1, 2016, 6:38 p.m. UTC | #1
On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This is with the xfs block layout?

Christoph, do we know anything about average or worst-case sizes for
that layout update field?

> This patch lets nfsd supports read buffer from multiples pages.

Hm.  We'll end up kmalloc()ing the passed-in field length:

		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);

We still do still have that (avail + argp->pagelen) limit, so we're not
going to pass arbitrarily large nbytes straight from the network to
kmalloc.  But we do try to avoid depending on higher-order allocations.

--b.

> 
> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
> ---
>  fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d6ef095..fcf399f 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>  	 * Maybe we need a new page, maybe we have just run out
>  	 */
>  	unsigned int avail = (char *)argp->end - (char *)argp->p;
> +	unsigned int copied = 0;
>  	__be32 *p;
> +
>  	if (avail + argp->pagelen < nbytes)
>  		return NULL;
> -	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
> -		return NULL;
>  	/* ok, we can do it with the current plus the next page */
>  	if (nbytes <= sizeof(argp->tmp))
>  		p = argp->tmp;
> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>  	 * guarantee p points to at least nbytes bytes.
>  	 */
>  	memcpy(p, argp->p, avail);
> +	copied += avail;
> +	nbytes -= avail;
> +
> +	while (nbytes > PAGE_SIZE) {
> +		next_decode_page(argp);
> +		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
> +		copied += PAGE_SIZE;
> +		nbytes -= PAGE_SIZE;
> +	}
> +
>  	next_decode_page(argp);
> -	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
> -	argp->p += XDR_QUADLEN(nbytes - avail);
> +	memcpy(((char*)p) + copied, argp->p, nbytes);
> +	argp->p += XDR_QUADLEN(nbytes);
>  	return p;
>  }
>  
> -- 
> 2.5.0
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 1, 2016, 6:48 p.m. UTC | #2
On Mon, Feb 01, 2016 at 01:38:05PM -0500, bfields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> > ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> > larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
> 
> This is with the xfs block layout?
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
> 
> > This patch lets nfsd supports read buffer from multiples pages.
> 
> Hm.  We'll end up kmalloc()ing the passed-in field length:
> 
> 		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
> 
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc.  But we do try to avoid depending on higher-order allocations.

(Which it looks like we were allowing before, possibly by accident.  But
I doubt they were actually happening in practice, so that's not evidence
that we don't need to worry about allocations greater than a page.)

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kinglong Mee Feb. 2, 2016, 12:54 a.m. UTC | #3
On 2/2/2016 02:38, J. Bruce Fields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
>> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
>> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
> 
> This is with the xfs block layout?

Yes, xfs block layout.
Tested by ltp's fsync02.

thanks,
Kinglong Mee
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
> 
>> This patch lets nfsd supports read buffer from multiples pages.
> 
> Hm.  We'll end up kmalloc()ing the passed-in field length:
> 
> 		p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
> 
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc.  But we do try to avoid depending on higher-order allocations.
> 
> --b.
> 
>>
>> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
>> ---
>>  fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index d6ef095..fcf399f 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>>  	 * Maybe we need a new page, maybe we have just run out
>>  	 */
>>  	unsigned int avail = (char *)argp->end - (char *)argp->p;
>> +	unsigned int copied = 0;
>>  	__be32 *p;
>> +
>>  	if (avail + argp->pagelen < nbytes)
>>  		return NULL;
>> -	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
>> -		return NULL;
>>  	/* ok, we can do it with the current plus the next page */
>>  	if (nbytes <= sizeof(argp->tmp))
>>  		p = argp->tmp;
>> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>>  	 * guarantee p points to at least nbytes bytes.
>>  	 */
>>  	memcpy(p, argp->p, avail);
>> +	copied += avail;
>> +	nbytes -= avail;
>> +
>> +	while (nbytes > PAGE_SIZE) {
>> +		next_decode_page(argp);
>> +		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
>> +		copied += PAGE_SIZE;
>> +		nbytes -= PAGE_SIZE;
>> +	}
>> +
>>  	next_decode_page(argp);
>> -	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> -	argp->p += XDR_QUADLEN(nbytes - avail);
>> +	memcpy(((char*)p) + copied, argp->p, nbytes);
>> +	argp->p += XDR_QUADLEN(nbytes);
>>  	return p;
>>  }
>>  
>> -- 
>> 2.5.0
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Feb. 2, 2016, 9:20 a.m. UTC | #4
On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> This is with the xfs block layout?
> 
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?

The average is rather small and fits into a single page, the worst
case is basically unlimited:

	(file size / block size) * sizeof(pnfs_block_extent)

by the protocol, and about half that for a non-stupid client as
it would merge consecutive blocks and only trigger something close
to the worst case for a "block allocated, block hole, block allocated, ..."
pattern.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J. Bruce Fields Feb. 2, 2016, 2:29 p.m. UTC | #5
On Tue, Feb 02, 2016 at 10:20:54AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> > This is with the xfs block layout?
> > 
> > Christoph, do we know anything about average or worst-case sizes for
> > that layout update field?
> 
> The average is rather small and fits into a single page, the worst
> case is basically unlimited:
> 
> 	(file size / block size) * sizeof(pnfs_block_extent)
> 
> by the protocol, and about half that for a non-stupid client as
> it would merge consecutive blocks and only trigger something close
> to the worst case for a "block allocated, block hole, block allocated, ..."
> pattern.

OK.  And what's the failure mode if the layoutcommit fails?  I guess the
client returns the layout and redoes IO through the MDS.  For a rare
failure maybe that's not so terrible.

So I guess the right thing to do is take Kinglong's patch for now.

After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
to decode from an array of pages.  But does the iomaps array end up
being just as big?

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Feb. 29, 2016, 9:49 a.m. UTC | #6
On Tue, Feb 02, 2016 at 09:29:48AM -0500, J. Bruce Fields wrote:
> OK.  And what's the failure mode if the layoutcommit fails?  I guess the
> client returns the layout and redoes IO through the MDS.  For a rare
> failure maybe that's not so terrible.
> 
> So I guess the right thing to do is take Kinglong's patch for now.

Yes, it would be great to take it.

> After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
> to decode from an array of pages.  But does the iomaps array end up
> being just as big?

The block layout extents are rather bloated, so it will be significantly
smaller.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d6ef095..fcf399f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -143,11 +143,11 @@  static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
 	 * Maybe we need a new page, maybe we have just run out
 	 */
 	unsigned int avail = (char *)argp->end - (char *)argp->p;
+	unsigned int copied = 0;
 	__be32 *p;
+
 	if (avail + argp->pagelen < nbytes)
 		return NULL;
-	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
-		return NULL;
 	/* ok, we can do it with the current plus the next page */
 	if (nbytes <= sizeof(argp->tmp))
 		p = argp->tmp;
@@ -164,9 +164,19 @@  static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
 	 * guarantee p points to at least nbytes bytes.
 	 */
 	memcpy(p, argp->p, avail);
+	copied += avail;
+	nbytes -= avail;
+
+	while (nbytes > PAGE_SIZE) {
+		next_decode_page(argp);
+		memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
+		copied += PAGE_SIZE;
+		nbytes -= PAGE_SIZE;
+	}
+
 	next_decode_page(argp);
-	memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
-	argp->p += XDR_QUADLEN(nbytes - avail);
+	memcpy(((char*)p) + copied, argp->p, nbytes);
+	argp->p += XDR_QUADLEN(nbytes);
 	return p;
 }