NFSD: trim reads past NFS_OFFSET_MAX

Message ID	20220121185023.260128-1-dan.aloni@vastdata.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@kernel.org> From: Dan Aloni <dan.aloni@vastdata.com> To: trondmy@kernel.org Cc: Anna Schumaker <Anna.Schumaker@netapp.com>, linux-nfs@vger.kernel.org Subject: [PATCH] NFSD: trim reads past NFS_OFFSET_MAX Date: Fri, 21 Jan 2022 20:50:23 +0200 Message-Id: <20220121185023.260128-1-dan.aloni@vastdata.com> In-Reply-To: <fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com> References: <fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	NFSD: trim reads past NFS_OFFSET_MAX \| expand NFSD: trim reads past NFS_OFFSET_MAX

Dan Aloni Jan. 21, 2022, 6:50 p.m. UTC

Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
RPC read layers"), a read of 0xfff is aligned up to server rsize of
0x1000.

As a result, in a test where the server has a file of size
0x7fffffffffffffff, and the client tries to read from the offset
0x7ffffffffffff000, the read causes loff_t overflow in the server and it
returns an NFS code of EINVAL to the client. The client as a result
indefinitely retries the request.

This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX.

Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
---
 fs/nfsd/vfs.c | 4 ++++
 1 file changed, 4 insertions(+)

Chuck Lever III Jan. 21, 2022, 10:32 p.m. UTC | #1

Hi Dan!

NFS server patches should be sent to me these days.

$ scripts/get_maintainer.pl fs/nfsd
Chuck Lever <chuck.lever@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS)
linux-nfs@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS)
linux-kernel@vger.kernel.org (open list)


> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote:
> 
> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
> RPC read layers"), a read of 0xfff is aligned up to server rsize of
> 0x1000.
> 
> As a result, in a test where the server has a file of size
> 0x7fffffffffffffff, and the client tries to read from the offset
> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> returns an NFS code of EINVAL to the client. The client as a result
> indefinitely retries the request.

An infinite loop in this case is a client bug.

Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
5661 permits the NFSv4 READ operation to return
NFS4ERR_INVAL.

Was the client side fix for this issue rejected?


> This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX.

It's OK for the server to return a short READ in this case,
so I will indeed consider a change to make that happen. But
see below for comments specific to this patch.


> Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers")
> Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
> ---
> fs/nfsd/vfs.c | 4 ++++
> 1 file changed, 4 insertions(+)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 738d564ca4ce..754f4e9ff4a2 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> 	__be32 err;
> 
> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> +
> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
> +		*count = NFS_OFFSET_MAX - offset;

Can @offset ever be larger than NFS_OFFSET_MAX?

Does this check have any effect on NFSv4 READ operations?


> +
> 	err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> 	if (err)
> 		return err;
> -- 
> 2.23.0
> 

--
Chuck Lever

Dan Aloni Jan. 22, 2022, 12:47 p.m. UTC | #2

On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
> NFS server patches should be sent to me these days.

Thanks, will remember this next time.

> > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote:
> > 
> > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
> > RPC read layers"), a read of 0xfff is aligned up to server rsize of
> > 0x1000.
> > 
> > As a result, in a test where the server has a file of size
> > 0x7fffffffffffffff, and the client tries to read from the offset
> > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
> > returns an NFS code of EINVAL to the client. The client as a result
> > indefinitely retries the request.
> 
> An infinite loop in this case is a client bug.
> 
> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> 5661 permits the NFSv4 READ operation to return
> NFS4ERR_INVAL.
> 
> Was the client side fix for this issue rejected?
 
Yeah, see Trond's response in

   https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/

So it is both a client and server bugs?

> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 738d564ca4ce..754f4e9ff4a2 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > 	__be32 err;
> > 
> > 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> > +
> > +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
> > +		*count = NFS_OFFSET_MAX - offset;
> 
> Can @offset ever be larger than NFS_OFFSET_MAX?

We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
(should it have been `>` rather?).

Seems it is missing from NFSv3, should add.

> Does this check have any effect on NFSv4 READ operations?

Indeed it doesn't - my expanded testing shows it only fixed for NFSv3.
Will send an updated patch.

Chuck Lever III Jan. 22, 2022, 5:05 p.m. UTC | #3

> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> wrote:
> 
> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote:
>>> 
>>> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the
>>> RPC read layers"), a read of 0xfff is aligned up to server rsize of
>>> 0x1000.
>>> 
>>> As a result, in a test where the server has a file of size
>>> 0x7fffffffffffffff, and the client tries to read from the offset
>>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it
>>> returns an NFS code of EINVAL to the client. The client as a result
>>> indefinitely retries the request.
>> 
>> An infinite loop in this case is a client bug.
>> 
>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>> 5661 permits the NFSv4 READ operation to return
>> NFS4ERR_INVAL.
>> 
>> Was the client side fix for this issue rejected?
> 
> Yeah, see Trond's response in
> 
>   https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/
> 
> So it is both a client and server bugs?

Splitting hairs, but yes there are issues on both sides
IMO. Bad behavior due to bugs on both sides is actually
not uncommon.

Trond is correct that the server is not dealing totally
correctly with the range of values in a READ request.

However, as I pointed out, the specification permits NFS
servers to return NFS[34]ERR_INVAL on READ. And in fact,
there is already code in the NFSv4 READ path that returns
INVAL, for example:

 785         if (read->rd_offset >= OFFSET_MAX)
 786                 return nfserr_inval;

I'm not sure the specifications describe precisely when
the server /must/ return INVAL, but the client needs to
be prepared to handle it reasonably. If INVAL results in
an infinite loop, then that's a client bug.

IMO changing the alignment for that case is a band-aid.
The underlying looping behavior is what is the root
problem. (So... I agree with Trond's NACK, but for
different reasons).

>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>> index 738d564ca4ce..754f4e9ff4a2 100644
>>> --- a/fs/nfsd/vfs.c
>>> +++ b/fs/nfsd/vfs.c
>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>> 	__be32 err;
>>> 
>>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>> +
>>> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>> +		*count = NFS_OFFSET_MAX - offset;
>> 
>> Can @offset ever be larger than NFS_OFFSET_MAX?
> 
> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
> (should it have been `>` rather?).

Don't think so, a zero-byte READ should be valid.

However it's rather interesting that it does not use
NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?

--
Chuck Lever

Trond Myklebust Jan. 22, 2022, 6:27 p.m. UTC | #4

On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
> 
> > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com>
> > wrote:
> > 
> > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
> > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com>
> > > > wrote:
> > > > 
> > > > Due to change 8cfb9015280d ("NFS: Always provide aligned
> > > > buffers to the
> > > > RPC read layers"), a read of 0xfff is aligned up to server
> > > > rsize of
> > > > 0x1000.
> > > > 
> > > > As a result, in a test where the server has a file of size
> > > > 0x7fffffffffffffff, and the client tries to read from the
> > > > offset
> > > > 0x7ffffffffffff000, the read causes loff_t overflow in the
> > > > server and it
> > > > returns an NFS code of EINVAL to the client. The client as a
> > > > result
> > > > indefinitely retries the request.
> > > 
> > > An infinite loop in this case is a client bug.
> > > 
> > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> > > 5661 permits the NFSv4 READ operation to return
> > > NFS4ERR_INVAL.
> > > 
> > > Was the client side fix for this issue rejected?
> > 
> > Yeah, see Trond's response in
> > 
> >  
> > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/
> > 
> > So it is both a client and server bugs?
> 
> Splitting hairs, but yes there are issues on both sides
> IMO. Bad behavior due to bugs on both sides is actually
> not uncommon.
> 
> Trond is correct that the server is not dealing totally
> correctly with the range of values in a READ request.
> 
> However, as I pointed out, the specification permits NFS
> servers to return NFS[34]ERR_INVAL on READ. And in fact,
> there is already code in the NFSv4 READ path that returns
> INVAL, for example:
> 
>  785         if (read->rd_offset >= OFFSET_MAX)
>  786                 return nfserr_inval;
> 
> I'm not sure the specifications describe precisely when
> the server /must/ return INVAL, but the client needs to
> be prepared to handle it reasonably. If INVAL results in
> an infinite loop, then that's a client bug.
> 
> IMO changing the alignment for that case is a band-aid.
> The underlying looping behavior is what is the root
> problem. (So... I agree with Trond's NACK, but for
> different reasons).

If I'm reading Dan's test case correctly, the client is trying to read
a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
That means the end offset for that read is 0x7fffffffffffff000 + 0x1000
- 1 = 0x7fffffffffffffff.

IOW: as far as the server is concerned, there is no loff_t overflow on
either the start or end offset and so there is no reason for it to
return NFS4ERR_INVAL.

Dan Aloni Jan. 22, 2022, 7:01 p.m. UTC | #5

On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote:
> >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> >>> index 738d564ca4ce..754f4e9ff4a2 100644
> >>> --- a/fs/nfsd/vfs.c
> >>> +++ b/fs/nfsd/vfs.c
> >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >>> 	__be32 err;
> >>> 
> >>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
> >>> +
> >>> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
> >>> +		*count = NFS_OFFSET_MAX - offset;
> >> 
> >> Can @offset ever be larger than NFS_OFFSET_MAX?
> > 
> > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
> > (should it have been `>` rather?).
> 
> Don't think so, a zero-byte READ should be valid.

Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check
resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes
as well?

> However it's rather interesting that it does not use
> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?

NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was
moved to a header file, which explains the comment on top of it,
outdated for quite awhile:

    /*
     * This is really a general kernel constant, but since nothing like
     * this is defined in the kernel headers, I have to do it here.
     */
    #define NFS_OFFSET_MAX		((__s64)((~(__u64)0) >> 1))

And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems
`OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem
inter-changeable to me.

Chuck Lever III Jan. 22, 2022, 8:15 p.m. UTC | #6

> On Jan 22, 2022, at 1:27 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com>
>>> wrote:
>>> 
>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote:
>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com>
>>>>> wrote:
>>>>> 
>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned
>>>>> buffers to the
>>>>> RPC read layers"), a read of 0xfff is aligned up to server
>>>>> rsize of
>>>>> 0x1000.
>>>>> 
>>>>> As a result, in a test where the server has a file of size
>>>>> 0x7fffffffffffffff, and the client tries to read from the
>>>>> offset
>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the
>>>>> server and it
>>>>> returns an NFS code of EINVAL to the client. The client as a
>>>>> result
>>>>> indefinitely retries the request.
>>>> 
>>>> An infinite loop in this case is a client bug.
>>>> 
>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>>>> 5661 permits the NFSv4 READ operation to return
>>>> NFS4ERR_INVAL.
>>>> 
>>>> Was the client side fix for this issue rejected?
>>> 
>>> Yeah, see Trond's response in
>>> 
>>>  
>>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/
>>> 
>>> So it is both a client and server bugs?
>> 
>> Splitting hairs, but yes there are issues on both sides
>> IMO. Bad behavior due to bugs on both sides is actually
>> not uncommon.
>> 
>> Trond is correct that the server is not dealing totally
>> correctly with the range of values in a READ request.
>> 
>> However, as I pointed out, the specification permits NFS
>> servers to return NFS[34]ERR_INVAL on READ. And in fact,
>> there is already code in the NFSv4 READ path that returns
>> INVAL, for example:
>> 
>>  785         if (read->rd_offset >= OFFSET_MAX)
>>  786                 return nfserr_inval;
>> 
>> I'm not sure the specifications describe precisely when
>> the server /must/ return INVAL, but the client needs to
>> be prepared to handle it reasonably. If INVAL results in
>> an infinite loop, then that's a client bug.
>> 
>> IMO changing the alignment for that case is a band-aid.
>> The underlying looping behavior is what is the root
>> problem. (So... I agree with Trond's NACK, but for
>> different reasons).
> 
> If I'm reading Dan's test case correctly, the client is trying to read
> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
> That means the end offset for that read is 0x7fffffffffffff000 + 0x1000
> - 1 = 0x7fffffffffffffff.
> 
> IOW: as far as the server is concerned, there is no loff_t overflow on
> either the start or end offset and so there is no reason for it to
> return NFS4ERR_INVAL.

Yep, I agree there's server misbehavior, and I think Dan's
server fix is on point.

I would like to know why the client is looping, though. INVAL
is a valid response the Linux server already uses in other
cases and by itself should not trigger a READ retry.

After checking the relevant XDR definitions, an NFS READ error
response doesn't include the EOF flag, so I'm a little mystified
why the client would need to retry after receiving INVAL.


--
Chuck Lever

Trond Myklebust Jan. 22, 2022, 8:30 p.m. UTC | #7

On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote:
> 
> 
> > On Jan 22, 2022, at 1:27 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
> > > 
> > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com>
> > > > wrote:
> > > > 
> > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III
> > > > wrote:
> > > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni
> > > > > > <dan.aloni@vastdata.com>
> > > > > > wrote:
> > > > > > 
> > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned
> > > > > > buffers to the
> > > > > > RPC read layers"), a read of 0xfff is aligned up to server
> > > > > > rsize of
> > > > > > 0x1000.
> > > > > > 
> > > > > > As a result, in a test where the server has a file of size
> > > > > > 0x7fffffffffffffff, and the client tries to read from the
> > > > > > offset
> > > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the
> > > > > > server and it
> > > > > > returns an NFS code of EINVAL to the client. The client as
> > > > > > a
> > > > > > result
> > > > > > indefinitely retries the request.
> > > > > 
> > > > > An infinite loop in this case is a client bug.
> > > > > 
> > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
> > > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
> > > > > 5661 permits the NFSv4 READ operation to return
> > > > > NFS4ERR_INVAL.
> > > > > 
> > > > > Was the client side fix for this issue rejected?
> > > > 
> > > > Yeah, see Trond's response in
> > > > 
> > > >  
> > > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/
> > > > 
> > > > So it is both a client and server bugs?
> > > 
> > > Splitting hairs, but yes there are issues on both sides
> > > IMO. Bad behavior due to bugs on both sides is actually
> > > not uncommon.
> > > 
> > > Trond is correct that the server is not dealing totally
> > > correctly with the range of values in a READ request.
> > > 
> > > However, as I pointed out, the specification permits NFS
> > > servers to return NFS[34]ERR_INVAL on READ. And in fact,
> > > there is already code in the NFSv4 READ path that returns
> > > INVAL, for example:
> > > 
> > >  785         if (read->rd_offset >= OFFSET_MAX)
> > >  786                 return nfserr_inval;
> > > 
> > > I'm not sure the specifications describe precisely when
> > > the server /must/ return INVAL, but the client needs to
> > > be prepared to handle it reasonably. If INVAL results in
> > > an infinite loop, then that's a client bug.
> > > 
> > > IMO changing the alignment for that case is a band-aid.
> > > The underlying looping behavior is what is the root
> > > problem. (So... I agree with Trond's NACK, but for
> > > different reasons).
> > 
> > If I'm reading Dan's test case correctly, the client is trying to
> > read
> > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
> > That means the end offset for that read is 0x7fffffffffffff000 +
> > 0x1000
> > - 1 = 0x7fffffffffffffff.
> > 
> > IOW: as far as the server is concerned, there is no loff_t overflow
> > on
> > either the start or end offset and so there is no reason for it to
> > return NFS4ERR_INVAL.
> 
> Yep, I agree there's server misbehavior, and I think Dan's
> server fix is on point.
> 
> I would like to know why the client is looping, though. INVAL
> is a valid response the Linux server already uses in other
> cases and by itself should not trigger a READ retry.
> 
> After checking the relevant XDR definitions, an NFS READ error
> response doesn't include the EOF flag, so I'm a little mystified
> why the client would need to retry after receiving INVAL.

While we could certainly add that error to nfs_error_is_fatal(), the
question is why the client should need to handle NFS4ERR_INVAL if it is
sending valid arguments?

15.1.1.4.  NFS4ERR_INVAL (Error Code 22)

   The arguments for this operation are not valid for some reason, even
   though they do match those specified in the XDR definition for the
   request.


Sure... What does that mean, and what do I do?

Chuck Lever III Jan. 22, 2022, 8:33 p.m. UTC | #8

> On Jan 22, 2022, at 2:01 PM, Dan Aloni <dan.aloni@vastdata.com> wrote:
> 
> On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote:
>>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>>>>> index 738d564ca4ce..754f4e9ff4a2 100644
>>>>> --- a/fs/nfsd/vfs.c
>>>>> +++ b/fs/nfsd/vfs.c
>>>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>>>>> 	__be32 err;
>>>>> 
>>>>> 	trace_nfsd_read_start(rqstp, fhp, offset, *count);
>>>>> +
>>>>> +	if (unlikely(offset + *count > NFS_OFFSET_MAX))
>>>>> +		*count = NFS_OFFSET_MAX - offset;
>>>> 
>>>> Can @offset ever be larger than NFS_OFFSET_MAX?
>>> 
>>> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`.
>>> (should it have been `>` rather?).
>> 
>> Don't think so, a zero-byte READ should be valid.
> 
> Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check
> resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes
> as well?

Geez, that's whole 'nother can of worms.

RFC 1813 section 3.3.21 does not list NFS3ERR_INVAL, and does
not discuss what to do if the commit argument values are
outside the range which the server or local filesystem
supports.

RFC 8881 section 15.2 (Table 6) does not list NFS4ERR_INVAL
as a valid status code for the COMMIT operation, and likewise
section 18.3 does not discuss how the server should respond
when the commit argument values are invalid.

Aside from nfsd3_proc_commit, nfsd_commit() is used by NFSv3
and NFSv4, and it has:

1129         __be32                  err = nfserr_inval;
1130 
1131         if (offset < 0)
1132                 goto out;
1133         if (count != 0) {
1134                 end = offset + (loff_t)count - 1;
1135                 if (end < offset)
1136                         goto out;
1137         }
1138 

which I think is going to be problematic. But no-one has
complained, so it's safe to defer changes here to another
patch, IMO.


>> However it's rather interesting that it does not use
>> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses
>> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX?
> 
> NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was
> moved to a header file, which explains the comment on top of it,
> outdated for quite awhile:
> 
>    /*
>     * This is really a general kernel constant, but since nothing like
>     * this is defined in the kernel headers, I have to do it here.
>     */
>    #define NFS_OFFSET_MAX		((__s64)((~(__u64)0) >> 1))
> 
> And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems
> `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem
> inter-changeable to me.

For now, add OFFSET_MAX in the NFSv4 paths, and use NFS_OFFSET_MAX
in the NFSv3 paths, and at some point someone can propose a clean
up to replace NFS_OFFSET_MAX with OFFSET_MAX.


--
Chuck Lever

Chuck Lever III Jan. 23, 2022, 5:35 p.m. UTC | #9

> On Jan 22, 2022, at 3:30 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote:
>> 
>>> On Jan 22, 2022, at 1:27 PM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote:
>>>> 
>>>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com>
>>>>> wrote:
>>>>> 
>>>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III
>>>>> wrote:
>>>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni
>>>>>>> <dan.aloni@vastdata.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned
>>>>>>> buffers to the
>>>>>>> RPC read layers"), a read of 0xfff is aligned up to server
>>>>>>> rsize of
>>>>>>> 0x1000.
>>>>>>> 
>>>>>>> As a result, in a test where the server has a file of size
>>>>>>> 0x7fffffffffffffff, and the client tries to read from the
>>>>>>> offset
>>>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the
>>>>>>> server and it
>>>>>>> returns an NFS code of EINVAL to the client. The client as
>>>>>>> a
>>>>>>> result
>>>>>>> indefinitely retries the request.
>>>>>> 
>>>>>> An infinite loop in this case is a client bug.
>>>>>> 
>>>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure
>>>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC
>>>>>> 5661 permits the NFSv4 READ operation to return
>>>>>> NFS4ERR_INVAL.
>>>>>> 
>>>>>> Was the client side fix for this issue rejected?
>>>>> 
>>>>> Yeah, see Trond's response in
>>>>> 
>>>>>  
>>>>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/
>>>>> 
>>>>> So it is both a client and server bugs?
>>>> 
>>>> Splitting hairs, but yes there are issues on both sides
>>>> IMO. Bad behavior due to bugs on both sides is actually
>>>> not uncommon.
>>>> 
>>>> Trond is correct that the server is not dealing totally
>>>> correctly with the range of values in a READ request.
>>>> 
>>>> However, as I pointed out, the specification permits NFS
>>>> servers to return NFS[34]ERR_INVAL on READ. And in fact,
>>>> there is already code in the NFSv4 READ path that returns
>>>> INVAL, for example:
>>>> 
>>>>  785         if (read->rd_offset >= OFFSET_MAX)
>>>>  786                 return nfserr_inval;
>>>> 
>>>> I'm not sure the specifications describe precisely when
>>>> the server /must/ return INVAL, but the client needs to
>>>> be prepared to handle it reasonably. If INVAL results in
>>>> an infinite loop, then that's a client bug.
>>>> 
>>>> IMO changing the alignment for that case is a band-aid.
>>>> The underlying looping behavior is what is the root
>>>> problem. (So... I agree with Trond's NACK, but for
>>>> different reasons).
>>> 
>>> If I'm reading Dan's test case correctly, the client is trying to
>>> read
>>> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000.
>>> That means the end offset for that read is 0x7fffffffffffff000 +
>>> 0x1000
>>> - 1 = 0x7fffffffffffffff.
>>> 
>>> IOW: as far as the server is concerned, there is no loff_t overflow
>>> on
>>> either the start or end offset and so there is no reason for it to
>>> return NFS4ERR_INVAL.
>> 
>> Yep, I agree there's server misbehavior, and I think Dan's
>> server fix is on point.
>> 
>> I would like to know why the client is looping, though. INVAL
>> is a valid response the Linux server already uses in other
>> cases and by itself should not trigger a READ retry.
>> 
>> After checking the relevant XDR definitions, an NFS READ error
>> response doesn't include the EOF flag, so I'm a little mystified
>> why the client would need to retry after receiving INVAL.
> 
> While we could certainly add that error to nfs_error_is_fatal(), the
> question is why the client should need to handle NFS4ERR_INVAL if it is
> sending valid arguments?

As I said:

I agree that Dan's test case is sending values in a range
that NFSD should handle without error. That does need to
be fixed.

However, there are other instances where NFSD returns INVAL
to a READ (and it has done so for a long while). Those cases
really mustn't trigger an unterminating loop, especially
since a ^C is not likely to unblock the application. That's
why I'm still concerned about behavior when a server returns
INVAL on a READ.


> 15.1.1.4.  NFS4ERR_INVAL (Error Code 22)
> 
>   The arguments for this operation are not valid for some reason, even
>   though they do match those specified in the XDR definition for the
>   request.
> 
> 
> Sure... What does that mean, and what do I do?

Let me try to paraphrase:

A. RFC 1813 and 8881 permit servers to return INVAL on READ,
   but do not specify under which conditions to use it. This
   ambiguity might be reason for a server implementation to
   avoid that status code with READ. Have you considered
   filing an errata?

B. Though the RFCs permit servers to return INVAL on READ,
   the Linux NFS client does not support it. The client is
   not spec-compliant in this regard, but that's because of
   the ambiguity described in A.

C. Therefore the Linux NFS client treats INVAL on READ as
   unexpected input.

I claim that when confronted with unexpected input (of any
form) a "good quality" client implementation should avoid
pathological behavior like unterminating loops.... That
behavior is both an attack surface and potentially a
problem if the client has to be rebooted to fully recover.

The specific behavior of returning INVAL on READ is being
addressed in the Linux server, but not root-causing and
addressing the client's response to this behavior leaves a
large set of potential issues in this same class.


--
Chuck Lever

NFSD: trim reads past NFS_OFFSET_MAX

Commit Message

Comments

Patch