Message ID | 20220121185023.260128-1-dan.aloni@vastdata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | NFSD: trim reads past NFS_OFFSET_MAX | expand |
Hi Dan! NFS server patches should be sent to me these days. $ scripts/get_maintainer.pl fs/nfsd Chuck Lever <chuck.lever@oracle.com> (supporter:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-nfs@vger.kernel.org (open list:KERNEL NFSD, SUNRPC, AND LOCKD SERVERS) linux-kernel@vger.kernel.org (open list) > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the > RPC read layers"), a read of 0xfff is aligned up to server rsize of > 0x1000. > > As a result, in a test where the server has a file of size > 0x7fffffffffffffff, and the client tries to read from the offset > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > returns an NFS code of EINVAL to the client. The client as a result > indefinitely retries the request. An infinite loop in this case is a client bug. Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC 5661 permits the NFSv4 READ operation to return NFS4ERR_INVAL. Was the client side fix for this issue rejected? > This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. It's OK for the server to return a short READ in this case, so I will indeed consider a change to make that happen. But see below for comments specific to this patch. > Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") > Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> > --- > fs/nfsd/vfs.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 738d564ca4ce..754f4e9ff4a2 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > __be32 err; > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > + > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > + *count = NFS_OFFSET_MAX - offset; Can @offset ever be larger than NFS_OFFSET_MAX? Does this check have any effect on NFSv4 READ operations? > + > err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); > if (err) > return err; > -- > 2.23.0 > -- Chuck Lever
On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: > NFS server patches should be sent to me these days. Thanks, will remember this next time. > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the > > RPC read layers"), a read of 0xfff is aligned up to server rsize of > > 0x1000. > > > > As a result, in a test where the server has a file of size > > 0x7fffffffffffffff, and the client tries to read from the offset > > 0x7ffffffffffff000, the read causes loff_t overflow in the server and it > > returns an NFS code of EINVAL to the client. The client as a result > > indefinitely retries the request. > > An infinite loop in this case is a client bug. > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > 5661 permits the NFSv4 READ operation to return > NFS4ERR_INVAL. > > Was the client side fix for this issue rejected? Yeah, see Trond's response in https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ So it is both a client and server bugs? > > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > > index 738d564ca4ce..754f4e9ff4a2 100644 > > --- a/fs/nfsd/vfs.c > > +++ b/fs/nfsd/vfs.c > > @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > > __be32 err; > > > > trace_nfsd_read_start(rqstp, fhp, offset, *count); > > + > > + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > > + *count = NFS_OFFSET_MAX - offset; > > Can @offset ever be larger than NFS_OFFSET_MAX? We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. (should it have been `>` rather?). Seems it is missing from NFSv3, should add. > Does this check have any effect on NFSv4 READ operations? Indeed it doesn't - my expanded testing shows it only fixed for NFSv3. Will send an updated patch.
> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: >>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: >>> >>> Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the >>> RPC read layers"), a read of 0xfff is aligned up to server rsize of >>> 0x1000. >>> >>> As a result, in a test where the server has a file of size >>> 0x7fffffffffffffff, and the client tries to read from the offset >>> 0x7ffffffffffff000, the read causes loff_t overflow in the server and it >>> returns an NFS code of EINVAL to the client. The client as a result >>> indefinitely retries the request. >> >> An infinite loop in this case is a client bug. >> >> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >> 5661 permits the NFSv4 READ operation to return >> NFS4ERR_INVAL. >> >> Was the client side fix for this issue rejected? > > Yeah, see Trond's response in > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > So it is both a client and server bugs? Splitting hairs, but yes there are issues on both sides IMO. Bad behavior due to bugs on both sides is actually not uncommon. Trond is correct that the server is not dealing totally correctly with the range of values in a READ request. However, as I pointed out, the specification permits NFS servers to return NFS[34]ERR_INVAL on READ. And in fact, there is already code in the NFSv4 READ path that returns INVAL, for example: 785 if (read->rd_offset >= OFFSET_MAX) 786 return nfserr_inval; I'm not sure the specifications describe precisely when the server /must/ return INVAL, but the client needs to be prepared to handle it reasonably. If INVAL results in an infinite loop, then that's a client bug. IMO changing the alignment for that case is a band-aid. The underlying looping behavior is what is the root problem. (So... I agree with Trond's NACK, but for different reasons). >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>> index 738d564ca4ce..754f4e9ff4a2 100644 >>> --- a/fs/nfsd/vfs.c >>> +++ b/fs/nfsd/vfs.c >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, >>> __be32 err; >>> >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>> + >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>> + *count = NFS_OFFSET_MAX - offset; >> >> Can @offset ever be larger than NFS_OFFSET_MAX? > > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. > (should it have been `>` rather?). Don't think so, a zero-byte READ should be valid. However it's rather interesting that it does not use NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? -- Chuck Lever
On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> > > wrote: > > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> > > > > wrote: > > > > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned > > > > buffers to the > > > > RPC read layers"), a read of 0xfff is aligned up to server > > > > rsize of > > > > 0x1000. > > > > > > > > As a result, in a test where the server has a file of size > > > > 0x7fffffffffffffff, and the client tries to read from the > > > > offset > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the > > > > server and it > > > > returns an NFS code of EINVAL to the client. The client as a > > > > result > > > > indefinitely retries the request. > > > > > > An infinite loop in this case is a client bug. > > > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > > > 5661 permits the NFSv4 READ operation to return > > > NFS4ERR_INVAL. > > > > > > Was the client side fix for this issue rejected? > > > > Yeah, see Trond's response in > > > > > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > > > So it is both a client and server bugs? > > Splitting hairs, but yes there are issues on both sides > IMO. Bad behavior due to bugs on both sides is actually > not uncommon. > > Trond is correct that the server is not dealing totally > correctly with the range of values in a READ request. > > However, as I pointed out, the specification permits NFS > servers to return NFS[34]ERR_INVAL on READ. And in fact, > there is already code in the NFSv4 READ path that returns > INVAL, for example: > > 785 if (read->rd_offset >= OFFSET_MAX) > 786 return nfserr_inval; > > I'm not sure the specifications describe precisely when > the server /must/ return INVAL, but the client needs to > be prepared to handle it reasonably. If INVAL results in > an infinite loop, then that's a client bug. > > IMO changing the alignment for that case is a band-aid. > The underlying looping behavior is what is the root > problem. (So... I agree with Trond's NACK, but for > different reasons). If I'm reading Dan's test case correctly, the client is trying to read a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. That means the end offset for that read is 0x7fffffffffffff000 + 0x1000 - 1 = 0x7fffffffffffffff. IOW: as far as the server is concerned, there is no loff_t overflow on either the start or end offset and so there is no reason for it to return NFS4ERR_INVAL.
On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote: > >>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > >>> index 738d564ca4ce..754f4e9ff4a2 100644 > >>> --- a/fs/nfsd/vfs.c > >>> +++ b/fs/nfsd/vfs.c > >>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, > >>> __be32 err; > >>> > >>> trace_nfsd_read_start(rqstp, fhp, offset, *count); > >>> + > >>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) > >>> + *count = NFS_OFFSET_MAX - offset; > >> > >> Can @offset ever be larger than NFS_OFFSET_MAX? > > > > We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. > > (should it have been `>` rather?). > > Don't think so, a zero-byte READ should be valid. Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes as well? > However it's rather interesting that it does not use > NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses > NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was moved to a header file, which explains the comment on top of it, outdated for quite awhile: /* * This is really a general kernel constant, but since nothing like * this is defined in the kernel headers, I have to do it here. */ #define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem inter-changeable to me.
> On Jan 22, 2022, at 1:27 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: >> >>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> >>> wrote: >>> >>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III wrote: >>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni <dan.aloni@vastdata.com> >>>>> wrote: >>>>> >>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned >>>>> buffers to the >>>>> RPC read layers"), a read of 0xfff is aligned up to server >>>>> rsize of >>>>> 0x1000. >>>>> >>>>> As a result, in a test where the server has a file of size >>>>> 0x7fffffffffffffff, and the client tries to read from the >>>>> offset >>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the >>>>> server and it >>>>> returns an NFS code of EINVAL to the client. The client as a >>>>> result >>>>> indefinitely retries the request. >>>> >>>> An infinite loop in this case is a client bug. >>>> >>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >>>> 5661 permits the NFSv4 READ operation to return >>>> NFS4ERR_INVAL. >>>> >>>> Was the client side fix for this issue rejected? >>> >>> Yeah, see Trond's response in >>> >>> >>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ >>> >>> So it is both a client and server bugs? >> >> Splitting hairs, but yes there are issues on both sides >> IMO. Bad behavior due to bugs on both sides is actually >> not uncommon. >> >> Trond is correct that the server is not dealing totally >> correctly with the range of values in a READ request. >> >> However, as I pointed out, the specification permits NFS >> servers to return NFS[34]ERR_INVAL on READ. And in fact, >> there is already code in the NFSv4 READ path that returns >> INVAL, for example: >> >> 785 if (read->rd_offset >= OFFSET_MAX) >> 786 return nfserr_inval; >> >> I'm not sure the specifications describe precisely when >> the server /must/ return INVAL, but the client needs to >> be prepared to handle it reasonably. If INVAL results in >> an infinite loop, then that's a client bug. >> >> IMO changing the alignment for that case is a band-aid. >> The underlying looping behavior is what is the root >> problem. (So... I agree with Trond's NACK, but for >> different reasons). > > If I'm reading Dan's test case correctly, the client is trying to read > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. > That means the end offset for that read is 0x7fffffffffffff000 + 0x1000 > - 1 = 0x7fffffffffffffff. > > IOW: as far as the server is concerned, there is no loff_t overflow on > either the start or end offset and so there is no reason for it to > return NFS4ERR_INVAL. Yep, I agree there's server misbehavior, and I think Dan's server fix is on point. I would like to know why the client is looping, though. INVAL is a valid response the Linux server already uses in other cases and by itself should not trigger a READ retry. After checking the relevant XDR definitions, an NFS READ error response doesn't include the EOF flag, so I'm a little mystified why the client would need to retry after receiving INVAL. -- Chuck Lever
On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote: > > > > On Jan 22, 2022, at 1:27 PM, Trond Myklebust > > <trondmy@hammerspace.com> wrote: > > > > On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: > > > > > > > On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> > > > > wrote: > > > > > > > > On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III > > > > wrote: > > > > > > On Jan 21, 2022, at 1:50 PM, Dan Aloni > > > > > > <dan.aloni@vastdata.com> > > > > > > wrote: > > > > > > > > > > > > Due to change 8cfb9015280d ("NFS: Always provide aligned > > > > > > buffers to the > > > > > > RPC read layers"), a read of 0xfff is aligned up to server > > > > > > rsize of > > > > > > 0x1000. > > > > > > > > > > > > As a result, in a test where the server has a file of size > > > > > > 0x7fffffffffffffff, and the client tries to read from the > > > > > > offset > > > > > > 0x7ffffffffffff000, the read causes loff_t overflow in the > > > > > > server and it > > > > > > returns an NFS code of EINVAL to the client. The client as > > > > > > a > > > > > > result > > > > > > indefinitely retries the request. > > > > > > > > > > An infinite loop in this case is a client bug. > > > > > > > > > > Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure > > > > > to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC > > > > > 5661 permits the NFSv4 READ operation to return > > > > > NFS4ERR_INVAL. > > > > > > > > > > Was the client side fix for this issue rejected? > > > > > > > > Yeah, see Trond's response in > > > > > > > > > > > > https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ > > > > > > > > So it is both a client and server bugs? > > > > > > Splitting hairs, but yes there are issues on both sides > > > IMO. Bad behavior due to bugs on both sides is actually > > > not uncommon. > > > > > > Trond is correct that the server is not dealing totally > > > correctly with the range of values in a READ request. > > > > > > However, as I pointed out, the specification permits NFS > > > servers to return NFS[34]ERR_INVAL on READ. And in fact, > > > there is already code in the NFSv4 READ path that returns > > > INVAL, for example: > > > > > > 785 if (read->rd_offset >= OFFSET_MAX) > > > 786 return nfserr_inval; > > > > > > I'm not sure the specifications describe precisely when > > > the server /must/ return INVAL, but the client needs to > > > be prepared to handle it reasonably. If INVAL results in > > > an infinite loop, then that's a client bug. > > > > > > IMO changing the alignment for that case is a band-aid. > > > The underlying looping behavior is what is the root > > > problem. (So... I agree with Trond's NACK, but for > > > different reasons). > > > > If I'm reading Dan's test case correctly, the client is trying to > > read > > a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. > > That means the end offset for that read is 0x7fffffffffffff000 + > > 0x1000 > > - 1 = 0x7fffffffffffffff. > > > > IOW: as far as the server is concerned, there is no loff_t overflow > > on > > either the start or end offset and so there is no reason for it to > > return NFS4ERR_INVAL. > > Yep, I agree there's server misbehavior, and I think Dan's > server fix is on point. > > I would like to know why the client is looping, though. INVAL > is a valid response the Linux server already uses in other > cases and by itself should not trigger a READ retry. > > After checking the relevant XDR definitions, an NFS READ error > response doesn't include the EOF flag, so I'm a little mystified > why the client would need to retry after receiving INVAL. While we could certainly add that error to nfs_error_is_fatal(), the question is why the client should need to handle NFS4ERR_INVAL if it is sending valid arguments? 15.1.1.4. NFS4ERR_INVAL (Error Code 22) The arguments for this operation are not valid for some reason, even though they do match those specified in the XDR definition for the request. Sure... What does that mean, and what do I do?
> On Jan 22, 2022, at 2:01 PM, Dan Aloni <dan.aloni@vastdata.com> wrote: > > On Sat, Jan 22, 2022 at 05:05:49PM +0000, Chuck Lever III wrote: >>>>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c >>>>> index 738d564ca4ce..754f4e9ff4a2 100644 >>>>> --- a/fs/nfsd/vfs.c >>>>> +++ b/fs/nfsd/vfs.c >>>>> @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, >>>>> __be32 err; >>>>> >>>>> trace_nfsd_read_start(rqstp, fhp, offset, *count); >>>>> + >>>>> + if (unlikely(offset + *count > NFS_OFFSET_MAX)) >>>>> + *count = NFS_OFFSET_MAX - offset; >>>> >>>> Can @offset ever be larger than NFS_OFFSET_MAX? >>> >>> We have this check in `nfsd4_read`, `(read->rd_offset >= OFFSET_MAX)`. >>> (should it have been `>` rather?). >> >> Don't think so, a zero-byte READ should be valid. > > Make sense. BTW, we have a `(argp->offset > NFS_OFFSET_MAX)` check > resulting in EINVAL under `nfsd3_proc_commit`. Does it apply to writes > as well? Geez, that's whole 'nother can of worms. RFC 1813 section 3.3.21 does not list NFS3ERR_INVAL, and does not discuss what to do if the commit argument values are outside the range which the server or local filesystem supports. RFC 8881 section 15.2 (Table 6) does not list NFS4ERR_INVAL as a valid status code for the COMMIT operation, and likewise section 18.3 does not discuss how the server should respond when the commit argument values are invalid. Aside from nfsd3_proc_commit, nfsd_commit() is used by NFSv3 and NFSv4, and it has: 1129 __be32 err = nfserr_inval; 1130 1131 if (offset < 0) 1132 goto out; 1133 if (count != 0) { 1134 end = offset + (loff_t)count - 1; 1135 if (end < offset) 1136 goto out; 1137 } 1138 which I think is going to be problematic. But no-one has complained, so it's safe to defer changes here to another patch, IMO. >> However it's rather interesting that it does not use >> NFS_OFFSET_MAX here. Does anyone know why NFSv3 uses >> NFS_OFFSET_MAX but NFSv4 and NLM use OFFSET_MAX? > > NFS_OFFSET_MAX introduced in v2.3.31, which is before `OFFSET_MAX` was > moved to a header file, which explains the comment on top of it, > outdated for quite awhile: > > /* > * This is really a general kernel constant, but since nothing like > * this is defined in the kernel headers, I have to do it here. > */ > #define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) > > And `OFFSET_MAX` in linux/fs.h was introduced in v2.3.99pre4. Seems > `OFFSET_MAX` always corresponds to 64-bit loff_t, so they seem > inter-changeable to me. For now, add OFFSET_MAX in the NFSv4 paths, and use NFS_OFFSET_MAX in the NFSv3 paths, and at some point someone can propose a clean up to replace NFS_OFFSET_MAX with OFFSET_MAX. -- Chuck Lever
> On Jan 22, 2022, at 3:30 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2022-01-22 at 20:15 +0000, Chuck Lever III wrote: >> >>> On Jan 22, 2022, at 1:27 PM, Trond Myklebust >>> <trondmy@hammerspace.com> wrote: >>> >>> On Sat, 2022-01-22 at 17:05 +0000, Chuck Lever III wrote: >>>> >>>>> On Jan 22, 2022, at 7:47 AM, Dan Aloni <dan.aloni@vastdata.com> >>>>> wrote: >>>>> >>>>> On Fri, Jan 21, 2022 at 10:32:28PM +0000, Chuck Lever III >>>>> wrote: >>>>>>> On Jan 21, 2022, at 1:50 PM, Dan Aloni >>>>>>> <dan.aloni@vastdata.com> >>>>>>> wrote: >>>>>>> >>>>>>> Due to change 8cfb9015280d ("NFS: Always provide aligned >>>>>>> buffers to the >>>>>>> RPC read layers"), a read of 0xfff is aligned up to server >>>>>>> rsize of >>>>>>> 0x1000. >>>>>>> >>>>>>> As a result, in a test where the server has a file of size >>>>>>> 0x7fffffffffffffff, and the client tries to read from the >>>>>>> offset >>>>>>> 0x7ffffffffffff000, the read causes loff_t overflow in the >>>>>>> server and it >>>>>>> returns an NFS code of EINVAL to the client. The client as >>>>>>> a >>>>>>> result >>>>>>> indefinitely retries the request. >>>>>> >>>>>> An infinite loop in this case is a client bug. >>>>>> >>>>>> Section 3.3.6 of RFC 1813 permits the NFSv3 READ procedure >>>>>> to return NFS3ERR_INVAL. The READ entry in Table 6 of RFC >>>>>> 5661 permits the NFSv4 READ operation to return >>>>>> NFS4ERR_INVAL. >>>>>> >>>>>> Was the client side fix for this issue rejected? >>>>> >>>>> Yeah, see Trond's response in >>>>> >>>>> >>>>> https://lore.kernel.org/linux-nfs/fa9974724216c43f9bdb3fd39555d398fde11e59.camel@hammerspace.com/ >>>>> >>>>> So it is both a client and server bugs? >>>> >>>> Splitting hairs, but yes there are issues on both sides >>>> IMO. Bad behavior due to bugs on both sides is actually >>>> not uncommon. >>>> >>>> Trond is correct that the server is not dealing totally >>>> correctly with the range of values in a READ request. >>>> >>>> However, as I pointed out, the specification permits NFS >>>> servers to return NFS[34]ERR_INVAL on READ. And in fact, >>>> there is already code in the NFSv4 READ path that returns >>>> INVAL, for example: >>>> >>>> 785 if (read->rd_offset >= OFFSET_MAX) >>>> 786 return nfserr_inval; >>>> >>>> I'm not sure the specifications describe precisely when >>>> the server /must/ return INVAL, but the client needs to >>>> be prepared to handle it reasonably. If INVAL results in >>>> an infinite loop, then that's a client bug. >>>> >>>> IMO changing the alignment for that case is a band-aid. >>>> The underlying looping behavior is what is the root >>>> problem. (So... I agree with Trond's NACK, but for >>>> different reasons). >>> >>> If I'm reading Dan's test case correctly, the client is trying to >>> read >>> a full page of 0x1000 bytes starting at offset 0x7fffffffffffff000. >>> That means the end offset for that read is 0x7fffffffffffff000 + >>> 0x1000 >>> - 1 = 0x7fffffffffffffff. >>> >>> IOW: as far as the server is concerned, there is no loff_t overflow >>> on >>> either the start or end offset and so there is no reason for it to >>> return NFS4ERR_INVAL. >> >> Yep, I agree there's server misbehavior, and I think Dan's >> server fix is on point. >> >> I would like to know why the client is looping, though. INVAL >> is a valid response the Linux server already uses in other >> cases and by itself should not trigger a READ retry. >> >> After checking the relevant XDR definitions, an NFS READ error >> response doesn't include the EOF flag, so I'm a little mystified >> why the client would need to retry after receiving INVAL. > > While we could certainly add that error to nfs_error_is_fatal(), the > question is why the client should need to handle NFS4ERR_INVAL if it is > sending valid arguments? As I said: I agree that Dan's test case is sending values in a range that NFSD should handle without error. That does need to be fixed. However, there are other instances where NFSD returns INVAL to a READ (and it has done so for a long while). Those cases really mustn't trigger an unterminating loop, especially since a ^C is not likely to unblock the application. That's why I'm still concerned about behavior when a server returns INVAL on a READ. > 15.1.1.4. NFS4ERR_INVAL (Error Code 22) > > The arguments for this operation are not valid for some reason, even > though they do match those specified in the XDR definition for the > request. > > > Sure... What does that mean, and what do I do? Let me try to paraphrase: A. RFC 1813 and 8881 permit servers to return INVAL on READ, but do not specify under which conditions to use it. This ambiguity might be reason for a server implementation to avoid that status code with READ. Have you considered filing an errata? B. Though the RFCs permit servers to return INVAL on READ, the Linux NFS client does not support it. The client is not spec-compliant in this regard, but that's because of the ambiguity described in A. C. Therefore the Linux NFS client treats INVAL on READ as unexpected input. I claim that when confronted with unexpected input (of any form) a "good quality" client implementation should avoid pathological behavior like unterminating loops.... That behavior is both an attack surface and potentially a problem if the client has to be rebooted to fully recover. The specific behavior of returning INVAL on READ is being addressed in the Linux server, but not root-causing and addressing the client's response to this behavior leaves a large set of potential issues in this same class. -- Chuck Lever
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 738d564ca4ce..754f4e9ff4a2 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1046,6 +1046,10 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp, __be32 err; trace_nfsd_read_start(rqstp, fhp, offset, *count); + + if (unlikely(offset + *count > NFS_OFFSET_MAX)) + *count = NFS_OFFSET_MAX - offset; + err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf); if (err) return err;
Due to change 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers"), a read of 0xfff is aligned up to server rsize of 0x1000. As a result, in a test where the server has a file of size 0x7fffffffffffffff, and the client tries to read from the offset 0x7ffffffffffff000, the read causes loff_t overflow in the server and it returns an NFS code of EINVAL to the client. The client as a result indefinitely retries the request. This fixes the issue at server side by trimming reads past NFS_OFFSET_MAX. Fixes: 8cfb9015280d ("NFS: Always provide aligned buffers to the RPC read layers") Signed-off-by: Dan Aloni <dan.aloni@vastdata.com> --- fs/nfsd/vfs.c | 4 ++++ 1 file changed, 4 insertions(+)