Message ID | 952eea7e97246870f83e7a4592e9338007dfe625.1700083991.git.bcodding@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] pNFS: Fix the pnfs block driver's calculation of layoutget size | expand |
Hi Ben, On Wed, Nov 15, 2023 at 4:34 PM Benjamin Coddington <bcodding@redhat.com> wrote: > > Now that we're calculating how large a remaining IO should be based > on the current request's offset, we no longer need to track bytes_left on > each struct nfs_direct_req. Drop the field, and clean up the direct > request tracepoints. I've been having problems with xfstests generic/465 on all NFS versions after applying this patch. Looking at wireshark, the client appears to be resending the same reads over and over again. Have you seen anything like this in your testing? Thanks, Anna > > Signed-off-by: Benjamin Coddington <bcodding@redhat.com> > --- > fs/nfs/direct.c | 4 ---- > fs/nfs/internal.h | 1 - > fs/nfs/nfstrace.h | 6 ++---- > 3 files changed, 2 insertions(+), 9 deletions(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index 5918c67dae0d..7167f588b1fc 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -369,7 +369,6 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, > bytes -= req_len; > requested_bytes += req_len; > pos += req_len; > - dreq->bytes_left -= req_len; > } > nfs_direct_release_pages(pagevec, npages); > kvfree(pagevec); > @@ -441,7 +440,6 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, > goto out; > > dreq->inode = inode; > - dreq->bytes_left = dreq->max_count = count; > dreq->io_start = iocb->ki_pos; > dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); > l_ctx = nfs_get_lock_context(dreq->ctx); > @@ -874,7 +872,6 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, > bytes -= req_len; > requested_bytes += req_len; > pos += req_len; > - dreq->bytes_left -= req_len; > > if (defer) { > nfs_mark_request_commit(req, NULL, &cinfo, 0); > @@ -981,7 +978,6 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, > goto out; > > dreq->inode = inode; > - dreq->bytes_left = dreq->max_count = count; > dreq->io_start = pos; > dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); > l_ctx = nfs_get_lock_context(dreq->ctx); > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > index b1fa81c9dff6..e3722ce6722e 100644 > --- a/fs/nfs/internal.h > +++ b/fs/nfs/internal.h > @@ -936,7 +936,6 @@ struct nfs_direct_req { > loff_t io_start; /* Start offset for I/O */ > ssize_t count, /* bytes actually processed */ > max_count, /* max expected count */ > - bytes_left, /* bytes left to be sent */ > error; /* any reported error */ > struct completion completion; /* wait for i/o completion */ > > diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h > index 4e90ca531176..03cbc3893cef 100644 > --- a/fs/nfs/nfstrace.h > +++ b/fs/nfs/nfstrace.h > @@ -1539,7 +1539,6 @@ DECLARE_EVENT_CLASS(nfs_direct_req_class, > __field(u32, fhandle) > __field(loff_t, offset) > __field(ssize_t, count) > - __field(ssize_t, bytes_left) > __field(ssize_t, error) > __field(int, flags) > ), > @@ -1554,19 +1553,18 @@ DECLARE_EVENT_CLASS(nfs_direct_req_class, > __entry->fhandle = nfs_fhandle_hash(fh); > __entry->offset = dreq->io_start; > __entry->count = dreq->count; > - __entry->bytes_left = dreq->bytes_left; > __entry->error = dreq->error; > __entry->flags = dreq->flags; > ), > > TP_printk( > "error=%zd fileid=%02x:%02x:%llu fhandle=0x%08x " > - "offset=%lld count=%zd bytes_left=%zd flags=%s", > + "offset=%lld count=%zd flags=%s", > __entry->error, MAJOR(__entry->dev), > MINOR(__entry->dev), > (unsigned long long)__entry->fileid, > __entry->fhandle, __entry->offset, > - __entry->count, __entry->bytes_left, > + __entry->count, > nfs_show_direct_req_flags(__entry->flags) > ) > ); > -- > 2.41.0 >
On 16 Nov 2023, at 16:44, Anna Schumaker wrote: > Hi Ben, > > On Wed, Nov 15, 2023 at 4:34 PM Benjamin Coddington <bcodding@redhat.com> wrote: >> >> Now that we're calculating how large a remaining IO should be based >> on the current request's offset, we no longer need to track bytes_left on >> each struct nfs_direct_req. Drop the field, and clean up the direct >> request tracepoints. > > I've been having problems with xfstests generic/465 on all NFS > versions after applying this patch. Looking at wireshark, the client > appears to be resending the same reads over and over again. Have you > seen anything like this in your testing? I have generic/465 failing before and after these two patches on pNFS SCSI.. but at least it completes. If I run it without pNFS I can see the same thing.. it just sends the same reads over and over. I'll figure out why. Ben
On Thu, Nov 16, 2023 at 5:03 PM Benjamin Coddington <bcodding@redhat.com> wrote: > > On 16 Nov 2023, at 16:44, Anna Schumaker wrote: > > > Hi Ben, > > > > On Wed, Nov 15, 2023 at 4:34 PM Benjamin Coddington <bcodding@redhat.com> wrote: > >> > >> Now that we're calculating how large a remaining IO should be based > >> on the current request's offset, we no longer need to track bytes_left on > >> each struct nfs_direct_req. Drop the field, and clean up the direct > >> request tracepoints. > > > > I've been having problems with xfstests generic/465 on all NFS > > versions after applying this patch. Looking at wireshark, the client > > appears to be resending the same reads over and over again. Have you > > seen anything like this in your testing? > > I have generic/465 failing before and after these two patches on pNFS SCSI.. > but at least it completes. If I run it without pNFS I can see the same > thing.. it just sends the same reads over and over. I'll figure out why. Thanks! I have it failing normally as well, so that's expected. It's the hanging forever that's not :) Anna > > Ben >
On 16 Nov 2023, at 17:11, Anna Schumaker wrote: > On Thu, Nov 16, 2023 at 5:03 PM Benjamin Coddington <bcodding@redhat.com> wrote: >> >> On 16 Nov 2023, at 16:44, Anna Schumaker wrote: >> >>> Hi Ben, >>> >>> On Wed, Nov 15, 2023 at 4:34 PM Benjamin Coddington <bcodding@redhat.com> wrote: >>>> >>>> Now that we're calculating how large a remaining IO should be based >>>> on the current request's offset, we no longer need to track bytes_left on >>>> each struct nfs_direct_req. Drop the field, and clean up the direct >>>> request tracepoints. >>> >>> I've been having problems with xfstests generic/465 on all NFS >>> versions after applying this patch. Looking at wireshark, the client >>> appears to be resending the same reads over and over again. Have you >>> seen anything like this in your testing? >> >> I have generic/465 failing before and after these two patches on pNFS SCSI.. >> but at least it completes. If I run it without pNFS I can see the same >> thing.. it just sends the same reads over and over. I'll figure out why. > > Thanks! I have it failing normally as well, so that's expected. It's > the hanging forever that's not :) The direct read is returning 0 when there's data on the device. Oh, the problem is probably that patch drops the update of dreq->max_count, which I overlooked because of the double assignment. Shame on me. Ben
On 17 Nov 2023, at 6:21, Benjamin Coddington wrote: > On 16 Nov 2023, at 17:11, Anna Schumaker wrote: > >> On Thu, Nov 16, 2023 at 5:03 PM Benjamin Coddington <bcodding@redhat.com> wrote: >>> >>> On 16 Nov 2023, at 16:44, Anna Schumaker wrote: >>> >>>> Hi Ben, >>>> >>>> On Wed, Nov 15, 2023 at 4:34 PM Benjamin Coddington <bcodding@redhat.com> wrote: >>>>> >>>>> Now that we're calculating how large a remaining IO should be based >>>>> on the current request's offset, we no longer need to track bytes_left on >>>>> each struct nfs_direct_req. Drop the field, and clean up the direct >>>>> request tracepoints. >>>> >>>> I've been having problems with xfstests generic/465 on all NFS >>>> versions after applying this patch. Looking at wireshark, the client >>>> appears to be resending the same reads over and over again. Have you >>>> seen anything like this in your testing? >>> >>> I have generic/465 failing before and after these two patches on pNFS SCSI.. >>> but at least it completes. If I run it without pNFS I can see the same >>> thing.. it just sends the same reads over and over. I'll figure out why. >> >> Thanks! I have it failing normally as well, so that's expected. It's >> the hanging forever that's not :) > > The direct read is returning 0 when there's data on the device. > > Oh, the problem is probably that patch drops the update of dreq->max_count, > which I overlooked because of the double assignment. Shame on me. BTW - I think generic/465 makes bad assumptions about what read() should return for O_DIRECT, and that's why it fails on NFS. Basically it does a bunch of WRITEs and then READs and expects the same data coming back in the READs, but doesn't use O_SYNC. On the wire, the client is interleaving the READs and WRITEs. Ben
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 5918c67dae0d..7167f588b1fc 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -369,7 +369,6 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, bytes -= req_len; requested_bytes += req_len; pos += req_len; - dreq->bytes_left -= req_len; } nfs_direct_release_pages(pagevec, npages); kvfree(pagevec); @@ -441,7 +440,6 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, goto out; dreq->inode = inode; - dreq->bytes_left = dreq->max_count = count; dreq->io_start = iocb->ki_pos; dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); l_ctx = nfs_get_lock_context(dreq->ctx); @@ -874,7 +872,6 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, bytes -= req_len; requested_bytes += req_len; pos += req_len; - dreq->bytes_left -= req_len; if (defer) { nfs_mark_request_commit(req, NULL, &cinfo, 0); @@ -981,7 +978,6 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, goto out; dreq->inode = inode; - dreq->bytes_left = dreq->max_count = count; dreq->io_start = pos; dreq->ctx = get_nfs_open_context(nfs_file_open_context(iocb->ki_filp)); l_ctx = nfs_get_lock_context(dreq->ctx); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index b1fa81c9dff6..e3722ce6722e 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -936,7 +936,6 @@ struct nfs_direct_req { loff_t io_start; /* Start offset for I/O */ ssize_t count, /* bytes actually processed */ max_count, /* max expected count */ - bytes_left, /* bytes left to be sent */ error; /* any reported error */ struct completion completion; /* wait for i/o completion */ diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h index 4e90ca531176..03cbc3893cef 100644 --- a/fs/nfs/nfstrace.h +++ b/fs/nfs/nfstrace.h @@ -1539,7 +1539,6 @@ DECLARE_EVENT_CLASS(nfs_direct_req_class, __field(u32, fhandle) __field(loff_t, offset) __field(ssize_t, count) - __field(ssize_t, bytes_left) __field(ssize_t, error) __field(int, flags) ), @@ -1554,19 +1553,18 @@ DECLARE_EVENT_CLASS(nfs_direct_req_class, __entry->fhandle = nfs_fhandle_hash(fh); __entry->offset = dreq->io_start; __entry->count = dreq->count; - __entry->bytes_left = dreq->bytes_left; __entry->error = dreq->error; __entry->flags = dreq->flags; ), TP_printk( "error=%zd fileid=%02x:%02x:%llu fhandle=0x%08x " - "offset=%lld count=%zd bytes_left=%zd flags=%s", + "offset=%lld count=%zd flags=%s", __entry->error, MAJOR(__entry->dev), MINOR(__entry->dev), (unsigned long long)__entry->fileid, __entry->fhandle, __entry->offset, - __entry->count, __entry->bytes_left, + __entry->count, nfs_show_direct_req_flags(__entry->flags) ) );
Now that we're calculating how large a remaining IO should be based on the current request's offset, we no longer need to track bytes_left on each struct nfs_direct_req. Drop the field, and clean up the direct request tracepoints. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> --- fs/nfs/direct.c | 4 ---- fs/nfs/internal.h | 1 - fs/nfs/nfstrace.h | 6 ++---- 3 files changed, 2 insertions(+), 9 deletions(-)