Message ID | 3173328.1738024385@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | netfs: Fix a number of read-retry hangs | expand |
On Mon, Jan 27, 2025 at 8:33 PM David Howells <dhowells@redhat.com> wrote: > > Hi Ihor, Marc, Steve, > > I think this patch should better fix the hangs that have been seen than Ihor's > previously tested fix (which I think isn't actually correct). > > David > --- > From: David Howells <dhowells@redhat.com> > > netfs: Fix a number of read-retry hangs > > Fix a number of hangs in the netfslib read-retry code, including: > > (1) netfs_reissue_read() doubles up the getting of references on > subrequests, thereby leaking the subrequest and causing inode eviction > to wait indefinitely. This can lead to the kernel reporting a hang in > the filesystem's evict_inode(). > > Fix this by removing the get from netfs_reissue_read() and adding one > to netfs_retry_read_subrequests() to deal with the one place that > didn't double up. > > (2) The loop in netfs_retry_read_subrequests() that retries a sequence of > failed subrequests doesn't record whether or not it retried the one > that the "subreq" pointer points to when it leaves the loop. It may > not if renegotiation/repreparation of the subrequests means that fewer > subrequests are needed to span the cumulative range of the sequence. > > Because it doesn't record this, the piece of code that discards > now-superfluous subrequests doesn't know whether it should discard the > one "subreq" points to - and so it doesn't. > > Fix this by noting whether the last subreq it examines is superfluous > and if it is, then getting rid of it and all subsequent subrequests. > > If that one one wasn't superfluous, then we would have tried to go > round the previous loop again and so there can be no further unretried > subrequests in the sequence. > > (3) netfs_retry_read_subrequests() gets yet an extra ref on any additional > subrequests it has to get because it ran out of ones it could reuse to > to renegotiation/repreparation shrinking the subrequests. > > Fix this by removing that extra ref. > > (4) In netfs_retry_reads(), it was using wait_on_bit() to wait for > NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the > sequence - but netfs_read_subreq_terminated() is now using a wait > queue on the request instead and so this wait will never finish. > > Fix this by waiting on the wait queue instead. To make this work, a > new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell > the wake-up code to wake up the wait queue rather than requeuing the > request's work item. > > Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is > no longer used. > > (5) Whilst not strictly anything to do with the hang, > netfs_retry_read_subrequests() was also doubly incrementing the > subreq_counter and re-setting the debug index, leaving a gap in the > trace. This is also fixed. > > One of these hangs was observed with 9p and with cifs. Others were forced > by manual code injection into fs/afs/file.c. Firstly, afs_prepare_read() > was created to provide an changing pattern of maximum subrequest sizes: > > static int afs_prepare_read(struct netfs_io_subrequest *subreq) > { > struct netfs_io_request *rreq = subreq->rreq; > if (!S_ISREG(subreq->rreq->inode->i_mode)) > return 0; > if (subreq->retry_count < 20) > rreq->io_streams[0].sreq_max_len = > umax(200, 2222 - subreq->retry_count * 40); > else > rreq->io_streams[0].sreq_max_len = 3333; > return 0; > } > > and pointed to by afs_req_ops. Then the following: > > struct netfs_io_subrequest *subreq = op->fetch.subreq; > if (subreq->error == 0 && > S_ISREG(subreq->rreq->inode->i_mode) && > subreq->retry_count < 20) { > subreq->transferred = subreq->already_done; > __clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags); > __set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags); > afs_fetch_data_notify(op); > return; > } > > was inserted into afs_fetch_data_success() at the beginning and struct > netfs_io_subrequest given an extra field, "already_done" that was set to > the value in "subreq->transferred" by netfs_reissue_read(). > > When reading a 4K file, the subrequests would get gradually smaller, a new > subrequest would be allocated around the 3rd retry and then eventually be > rendered superfluous when the 20th retry was hit and the limit on the first > subrequest was eased. > > Reported-by: Ihor Solodrai <ihor.solodrai@pm.me> > Closes: https://lore.kernel.org/r/a7x33d4dnMdGTtRivptq6S1i8btK70SNBP2XyX_xwDAhLvgQoPox6FVBOkifq4eBinfFfbZlIkMZBe3QarlWTxoEtHZwJCZbNKtaqrR7PvI=@pm.me/ > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Marc Dionne <marc.dionne@auristor.com> > cc: Steve French <stfrench@microsoft.com> > cc: Eric Van Hensbergen <ericvh@kernel.org> > cc: Latchesar Ionkov <lucho@ionkov.net> > cc: Dominique Martinet <asmadeus@codewreck.org> > cc: Christian Schoenebeck <linux_oss@crudebyte.com> > cc: Paulo Alcantara <pc@manguebit.com> > cc: Jeff Layton <jlayton@kernel.org> > cc: v9fs@lists.linux.dev > cc: linux-cifs@vger.kernel.org > cc: netfs@lists.linux.dev > cc: linux-fsdevel@vger.kernel.org > --- > fs/netfs/read_collect.c | 6 ++++-- > fs/netfs/read_retry.c | 40 ++++++++++++++++++++++++++++++---------- > include/linux/netfs.h | 2 +- > include/trace/events/netfs.h | 4 +++- > 4 files changed, 38 insertions(+), 14 deletions(-) > > diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c > index f65affa5a9e4..636cc5a98ef5 100644 > --- a/fs/netfs/read_collect.c > +++ b/fs/netfs/read_collect.c > @@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work) > */ > void netfs_wake_read_collector(struct netfs_io_request *rreq) > { > - if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) { > + if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) && > + !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) { > if (!work_pending(&rreq->work)) { > netfs_get_request(rreq, netfs_rreq_trace_get_work); > if (!queue_work(system_unbound_wq, &rreq->work)) > @@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq) > smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */ > > /* If we are at the head of the queue, wake up the collector. */ > - if (list_is_first(&subreq->rreq_link, &stream->subrequests)) > + if (list_is_first(&subreq->rreq_link, &stream->subrequests) || > + test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) > netfs_wake_read_collector(rreq); > > netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated); > diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c > index 2290af0d51ac..8316c4533a51 100644 > --- a/fs/netfs/read_retry.c > +++ b/fs/netfs/read_retry.c > @@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq, > { > __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); > __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); > - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); > subreq->rreq->netfs_ops->issue_read(subreq); > } > > @@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); > subreq->retry_count++; > netfs_reset_iter(subreq); > + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); > netfs_reissue_read(rreq, subreq); > } > } > @@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > struct iov_iter source; > unsigned long long start, len; > size_t part; > - bool boundary = false; > + bool boundary = false, subreq_superfluous = false; > > /* Go through the subreqs and find the next span of contiguous > * buffer that we then rejig (cifs, for example, needs the > @@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > /* Work through the sublist. */ > subreq = from; > list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) { > - if (!len) > + if (!len) { > + subreq_superfluous = true; > break; > + } > subreq->source = NETFS_DOWNLOAD_FROM_SERVER; > subreq->start = start - subreq->transferred; > subreq->len = len + subreq->transferred; > @@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > > netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); > netfs_reissue_read(rreq, subreq); > - if (subreq == to) > + if (subreq == to) { > + subreq_superfluous = false; > break; > + } > } > > /* If we managed to use fewer subreqs, we can discard the > * excess; if we used the same number, then we're done. > */ > if (!len) { > - if (subreq == to) > + if (!subreq_superfluous) > continue; > list_for_each_entry_safe_from(subreq, tmp, > &stream->subrequests, rreq_link) { > - trace_netfs_sreq(subreq, netfs_sreq_trace_discard); > + trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous); > list_del(&subreq->rreq_link); > netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done); > if (subreq == to) > @@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) > subreq->source = NETFS_DOWNLOAD_FROM_SERVER; > subreq->start = start; > subreq->len = len; > - subreq->debug_index = atomic_inc_return(&rreq->subreq_counter); > subreq->stream_nr = stream->stream_nr; > subreq->retry_count = 1; > > trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index, > refcount_read(&subreq->ref), > netfs_sreq_trace_new); > - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); > > list_add(&subreq->rreq_link, &to->rreq_link); > to = list_next_entry(to, rreq_link); > @@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq) > { > struct netfs_io_subrequest *subreq; > struct netfs_io_stream *stream = &rreq->io_streams[0]; > + DEFINE_WAIT(myself); > + > + set_bit(NETFS_RREQ_RETRYING, &rreq->flags); > > /* Wait for all outstanding I/O to quiesce before performing retries as > * we may need to renegotiate the I/O sizes. > */ > list_for_each_entry(subreq, &stream->subrequests, rreq_link) { > - wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS, > - TASK_UNINTERRUPTIBLE); > + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) > + continue; > + > + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue); > + for (;;) { > + prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE); > + > + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) > + break; > + > + trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for); > + schedule(); > + trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue); > + } > + > + finish_wait(&rreq->waitq, &myself); > } > + clear_bit(NETFS_RREQ_RETRYING, &rreq->flags); > > trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); > netfs_retry_read_subrequests(rreq); > diff --git a/include/linux/netfs.h b/include/linux/netfs.h > index 071d05d81d38..c86a11cfc4a3 100644 > --- a/include/linux/netfs.h > +++ b/include/linux/netfs.h > @@ -278,7 +278,7 @@ struct netfs_io_request { > #define NETFS_RREQ_PAUSE 11 /* Pause subrequest generation */ > #define NETFS_RREQ_USE_IO_ITER 12 /* Use ->io_iter rather than ->i_pages */ > #define NETFS_RREQ_ALL_QUEUED 13 /* All subreqs are now queued */ > -#define NETFS_RREQ_NEED_RETRY 14 /* Need to try retrying */ > +#define NETFS_RREQ_RETRYING 14 /* Set if we're in the retry path */ > #define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark > * write to cache on read */ > const struct netfs_request_ops *netfs_ops; > diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h > index 6e699cadcb29..f880835f7695 100644 > --- a/include/trace/events/netfs.h > +++ b/include/trace/events/netfs.h > @@ -99,7 +99,7 @@ > EM(netfs_sreq_trace_limited, "LIMIT") \ > EM(netfs_sreq_trace_need_clear, "N-CLR") \ > EM(netfs_sreq_trace_partial_read, "PARTR") \ > - EM(netfs_sreq_trace_need_retry, "NRTRY") \ > + EM(netfs_sreq_trace_need_retry, "ND-RT") \ > EM(netfs_sreq_trace_prepare, "PREP ") \ > EM(netfs_sreq_trace_prep_failed, "PRPFL") \ > EM(netfs_sreq_trace_progress, "PRGRS") \ > @@ -108,7 +108,9 @@ > EM(netfs_sreq_trace_short, "SHORT") \ > EM(netfs_sreq_trace_split, "SPLIT") \ > EM(netfs_sreq_trace_submit, "SUBMT") \ > + EM(netfs_sreq_trace_superfluous, "SPRFL") \ > EM(netfs_sreq_trace_terminated, "TERM ") \ > + EM(netfs_sreq_trace_wait_for, "_WAIT") \ > EM(netfs_sreq_trace_write, "WRITE") \ > EM(netfs_sreq_trace_write_skip, "SKIP ") \ > E_(netfs_sreq_trace_write_term, "WTERM") Looks good in testing, with afs. Took quite a few iterations to see evidence of a retry occurring, where the new stat was helpful. Tested-by: Marc Dionne <marc.dionne@auristor.com> Marc
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c index f65affa5a9e4..636cc5a98ef5 100644 --- a/fs/netfs/read_collect.c +++ b/fs/netfs/read_collect.c @@ -470,7 +470,8 @@ void netfs_read_collection_worker(struct work_struct *work) */ void netfs_wake_read_collector(struct netfs_io_request *rreq) { - if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) { + if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) && + !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) { if (!work_pending(&rreq->work)) { netfs_get_request(rreq, netfs_rreq_trace_get_work); if (!queue_work(system_unbound_wq, &rreq->work)) @@ -586,7 +587,8 @@ void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq) smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */ /* If we are at the head of the queue, wake up the collector. */ - if (list_is_first(&subreq->rreq_link, &stream->subrequests)) + if (list_is_first(&subreq->rreq_link, &stream->subrequests) || + test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) netfs_wake_read_collector(rreq); netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated); diff --git a/fs/netfs/read_retry.c b/fs/netfs/read_retry.c index 2290af0d51ac..8316c4533a51 100644 --- a/fs/netfs/read_retry.c +++ b/fs/netfs/read_retry.c @@ -14,7 +14,6 @@ static void netfs_reissue_read(struct netfs_io_request *rreq, { __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); subreq->rreq->netfs_ops->issue_read(subreq); } @@ -48,6 +47,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); subreq->retry_count++; netfs_reset_iter(subreq); + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); netfs_reissue_read(rreq, subreq); } } @@ -75,7 +75,7 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) struct iov_iter source; unsigned long long start, len; size_t part; - bool boundary = false; + bool boundary = false, subreq_superfluous = false; /* Go through the subreqs and find the next span of contiguous * buffer that we then rejig (cifs, for example, needs the @@ -116,8 +116,10 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) /* Work through the sublist. */ subreq = from; list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) { - if (!len) + if (!len) { + subreq_superfluous = true; break; + } subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->start = start - subreq->transferred; subreq->len = len + subreq->transferred; @@ -154,19 +156,21 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); netfs_reissue_read(rreq, subreq); - if (subreq == to) + if (subreq == to) { + subreq_superfluous = false; break; + } } /* If we managed to use fewer subreqs, we can discard the * excess; if we used the same number, then we're done. */ if (!len) { - if (subreq == to) + if (!subreq_superfluous) continue; list_for_each_entry_safe_from(subreq, tmp, &stream->subrequests, rreq_link) { - trace_netfs_sreq(subreq, netfs_sreq_trace_discard); + trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous); list_del(&subreq->rreq_link); netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done); if (subreq == to) @@ -187,14 +191,12 @@ static void netfs_retry_read_subrequests(struct netfs_io_request *rreq) subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->start = start; subreq->len = len; - subreq->debug_index = atomic_inc_return(&rreq->subreq_counter); subreq->stream_nr = stream->stream_nr; subreq->retry_count = 1; trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index, refcount_read(&subreq->ref), netfs_sreq_trace_new); - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); list_add(&subreq->rreq_link, &to->rreq_link); to = list_next_entry(to, rreq_link); @@ -256,14 +258,32 @@ void netfs_retry_reads(struct netfs_io_request *rreq) { struct netfs_io_subrequest *subreq; struct netfs_io_stream *stream = &rreq->io_streams[0]; + DEFINE_WAIT(myself); + + set_bit(NETFS_RREQ_RETRYING, &rreq->flags); /* Wait for all outstanding I/O to quiesce before performing retries as * we may need to renegotiate the I/O sizes. */ list_for_each_entry(subreq, &stream->subrequests, rreq_link) { - wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS, - TASK_UNINTERRUPTIBLE); + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) + continue; + + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue); + for (;;) { + prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE); + + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) + break; + + trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for); + schedule(); + trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue); + } + + finish_wait(&rreq->waitq, &myself); } + clear_bit(NETFS_RREQ_RETRYING, &rreq->flags); trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); netfs_retry_read_subrequests(rreq); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 071d05d81d38..c86a11cfc4a3 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -278,7 +278,7 @@ struct netfs_io_request { #define NETFS_RREQ_PAUSE 11 /* Pause subrequest generation */ #define NETFS_RREQ_USE_IO_ITER 12 /* Use ->io_iter rather than ->i_pages */ #define NETFS_RREQ_ALL_QUEUED 13 /* All subreqs are now queued */ -#define NETFS_RREQ_NEED_RETRY 14 /* Need to try retrying */ +#define NETFS_RREQ_RETRYING 14 /* Set if we're in the retry path */ #define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark * write to cache on read */ const struct netfs_request_ops *netfs_ops; diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 6e699cadcb29..f880835f7695 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -99,7 +99,7 @@ EM(netfs_sreq_trace_limited, "LIMIT") \ EM(netfs_sreq_trace_need_clear, "N-CLR") \ EM(netfs_sreq_trace_partial_read, "PARTR") \ - EM(netfs_sreq_trace_need_retry, "NRTRY") \ + EM(netfs_sreq_trace_need_retry, "ND-RT") \ EM(netfs_sreq_trace_prepare, "PREP ") \ EM(netfs_sreq_trace_prep_failed, "PRPFL") \ EM(netfs_sreq_trace_progress, "PRGRS") \ @@ -108,7 +108,9 @@ EM(netfs_sreq_trace_short, "SHORT") \ EM(netfs_sreq_trace_split, "SPLIT") \ EM(netfs_sreq_trace_submit, "SUBMT") \ + EM(netfs_sreq_trace_superfluous, "SPRFL") \ EM(netfs_sreq_trace_terminated, "TERM ") \ + EM(netfs_sreq_trace_wait_for, "_WAIT") \ EM(netfs_sreq_trace_write, "WRITE") \ EM(netfs_sreq_trace_write_skip, "SKIP ") \ E_(netfs_sreq_trace_write_term, "WTERM")