diff mbox series

netfs: Fix collection of results during pause when collection offloaded

Message ID 3431541.1740763345@warthog.procyon.org.uk (mailing list archive)
State New
Headers show
Series netfs: Fix collection of results during pause when collection offloaded | expand

Commit Message

David Howells Feb. 28, 2025, 5:22 p.m. UTC
A netfs read request can run in one of two modes: for synchronous reads
writes, the app thread does the collection of results and for asynchronous
reads, this is offloaded to a worker thread.  This is controlled by the
NETFS_RREQ_OFFLOAD_COLLECTION flag.

Now, if a subrequest incurs an error, the NETFS_RREQ_PAUSE flag is set to
stop the issuing loop temporarily from issuing more subrequests until a
retry is successful or the request is abandoned.

When the issuing loop sees NETFS_RREQ_PAUSE, it jumps to
netfs_wait_for_pause() which will wait for the PAUSE flag to be cleared -
and whilst it is waiting, it will call out to the collector as more results
acrue...  But this is the wrong thing to do if OFFLOAD_COLLECTION is set as
we can then end up with both the app thread and the work item collecting
results simultaneously.

This manifests itself occasionally when running the generic/323 xfstest
against multichannel cifs as an oops that's a bit random but frequently
involving io_submit() (the test does lots of simultaneous async DIO reads).

Fix this by only doing the collection in netfs_wait_for_pause() if the
NETFS_RREQ_OFFLOAD_COLLECTION is not set.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Reported-by: Steve French <stfrench@microsoft.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
---
 fs/netfs/read_collect.c |   18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

Comments

Steve French March 1, 2025, 12:54 a.m. UTC | #1
Am testing this fix now (especially with multichannel scenarios which
were failing before)

On Fri, Feb 28, 2025 at 11:22 AM David Howells <dhowells@redhat.com> wrote:
>
>
> A netfs read request can run in one of two modes: for synchronous reads
> writes, the app thread does the collection of results and for asynchronous
> reads, this is offloaded to a worker thread.  This is controlled by the
> NETFS_RREQ_OFFLOAD_COLLECTION flag.
>
> Now, if a subrequest incurs an error, the NETFS_RREQ_PAUSE flag is set to
> stop the issuing loop temporarily from issuing more subrequests until a
> retry is successful or the request is abandoned.
>
> When the issuing loop sees NETFS_RREQ_PAUSE, it jumps to
> netfs_wait_for_pause() which will wait for the PAUSE flag to be cleared -
> and whilst it is waiting, it will call out to the collector as more results
> acrue...  But this is the wrong thing to do if OFFLOAD_COLLECTION is set as
> we can then end up with both the app thread and the work item collecting
> results simultaneously.
>
> This manifests itself occasionally when running the generic/323 xfstest
> against multichannel cifs as an oops that's a bit random but frequently
> involving io_submit() (the test does lots of simultaneous async DIO reads).
>
> Fix this by only doing the collection in netfs_wait_for_pause() if the
> NETFS_RREQ_OFFLOAD_COLLECTION is not set.
>
> Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
> Reported-by: Steve French <stfrench@microsoft.com>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Paulo Alcantara <pc@manguebit.com>
> cc: Jeff Layton <jlayton@kernel.org>
> cc: linux-cifs@vger.kernel.org
> cc: netfs@lists.linux.dev
> cc: linux-fsdevel@vger.kernel.org
> ---
>  fs/netfs/read_collect.c |   18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index 636cc5a98ef5..23c75755ad4e 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -682,14 +682,16 @@ void netfs_wait_for_pause(struct netfs_io_request *rreq)
>                 trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
>                 prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
>
> -               subreq = list_first_entry_or_null(&stream->subrequests,
> -                                                 struct netfs_io_subrequest, rreq_link);
> -               if (subreq &&
> -                   (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
> -                    test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
> -                       __set_current_state(TASK_RUNNING);
> -                       netfs_read_collection(rreq);
> -                       continue;
> +               if (!test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
> +                       subreq = list_first_entry_or_null(&stream->subrequests,
> +                                                         struct netfs_io_subrequest, rreq_link);
> +                       if (subreq &&
> +                           (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
> +                            test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
> +                               __set_current_state(TASK_RUNNING);
> +                               netfs_read_collection(rreq);
> +                               continue;
> +                       }
>                 }
>
>                 if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags) ||
>
>
diff mbox series

Patch

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 636cc5a98ef5..23c75755ad4e 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -682,14 +682,16 @@  void netfs_wait_for_pause(struct netfs_io_request *rreq)
 		trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue);
 		prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE);
 
-		subreq = list_first_entry_or_null(&stream->subrequests,
-						  struct netfs_io_subrequest, rreq_link);
-		if (subreq &&
-		    (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
-		     test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
-			__set_current_state(TASK_RUNNING);
-			netfs_read_collection(rreq);
-			continue;
+		if (!test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) {
+			subreq = list_first_entry_or_null(&stream->subrequests,
+							  struct netfs_io_subrequest, rreq_link);
+			if (subreq &&
+			    (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags) ||
+			     test_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags))) {
+				__set_current_state(TASK_RUNNING);
+				netfs_read_collection(rreq);
+				continue;
+			}
 		}
 
 		if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags) ||