diff mbox series

[11/20] lustre: ptlrpc: handle reply and resend reorder

Message ID 1633974049-26490-12-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: sync to OpenSFS Oct 11, 2021 | expand

Commit Message

James Simmons Oct. 11, 2021, 5:40 p.m. UTC
From: Alexander Boyko <alexander.boyko@hpe.com>

ptlrpc can't detect a bulk transfer timeout
if rpc and bulk are reordered on router.
We should fail a bulk for situations where bulk is not
completed (after bulk timeout LNET_EVENT_UNLINK is set).

HPE-bug-id: LUS-7445, LUS-7569
WC-bug-id: https://jira.whamcloud.com/browse/LU-12567
Lustre-commit: f7f31f8f969f410cc ("LU-12567 ptlrpc: handle reply and resend reorder")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/35571
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ptlrpc/client.c | 5 ++++-
 fs/lustre/ptlrpc/events.c | 3 +--
 2 files changed, 5 insertions(+), 3 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/ptlrpc/client.c b/fs/lustre/ptlrpc/client.c
index 83d269c..e800000 100644
--- a/fs/lustre/ptlrpc/client.c
+++ b/fs/lustre/ptlrpc/client.c
@@ -2075,7 +2075,10 @@  int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
 			 * was good after getting the REPLY for her GET or
 			 * the ACK for her PUT.
 			 */
-			DEBUG_REQ(D_ERROR, req, "bulk transfer failed");
+			DEBUG_REQ(D_ERROR, req, "bulk transfer failed %d/%d/%d",
+				  req->rq_status,
+				  req->rq_bulk->bd_nob,
+				  req->rq_bulk->bd_nob_transferred);
 			req->rq_status = -EIO;
 		}
 
diff --git a/fs/lustre/ptlrpc/events.c b/fs/lustre/ptlrpc/events.c
index c81181d..559d811 100644
--- a/fs/lustre/ptlrpc/events.c
+++ b/fs/lustre/ptlrpc/events.c
@@ -219,10 +219,9 @@  void client_bulk_callback(struct lnet_event *ev)
 		spin_lock(&req->rq_lock);
 		req->rq_net_err = 1;
 		spin_unlock(&req->rq_lock);
+		desc->bd_failure = 1;
 	}
 
-	if (ev->status != 0)
-		desc->bd_failure = 1;
 
 	/* NB don't unlock till after wakeup; desc can disappear under us
 	 * otherwise