diff mbox series

[13/24] lustre: ptlrpc: two replay lock threads

Message ID 1632277201-6920-14-git-send-email-jsimmons@infradead.org (mailing list archive)
State New, archived
Headers show
Series lustre: Update to OpenSFS Sept 21, 2021 | expand

Commit Message

James Simmons Sept. 22, 2021, 2:19 a.m. UTC
From: Vitaly Fertman <c17818@cray.com>

conflict to each other what leads to:
        ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )

replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.

replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().

It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.

The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.

HPE-bug-id: LUS-10147
WC-bug-id: https://jira.whamcloud.com/browse/LU-14847
Lustre-commit: d7d7eb50c8f5fd3fc ("LU-14847 ptlrpc: two replay lock threads")
Fixes: 8cc7f22847 ("lustre: ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/158931
Reviewed-on: https://review.whamcloud.com/44294
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/ldlm/ldlm_request.c   | 10 +++++++---
 fs/lustre/obdclass/obd_config.c |  4 ++--
 2 files changed, 9 insertions(+), 5 deletions(-)
diff mbox series

Patch

diff --git a/fs/lustre/ldlm/ldlm_request.c b/fs/lustre/ldlm/ldlm_request.c
index 7718e07..746c45b 100644
--- a/fs/lustre/ldlm/ldlm_request.c
+++ b/fs/lustre/ldlm/ldlm_request.c
@@ -2253,7 +2253,8 @@  int __ldlm_replay_locks(struct obd_import *imp, bool rate_limit)
 	struct ldlm_lock *lock;
 	int rc = 0;
 
-	LASSERT(atomic_read(&imp->imp_replay_inflight) == 1);
+	while (atomic_read(&imp->imp_replay_inflight) != 1)
+		cond_resched();
 
 	/* don't replay locks if import failed recovery */
 	if (imp->imp_vbr_failed)
@@ -2311,9 +2312,12 @@  int ldlm_replay_locks(struct obd_import *imp)
 	struct task_struct *task;
 	int rc = 0;
 
-	class_import_get(imp);
 	/* ensure this doesn't fall to 0 before all have been queued */
-	atomic_inc(&imp->imp_replay_inflight);
+	if (atomic_inc_return(&imp->imp_replay_inflight) > 1) {
+		atomic_dec(&imp->imp_replay_inflight);
+		return 0;
+	}
+	class_import_get(imp);
 
 	task = kthread_run(ldlm_lock_replay_thread, imp, "ldlm_lock_replay");
 	if (IS_ERR(task)) {
diff --git a/fs/lustre/obdclass/obd_config.c b/fs/lustre/obdclass/obd_config.c
index 3a0dbd5..cb70ed5 100644
--- a/fs/lustre/obdclass/obd_config.c
+++ b/fs/lustre/obdclass/obd_config.c
@@ -519,8 +519,8 @@  struct obd_device *class_incref(struct obd_device *obd,
 {
 	lu_ref_add_atomic(&obd->obd_reference, scope, source);
 	atomic_inc(&obd->obd_refcount);
-	CDEBUG(D_INFO, "incref %s (%p) now %d\n", obd->obd_name, obd,
-	       atomic_read(&obd->obd_refcount));
+	CDEBUG(D_INFO, "incref %s (%p) now %d - %s\n", obd->obd_name, obd,
+	       atomic_read(&obd->obd_refcount), scope);
 
 	return obd;
 }