From patchwork Tue Jun 27 18:31:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Coddington X-Patchwork-Id: 13294916 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A72BAEB64DC for ; Tue, 27 Jun 2023 18:34:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229481AbjF0SeG (ORCPT ); Tue, 27 Jun 2023 14:34:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231359AbjF0Sdz (ORCPT ); Tue, 27 Jun 2023 14:33:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC662358A for ; Tue, 27 Jun 2023 11:32:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687890714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=0EZWm8l8JyZ8/z4mgpRO80wwIsHmr8ARu+FRb8E5pRU=; b=CXHjhzOsWOkwKwLiFf7pTEHBao5ilo+a44BnUsszXMMYEQAzIseP1cDuUUvBwnME+KZwN7 48mFmFbEw8Pmyr5Wyj5Ej4t6SInDEOer+rOHyKW6EiqoeAX4Ocu+NMEa9mbw8a/ezFqYgh y/KUsdkeA32ciyzpFl2r+/UV9J6btsw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-350-dNdI6C6ZPKS23QBr6O3u9w-1; Tue, 27 Jun 2023 14:31:52 -0400 X-MC-Unique: dNdI6C6ZPKS23QBr6O3u9w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 031C23823A02; Tue, 27 Jun 2023 18:31:52 +0000 (UTC) Received: from bcodding.csb.redhat.com (unknown [10.22.50.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 437C2200A3AD; Tue, 27 Jun 2023 18:31:51 +0000 (UTC) From: Benjamin Coddington To: trond.myklebust@hammerspace.com, anna@kernel.org Cc: Olga.Kornievskaia@netapp.com, linux-nfs@vger.kernel.org Subject: [PATCH 1/2] Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return" Date: Tue, 27 Jun 2023 14:31:49 -0400 Message-Id: <5577791deaa898578c8e8f86336eaca053d9efdd.1687890438.git.bcodding@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Olga Kornievskaia reports that this patch breaks NFSv4.0 state recovery. It also introduces additional complexity in the error paths for cases not related to the original problem. Let's revert it for now, and address the original problem in another manner. This reverts commit f5ea16137a3fa2858620dc9084466491c128535f. Fixes: f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") Reported-by: Kornievskaia, Olga Signed-off-by: Benjamin Coddington --- fs/nfs/nfs4proc.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index d3665390c4cb..6bb14f6cfbc0 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7159,7 +7159,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata) { struct nfs4_lockdata *data = calldata; struct nfs4_lock_state *lsp = data->lsp; - struct nfs_server *server = NFS_SERVER(d_inode(data->ctx->dentry)); if (!nfs4_sequence_done(task, &data->res.seq_res)) return; @@ -7167,7 +7166,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata) data->rpc_status = task->tk_status; switch (task->tk_status) { case 0: - renew_lease(server, data->timestamp); + renew_lease(NFS_SERVER(d_inode(data->ctx->dentry)), + data->timestamp); if (data->arg.new_lock && !data->cancelled) { data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS); if (locks_lock_inode_wait(lsp->ls_state->inode, &data->fl) < 0) @@ -7188,8 +7188,6 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata) if (!nfs4_stateid_match(&data->arg.open_stateid, &lsp->ls_state->open_stateid)) goto out_restart; - else if (nfs4_async_handle_error(task, server, lsp->ls_state, NULL) == -EAGAIN) - goto out_restart; } else if (!nfs4_stateid_match(&data->arg.lock_stateid, &lsp->ls_stateid)) goto out_restart; From patchwork Tue Jun 27 18:31:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Coddington X-Patchwork-Id: 13294915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B5E2EB64D9 for ; Tue, 27 Jun 2023 18:34:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230364AbjF0SeF (ORCPT ); Tue, 27 Jun 2023 14:34:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230063AbjF0Sdx (ORCPT ); Tue, 27 Jun 2023 14:33:53 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2045C358B for ; Tue, 27 Jun 2023 11:32:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1687890715; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3ZIwHKoZ2Q0MBCw+53Zg48a81zOE65M97HqJdLz39Sg=; b=UuAaIf33IEKxW0nrK09p/Yfv7Fap2+FmhkpxG5O2axX5YdjXqJchPrFx9DpoMA09HCE6hY yfRf3IzxNVXYghxMcnxz+qveWCuMxJtbMB7newFHT60kJWIbWT5NEFXMNG93SbBsYYv89J e3f5rRmTBvbwJ9x4wOLdsT0UgriUEmw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-156-G5UIng78OFWZawy30wS5Yg-1; Tue, 27 Jun 2023 14:31:53 -0400 X-MC-Unique: G5UIng78OFWZawy30wS5Yg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ACF458631DB; Tue, 27 Jun 2023 18:31:52 +0000 (UTC) Received: from bcodding.csb.redhat.com (unknown [10.22.50.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2914F200BA86; Tue, 27 Jun 2023 18:31:52 +0000 (UTC) From: Benjamin Coddington To: trond.myklebust@hammerspace.com, anna@kernel.org Cc: Olga.Kornievskaia@netapp.com, linux-nfs@vger.kernel.org Subject: [PATCH 2/2] NFSv4: Fix dropped lock for racing OPEN and delegation return Date: Tue, 27 Jun 2023 14:31:50 -0400 Message-Id: <01047e4baa85ca541a5a43f88f588b15163292dc.1687890438.git.bcodding@redhat.com> In-Reply-To: <5577791deaa898578c8e8f86336eaca053d9efdd.1687890438.git.bcodding@redhat.com> References: <5577791deaa898578c8e8f86336eaca053d9efdd.1687890438.git.bcodding@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". We want a fix that is much more focused to the original problem. Fix this issue by returning -EAGAIN from the nfs4_handle_delegation_recall_error() on OLD_STATEID, and use that as a signal for the delegation return code to retry the LOCK operation. We should at this point be able to send along the updated stateid. Signed-off-by: Benjamin Coddington --- fs/nfs/delegation.c | 4 +++- fs/nfs/nfs4proc.c | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index cf7365581031..23aeb02319a5 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -160,7 +160,9 @@ static int nfs_delegation_claim_locks(struct nfs4_state *state, const nfs4_state if (nfs_file_open_context(fl->fl_file)->state != state) continue; spin_unlock(&flctx->flc_lock); - status = nfs4_lock_delegation_recall(fl, state, stateid); + do { + status = nfs4_lock_delegation_recall(fl, state, stateid); + } while (status == -EAGAIN); if (status < 0) goto out; spin_lock(&flctx->flc_lock); diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 6bb14f6cfbc0..399db73a57f4 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2262,6 +2262,7 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct case -NFS4ERR_BAD_HIGH_SLOT: case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: case -NFS4ERR_DEADSESSION: + case -NFS4ERR_OLD_STATEID: return -EAGAIN; case -NFS4ERR_STALE_CLIENTID: case -NFS4ERR_STALE_STATEID: