From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12770995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F601C433EF for ; Sun, 6 Mar 2022 23:42:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234454AbiCFXnW (ORCPT ); Sun, 6 Mar 2022 18:43:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234453AbiCFXnV (ORCPT ); Sun, 6 Mar 2022 18:43:21 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 778EB2727 for ; Sun, 6 Mar 2022 15:42:28 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 31EB1210FC; Sun, 6 Mar 2022 23:42:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fm7lHAN5pwLLux0LIl0j4ave89T8JqzXvub0c5afKiY=; b=g8xJfhwFNPfWpKROwxyaXSbghXEHULxF6aL34Ffu4EQt6tbaBlhTyBvWKszSJcKLXxf6Xt QLhCbf0u/cd/1a10tPtyIRY11xm6iFOwjJjfOrrW8HrqrFAXuR7KDTW4kWC3EZ8LCq1yf0 WBUDWwemyT3i7QQy61F9m4jewsBGdQE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610147; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fm7lHAN5pwLLux0LIl0j4ave89T8JqzXvub0c5afKiY=; b=Bv7iGK0SYaZG3UKn5GMiJIJd2kLeQc1QIn3YEM8gIg094L3JG8NOD5FxdMNKCH3Rs9koQS paigHbcrs3ClpRAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id CDEFA134CD; Sun, 6 Mar 2022 23:42:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id rFDVIeFGJWIIWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:25 +0000 Subject: [PATCH 01/11] NFS: remove IS_SWAPFILE hack From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010493.31054.5415277267173006275.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org This code is pointless as IS_SWAPFILE is always defined. So remove it. Suggested-by: Mark Hemment Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown --- fs/nfs/file.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 76d76acbc594..901f316f084d 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -44,11 +44,6 @@ static const struct vm_operations_struct nfs_file_vm_ops; -/* Hack for future NFS swap support */ -#ifndef IS_SWAPFILE -# define IS_SWAPFILE(inode) (0) -#endif - int nfs_check_flags(int flags) { if ((flags & (O_APPEND | O_DIRECT)) == (O_APPEND | O_DIRECT)) From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12770996 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60666C433F5 for ; Sun, 6 Mar 2022 23:42:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234447AbiCFXna (ORCPT ); Sun, 6 Mar 2022 18:43:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234453AbiCFXn2 (ORCPT ); Sun, 6 Mar 2022 18:43:28 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5A586264 for ; Sun, 6 Mar 2022 15:42:34 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 081C2210FC; Sun, 6 Mar 2022 23:42:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610153; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ViWGm/depDGf235ykZcrvlwHPhzYZ1IOzNywgeuJlCs=; b=ULEa2lInaiA8E6FHFh/s+puLhZZMrIN0tJERmyPKYTRET5YsY1xbbYrlmTMpfDFct0KKsq 5Zehag3CjmDpbXiFpTU+5YpusfDqLy9x0Lwewx6EKT/uMuOtAeoo3eG+2/TdXTn8vkucvf c9xtpcNEfUAH6Kb4Id169AIhWQhdF/s= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610153; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ViWGm/depDGf235ykZcrvlwHPhzYZ1IOzNywgeuJlCs=; b=Lj/BMr3rs42wXMyn+3IzbHX6+uax6bjtoTq2FaCHoG2/1ER0JwI+cDNMKFK0U4pRSUtrVq /Cl6QP63jlWbuSCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id ECD9E134CD; Sun, 6 Mar 2022 23:42:31 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id DgN0KudGJWIQWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:31 +0000 Subject: [PATCH 02/11] SUNRPC/call_alloc: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010494.31054.3655053845587236983.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. rpc_malloc() can block, and this might cause deadlocks. So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if blocking is acceptable. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 4 +++- net/sunrpc/xprtrdma/transport.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index e2c835482791..d5b6e897f5a5 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task) struct rpc_buffer *buf; gfp_t gfp = GFP_NOFS; + if (RPC_IS_ASYNC(task)) + gfp = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 42e375dbdadb..5714bf880e95 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -570,8 +570,10 @@ xprt_rdma_allocate(struct rpc_task *task) gfp_t flags; flags = RPCRDMA_DEF_GFP; + if (RPC_IS_ASYNC(task)) + flags = GFP_NOWAIT | __GFP_NOWARN; if (RPC_IS_SWAPPER(task)) - flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN; + flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12770997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F655C433EF for ; Sun, 6 Mar 2022 23:42:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234453AbiCFXne (ORCPT ); Sun, 6 Mar 2022 18:43:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXnd (ORCPT ); Sun, 6 Mar 2022 18:43:33 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E652286DA for ; Sun, 6 Mar 2022 15:42:38 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8AF801F38E; Sun, 6 Mar 2022 23:42:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610157; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfq4C5n0j+9ECeq3Bau2+AONhc0gqMkxIyut8YkPtic=; b=yTFwVs/KkS0Fsh8CBkYJRgMHAgUzTy7YrXIAbJdVXclA+XNOzg+rkTcA4mURoXtNX6hHn6 xTBdfexDU3dZn+R4WMriIRW0YBC7D9XhFHzcRCYShkByUQg89itj2eLMp/P1IV4d99rwz/ GydzKyED199WlsArRqffmBLgTZ3Knek= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610157; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Cfq4C5n0j+9ECeq3Bau2+AONhc0gqMkxIyut8YkPtic=; b=QsFdsEUxy8+TZVpBelK3g50cwRqLf//W6eOX3vevB5USPH4MbUhfx/qUkUE8KlDRRlgHoU ldeazv6Q1XUUH1AA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6B80A134CD; Sun, 6 Mar 2022 23:42:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id H8DZCexGJWIXWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:36 +0000 Subject: [PATCH 03/11] SUNRPC/auth: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010495.31054.9419024462478476513.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown --- include/linux/sunrpc/auth.h | 1 + net/sunrpc/auth.c | 6 +++++- net/sunrpc/auth_gss/auth_gss.c | 6 +++++- net/sunrpc/auth_unix.c | 10 ++++++++-- net/sunrpc/clnt.c | 3 +++ 5 files changed, 22 insertions(+), 4 deletions(-) diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index a9f0d17fdb0d..6bfa19f9fa6a 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; + if (RPC_IS_ASYNC(task)) + lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW; + lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..df72d6301f78 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS); + gfp_t gfp = GFP_NOFS; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + return rpcauth_lookup_credcache(auth, acred, flags, gfp); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index e7df1f782b2e..e5819265dd1b 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth) static struct rpc_cred * unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS); - + gfp_t gfp = GFP_NOFS; + struct rpc_cred *ret; + + if (flags & RPCAUTH_LOOKUP_ASYNC) + gfp = GFP_NOWAIT | __GFP_NOWARN; + ret = mempool_alloc(unix_pool, gfp); + if (!ret) + return ERR_PTR(-ENOMEM); rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index c83fe618767c..d1fb7c0c7685 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task) task->tk_cred_retry--; trace_rpc_retry_refresh_status(task); return; + case -ENOMEM: + rpc_delay(task, HZ >> 4); + return; } trace_rpc_refresh_status(task); rpc_call_rpcerror(task, status); From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12770998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D567CC433F5 for ; Sun, 6 Mar 2022 23:42:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234456AbiCFXno (ORCPT ); Sun, 6 Mar 2022 18:43:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXnn (ORCPT ); Sun, 6 Mar 2022 18:43:43 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 243E53C484 for ; Sun, 6 Mar 2022 15:42:50 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D7B761F38E; Sun, 6 Mar 2022 23:42:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610168; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WEfcbq1/ee9eOf+W9MRAhDMLLa4E0bAnyxCt3gnhbfg=; b=lnQ1/++XaR21/kGuzD76VNliRPQaTxjFYf/75Kx7IDdVlXAUEgC4a68wvnLxNyhZXdn+o+ mpyfLDU4ew0VMTKJoy2X6uZTsuopTZVyjI4B6ucHFoS2R+U4JhBWgeW4xy3SNoS+ZZmMut sJ1DsVnOp3FK5um1LuVb/oShsK1hJb4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610168; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WEfcbq1/ee9eOf+W9MRAhDMLLa4E0bAnyxCt3gnhbfg=; b=eZJCcONlhADOk+mgjSR2ycqQQ72QHL3h/muDiFqD+V2XiS0XM4O5ODXGmvCmwwURNuA5bO Exb1X2wFzRtQKpBA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C8527134CD; Sun, 6 Mar 2022 23:42:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gKCGIfdGJWIjWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:47 +0000 Subject: [PATCH 04/11] SUNRPC/xprt: async tasks mustn't block waiting for memory From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010495.31054.15493859799937151207.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. xprt_dynamic_alloc_slot can block indefinitely. This can tie up all workqueue threads and NFS can deadlock. So when called from a workqueue, set __GFP_NORETRY. The rdma alloc_slot already does not block. However it sets the error to -EAGAIN suggesting this will trigger a sleep. It does not. As we can see in call_reserveresult(), only -ENOMEM causes a sleep. -EAGAIN causes immediate retry. Signed-off-by: NeilBrown --- net/sunrpc/xprt.c | 5 ++++- net/sunrpc/xprtrdma/transport.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a02de2bddb28..dbffb5147f16 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt) { struct rpc_rqst *req = ERR_PTR(-EAGAIN); + gfp_t gfp_mask = GFP_NOFS; if (xprt->num_reqs >= xprt->max_reqs) goto out; ++xprt->num_reqs; spin_unlock(&xprt->reserve_lock); - req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS); + if (current->flags & PF_WQ_WORKER) + gfp_mask |= __GFP_NORETRY | __GFP_NOWARN; + req = kzalloc(sizeof(*req), gfp_mask); spin_lock(&xprt->reserve_lock); if (req != NULL) goto out; diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 5714bf880e95..923e4b512ee9 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -517,7 +517,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) return; out_sleep: - task->tk_status = -EAGAIN; + task->tk_status = -ENOMEM; xprt_add_backlog(xprt, task); } From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12770999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 848BFC433F5 for ; Sun, 6 Mar 2022 23:42:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234457AbiCFXns (ORCPT ); Sun, 6 Mar 2022 18:43:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXns (ORCPT ); Sun, 6 Mar 2022 18:43:48 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5190D3CFFC for ; Sun, 6 Mar 2022 15:42:55 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 11D49210FC; Sun, 6 Mar 2022 23:42:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610174; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i04aAdU5/k6G66b0dL8F9Od5zxwdOZ/UHounogEVi8c=; b=Ow915jaTUIFVGccMC7Y/0YsVydtNGcfrbwfjX7mIAG/aZn6DfUqIBgB6fBe56Qig50du/H Y9R43vLa5TOKBdxIKEVEadan1qp8V2+arolrqj4LXIys8aA3T25Ye0Vlb91aLLK9ZjCPug An/Iq+EG8Fy0q07IZLmwv4uM+A28Z4g= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610174; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i04aAdU5/k6G66b0dL8F9Od5zxwdOZ/UHounogEVi8c=; b=IuuXTVosi4r+bCwqS2MGldzaT8UuodfL+GTVYQGc1hh3zTc/LMO3C+gYW/IOfuEHoHG2r3 dmTrzmA+u7LuAFAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EBABE134CD; Sun, 6 Mar 2022 23:42:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 7tkiKvxGJWIpWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:52 +0000 Subject: [PATCH 05/11] SUNRPC: remove scheduling boost for "SWAPPER" tasks. From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010496.31054.6926046073292617693.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Currently, tasks marked as "swapper" tasks get put to the front of non-priority rpc_queues, and are sorted earlier than non-swapper tasks on the transport's ->xmit_queue. This is pointless as currently *all* tasks for a mount that has swap enabled on *any* file are marked as "swapper" tasks. So the net result is that the non-priority rpc_queues are reverse-ordered (LIFO). This scheduling boost is not necessary to avoid deadlocks, and hurts fairness, so remove it. If there were a need to expedite some requests, the tk_priority mechanism is a more appropriate tool. Signed-off-by: NeilBrown --- net/sunrpc/sched.c | 7 ------- net/sunrpc/xprt.c | 11 ----------- 2 files changed, 18 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index d5b6e897f5a5..256302bf6557 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue, /* * Add new request to wait queue. - * - * Swapper tasks always get inserted at the head of the queue. - * This should avoid many nasty memory deadlocks and hopefully - * improve overall performance. - * Everyone else gets appended to the queue to ensure proper FIFO behavior. */ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, struct rpc_task *task, @@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue, INIT_LIST_HEAD(&task->u.tk_wait.timer_list); if (RPC_IS_PRIORITY(queue)) __rpc_add_wait_queue_priority(queue, task, queue_priority); - else if (RPC_IS_SWAPPER(task)) - list_add(&task->u.tk_wait.list, &queue->tasks[0]); else list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]); task->tk_waitqueue = queue; diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index dbffb5147f16..671299542cab 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task) INIT_LIST_HEAD(&req->rq_xmit2); goto out; } - } else if (RPC_IS_SWAPPER(task)) { - list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { - if (pos->rq_cong || pos->rq_bytes_sent) - continue; - if (RPC_IS_SWAPPER(pos->rq_task)) - continue; - /* Note: req is added _before_ pos */ - list_add_tail(&req->rq_xmit, &pos->rq_xmit); - INIT_LIST_HEAD(&req->rq_xmit2); - goto out; - } } else if (!req->rq_seqno) { list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) { if (pos->rq_task->tk_owner != task->tk_owner) From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50C93C433F5 for ; Sun, 6 Mar 2022 23:43:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234462AbiCFXnz (ORCPT ); Sun, 6 Mar 2022 18:43:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXny (ORCPT ); Sun, 6 Mar 2022 18:43:54 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 197173DA64 for ; Sun, 6 Mar 2022 15:43:01 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CCF7E1F38E; Sun, 6 Mar 2022 23:42:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610179; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kwNXwknotKqfVX/1yCoJLtq/tjIcG1KH1vrhe4Is+lg=; b=RFduSWjugCbfTHSe2fd6Xd/5shZITs07+Fii0a+kNkmOxIqrFNM8a/iJca0FPPbB7Raaer BtKKI0kfrxq73Dtc6AsUMpDwTbWtSx4LTJjrUFDUjEa/uhUAViVGFkr7rB4UgBtaupdNmv 4GjqE/1CkqUPAW/3jU41XFbXdwJe7WU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610179; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kwNXwknotKqfVX/1yCoJLtq/tjIcG1KH1vrhe4Is+lg=; b=weOo9T6ON3PF/UxDdQdB/+oGeIzyTxkTKyE5/X2w/O+0omCu/nJdWlTR+8DIsvZyzhnkax uS4FQY8DpxRcloAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 298D8134CD; Sun, 6 Mar 2022 23:42:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id BjZeNQFHJWIwWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:42:57 +0000 Subject: [PATCH 06/11] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010496.31054.12673026520424612494.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown --- fs/nfs/read.c | 4 ---- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - include/trace/events/sunrpc.h | 1 - net/sunrpc/auth.c | 2 +- 5 files changed, 1 insertion(+), 12 deletions(-) diff --git a/fs/nfs/read.c b/fs/nfs/read.c index eb00229c1a50..cd797ce3a67c 100644 --- a/fs/nfs/read.c +++ b/fs/nfs/read.c @@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr, const struct nfs_rpc_ops *rpc_ops, struct rpc_task_setup *task_setup_data, int how) { - struct inode *inode = hdr->inode; - int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0; - - task_setup_data->flags |= swap_flags; rpc_ops->read_setup(hdr, msg); trace_nfs_initiate_read(hdr); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 68f81d8d36de..582f2eedbf55 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h index 29982d60b68a..ac33892da411 100644 --- a/include/trace/events/sunrpc.h +++ b/include/trace/events/sunrpc.h @@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request, { RPC_TASK_MOVEABLE, "MOVEABLE" }, \ { RPC_TASK_NULLCREDS, "NULLCREDS" }, \ { RPC_CALL_MAJORSEEN, "MAJORSEEN" }, \ - { RPC_TASK_ROOTCREDS, "ROOTCREDS" }, \ { RPC_TASK_DYNAMIC, "DYNAMIC" }, \ { RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" }, \ { RPC_TASK_SOFT, "SOFT" }, \ diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 6bfa19f9fa6a..682fcd24bf43 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) /* If machine cred couldn't be bound, try a root cred */ if (new) ; - else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS)) + else if (cred == &machine_cred) new = rpcauth_bind_root_cred(task, lookupflags); else if (flags & RPC_TASK_NULLCREDS) new = authnull_ops.lookup_cred(NULL, NULL, 0); From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771001 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 033FEC433EF for ; Sun, 6 Mar 2022 23:43:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234458AbiCFXoA (ORCPT ); Sun, 6 Mar 2022 18:44:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234459AbiCFXoA (ORCPT ); Sun, 6 Mar 2022 18:44:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 063E83E0F3 for ; Sun, 6 Mar 2022 15:43:06 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id BA51B1F38E; Sun, 6 Mar 2022 23:43:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610184; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GGOQLgpIw2gxA5cK9Vw3SvY11RrqZK893a5MBJhzMIc=; b=ffUbLE3NzOpm9WIiG5Vnyb6nv4hw37JBP8ziGs/ueegV6jgt+ol/oug29S+8uxjCF9lqKq iEg/jjwElKZvUnb7NoTR61QfTAZ8yGS3inU4QCLZF4bG8A/eKg0Rga6uwgie5G+VGI/rAL 4k696+SAUGd6g3UvkAhTKX2dBMZ+V5o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610184; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GGOQLgpIw2gxA5cK9Vw3SvY11RrqZK893a5MBJhzMIc=; b=FAZmaNxUQfjEGI59knT8ofdlwuH2C+wGFYky0HwOFQ7TjCOYonk4aryEj8dSSyVh9uC/ce pTbG4VqVGDv+lFDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9BC97134CD; Sun, 6 Mar 2022 23:43:03 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Bli0FQdHJWI3WAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:43:03 +0000 Subject: [PATCH 07/11] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010497.31054.10842657745902318293.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org rpc tasks can be marked as RPC_TASK_SWAPPER. This causes GFP_MEMALLOC to be used for some allocations. This is needed in some cases, but not in all where it is currently provided, and in some where it isn't provided. Currently *all* tasks associated with a rpc_client on which swap is enabled get the flag and hence some GFP_MEMALLOC support. GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it. However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does need it. xdr_alloc_bvec is called while the XPRT_LOCK is held. If this blocks, then it blocks all other queued tasks. So this allocation needs GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used for any swap writes. Similarly, if the transport is not connected, that will block all requests including swap writes, so memory allocations should get GFP_MEMALLOC if swap writes are possible. So with this patch: 1/ we ONLY set RPC_TASK_SWAPPER for swap writes. 2/ __rpc_execute() sets PF_MEMALLOC while handling any task with RPC_TASK_SWAPPER set, or when handling any task that holds the XPRT_LOCKED lock on an xprt used for swap. This removes the need for the RPC_IS_SWAPPER() test in ->buf_alloc handlers. 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking any task to a swapper xprt. __rpc_execute() will clear it. 3/ PF_MEMALLOC is set for all the connect workers. Reviewed-by: Chuck Lever (for xprtrdma parts) Signed-off-by: NeilBrown --- fs/nfs/write.c | 2 ++ net/sunrpc/clnt.c | 2 -- net/sunrpc/sched.c | 20 +++++++++++++++++--- net/sunrpc/xprt.c | 3 +++ net/sunrpc/xprtrdma/transport.c | 6 ++++-- net/sunrpc/xprtsock.c | 8 ++++++++ 6 files changed, 34 insertions(+), 7 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 987a187bd39a..9f7176745fef 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1409,6 +1409,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr, { int priority = flush_task_priority(how); + if (IS_SWAPFILE(hdr->inode)) + task_setup_data->flags |= RPC_TASK_SWAPPER; task_setup_data->priority = priority; rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client); trace_nfs_initiate_write(hdr); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d1fb7c0c7685..842366a2fc57 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt) task->tk_flags |= RPC_TASK_TIMEOUT; if (clnt->cl_noretranstimeo) task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; - if (atomic_read(&clnt->cl_swapper)) - task->tk_flags |= RPC_TASK_SWAPPER; /* Add to the client's list of all tasks */ spin_lock(&clnt->cl_lock); list_add_tail(&task->tk_task, &clnt->cl_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 256302bf6557..9020cedb7c95 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata) ops->rpc_release(calldata); } +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk) +{ + if (!xprt) + return false; + if (!atomic_read(&xprt->swapper)) + return false; + return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk; +} + /* * This is the RPC `scheduler' (or rather, the finite state machine). */ @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task) struct rpc_wait_queue *queue; int task_is_async = RPC_IS_ASYNC(task); int status = 0; + unsigned long pflags = current->flags; WARN_ON_ONCE(RPC_IS_QUEUED(task)); if (RPC_IS_QUEUED(task)) @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task) } if (!do_action) break; + if (RPC_IS_SWAPPER(task) || + xprt_needs_memalloc(task->tk_xprt, task)) + current->flags |= PF_MEMALLOC; + trace_rpc_task_run_action(task, do_action); do_action(task); @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task) rpc_clear_running(task); spin_unlock(&queue->lock); if (task_is_async) - return; + goto out; /* sync task: sleep here */ trace_rpc_task_sync_sleep(task, task->tk_action); @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task) /* Release all resources associated with the task */ rpc_release_task(task); +out: + current_restore_flags(pflags, PF_MEMALLOC); } /* @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task) if (RPC_IS_ASYNC(task)) gfp = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - gfp |= __GFP_MEMALLOC; size += sizeof(struct rpc_buffer); if (size <= RPC_BUFFER_MAXSIZE) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 671299542cab..e31b09dc1811 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task) return false; } + if (atomic_read(&xprt->swapper)) + /* This will be clear in __rpc_execute */ + current->flags |= PF_MEMALLOC; return true; } diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 923e4b512ee9..6b7e10e5a141 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -235,8 +235,11 @@ xprt_rdma_connect_worker(struct work_struct *work) struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt, rx_connect_worker.work); struct rpc_xprt *xprt = &r_xprt->rx_xprt; + unsigned int pflags = current->flags; int rc; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; rc = rpcrdma_xprt_connect(r_xprt); xprt_clear_connecting(xprt); if (!rc) { @@ -250,6 +253,7 @@ xprt_rdma_connect_worker(struct work_struct *work) rpcrdma_xprt_disconnect(r_xprt); xprt_unlock_connect(xprt, r_xprt); xprt_wake_pending_tasks(xprt, rc); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -572,8 +576,6 @@ xprt_rdma_allocate(struct rpc_task *task) flags = RPCRDMA_DEF_GFP; if (RPC_IS_ASYNC(task)) flags = GFP_NOWAIT | __GFP_NOWARN; - if (RPC_IS_SWAPPER(task)) - flags |= __GFP_MEMALLOC; if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize, flags)) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 0f39e08ee580..61d3293f1d68 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2052,7 +2052,10 @@ static void xs_udp_setup_socket(struct work_struct *work) struct rpc_xprt *xprt = &transport->xprt; struct socket *sock; int status = -EIO; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_DGRAM, IPPROTO_UDP, false); @@ -2072,6 +2075,7 @@ static void xs_udp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); xprt_unlock_connect(xprt, transport); xprt_wake_pending_tasks(xprt, status); + current_restore_flags(pflags, PF_MEMALLOC); } /** @@ -2231,7 +2235,10 @@ static void xs_tcp_setup_socket(struct work_struct *work) struct socket *sock = transport->sock; struct rpc_xprt *xprt = &transport->xprt; int status; + unsigned int pflags = current->flags; + if (atomic_read(&xprt->swapper)) + current->flags |= PF_MEMALLOC; if (!sock) { sock = xs_create_sock(xprt, transport, xs_addr(xprt)->sa_family, SOCK_STREAM, @@ -2296,6 +2303,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) xprt_clear_connecting(xprt); out_unlock: xprt_unlock_connect(xprt, transport); + current_restore_flags(pflags, PF_MEMALLOC); } /** From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771002 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9084BC433F5 for ; Sun, 6 Mar 2022 23:43:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234459AbiCFXoE (ORCPT ); Sun, 6 Mar 2022 18:44:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXoD (ORCPT ); Sun, 6 Mar 2022 18:44:03 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 825E93EA9E for ; Sun, 6 Mar 2022 15:43:10 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 3C4A0210FF; Sun, 6 Mar 2022 23:43:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610189; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZmSIr8ZLVyhZTWR/059cLHJTosbTOBWKNNzA+rztwo=; b=uSVF0aURja2cE16TbI51SF5+4dMLesvJmvWFtO6HlebfGnv1CJta1f6nYWuq40DpvWgEi/ ifAaxh/Y00bxH+mIIDDbCNApajNKUcIcOtUhh3KxmITNezKsKUKiagVbMnJKynEFrFvEKS WhQbMIMU0g6TyM/1FYr4ryL+GoK4EF4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610189; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yZmSIr8ZLVyhZTWR/059cLHJTosbTOBWKNNzA+rztwo=; b=uxXs1kn1oNXShqgVDojgPBuG3XapoZYGSNfZH97D/3QMwcH30OxKN8qEAWFttXQKbdPdOv tTBFh0ChN17c1GAg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 2CCC6134CD; Sun, 6 Mar 2022 23:43:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Z6gpNwtHJWI9WAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:43:07 +0000 Subject: [PATCH 08/11] NFSv4: keep state manager thread active if swap is enabled From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010497.31054.2736520975524167649.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown --- fs/nfs/file.c | 15 ++++++++++++--- fs/nfs/nfs4_fs.h | 1 + fs/nfs/nfs4proc.c | 20 ++++++++++++++++++++ fs/nfs/nfs4state.c | 39 +++++++++++++++++++++++++++++++++------ include/linux/nfs_xdr.h | 2 ++ net/sunrpc/clnt.c | 2 ++ 6 files changed, 70 insertions(+), 9 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 901f316f084d..e82f78e256e5 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -483,8 +483,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, { unsigned long blocks; long long isize; - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); - struct inode *inode = file->f_mapping->host; + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; spin_lock(&inode->i_lock); blocks = inode->i_blocks; @@ -497,14 +498,22 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file, *span = sis->pages; + + if (cl->rpc_ops->enable_swap) + cl->rpc_ops->enable_swap(inode); + return rpc_clnt_swap_activate(clnt); } static void nfs_swap_deactivate(struct file *file) { - struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host); + struct inode *inode = file_inode(file); + struct rpc_clnt *clnt = NFS_CLIENT(inode); + struct nfs_client *cl = NFS_SERVER(inode)->nfs_client; rpc_clnt_swap_deactivate(clnt); + if (cl->rpc_ops->disable_swap) + cl->rpc_ops->disable_swap(file_inode(file)); } const struct address_space_operations nfs_file_aops = { diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h index 84f39b6f1b1e..79df6e83881b 100644 --- a/fs/nfs/nfs4_fs.h +++ b/fs/nfs/nfs4_fs.h @@ -42,6 +42,7 @@ enum nfs4_client_state { NFS4CLNT_LEASE_MOVED, NFS4CLNT_DELEGATION_EXPIRED, NFS4CLNT_RUN_MANAGER, + NFS4CLNT_MANAGER_AVAILABLE, NFS4CLNT_RECALL_RUNNING, NFS4CLNT_RECALL_ANY_LAYOUT_READ, NFS4CLNT_RECALL_ANY_LAYOUT_RW, diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 0e0db6c27619..32c8e9951ce1 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -10461,6 +10461,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size) return error + error2 + error3; } +static void nfs4_enable_swap(struct inode *inode) +{ + /* The state manager thread must always be running. + * It will notice the client is a swapper, and stay put. + */ + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client; + + nfs4_schedule_state_manager(clp); +} + +static void nfs4_disable_swap(struct inode *inode) +{ + /* The state manager thread will now exit once it is + * woken. + */ + wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state); +} + static const struct inode_operations nfs4_dir_inode_operations = { .create = nfs_create, .lookup = nfs_lookup, @@ -10538,6 +10556,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = { .create_server = nfs4_create_server, .clone_server = nfs_clone_server, .discover_trunking = nfs4_discover_trunking, + .enable_swap = nfs4_enable_swap, + .disable_swap = nfs4_disable_swap, }; static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = { diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index f5a62c0d999b..5dc52eefaffb 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) { struct task_struct *task; char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1]; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state); - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) + if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) { + wake_up_var(&clp->cl_state); return; + } + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); __module_get(THIS_MODULE); refcount_inc(&clp->cl_count); @@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp) printk(KERN_ERR "%s: kthread_run: %ld\n", __func__, PTR_ERR(task)); nfs4_clear_state_manager_bit(clp); + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); nfs_put_client(clp); module_put(THIS_MODULE); } @@ -2669,11 +2677,8 @@ static void nfs4_state_manager(struct nfs_client *clp) clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state); } - /* Did we race with an attempt to give us more work? */ - if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) - return; - if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0) - return; + return; + } while (refcount_read(&clp->cl_count) > 1 && !signalled()); goto out_drain; @@ -2693,9 +2698,31 @@ static void nfs4_state_manager(struct nfs_client *clp) static int nfs4_run_state_manager(void *ptr) { struct nfs_client *clp = ptr; + struct rpc_clnt *cl = clp->cl_rpcclient; + + while (cl != cl->cl_parent) + cl = cl->cl_parent; allow_signal(SIGKILL); +again: + set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state); nfs4_state_manager(clp); + if (atomic_read(&cl->cl_swapper)) { + wait_var_event_interruptible(&clp->cl_state, + test_bit(NFS4CLNT_RUN_MANAGER, + &clp->cl_state)); + if (atomic_read(&cl->cl_swapper) && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)) + goto again; + /* Either no longer a swapper, or were signalled */ + } + clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state); + + if (refcount_read(&clp->cl_count) > 1 && !signalled() && + test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) && + !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state)) + goto again; + nfs_put_client(clp); module_put_and_kthread_exit(0); return 0; diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 728cb0c1f0b6..6861ac8af808 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1798,6 +1798,8 @@ struct nfs_rpc_ops { struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); int (*discover_trunking)(struct nfs_server *, struct nfs_fh *); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 842366a2fc57..04ccf6a06ca7 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3069,6 +3069,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt, int rpc_clnt_swap_activate(struct rpc_clnt *clnt) { + while (clnt != clnt->cl_parent) + clnt = clnt->cl_parent; if (atomic_inc_return(&clnt->cl_swapper) == 1) return rpc_clnt_iterate_for_each_xprt(clnt, rpc_clnt_swap_activate_callback, NULL); From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771003 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69CFBC433EF for ; Sun, 6 Mar 2022 23:43:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234466AbiCFXoM (ORCPT ); Sun, 6 Mar 2022 18:44:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43784 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234455AbiCFXoK (ORCPT ); Sun, 6 Mar 2022 18:44:10 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6200F3FBCD for ; Sun, 6 Mar 2022 15:43:17 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 22AFF1F37D; Sun, 6 Mar 2022 23:43:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610196; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CSKA1IHXo4VvQyQ5fH9dw1FECLVlW8bwiYH7bPlSBvI=; b=yH+RBocUV7uauCqG+iXhdj/WWHeSAeRMXuX/LuRFj/O9u+lSsU5SqsWSSEh4IiBZUd1Iu+ iigUk6pbBkYsfom8YLnSpEJnHeSlegWH9Xc2BPDXL5AFf0xqLzVY2vwZaQrNdG2TlDRHfG /fTJrE/XMalLqrH17ee9/3c107Pxe44= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610196; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CSKA1IHXo4VvQyQ5fH9dw1FECLVlW8bwiYH7bPlSBvI=; b=atBWnF8qKAxnVRncDkW/qCfU8IACYl9ibeg2uTQcgYcHgvrUxQvKPlnM+Yd9C5lVZKiV1l CsDAO/0TNf7OV+DA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 144BC134CD; Sun, 6 Mar 2022 23:43:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id UgkUMRJHJWJFWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:43:14 +0000 Subject: [PATCH 09/11] NFS: swap IO handling is slightly different for O_DIRECT IO From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010498.31054.17793708550077989003.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown --- fs/nfs/direct.c | 42 ++++++++++++++++++++++++++++-------------- fs/nfs/file.c | 4 ++-- include/linux/nfs_fs.h | 8 ++++---- 3 files changed, 34 insertions(+), 20 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index eabfdab543c8..04aaf39a05cb 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -173,8 +173,8 @@ ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE); if (iov_iter_rw(iter) == READ) - return nfs_file_direct_read(iocb, iter); - return nfs_file_direct_write(iocb, iter); + return nfs_file_direct_read(iocb, iter, true); + return nfs_file_direct_write(iocb, iter, true); } static void nfs_direct_release_pages(struct page **pages, unsigned int npages) @@ -425,6 +425,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_read - file direct read operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers into which to read data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct reads instead of calling * generic_file_aio_read() in order to avoid gfar's check to see if @@ -440,7 +441,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq, * client must read the updated atime from the server back into its * cache. */ -ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { struct file *file = iocb->ki_filp; struct address_space *mapping = file->f_mapping; @@ -482,12 +484,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter) if (iter_is_iovec(iter)) dreq->flags = NFS_ODIRECT_SHOULD_DIRTY; - nfs_start_io_direct(inode); + if (!swap) + nfs_start_io_direct(inode); NFS_I(inode)->read_io += count; requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos); - nfs_end_io_direct(inode); + if (!swap) + nfs_end_io_direct(inode); if (requested > 0) { result = nfs_direct_wait(dreq); @@ -876,6 +880,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * nfs_file_direct_write - file direct write operation for NFS files * @iocb: target I/O control block * @iter: vector of user buffers from which to write data + * @swap: flag indicating this is swap IO, not O_DIRECT IO * * We use this function for direct writes instead of calling * generic_file_aio_write() in order to avoid taking the inode @@ -892,7 +897,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, * Note that O_APPEND is not supported for NFS direct writes, as there * is no atomic O_APPEND write facility in the NFS protocol. */ -ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) +ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, + bool swap) { ssize_t result, requested; size_t count; @@ -906,7 +912,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n", file, iov_iter_count(iter), (long long) iocb->ki_pos); - result = generic_write_checks(iocb, iter); + if (swap) + /* bypass generic checks */ + result = iov_iter_count(iter); + else + result = generic_write_checks(iocb, iter); if (result <= 0) return result; count = result; @@ -937,16 +947,20 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter) dreq->iocb = iocb; pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); - nfs_start_io_direct(inode); + if (swap) { + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + } else { + nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); - if (mapping->nrpages) { - invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end); - } + if (mapping->nrpages) { + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end); + } - nfs_end_io_direct(inode); + nfs_end_io_direct(inode); + } if (requested > 0) { result = nfs_direct_wait(dreq); diff --git a/fs/nfs/file.c b/fs/nfs/file.c index e82f78e256e5..f2a32ef90e22 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -157,7 +157,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to) ssize_t result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_read(iocb, to); + return nfs_file_direct_read(iocb, to, false); dprintk("NFS: read(%pD2, %zu@%lu)\n", iocb->ki_filp, @@ -623,7 +623,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) return result; if (iocb->ki_flags & IOCB_DIRECT) - return nfs_file_direct_write(iocb, from); + return nfs_file_direct_write(iocb, from, false); dprintk("NFS: write(%pD2, %zu@%Ld)\n", file, iov_iter_count(from), (long long) iocb->ki_pos); diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 582f2eedbf55..8215fca12e1a 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -508,10 +508,10 @@ static inline const struct cred *nfs_file_cred(struct file *file) * linux/fs/nfs/direct.c */ extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); +ssize_t nfs_file_direct_read(struct kiocb *iocb, + struct iov_iter *iter, bool swap); +ssize_t nfs_file_direct_write(struct kiocb *iocb, + struct iov_iter *iter, bool swap); /* * linux/fs/nfs/dir.c From patchwork Sun Mar 6 23:41:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771004 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41AB1C433FE for ; Sun, 6 Mar 2022 23:43:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234465AbiCFXoR (ORCPT ); Sun, 6 Mar 2022 18:44:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234472AbiCFXoQ (ORCPT ); Sun, 6 Mar 2022 18:44:16 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DCA2D403D1 for ; Sun, 6 Mar 2022 15:43:22 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9E2101F37D; Sun, 6 Mar 2022 23:43:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610201; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P8Ex1C2tLGBY3G4iAkuioFe1tiAu7Sw2IQj0bqqMkqA=; b=I4NADb68BZnf7lUhlzMYISm227LyvRz/3ItYXuq1oatCTkc33NLqGZJ60uEB8loTchguuZ 6basXbVyL1PjbzOdIIY7FZ4Pj9xnuRhBCH+7s624uaJpYEi4byX+tDDartE+OV554VQLQe gJD1hnQQD6wHF09fO2YftsccHMK0ODM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610201; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P8Ex1C2tLGBY3G4iAkuioFe1tiAu7Sw2IQj0bqqMkqA=; b=2+1XboxeUYaPpfliJDdsioP5qzOfv7gyRcPt6JCBhvDRVIDhmWqIPp2H52nMj6DDtUFLiu GiCnSD90fbfyScCA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 91F3E134CD; Sun, 6 Mar 2022 23:43:20 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id gMI7FBhHJWJJWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:43:20 +0000 Subject: [PATCH 10/11] NFS: swap-out must always use STABLE writes. From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:44 +1100 Message-ID: <164661010499.31054.4815913983715248940.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org The commit handling code is not safe against memory-pressure deadlocks when writing to swap. In particular, nfs_commitdata_alloc() blocks indefinitely waiting for memory, and this can consume all available workqueue threads. swap-out most likely uses STABLE writes anyway as COND_STABLE indicates that a stable write should be used if the write fits in a single request, and it normally does. However if we ever swap with a small wsize, or gather unusually large numbers of pages for a single write, this might change. For safety, make it explicit in the code that direct writes used for swap must always use FLUSH_STABLE. Signed-off-by: NeilBrown --- fs/nfs/direct.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index 04aaf39a05cb..11c566d8769f 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -794,7 +794,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = { */ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, struct iov_iter *iter, - loff_t pos) + loff_t pos, int ioflags) { struct nfs_pageio_descriptor desc; struct inode *inode = dreq->inode; @@ -802,7 +802,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq, size_t requested_bytes = 0; size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE); - nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false, + nfs_pageio_init_write(&desc, inode, ioflags, false, &nfs_direct_write_completion_ops); desc.pg_dreq = dreq; get_dreq(dreq); @@ -948,11 +948,13 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter, pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode); if (swap) { - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_STABLE); } else { nfs_start_io_direct(inode); - requested = nfs_direct_write_schedule_iovec(dreq, iter, pos); + requested = nfs_direct_write_schedule_iovec(dreq, iter, pos, + FLUSH_COND_STABLE); if (mapping->nrpages) { invalidate_inode_pages2_range(mapping, From patchwork Sun Mar 6 23:41:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: NeilBrown X-Patchwork-Id: 12771005 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56089C43217 for ; Sun, 6 Mar 2022 23:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234471AbiCFXo2 (ORCPT ); Sun, 6 Mar 2022 18:44:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44892 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234481AbiCFXo1 (ORCPT ); Sun, 6 Mar 2022 18:44:27 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED2404198F for ; Sun, 6 Mar 2022 15:43:29 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7547121111; Sun, 6 Mar 2022 23:43:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1646610208; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hRVIPm23nHnrI7WtoIgjrZUWa+2YaFb1uWafdgoA9iU=; b=P1khN2D5Si+pdXiOTkY7AffOEEGHB4eCjmyc+4gPbxnhr2n6b236zSF+2fljRAZi/svIB2 113c6SzCXxgsJIPpIlj/D2/m4jdP3EglfUsJ7inBjkO2r2wBjineTR0yN90bVzLepJLefg a05OkmS9KoM5uzoaRXjcfM23PhJNCZU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1646610208; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hRVIPm23nHnrI7WtoIgjrZUWa+2YaFb1uWafdgoA9iU=; b=RyXhAuTtzDSOo5toNnzJomClbjAPpUYc/hsBYm4iI8zItfW9AZM1H5rVEj/0+Jwnjv4vKq UHRM+e8yE98MQtCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 55E69134CD; Sun, 6 Mar 2022 23:43:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id sTKUBB9HJWJRWAAAMHmgww (envelope-from ); Sun, 06 Mar 2022 23:43:27 +0000 Subject: [PATCH 11/11] SUNRPC: change locking for xs_swap_enable/disable From: NeilBrown To: Trond Myklebust , Anna Schumaker Cc: linux-nfs@vger.kernel.org Date: Mon, 07 Mar 2022 10:41:45 +1100 Message-ID: <164661010499.31054.12748930938408125464.stgit@noble.brown> In-Reply-To: <164660993703.31054.17075972410545449555.stgit@noble.brown> References: <164660993703.31054.17075972410545449555.stgit@noble.brown> User-Agent: StGit/0.23 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org It is not in general safe to wait for XPRT_LOCKED to clear. A wakeup is only sent when - connection completes - sock close completes so during normal operations, this can wait indefinitely. The event we need to protect against is ->inet being set to NULL, and that happens under the recv_mutex lock. So drop the handlign of XPRT_LOCKED and use recv_mutex instead. Signed-off-by: NeilBrown --- net/sunrpc/xprtsock.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 61d3293f1d68..7e39f87cde2d 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1936,9 +1936,9 @@ static void xs_local_connect(struct rpc_xprt *xprt, struct rpc_task *task) #if IS_ENABLED(CONFIG_SUNRPC_SWAP) /* - * Note that this should be called with XPRT_LOCKED held (or when we otherwise - * know that we have exclusive access to the socket), to guard against - * races with xs_reset_transport. + * Note that this should be called with XPRT_LOCKED held, or recv_mutex + * held, or when we otherwise know that we have exclusive access to the + * socket, to guard against races with xs_reset_transport. */ static void xs_set_memalloc(struct rpc_xprt *xprt) { @@ -1967,13 +1967,11 @@ xs_enable_swap(struct rpc_xprt *xprt) { struct sock_xprt *xs = container_of(xprt, struct sock_xprt, xprt); - if (atomic_inc_return(&xprt->swapper) != 1) - return 0; - if (wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_KILLABLE)) - return -ERESTARTSYS; - if (xs->inet) + mutex_lock(&xs->recv_mutex); + if (atomic_inc_return(&xprt->swapper) == 1 && + xs->inet) sk_set_memalloc(xs->inet); - xprt_release_xprt(xprt, NULL); + mutex_unlock(&xs->recv_mutex); return 0; } @@ -1989,13 +1987,11 @@ xs_disable_swap(struct rpc_xprt *xprt) { struct sock_xprt *xs = container_of(xprt, struct sock_xprt, xprt); - if (!atomic_dec_and_test(&xprt->swapper)) - return; - if (wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_KILLABLE)) - return; - if (xs->inet) + mutex_lock(&xs->recv_mutex); + if (atomic_dec_and_test(&xprt->swapper) && + xs->inet) sk_clear_memalloc(xs->inet); - xprt_release_xprt(xprt, NULL); + mutex_unlock(&xs->recv_mutex); } #else static void xs_set_memalloc(struct rpc_xprt *xprt)