From patchwork Mon Nov 30 22:03:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941411 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9638EC63777 for ; Mon, 30 Nov 2020 22:04:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3FF402076C for ; Mon, 30 Nov 2020 22:04:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="KvuN/GqF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727368AbgK3WED (ORCPT ); Mon, 30 Nov 2020 17:04:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgK3WED (ORCPT ); Mon, 30 Nov 2020 17:04:03 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A564C0613D2 for ; Mon, 30 Nov 2020 14:03:23 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id k4so4770761qtj.10 for ; Mon, 30 Nov 2020 14:03:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UWU7p+8Cw/6dq0HabsJ29Wcb6rXVv/mFnTwRIn/q/+8=; b=KvuN/GqFtomJ8cP0lqjfCRM1v6u2EB30ziTumvuIyuaRqghVkXy3I2GsXfdrnyAvjT 0BPCedZMzHcUSHHIfKT+MIOn55NeHu64+e3GoDaQ4+ff3/6qB+digTMRlo2GMeuliA5D nsqDhdtm8+F8RDzN9oxNlLqfLBnd49E3PN0GbpG9SpbD9zOs7rc/Glo2UoAvkuKq3n58 /eLdLeuQkaQfYaImPmu3VmgWR9usNaQIh9wUPOSzIicd3AgLumsE5b2ffVGwvraD6wQb wjM3kaGZmaUPREekKTPlGRBIS1T+keEcB4tj0mi6EI1VDWC9UK6PxfC/NtfT7vQMlaT9 XhSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UWU7p+8Cw/6dq0HabsJ29Wcb6rXVv/mFnTwRIn/q/+8=; b=S4hEKCYoGYCU3jHpM9oZs/SllFW+RCjkSZZZsBRDIUgcJPfhHODk28Xzbh2t2OAooo sbxJxB03+/FZTc5puj4m5Y+dlsrpTkAgJEssq7sATb5ZsiTlkBTjy0dVlGzCwzAEcHnI A1iW8FrhyE7YOX/o1RQc09vvSzJMPoiXVoM5HfdNy/6jXH0Vkq9CN6wP2G03iOg612LH o5fTzCmyOXT4p+htUQHtb0dmG1Fbj8LxcH3Lq7UTGei523QmOtS2duW5qmDqQz0TLbWi jJYMGvDoplIPg50uILZnuOVtmeXhc12YO64GVoSgUvwrB97uq7ldTVt8YBhPQgEf6fAp YRBg== X-Gm-Message-State: AOAM533+jYVBU5jZxrgj9U+OWcWjfE1CKBTig/Na8w57GrbHE/p52pA5 gHAhxPovMsxwHycHu2f2gg== X-Google-Smtp-Source: ABdhPJxQvIGGabKAnoPtMzZxw3uoEsVNzP3mBZ+5GzFFoXyUZG5cxOH76M5hO6xcwFDjaNahy30r1Q== X-Received: by 2002:ac8:3712:: with SMTP id o18mr24327066qtb.357.1606773802408; Mon, 30 Nov 2020 14:03:22 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:21 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 1/6] nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations Date: Mon, 30 Nov 2020 17:03:14 -0500 Message-Id: <20201130220319.501064-2-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-1-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Jeff Layton With NFSv3 nfsd will always attempt to send along WCC data to the client. This generally involves saving off the in-core inode information prior to doing the operation on the given filehandle, and then issuing a vfs_getattr to it after the op. Some filesystems (particularly clustered or networked ones) have an expensive ->getattr inode operation. Atomicitiy is also often difficult or impossible to guarantee on such filesystems. For those, we're best off not trying to provide WCC information to the client at all, and to simply allow it to poll for that information as needed with a GETATTR RPC. This patch adds a new flags field to struct export_operations, and defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate that nfsd should not attempt to provide WCC info in NFSv3 replies. It also adds a blurb about the new flags field and flag to the exporting documentation. The server will also now skip collecting this information for NFSv2 as well, since that info is never used there anyway. Note that this patch does not add this flag to any filesystem export_operations structures. This was originally developed to allow reexporting nfs via nfsd. That code is not (and may never be) suitable for merging into mainline. Other filesystems may want to consider enabling this flag too. It's hard to tell however which ones have export operations to enable export via knfsd and which ones mostly rely on them for open-by-filehandle support, so I'm leaving that up to the individual maintainers to decide. I am cc'ing the relevant lists for those filesystems that I think may want to consider adding this though. Cc: HPDD-discuss@lists.01.org Cc: ceph-devel@vger.kernel.org Cc: cluster-devel@redhat.com Cc: fuse-devel@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust --- Documentation/filesystems/nfs/exporting.rst | 27 +++++++++++++++++++++ fs/nfs/export.c | 1 + fs/nfsd/nfs3xdr.c | 7 ++++-- fs/nfsd/nfsfh.c | 14 +++++++++++ fs/nfsd/nfsfh.h | 2 +- include/linux/exportfs.h | 2 ++ 6 files changed, 50 insertions(+), 3 deletions(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 33d588a01ace..cbe542ad5233 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -154,6 +154,11 @@ struct which has the following members: to find potential names, and matches inode numbers to find the correct match. + flags + Some filesystems may need to be handled differently than others. The + export_operations struct also includes a flags field that allows the + filesystem to communicate such information to nfsd. See the Export + Operations Flags section below for more explanation. A filehandle fragment consists of an array of 1 or more 4byte words, together with a one byte "type". @@ -163,3 +168,25 @@ generated by encode_fh, in which case it will have been padded with nuls. Rather, the encode_fh routine should choose a "type" which indicates the decode_fh how much of the filehandle is valid, and how it should be interpreted. + +Export Operations Flags +----------------------- +In addition to the operation vector pointers, struct export_operations also +contains a "flags" field that allows the filesystem to communicate to nfsd +that it may want to do things differently when dealing with it. The +following flags are defined: + + EXPORT_OP_NOWCC - disable NFSv3 WCC attributes on this filesystem + RFC 1813 recommends that servers always send weak cache consistency + (WCC) data to the client after each operation. The server should + atomically collect attributes about the inode, do an operation on it, + and then collect the attributes afterward. This allows the client to + skip issuing GETATTRs in some situations but means that the server + is calling vfs_getattr for almost all RPCs. On some filesystems + (particularly those that are clustered or networked) this is expensive + and atomicity is difficult to guarantee. This flag indicates to nfsd + that it should skip providing WCC attributes to the client in NFSv3 + replies when doing operations on this filesystem. Consider enabling + this on filesystems that have an expensive ->getattr inode operation, + or when atomicity between pre and post operation attribute collection + is impossible to guarantee. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 3430d6891e89..8f4c528865c5 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,4 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, + .flags = EXPORT_OP_NOWCC, }; diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c index b0a53c857706..821db21ba072 100644 --- a/fs/nfsd/nfs3xdr.c +++ b/fs/nfsd/nfs3xdr.c @@ -206,7 +206,7 @@ static __be32 * encode_post_op_attr(struct svc_rqst *rqstp, __be32 *p, struct svc_fh *fhp) { struct dentry *dentry = fhp->fh_dentry; - if (dentry && d_really_is_positive(dentry)) { + if (!fhp->fh_no_wcc && dentry && d_really_is_positive(dentry)) { __be32 err; struct kstat stat; @@ -262,7 +262,7 @@ void fill_pre_wcc(struct svc_fh *fhp) bool v4 = (fhp->fh_maxsize == NFS4_FHSIZE); __be32 err; - if (fhp->fh_pre_saved) + if (fhp->fh_no_wcc || fhp->fh_pre_saved) return; inode = d_inode(fhp->fh_dentry); err = fh_getattr(fhp, &stat); @@ -290,6 +290,9 @@ void fill_post_wcc(struct svc_fh *fhp) struct inode *inode = d_inode(fhp->fh_dentry); __be32 err; + if (fhp->fh_no_wcc) + return; + if (fhp->fh_post_saved) printk("nfsd: inode locked twice during operation.\n"); diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index c81dbbad8792..0c2ee65e46f3 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -291,6 +291,16 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) fhp->fh_dentry = dentry; fhp->fh_export = exp; + + switch (rqstp->rq_vers) { + case 3: + if (!(dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)) + break; + /* Fallthrough */ + case 2: + fhp->fh_no_wcc = true; + } + return 0; out: exp_put(exp); @@ -559,6 +569,9 @@ fh_compose(struct svc_fh *fhp, struct svc_export *exp, struct dentry *dentry, */ set_version_and_fsid_type(fhp, exp, ref_fh); + /* If we have a ref_fh, then copy the fh_no_wcc setting from it. */ + fhp->fh_no_wcc = ref_fh ? ref_fh->fh_no_wcc : false; + if (ref_fh == fhp) fh_put(ref_fh); @@ -662,6 +675,7 @@ fh_put(struct svc_fh *fhp) exp_put(exp); fhp->fh_export = NULL; } + fhp->fh_no_wcc = false; return; } diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h index 45bd776290d5..347d10aa6265 100644 --- a/fs/nfsd/nfsfh.h +++ b/fs/nfsd/nfsfh.h @@ -35,6 +35,7 @@ typedef struct svc_fh { bool fh_locked; /* inode locked by us */ bool fh_want_write; /* remount protection taken */ + bool fh_no_wcc; /* no wcc data needed */ int fh_flags; /* FH flags */ #ifdef CONFIG_NFSD_V3 bool fh_post_saved; /* post-op attrs saved */ @@ -54,7 +55,6 @@ typedef struct svc_fh { struct kstat fh_post_attr; /* full attrs after operation */ u64 fh_post_change; /* nfsv4 change; see above */ #endif /* CONFIG_NFSD_V3 */ - } svc_fh; #define NFSD4_FH_FOREIGN (1<<0) #define SET_FH_FLAG(c, f) ((c)->fh_flags |= (f)) diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 3ceb72b67a7a..e7de0103a32e 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,6 +213,8 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); +#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ + unsigned long flags; }; extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, From patchwork Mon Nov 30 22:03:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0C5FC64E90 for ; Mon, 30 Nov 2020 22:04:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74EF12076C for ; Mon, 30 Nov 2020 22:04:05 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hdkW4bQL" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727901AbgK3WEF (ORCPT ); Mon, 30 Nov 2020 17:04:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgK3WEE (ORCPT ); Mon, 30 Nov 2020 17:04:04 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74284C0613D3 for ; Mon, 30 Nov 2020 14:03:24 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id e60so9472575qtd.3 for ; Mon, 30 Nov 2020 14:03:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=SX9w/B7ZmUEuS3n1/Q9UVgaFC7LsMsGrCACvmakKut0=; b=hdkW4bQLIONH2Bp5gXSvpXQqgK2vMoanFy65AqK/wW8WOBpBoNqTUraEHYrPb3/hig oFhKYtOvXrbtQ2V+w0Hd93U7OBJJnE00U1Xhd+QXJ68McspcXK/cc7eov3xfsCSamiSp yEF4Ujo5/wLhuObD2Pswwc1bAYw7ezgJtGcDPTUAgt+okeNMf7EMz4w9m5jdHEnF/rX5 3qYrS66AVfz3WBb1UIBMgwu5uOmF58kUmSwil0SnES8u0+miN3zUB461ZquQqy/v7cy3 qt768dq9eWvRgdlHs3c2ThjIGFocFOdFCIAlHHQMNai6pabFsRbuwVtknCjxjl5wieKk aUAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SX9w/B7ZmUEuS3n1/Q9UVgaFC7LsMsGrCACvmakKut0=; b=TWT2NFqSDoIJR8Wn3as2YTDU3NWjIOBvlQi9o6xF+Ij6u00XFQq5DVIifnHR2Hbh6m c1udihzMxgoIpPtx/51rWw480ojjHptJ80umQcVq02hC3xNy3HcQRVRcFhlS1yPBq7N6 Y9jPql+XZxA2UQ/6suEFk6gkhDEJJah6YlSYIiRwjNYz9hHMSjkka1+BA8mEqHZ6hRu6 E+/mBAa6kfzM2nyDDRZqSMlK5JDyiPqHeL5vIqNviCRddPUAjLvuiY5DVCJjdX5i6T0O x50Gv1mHeC6INs9YEi4oIYBWHWCZzCUP7hKv7p8zbkjrgMbZZC8f8i13gEe3P0SkQXv8 5Uqw== X-Gm-Message-State: AOAM532TXnzAEU87eRUicwcfpeFFNtINnkqzUHSJsAppHiLa7Z8KpkTv TWweTu5P8khdgLqg5JOnocOKw5ki/w== X-Google-Smtp-Source: ABdhPJySYt0uazG9/SFDL9zz6y2XWvJwZyvCht+1T4NUVkPS3v/HAMAC5d7HSDG1IriRVcyZ+4LLvg== X-Received: by 2002:ac8:6ed9:: with SMTP id f25mr10015149qtv.227.1606773803605; Mon, 30 Nov 2020 14:03:23 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:22 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 2/6] nfsd: allow filesystems to opt out of subtree checking Date: Mon, 30 Nov 2020 17:03:15 -0500 Message-Id: <20201130220319.501064-3-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-2-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> <20201130220319.501064-2-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Jeff Layton When we start allowing NFS to be reexported, then we have some problems when it comes to subtree checking. In principle, we could allow it, but it would mean encoding parent info in the filehandles and there may not be enough space for that in a NFSv3 filehandle. To enforce this at export upcall time, we add a new export_ops flag that declares the filesystem ineligible for subtree checking. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust --- Documentation/filesystems/nfs/exporting.rst | 12 ++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/export.c | 6 ++++++ include/linux/exportfs.h | 1 + 4 files changed, 20 insertions(+), 1 deletion(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index cbe542ad5233..960be64446cb 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -190,3 +190,15 @@ following flags are defined: this on filesystems that have an expensive ->getattr inode operation, or when atomicity between pre and post operation attribute collection is impossible to guarantee. + + EXPORT_OP_NOSUBTREECHK - disallow subtree checking on this fs + Many NFS operations deal with filehandles, which the server must then + vet to ensure that they live inside of an exported tree. When the + export consists of an entire filesystem, this is trivial. nfsd can just + ensure that the filehandle live on the filesystem. When only part of a + filesystem is exported however, then nfsd must walk the ancestors of the + inode to ensure that it's within an exported subtree. This is an + expensive operation and not all filesystems can support it properly. + This flag exempts the filesystem from subtree checking and causes + exportfs to get back an error if it tries to enable subtree checking + on it. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 8f4c528865c5..b9ba306bf912 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, }; diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 21e404e7cb68..81e7bb12aca6 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -408,6 +408,12 @@ static int check_export(struct inode *inode, int *flags, unsigned char *uuid) return -EINVAL; } + if (inode->i_sb->s_export_op->flags & EXPORT_OP_NOSUBTREECHK && + !(*flags & NFSEXP_NOSUBTREECHECK)) { + dprintk("%s: %s does not support subtree checking!\n", + __func__, inode->i_sb->s_type->name); + return -EINVAL; + } return 0; } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index e7de0103a32e..2fcbab0f6b61 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -214,6 +214,7 @@ struct export_operations { int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); #define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ unsigned long flags; }; From patchwork Mon Nov 30 22:03:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07BF4C64E7B for ; Mon, 30 Nov 2020 22:04:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FED52076C for ; Mon, 30 Nov 2020 22:04:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DPKWHuik" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729456AbgK3WEH (ORCPT ); Mon, 30 Nov 2020 17:04:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgK3WEG (ORCPT ); Mon, 30 Nov 2020 17:04:06 -0500 Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E17CAC0613D4 for ; Mon, 30 Nov 2020 14:03:25 -0800 (PST) Received: by mail-qt1-x841.google.com with SMTP id f15so9428654qto.13 for ; Mon, 30 Nov 2020 14:03:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=0JyMZfz8+0avmOEl//mrmV9tvJT2r5yGvgT6Yrbnb9Y=; b=DPKWHuikL0Iu2Xm7JeT1Zlxb3pBp2Vcxt7Z5zq+1aV9CmPBE3y9g1co/gPf+2sIJ+r h6KGSKVlU2SBcy0FDlehdiAxr32bvo9uVuIZwsyJEMS0CX4QuN4oz2yR0Z4mLJ8M6pF2 BrB+N9sqB0nStuPvQ7ic2EzYdUeux8+L+439FnDH/hw5j+eTQnSbF0FQ+VdmEhlmEjUI +2azzZWWDPc3lyM3nayBxVo9b4FibnxXI3STukH1B4tDLBHSkBrGdPQHhkb29Zv9twi+ gJMWbCQLnElOg5hkmiHrhI7nowDVXznTm0PXyyye+8TBwom7ljQzZD288khXK4u99Z3F bD9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0JyMZfz8+0avmOEl//mrmV9tvJT2r5yGvgT6Yrbnb9Y=; b=srAiu1VlwMKMsTHssP3dyxBxLAPa5q4/oLcUF2QKhUaRbw1fv9I/os56COSXwM8cO9 DT+wx42JyiSeKiflQ4o5ILxlOQwbwO0b6iIJcTtEnn9nkzNZPpoexpODbeOYxJCpX9CV DQHOL7bGnHiGOBQwfqP3PT1ay0dsrF9EzEqzJQzpr/cAUeQqZmE1TM1QOjY6UWrFlOBO ncSMBTdqok25a+Q8v53VHpcmf1s3jj3H/jPPZ01w3FgjNjgBHWtrbW0cWgAADs8yY2VW 4EqfYBc55QHof9N9ryYxLCWcs8wCThSO5bKI3wgFhiwtiKNulwC0YOz7K5ivzfBdamPu sC5w== X-Gm-Message-State: AOAM532jMu7Jg27gkD/4aIYg6G1f5XRJH0x1UR6XKvqNlp5gTYpSq1pN dghZDIb/X9NpsoOyszrD4w== X-Google-Smtp-Source: ABdhPJy0TYnXN4CUocWihYR37QjkpoiE4uZ83PFQpqLMpe2ek+U1fQyK7tSGL/OFM6ptv5UaDnUZOg== X-Received: by 2002:a05:622a:d1:: with SMTP id p17mr24348237qtw.233.1606773804986; Mon, 30 Nov 2020 14:03:24 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:24 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 3/6] nfsd: close cached files prior to a REMOVE or RENAME that would replace target Date: Mon, 30 Nov 2020 17:03:16 -0500 Message-Id: <20201130220319.501064-4-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-3-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> <20201130220319.501064-2-trond.myklebust@hammerspace.com> <20201130220319.501064-3-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Jeff Layton It's not uncommon for some workloads to do a bunch of I/O to a file and delete it just afterward. If knfsd has a cached open file however, then the file may still be open when the dentry is unlinked. If the underlying filesystem is nfs, then that could trigger it to do a sillyrename. On a REMOVE or RENAME scan the nfsd_file cache for open files that correspond to the inode, and proactively unhash and put their references. This should prevent any delete-on-last-close activity from occurring, solely due to knfsd's open file cache. This must be done synchronously though so we use the variants that call flush_delayed_fput. There are deadlock possibilities if you call flush_delayed_fput while holding locks, however. In the case of nfsd_rename, we don't even do the lookups of the dentries to be renamed until we've locked for rename. Once we've figured out what the target dentry is for a rename, check to see whether there are cached open files associated with it. If there are, then unwind all of the locking, close them all, and then reattempt the rename. None of this is really necessary for "typical" filesystems though. It's mostly of use for NFS, so declare a new export op flag and use that to determine whether to close the files beforehand. Signed-off-by: Jeff Layton Signed-off-by: Lance Shelton Signed-off-by: Trond Myklebust --- Documentation/filesystems/nfs/exporting.rst | 13 +++++++++++++ fs/nfs/export.c | 2 +- fs/nfsd/vfs.c | 16 +++++++++------- include/linux/exportfs.h | 5 +++-- 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/Documentation/filesystems/nfs/exporting.rst b/Documentation/filesystems/nfs/exporting.rst index 960be64446cb..0e98edd353b5 100644 --- a/Documentation/filesystems/nfs/exporting.rst +++ b/Documentation/filesystems/nfs/exporting.rst @@ -202,3 +202,16 @@ following flags are defined: This flag exempts the filesystem from subtree checking and causes exportfs to get back an error if it tries to enable subtree checking on it. + + EXPORT_OP_CLOSE_BEFORE_UNLINK - always close cached files before unlinking + On some exportable filesystems (such as NFS) unlinking a file that + is still open can cause a fair bit of extra work. For instance, + the NFS client will do a "sillyrename" to ensure that the file + sticks around while it's still open. When reexporting, that open + file is held by nfsd so we usually end up doing a sillyrename, and + then immediately deleting the sillyrenamed file just afterward when + the link count actually goes to zero. Sometimes this delete can race + with other operations (for instance an rmdir of the parent directory). + This flag causes nfsd to close any open files for this inode _before_ + calling into the vfs to do an unlink or a rename that would replace + an existing file. diff --git a/fs/nfs/export.c b/fs/nfs/export.c index b9ba306bf912..5428713af5fe 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,5 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 1ecaceebee13..79cba942087e 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1724,7 +1724,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, struct inode *fdir, *tdir; __be32 err; int host_err; - bool has_cached = false; + bool close_cached = false; err = fh_verify(rqstp, ffhp, S_IFDIR, NFSD_MAY_REMOVE); if (err) @@ -1783,8 +1783,9 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry) goto out_dput_new; - if (nfsd_has_cached_files(ndentry)) { - has_cached = true; + if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) && + nfsd_has_cached_files(ndentry)) { + close_cached = true; goto out_dput_old; } else { host_err = vfs_rename(fdir, odentry, tdir, ndentry, NULL, 0); @@ -1805,7 +1806,7 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * as that would do the wrong thing if the two directories * were the same, so again we do it by hand. */ - if (!has_cached) { + if (!close_cached) { fill_post_wcc(ffhp); fill_post_wcc(tfhp); } @@ -1819,8 +1820,8 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen, * shouldn't be done with locks held however, so we delay it until this * point and then reattempt the whole shebang. */ - if (has_cached) { - has_cached = false; + if (close_cached) { + close_cached = false; nfsd_close_cached_files(ndentry); dput(ndentry); goto retry; @@ -1872,7 +1873,8 @@ nfsd_unlink(struct svc_rqst *rqstp, struct svc_fh *fhp, int type, type = d_inode(rdentry)->i_mode & S_IFMT; if (type != S_IFDIR) { - nfsd_close_cached_files(rdentry); + if (rdentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) + nfsd_close_cached_files(rdentry); host_err = vfs_unlink(dirp, rdentry, NULL); } else { host_err = vfs_rmdir(dirp, rdentry); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 2fcbab0f6b61..d829403ffd3b 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -213,8 +213,9 @@ struct export_operations { bool write, u32 *device_generation); int (*commit_blocks)(struct inode *inode, struct iomap *iomaps, int nr_iomaps, struct iattr *iattr); -#define EXPORT_OP_NOWCC (0x1) /* Don't collect wcc data for NFSv3 replies */ -#define EXPORT_OP_NOSUBTREECHK (0x2) /* Subtree checking is not supported! */ +#define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ +#define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ +#define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ unsigned long flags; }; From patchwork Mon Nov 30 22:03:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CF2AC63777 for ; Mon, 30 Nov 2020 22:04:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 117ED20857 for ; Mon, 30 Nov 2020 22:04:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WP6BclLF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730349AbgK3WEH (ORCPT ); Mon, 30 Nov 2020 17:04:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgK3WEH (ORCPT ); Mon, 30 Nov 2020 17:04:07 -0500 Received: from mail-qk1-x741.google.com (mail-qk1-x741.google.com [IPv6:2607:f8b0:4864:20::741]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 052ECC0613D6 for ; Mon, 30 Nov 2020 14:03:27 -0800 (PST) Received: by mail-qk1-x741.google.com with SMTP id n132so12508404qke.1 for ; Mon, 30 Nov 2020 14:03:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7ZSNI8VYDjpJp4/4zDkWO9TfGwHU1P+knHFkHkZXmGo=; b=WP6BclLFYzT3V/LWGAHNhEoksyLPlCWvWgm680VJBo8yQUZ3cCR4iH8DDxH8/9al33 gcLmXHTFZ1/BuD5BdvsUFo8LXLQM9xwKjCgHX3//1yolCm+ab2Jy0dNYPfVGYdGklE29 q0pFOa3J/gW6mnFhU5b1QSEMlDtRmID3mglTPKR7nySQxgsQYGOVBWV+LX5NHbD154fA q3MGbjP7Wh4N123MF6HWBhWlMWt9/PFpMiQ0SMgo7urfLv4Z/kV7aCWP4N9MlJoD3zOE Zfhh36WMS8bSsalQMdd//GDN7GKywaB0JsLAT8I2HlDvXSCIXcHnmXwOLTdl2874cadn +4Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7ZSNI8VYDjpJp4/4zDkWO9TfGwHU1P+knHFkHkZXmGo=; b=dqOSsALybEQLwkOnCQIViHb/Xh37KHM0VbO77OOXHORzHBl6VYkOiMiE+CG8alYBnJ Cm/SxGEYxm/OCXuv6G4vT6wGu5//yT0KKNuuDwtksXiTxU3V9sJC6eZbuwrdJ1SFvBwl QzIx/cVU28RUnLwkuCl/7CO1UBwBsHqJ3aKMbol5xxny7gBAket+qYM6nnwFsml7svo+ ++KkWzzCU6sLywut5hU39uFw78jwYA4KFhfGrjqkiIpOmV8qpYJbGAJHNZSNQ+CgfgE4 HrfVBnfjLvf0bh/i0Bjy0FF2NT1ehlbwzqDnOXzPX+DT5yDNxbLBJPDghXr2MR2Blxm1 wJ/Q== X-Gm-Message-State: AOAM532odpNUMY2l3lxgf2Npk3t4Og2Su5nTby3VDf/HHtR7Qki9kN+M HuQLt3xdrSZ5X2q4iqmaBw== X-Google-Smtp-Source: ABdhPJzCysQ9tnEfVrlE/8LbItrlBnEd2cidEEi6ZG7odmVJvv84JugtrMpVn59fIuWGC5yeocy51w== X-Received: by 2002:a05:620a:113b:: with SMTP id p27mr18691209qkk.29.1606773806156; Mon, 30 Nov 2020 14:03:26 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:25 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 4/6] exportfs: Add a function to return the raw output from fh_to_dentry() Date: Mon, 30 Nov 2020 17:03:17 -0500 Message-Id: <20201130220319.501064-5-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-4-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> <20201130220319.501064-2-trond.myklebust@hammerspace.com> <20201130220319.501064-3-trond.myklebust@hammerspace.com> <20201130220319.501064-4-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Trond Myklebust In order to allow nfsd to accept return values that are not acceptable to overlayfs and others, add a new function. Signed-off-by: Trond Myklebust --- fs/exportfs/expfs.c | 32 ++++++++++++++++++++++++-------- include/linux/exportfs.h | 5 +++++ 2 files changed, 29 insertions(+), 8 deletions(-) diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c index 2dd55b172d57..0106eba46d5a 100644 --- a/fs/exportfs/expfs.c +++ b/fs/exportfs/expfs.c @@ -417,9 +417,11 @@ int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, } EXPORT_SYMBOL_GPL(exportfs_encode_fh); -struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, - int fh_len, int fileid_type, - int (*acceptable)(void *, struct dentry *), void *context) +struct dentry * +exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) { const struct export_operations *nop = mnt->mnt_sb->s_export_op; struct dentry *result, *alias; @@ -432,10 +434,8 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, if (!nop || !nop->fh_to_dentry) return ERR_PTR(-ESTALE); result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); - if (PTR_ERR(result) == -ENOMEM) - return ERR_CAST(result); if (IS_ERR_OR_NULL(result)) - return ERR_PTR(-ESTALE); + return result; /* * If no acceptance criteria was specified by caller, a disconnected @@ -561,10 +561,26 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, err_result: dput(result); - if (err != -ENOMEM) - err = -ESTALE; return ERR_PTR(err); } +EXPORT_SYMBOL_GPL(exportfs_decode_fh_raw); + +struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, + int fh_len, int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context) +{ + struct dentry *ret; + + ret = exportfs_decode_fh_raw(mnt, fid, fh_len, fileid_type, + acceptable, context); + if (IS_ERR_OR_NULL(ret)) { + if (ret == ERR_PTR(-ENOMEM)) + return ret; + return ERR_PTR(-ESTALE); + } + return ret; +} EXPORT_SYMBOL_GPL(exportfs_decode_fh); MODULE_LICENSE("GPL"); diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index d829403ffd3b..846df3c96730 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -223,6 +223,11 @@ extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid, int *max_len, struct inode *parent); extern int exportfs_encode_fh(struct dentry *dentry, struct fid *fid, int *max_len, int connectable); +extern struct dentry *exportfs_decode_fh_raw(struct vfsmount *mnt, + struct fid *fid, int fh_len, + int fileid_type, + int (*acceptable)(void *, struct dentry *), + void *context); extern struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid, int fh_len, int fileid_type, int (*acceptable)(void *, struct dentry *), void *context); From patchwork Mon Nov 30 22:03:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B36A4C71156 for ; Mon, 30 Nov 2020 22:05:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5C5B3208C3 for ; Mon, 30 Nov 2020 22:05:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rP65M+wG" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725893AbgK3WEt (ORCPT ); Mon, 30 Nov 2020 17:04:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730391AbgK3WEs (ORCPT ); Mon, 30 Nov 2020 17:04:48 -0500 Received: from mail-qv1-xf44.google.com (mail-qv1-xf44.google.com [IPv6:2607:f8b0:4864:20::f44]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1458BC0617A6 for ; Mon, 30 Nov 2020 14:03:28 -0800 (PST) Received: by mail-qv1-xf44.google.com with SMTP id k3so6436324qvz.4 for ; Mon, 30 Nov 2020 14:03:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=nqbE+r16ckWaUn4Rl7Ch5GGruWtYjnxznIyjxsTu58c=; b=rP65M+wGpoJ8e9UvvK4/osu4gNM+gvZlBWaDbYlBAqECX1UOkNjumzIUwt4Ldmurxg WQUqjxEhY32Mgd1C4bNM2e41guaK5yvK6cxgqYGwmXdJjFAIrU5TQsDfXGRw89SaIBzi sOzzZDXOtbrG+mWyPVX8/DCXwG28AYhASN2mvuybDz44OXIhTYgnd4pLRevYyBXi+df+ EaeOLXZA0k2HFCNZOAm1JEzTcCxfJi9ayW0p7ry7+H6RrncmBKku6wyfMcBuUqkAC/8W wH8ysmE3P3wc1ZYoQs1xX+qg36K4GidQENS06zw0C0xUTV81qeRg6eiGDJB9pXP49Yz2 NmIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=nqbE+r16ckWaUn4Rl7Ch5GGruWtYjnxznIyjxsTu58c=; b=uLwEQ8Wzyg1KXKgS5ZgXiqRL19UbbjdcT5vCuduurfOGYLmlik2vM6gmWtD8xeQsfk +cqBU/iG1ExzfTpxYizp7AB1FnpLEj5ZdFmR7ixGijAxvq+CZh6/trvAIoRVrDWL0/8A ShK4xGzbYspHNEEc/taQ7RPiPIcycvYfrnjitOG7x3u8tYWVwWvcyUuetI/Bav/pLYBz /6P4VxJ7Gzd0FhWnuVmhFFaXa5w32+nkUzAlANnfxcoc6EMG/sZYCywB08D62gnNN/i4 SbsBGYZppRIM++IHyqg4G8jWxCTvsqPblEe64S+OaiRYuP6KGdFHOwtPca3hpXVdX6ZR TxlA== X-Gm-Message-State: AOAM530lRwyISfZWdeM71mV1IamukKzUMmjhNCmPzxCcv/eaIkWZDGRT UxO44rJCnfrm8Bw46pKmu4l9uGh5tw== X-Google-Smtp-Source: ABdhPJwKgGq+El+t8JJTq/3L42G+V09V5bV+6+I/FSSv+kJV7WcX544iSU8yukg8gDW163fu/4ZPOA== X-Received: by 2002:a0c:9e6b:: with SMTP id z43mr24301264qve.6.1606773807215; Mon, 30 Nov 2020 14:03:27 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:26 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 5/6] nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE Date: Mon, 30 Nov 2020 17:03:18 -0500 Message-Id: <20201130220319.501064-6-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-5-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> <20201130220319.501064-2-trond.myklebust@hammerspace.com> <20201130220319.501064-3-trond.myklebust@hammerspace.com> <20201130220319.501064-4-trond.myklebust@hammerspace.com> <20201130220319.501064-5-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Trond Myklebust If the underlying filesystem times out, then we want knfsd to return NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE. Signed-off-by: Trond Myklebust --- fs/nfsd/nfsfh.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c index 0c2ee65e46f3..46c86f7bc429 100644 --- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -268,12 +268,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp) if (fileid_type == FILEID_ROOT) dentry = dget(exp->ex_path.dentry); else { - dentry = exportfs_decode_fh(exp->ex_path.mnt, fid, - data_left, fileid_type, - nfsd_acceptable, exp); - if (IS_ERR_OR_NULL(dentry)) + dentry = exportfs_decode_fh_raw(exp->ex_path.mnt, fid, + data_left, fileid_type, + nfsd_acceptable, exp); + if (IS_ERR_OR_NULL(dentry)) { trace_nfsd_set_fh_dentry_badhandle(rqstp, fhp, dentry ? PTR_ERR(dentry) : -ESTALE); + switch (PTR_ERR(dentry)) { + case -ENOMEM: + case -ETIMEDOUT: + break; + default: + dentry = ERR_PTR(-ESTALE); + } + } } if (dentry == NULL) goto out; From patchwork Mon Nov 30 22:03:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 11941421 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 840B3C64E7B for ; Mon, 30 Nov 2020 22:05:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1E42B2076C for ; Mon, 30 Nov 2020 22:05:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Y42+wxvF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730392AbgK3WEs (ORCPT ); Mon, 30 Nov 2020 17:04:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33908 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgK3WEs (ORCPT ); Mon, 30 Nov 2020 17:04:48 -0500 Received: from mail-qk1-x742.google.com (mail-qk1-x742.google.com [IPv6:2607:f8b0:4864:20::742]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36159C0617A7 for ; Mon, 30 Nov 2020 14:03:29 -0800 (PST) Received: by mail-qk1-x742.google.com with SMTP id n132so12508516qke.1 for ; Mon, 30 Nov 2020 14:03:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=sHOAENSNXIvjYq9Tl7P3CdijvmITSoC+1Zpxg6ltwMA=; b=Y42+wxvFjhSWek3zq7LQvqr0l//jB621jsPyx188fedunt9y8hLMBnyhvXW5OAVbv5 XJj1TzI8BTDOicAC+vDiZ3BtVavWeWZPDWcK1iWpNEk1VvJP5QjGIMujgokqLWnv9bsD 1JqaEmxzKeKc1/MTTB6JV4SlXFAp8A24HiECzv9iZ6tgC/1EEAOPN5c7L06+oMZ3G9JN MsxL3XuCMZ8lMomFk8dkMqihOMQGWyfRzPx0875i2l6kHi2t/ZcWsUOBDY5WGeCTxrAD 35SeLft/qP9oVRynuHnFXzLw6FD0PpwnLjOVLt2NX11eOjw9jJkPipsRf8wp6KGqFQMA 1nsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=sHOAENSNXIvjYq9Tl7P3CdijvmITSoC+1Zpxg6ltwMA=; b=IuGWT0bPXfQCH2ZeOVcE+9r/7Q15kdcnharacCOAVzcJMD2SHVarOebCUtPeDSkxwD 9yhHhxcrBpT6xx0kFWXeFKsgQWf7vBLx+B1aynVNe0Ae6z2IM6UMXpe3FoYKEVXuhb1P NKmCV/Smp0YMCHaSkIpuzW/Rl0ic0wsGKRbW/4HbH42PO/Ie1MA3V4vbrhraDoIv7q+s mj0pD6W+KjiPBsBDgWE9K5Aoi+i/eLjfvP8bzVdY5f3UfTd4Qv0TecGvppJAW4ZpIB57 0HO/qxwy/5bHZdhBZyntaGm5hpmYcqBH/oN6wHa6TY8KcUcXuSfnFTuVx5zklrKdgiIT 2Bag== X-Gm-Message-State: AOAM531LRdpoWuAPgSAfXZeEC+ZqiXylfCi5ktS0sohWPEQzSrEUvyN/ s5qOhvKF9H6D/OP0VI/lAg== X-Google-Smtp-Source: ABdhPJwajcc/Tj1ux/CpoCBgcFDGWq4K5eLsq2AdQqBVHC5x3VJ1MqGFnkyl6mqraRlRRivIk3+UKw== X-Received: by 2002:a37:500a:: with SMTP id e10mr26008341qkb.60.1606773808357; Mon, 30 Nov 2020 14:03:28 -0800 (PST) Received: from leira.hammer.space (c-68-36-133-222.hsd1.mi.comcast.net. [68.36.133.222]) by smtp.gmail.com with ESMTPSA id q62sm17642886qkf.86.2020.11.30.14.03.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 14:03:27 -0800 (PST) From: trondmy@gmail.com X-Google-Original-From: trond.myklebust@hammerspace.com To: "J. Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org Subject: [PATCH v2 6/6] nfsd: Set PF_LOCAL_THROTTLE on local filesystems only Date: Mon, 30 Nov 2020 17:03:19 -0500 Message-Id: <20201130220319.501064-7-trond.myklebust@hammerspace.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20201130220319.501064-6-trond.myklebust@hammerspace.com> References: <20201130220319.501064-1-trond.myklebust@hammerspace.com> <20201130220319.501064-2-trond.myklebust@hammerspace.com> <20201130220319.501064-3-trond.myklebust@hammerspace.com> <20201130220319.501064-4-trond.myklebust@hammerspace.com> <20201130220319.501064-5-trond.myklebust@hammerspace.com> <20201130220319.501064-6-trond.myklebust@hammerspace.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Trond Myklebust Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they aren't expected to ever be subject to double buffering. Signed-off-by: Trond Myklebust --- fs/nfs/export.c | 3 ++- fs/nfsd/vfs.c | 13 +++++++++++-- include/linux/exportfs.h | 1 + 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/fs/nfs/export.c b/fs/nfs/export.c index 5428713af5fe..48b879cfe6e3 100644 --- a/fs/nfs/export.c +++ b/fs/nfs/export.c @@ -171,5 +171,6 @@ const struct export_operations nfs_export_ops = { .encode_fh = nfs_encode_fh, .fh_to_dentry = nfs_fh_to_dentry, .get_parent = nfs_get_parent, - .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|EXPORT_OP_CLOSE_BEFORE_UNLINK, + .flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK| + EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS, }; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 79cba942087e..04937e51de56 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -978,18 +978,25 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, __be32 *verf) { struct file *file = nf->nf_file; + struct super_block *sb = file_inode(file)->i_sb; struct svc_export *exp; struct iov_iter iter; __be32 nfserr; int host_err; int use_wgather; loff_t pos = offset; + unsigned long exp_op_flags = 0; unsigned int pflags = current->flags; rwf_t flags = 0; + bool restore_flags = false; trace_nfsd_write_opened(rqstp, fhp, offset, *cnt); - if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (sb->s_export_op) + exp_op_flags = sb->s_export_op->flags; + + if (test_bit(RQ_LOCAL, &rqstp->rq_flags) && + !(exp_op_flags & EXPORT_OP_REMOTE_FS)) { /* * We want throttling in balance_dirty_pages() * and shrink_inactive_list() to only consider @@ -998,6 +1005,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, * the client's dirty pages or its congested queue. */ current->flags |= PF_LOCAL_THROTTLE; + restore_flags = true; + } exp = fhp->fh_export; use_wgather = (rqstp->rq_vers == 2) && EX_WGATHER(exp); @@ -1049,7 +1058,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf, trace_nfsd_write_err(rqstp, fhp, offset, host_err); nfserr = nfserrno(host_err); } - if (test_bit(RQ_LOCAL, &rqstp->rq_flags)) + if (restore_flags) current_restore_flags(pflags, PF_LOCAL_THROTTLE); return nfserr; } diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h index 846df3c96730..d93e8a6737bb 100644 --- a/include/linux/exportfs.h +++ b/include/linux/exportfs.h @@ -216,6 +216,7 @@ struct export_operations { #define EXPORT_OP_NOWCC (0x1) /* don't collect v3 wcc data */ #define EXPORT_OP_NOSUBTREECHK (0x2) /* no subtree checking */ #define EXPORT_OP_CLOSE_BEFORE_UNLINK (0x4) /* close files before unlink */ +#define EXPORT_OP_REMOTE_FS (0x8) /* Filesystem is remote */ unsigned long flags; };