From patchwork Thu May 20 15:46:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12270977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B07B1C433B4 for ; Thu, 20 May 2021 15:47:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 978676108D for ; Thu, 20 May 2021 15:47:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238025AbhETPsf convert rfc822-to-8bit (ORCPT ); Thu, 20 May 2021 11:48:35 -0400 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:30627 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236883AbhETPse (ORCPT ); Thu, 20 May 2021 11:48:34 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-525-1lUyBVChNhe7Z27CTXJhgg-1; Thu, 20 May 2021 11:47:09 -0400 X-MC-Unique: 1lUyBVChNhe7Z27CTXJhgg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7E956107ACCA; Thu, 20 May 2021 15:47:07 +0000 (UTC) Received: from bahia.redhat.com (ovpn-112-99.ams2.redhat.com [10.36.112.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9FAE610016F4; Thu, 20 May 2021 15:47:04 +0000 (UTC) From: Greg Kurz To: Miklos Szeredi Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Max Reitz , Vivek Goyal , Greg Kurz Subject: [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path Date: Thu, 20 May 2021 17:46:50 +0200 Message-Id: <20210520154654.1791183-2-groug@kaod.org> In-Reply-To: <20210520154654.1791183-1-groug@kaod.org> References: <20210520154654.1791183-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Some rollback was forgotten during the addition of crossmounts. Fixes: bf109c64040f ("fuse: implement crossmounts") Cc: mreitz@redhat.com Signed-off-by: Greg Kurz --- fs/fuse/dir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index 1b6c001a7dd1..fb2af70596c3 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -339,8 +339,11 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) /* Initialize superblock, making @mp_fi its root */ err = fuse_fill_super_submount(sb, mp_fi); - if (err) + if (err) { + fuse_conn_put(fc); + kfree(fm); goto out_put_sb; + } sb->s_flags |= SB_ACTIVE; fsc->root = dget(sb->s_root); From patchwork Thu May 20 15:46:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12270979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 589D3C43462 for ; Thu, 20 May 2021 15:47:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3A7A2610A2 for ; Thu, 20 May 2021 15:47:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238189AbhETPsk convert rfc822-to-8bit (ORCPT ); Thu, 20 May 2021 11:48:40 -0400 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:30205 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232575AbhETPsh (ORCPT ); Thu, 20 May 2021 11:48:37 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-533-3MzFbQv4MyK5fa_1GMvdAw-1; Thu, 20 May 2021 11:47:11 -0400 X-MC-Unique: 3MzFbQv4MyK5fa_1GMvdAw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 91082107ACE4; Thu, 20 May 2021 15:47:09 +0000 (UTC) Received: from bahia.redhat.com (ovpn-112-99.ams2.redhat.com [10.36.112.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id C710810013C1; Thu, 20 May 2021 15:47:07 +0000 (UTC) From: Greg Kurz To: Miklos Szeredi Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Max Reitz , Vivek Goyal , Greg Kurz Subject: [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts Date: Thu, 20 May 2021 17:46:51 +0200 Message-Id: <20210520154654.1791183-3-groug@kaod.org> In-Reply-To: <20210520154654.1791183-1-groug@kaod.org> References: <20210520154654.1791183-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We don't set the SB_BORN flag on submounts superblocks. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The matching criteria in virtio_fs_test_super() is the pointer of the underlying virtiofs device, which is shared by the root mount and its submounts. This means that any submount can be picked up instead of the root mount. This is itself a bug : submounts should be ignored in this case. But, most importantly, it then triggers an infinite loop in sget_fc() because it fails to grab the superblock (very easy to reproduce). The only viable solution is to set SB_BORN at some point. This must be done with vfs_get_tree() because setting SB_BORN requires special care, i.e. a memory barrier for super_cache_count() which can check SB_BORN without taking any lock. This requires to split out some code from fuse_dentry_automount() to a new dedicated fuse_get_tree_submount(). The fs_private field of the filesystem context isn't used with submounts : hijack it to pass the FUSE inode of the mount point down to fuse_get_tree_submount(). Finally, adapt virtiofs to use this. Signed-off-by: Greg Kurz Reported-by: kernel test robot Reported-by: kernel test robot --- fs/fuse/dir.c | 48 +++++++++++---------------------------------- fs/fuse/fuse_i.h | 6 ++++++ fs/fuse/inode.c | 43 ++++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 3 +++ 4 files changed, 63 insertions(+), 37 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fb2af70596c3..4c8dafe4f69e 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -309,12 +309,9 @@ static int fuse_dentry_delete(const struct dentry *dentry) static struct vfsmount *fuse_dentry_automount(struct path *path) { struct fs_context *fsc; - struct fuse_mount *parent_fm = get_fuse_mount_super(path->mnt->mnt_sb); - struct fuse_conn *fc = parent_fm->fc; struct fuse_mount *fm; struct vfsmount *mnt; struct fuse_inode *mp_fi = get_fuse_inode(d_inode(path->dentry)); - struct super_block *sb; int err; fsc = fs_context_for_submount(path->mnt->mnt_sb->s_type, path->dentry); @@ -323,36 +320,19 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) goto out; } - err = -ENOMEM; - fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); - if (!fm) + /* + * Hijack fsc->fs_private to pass the mount point inode to + * fuse_get_tree_submount(). It *must* be NULLified afterwards + * to avoid the inode pointer to be passed to kfree() when + * the context gets freed. + */ + fsc->fs_private = mp_fi; + err = vfs_get_tree(fsc); + fsc->fs_private = NULL; + if (err) goto out_put_fsc; - fsc->s_fs_info = fm; - sb = sget_fc(fsc, NULL, set_anon_super_fc); - if (IS_ERR(sb)) { - err = PTR_ERR(sb); - kfree(fm); - goto out_put_fsc; - } - fm->fc = fuse_conn_get(fc); - - /* Initialize superblock, making @mp_fi its root */ - err = fuse_fill_super_submount(sb, mp_fi); - if (err) { - fuse_conn_put(fc); - kfree(fm); - goto out_put_sb; - } - - sb->s_flags |= SB_ACTIVE; - fsc->root = dget(sb->s_root); - /* We are done configuring the superblock, so unlock it */ - up_write(&sb->s_umount); - - down_write(&fc->killsb); - list_add_tail(&fm->fc_entry, &fc->mounts); - up_write(&fc->killsb); + fm = get_fuse_mount_super(fsc->root->d_sb); /* Create the submount */ mnt = vfs_create_mount(fsc); @@ -364,12 +344,6 @@ static struct vfsmount *fuse_dentry_automount(struct path *path) put_fs_context(fsc); return mnt; -out_put_sb: - /* - * Only jump here when fsc->root is NULL and sb is still locked - * (otherwise put_fs_context() will put the superblock) - */ - deactivate_locked_super(sb); out_put_fsc: put_fs_context(fsc); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7e463e220053..d7fcf59a6a0e 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1090,6 +1090,12 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); int fuse_fill_super_submount(struct super_block *sb, struct fuse_inode *parent_fi); +/* + * Get the mountable root for the submount + * @fsc: superblock configuration context + */ +int fuse_get_tree_submount(struct fs_context *fsc); + /* * Remove the mount from the connection * diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 393e36b74dc4..74e5205f203c 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1313,6 +1313,49 @@ int fuse_fill_super_submount(struct super_block *sb, return 0; } +/* Filesystem context private data holds the FUSE inode of the mount point */ +int fuse_get_tree_submount(struct fs_context *fsc) +{ + struct fuse_mount *fm; + struct fuse_inode *mp_fi = fsc->fs_private; + struct fuse_conn *fc = get_fuse_conn(&mp_fi->inode); + struct super_block *sb; + int err; + + fm = kzalloc(sizeof(struct fuse_mount), GFP_KERNEL); + if (!fm) + return -ENOMEM; + + fsc->s_fs_info = fm; + sb = sget_fc(fsc, NULL, set_anon_super_fc); + if (IS_ERR(sb)) { + kfree(fm); + return PTR_ERR(sb); + } + fm->fc = fuse_conn_get(fc); + + /* Initialize superblock, making @mp_fi its root */ + err = fuse_fill_super_submount(sb, mp_fi); + if (err) { + fuse_conn_put(fc); + deactivate_locked_super(sb); + kfree(fm); + return err; + } + + sb->s_flags |= SB_ACTIVE; + fsc->root = dget(sb->s_root); + /* We are done configuring the superblock, so unlock it */ + up_write(&sb->s_umount); + + down_write(&fc->killsb); + list_add_tail(&fm->fc_entry, &fc->mounts); + up_write(&fc->killsb); + + return 0; +} +EXPORT_SYMBOL_GPL(fuse_get_tree_submount); + int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) { struct fuse_dev *fud = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index bcb8a02e2d8b..e12e5190352c 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1420,6 +1420,9 @@ static int virtio_fs_get_tree(struct fs_context *fsc) unsigned int virtqueue_size; int err = -EIO; + if (fsc->purpose == FS_CONTEXT_FOR_SUBMOUNT) + return fuse_get_tree_submount(fsc); + /* This gets a reference on virtio_fs object. This ptr gets installed * in fc->iq->priv. Once fuse_conn is going away, it calls ->put() * to drop the reference to this object. From patchwork Thu May 20 15:46:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12270981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16296C433ED for ; Thu, 20 May 2021 15:47:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EF4BE6108D for ; Thu, 20 May 2021 15:47:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238572AbhETPsm convert rfc822-to-8bit (ORCPT ); Thu, 20 May 2021 11:48:42 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:43630 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232553AbhETPsi (ORCPT ); Thu, 20 May 2021 11:48:38 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-OUwm_plVM2qSqXcyk3-WTg-1; Thu, 20 May 2021 11:47:12 -0400 X-MC-Unique: OUwm_plVM2qSqXcyk3-WTg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A77D3805F03; Thu, 20 May 2021 15:47:11 +0000 (UTC) Received: from bahia.redhat.com (ovpn-112-99.ams2.redhat.com [10.36.112.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id D969A10013C1; Thu, 20 May 2021 15:47:09 +0000 (UTC) From: Greg Kurz To: Miklos Szeredi Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Max Reitz , Vivek Goyal , Greg Kurz Subject: [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static Date: Thu, 20 May 2021 17:46:52 +0200 Message-Id: <20210520154654.1791183-4-groug@kaod.org> In-Reply-To: <20210520154654.1791183-1-groug@kaod.org> References: <20210520154654.1791183-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This function used to be called from fuse_dentry_automount(). This code was moved to fuse_get_tree_submount() in the same file since then. It is unlikely there will ever be another user. No need to be extern in this case. Signed-off-by: Greg Kurz --- fs/fuse/fuse_i.h | 9 --------- fs/fuse/inode.c | 4 ++-- 2 files changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index d7fcf59a6a0e..e2f5c8617e0d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1081,15 +1081,6 @@ void fuse_send_init(struct fuse_mount *fm); */ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx); -/* - * Fill in superblock for submounts - * @sb: partially-initialized superblock to fill in - * @parent_fi: The fuse_inode of the parent filesystem where this submount is - * mounted - */ -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi); - /* * Get the mountable root for the submount * @fsc: superblock configuration context diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 74e5205f203c..123b53d1c3c6 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1275,8 +1275,8 @@ static void fuse_sb_defaults(struct super_block *sb) sb->s_xattr = fuse_no_acl_xattr_handlers; } -int fuse_fill_super_submount(struct super_block *sb, - struct fuse_inode *parent_fi) +static int fuse_fill_super_submount(struct super_block *sb, + struct fuse_inode *parent_fi) { struct fuse_mount *fm = get_fuse_mount_super(sb); struct super_block *parent_sb = parent_fi->inode.i_sb; From patchwork Thu May 20 15:46:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12270983 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A96CFC433ED for ; Thu, 20 May 2021 15:47:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8E76C6108D for ; Thu, 20 May 2021 15:47:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239066AbhETPsr convert rfc822-to-8bit (ORCPT ); Thu, 20 May 2021 11:48:47 -0400 Received: from us-smtp-delivery-44.mimecast.com ([207.211.30.44]:21655 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233137AbhETPsl (ORCPT ); Thu, 20 May 2021 11:48:41 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-331-J5b6BXnVM5CEOh3kl5xPKw-1; Thu, 20 May 2021 11:47:15 -0400 X-MC-Unique: J5b6BXnVM5CEOh3kl5xPKw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B97B21854E21; Thu, 20 May 2021 15:47:13 +0000 (UTC) Received: from bahia.redhat.com (ovpn-112-99.ams2.redhat.com [10.36.112.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id EEE8110013C1; Thu, 20 May 2021 15:47:11 +0000 (UTC) From: Greg Kurz To: Miklos Szeredi Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Max Reitz , Vivek Goyal , Greg Kurz Subject: [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() Date: Thu, 20 May 2021 17:46:53 +0200 Message-Id: <20210520154654.1791183-5-groug@kaod.org> In-Reply-To: <20210520154654.1791183-1-groug@kaod.org> References: <20210520154654.1791183-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org All submounts share the same virtio-fs device instance as the root mount. If the same virtiofs filesystem is mounted again, sget_fc() is likely to pick up any of these submounts and reuse it instead of the root mount. On the server side: # mkdir ${some_dir} # mkdir ${some_dir}/mnt1 # mount -t tmpfs none ${some_dir}/mnt1 # touch ${some_dir}/mnt1/THIS_IS_MNT1 # mkdir ${some_dir}/mnt2 # mount -t tmpfs none ${some_dir}/mnt2 # touch ${some_dir}/mnt2/THIS_IS_MNT2 On the client side: # mkdir /mnt/virtiofs1 # mount -t virtiofs myfs /mnt/virtiofs1 # ls /mnt/virtiofs1 mnt1 mnt2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) And now remount it again: # mount -t virtiofs myfs /mnt/virtiofs2 # grep virtiofs /proc/mounts myfs /mnt/virtiofs1 virtiofs rw,seclabel,relatime 0 0 none on /mnt/mnt1 type virtiofs (rw,relatime,seclabel) none on /mnt/mnt2 type virtiofs (rw,relatime,seclabel) myfs /mnt/virtiofs2 virtiofs rw,seclabel,relatime 0 0 # ls /mnt/virtiofs2 THIS_IS_MNT2 Submount mnt2 was picked-up instead of the root mount. Just skip submounts in virtio_fs_test_super(). Signed-off-by: Greg Kurz --- fs/fuse/virtio_fs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e12e5190352c..8962cd033016 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1408,6 +1408,11 @@ static int virtio_fs_test_super(struct super_block *sb, struct fuse_mount *fsc_fm = fsc->s_fs_info; struct fuse_mount *sb_fm = get_fuse_mount_super(sb); + + /* Skip submounts */ + if (!list_is_first(&sb_fm->fc_entry, &sb_fm->fc->mounts)) + return 0; + return fsc_fm->fc->iq.priv == sb_fm->fc->iq.priv; } From patchwork Thu May 20 15:46:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Greg Kurz X-Patchwork-Id: 12270985 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA8FAC43460 for ; Thu, 20 May 2021 15:47:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C24BE610A2 for ; Thu, 20 May 2021 15:47:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239049AbhETPsz convert rfc822-to-8bit (ORCPT ); Thu, 20 May 2021 11:48:55 -0400 Received: from us-smtp-delivery-44.mimecast.com ([205.139.111.44]:50146 "EHLO us-smtp-delivery-44.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239128AbhETPss (ORCPT ); Thu, 20 May 2021 11:48:48 -0400 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-270-7Q-FYw8yOc2bvCh4Mw5zbw-1; Thu, 20 May 2021 11:47:20 -0400 X-MC-Unique: 7Q-FYw8yOc2bvCh4Mw5zbw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 736CD802690; Thu, 20 May 2021 15:47:19 +0000 (UTC) Received: from bahia.redhat.com (ovpn-112-99.ams2.redhat.com [10.36.112.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0E92E10013C1; Thu, 20 May 2021 15:47:13 +0000 (UTC) From: Greg Kurz To: Miklos Szeredi Cc: virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, virtio-fs@redhat.com, Stefan Hajnoczi , Max Reitz , Vivek Goyal , Greg Kurz , Robert Krawitz Subject: [PATCH v4 5/5] virtiofs: propagate sync() to file server Date: Thu, 20 May 2021 17:46:54 +0200 Message-Id: <20210520154654.1791183-6-groug@kaod.org> In-Reply-To: <20210520154654.1791183-1-groug@kaod.org> References: <20210520154654.1791183-1-groug@kaod.org> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=groug@kaod.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: kaod.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though : sync() inside the guest returns right away even though data still needs to be flushed from the host page cache. This is easily demonstrated by doing the following in the guest: $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068> +++ exited with 0 +++ and start the following in the host when the 'dd' command completes in the guest: $ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640> +++ exited with 0 +++ There are no good reasons not to honor the expected behavior of sync() actually : it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously. Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock : a FUSE_SYNCFS is send for the root mount and for each submount. Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers : the client will get the current behavior of sync() not being propagated to the file server. Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel. Reported-by: Robert Krawitz Signed-off-by: Greg Kurz --- fs/fuse/fuse_i.h | 3 +++ fs/fuse/inode.c | 40 +++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 1 + include/uapi/linux/fuse.h | 10 +++++++++- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index e2f5c8617e0d..01d9283261af 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -761,6 +761,9 @@ struct fuse_conn { /* Auto-mount submounts announced by the server */ unsigned int auto_submounts:1; + /* Propagate syncfs() to server */ + unsigned int sync_fs:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 123b53d1c3c6..96b00253f766 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -506,6 +506,45 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) return err; } +static int fuse_sync_fs(struct super_block *sb, int wait) +{ + struct fuse_mount *fm = get_fuse_mount_super(sb); + struct fuse_conn *fc = fm->fc; + struct fuse_syncfs_in inarg; + FUSE_ARGS(args); + int err; + + /* + * Userspace cannot handle the wait == 0 case. Avoid a + * gratuitous roundtrip. + */ + if (!wait) + return 0; + + /* The filesystem is being unmounted. Nothing to do. */ + if (!sb->s_root) + return 0; + + if (!fc->sync_fs) + return 0; + + memset(&inarg, 0, sizeof(inarg)); + args.in_numargs = 1; + args.in_args[0].size = sizeof(inarg); + args.in_args[0].value = &inarg; + args.opcode = FUSE_SYNCFS; + args.nodeid = get_node_id(sb->s_root->d_inode); + args.out_numargs = 0; + + err = fuse_simple_request(fm, &args); + if (err == -ENOSYS) { + fc->sync_fs = 0; + err = 0; + } + + return err; +} + enum { OPT_SOURCE, OPT_SUBTYPE, @@ -909,6 +948,7 @@ static const struct super_operations fuse_super_operations = { .put_super = fuse_put_super, .umount_begin = fuse_umount_begin, .statfs = fuse_statfs, + .sync_fs = fuse_sync_fs, .show_options = fuse_show_options, }; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 8962cd033016..f649a47efb68 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1455,6 +1455,7 @@ static int virtio_fs_get_tree(struct fs_context *fsc) fc->release = fuse_free_conn; fc->delete_stale = true; fc->auto_submounts = true; + fc->sync_fs = true; /* Tell FUSE to split requests that exceed the virtqueue's size */ fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit, diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 271ae90a9bb7..36ed092227fa 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -181,6 +181,9 @@ * - add FUSE_OPEN_KILL_SUIDGID * - extend fuse_setxattr_in, add FUSE_SETXATTR_EXT * - add FUSE_SETXATTR_ACL_KILL_SGID + * + * 7.34 + * - add FUSE_SYNCFS */ #ifndef _LINUX_FUSE_H @@ -216,7 +219,7 @@ #define FUSE_KERNEL_VERSION 7 /** Minor version number of this interface */ -#define FUSE_KERNEL_MINOR_VERSION 33 +#define FUSE_KERNEL_MINOR_VERSION 34 /** The node ID of the root inode */ #define FUSE_ROOT_ID 1 @@ -509,6 +512,7 @@ enum fuse_opcode { FUSE_COPY_FILE_RANGE = 47, FUSE_SETUPMAPPING = 48, FUSE_REMOVEMAPPING = 49, + FUSE_SYNCFS = 50, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -971,4 +975,8 @@ struct fuse_removemapping_one { #define FUSE_REMOVEMAPPING_MAX_ENTRY \ (PAGE_SIZE / sizeof(struct fuse_removemapping_one)) +struct fuse_syncfs_in { + uint64_t padding; +}; + #endif /* _LINUX_FUSE_H */