From patchwork Wed May 15 19:26:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6684392A for ; Wed, 15 May 2019 19:31:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5638128541 for ; Wed, 15 May 2019 19:31:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4A64A288DB; Wed, 15 May 2019 19:31:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EBABF28609 for ; Wed, 15 May 2019 19:31:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728027AbfEOTbh (ORCPT ); Wed, 15 May 2019 15:31:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34970 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726406AbfEOT1c (ORCPT ); Wed, 15 May 2019 15:27:32 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8022EC0703C6; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id C6AF419C71; Wed, 15 May 2019 19:27:29 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 5C4D322063D; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 01/30] fuse: delete dentry if timeout is zero Date: Wed, 15 May 2019 15:26:46 -0400 Message-Id: <20190515192715.18000-2-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 15 May 2019 19:27:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi Don't hold onto dentry in lru list if need to re-lookup it anyway at next access. More advanced version of this patch would periodically flush out dentries from the lru which have gone stale. Signed-off-by: Miklos Szeredi --- fs/fuse/dir.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index dd0f64f7bc06..fd8636e67ae9 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -29,12 +29,26 @@ union fuse_dentry { struct rcu_head rcu; }; -static inline void fuse_dentry_settime(struct dentry *entry, u64 time) +static void fuse_dentry_settime(struct dentry *dentry, u64 time) { - ((union fuse_dentry *) entry->d_fsdata)->time = time; + /* + * Mess with DCACHE_OP_DELETE because dput() will be faster without it. + * Don't care about races, either way it's just an optimization + */ + if ((time && (dentry->d_flags & DCACHE_OP_DELETE)) || + (!time && !(dentry->d_flags & DCACHE_OP_DELETE))) { + spin_lock(&dentry->d_lock); + if (time) + dentry->d_flags &= ~DCACHE_OP_DELETE; + else + dentry->d_flags |= DCACHE_OP_DELETE; + spin_unlock(&dentry->d_lock); + } + + ((union fuse_dentry *) dentry->d_fsdata)->time = time; } -static inline u64 fuse_dentry_time(struct dentry *entry) +static inline u64 fuse_dentry_time(const struct dentry *entry) { return ((union fuse_dentry *) entry->d_fsdata)->time; } @@ -255,8 +269,14 @@ static void fuse_dentry_release(struct dentry *dentry) kfree_rcu(fd, rcu); } +static int fuse_dentry_delete(const struct dentry *dentry) +{ + return time_before64(fuse_dentry_time(dentry), get_jiffies_64()); +} + const struct dentry_operations fuse_dentry_operations = { .d_revalidate = fuse_dentry_revalidate, + .d_delete = fuse_dentry_delete, .d_init = fuse_dentry_init, .d_release = fuse_dentry_release, }; From patchwork Wed May 15 19:26:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945201 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D9B313AD for ; Wed, 15 May 2019 19:29:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D74B2842D for ; Wed, 15 May 2019 19:29:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7124728541; Wed, 15 May 2019 19:29:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1C75F28516 for ; Wed, 15 May 2019 19:29:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727829AbfEOT3j (ORCPT ); Wed, 15 May 2019 15:29:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51172 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727679AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A70A3C05000D; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id C986060BF7; Wed, 15 May 2019 19:27:29 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 619CF2237E8; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Date: Wed, 15 May 2019 15:26:47 -0400 Message-Id: <20190515192715.18000-3-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If fuse daemon is started with cache=never, fuse falls back to direct IO. In that write path we don't call file_remove_privs() and that means setuid bit is not cleared if unpriviliged user writes to a file with setuid bit set. pjdfstest chmod test 12.t tests this and fails. Fix this by calling fuse_remove_privs() even for direct I/O path. I tested this as follows. - Run fuse example pasthrough fs. $ passthrough_ll /mnt/pasthrough-mnt -o default_permissions,allow_other,cache=never $ mkdir /mnt/pasthrough-mnt/testdir $ cd /mnt/pasthrough-mnt/testdir $ prove -rv pjdfstests/tests/chmod/12.t Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 06096b60f1df..5baf07fd2876 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1456,14 +1456,18 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) /* Don't allow parallel writes to the same file */ inode_lock(inode); res = generic_write_checks(iocb, from); - if (res > 0) { - if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) { - res = fuse_direct_IO(iocb, from); - } else { - res = fuse_direct_io(&io, from, &iocb->ki_pos, - FUSE_DIO_WRITE); - } + if (res <= 0) + goto out; + + res = file_remove_privs(iocb->ki_filp); + if (res) + goto out; + if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) { + res = fuse_direct_IO(iocb, from); + } else { + res = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE); } +out: fuse_invalidate_attr(inode); if (res > 0) fuse_write_update_size(inode, iocb->ki_pos); From patchwork Wed May 15 19:26:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945267 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 94EBC933 for ; Wed, 15 May 2019 19:31:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 850BF28641 for ; Wed, 15 May 2019 19:31:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 78C6828922; Wed, 15 May 2019 19:31:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 27F8F288F4 for ; Wed, 15 May 2019 19:31:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726621AbfEOT1c (ORCPT ); Wed, 15 May 2019 15:27:32 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55550 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726124AbfEOT1c (ORCPT ); Wed, 15 May 2019 15:27:32 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 248B9307CB5E; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id CE1B95D721; Wed, 15 May 2019 19:27:29 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 692B4225477; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Date: Wed, 15 May 2019 15:26:48 -0400 Message-Id: <20190515192715.18000-4-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 15 May 2019 19:27:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Miklos Szeredi --- fs/fuse/file.c | 15 ++++++++++++++- fs/splice.c | 3 ++- include/linux/fs.h | 2 ++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 5baf07fd2876..e9a7aa97c539 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2167,6 +2167,19 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) return 0; } +static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos, + struct pipe_inode_info *pipe, size_t len, + unsigned int flags) +{ + struct fuse_file *ff = in->private_data; + + if (ff->open_flags & FOPEN_DIRECT_IO) + return default_file_splice_read(in, ppos, pipe, len, flags); + else + return generic_file_splice_read(in, ppos, pipe, len, flags); + +} + static int convert_fuse_file_lock(struct fuse_conn *fc, const struct fuse_file_lock *ffl, struct file_lock *fl) @@ -3174,7 +3187,7 @@ static const struct file_operations fuse_file_operations = { .fsync = fuse_fsync, .lock = fuse_file_lock, .flock = fuse_file_flock, - .splice_read = generic_file_splice_read, + .splice_read = fuse_file_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = fuse_file_ioctl, .compat_ioctl = fuse_file_compat_ioctl, diff --git a/fs/splice.c b/fs/splice.c index 25212dcca2df..e2e881e34935 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -361,7 +361,7 @@ static ssize_t kernel_readv(struct file *file, const struct kvec *vec, return res; } -static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, +ssize_t default_file_splice_read(struct file *in, loff_t *ppos, struct pipe_inode_info *pipe, size_t len, unsigned int flags) { @@ -425,6 +425,7 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos, iov_iter_advance(&to, copied); /* truncates and discards */ return res; } +EXPORT_SYMBOL(default_file_splice_read); /* * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' diff --git a/include/linux/fs.h b/include/linux/fs.h index dd28e7679089..6804aecf7e30 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3055,6 +3055,8 @@ extern void block_sync_page(struct page *page); /* fs/splice.c */ extern ssize_t generic_file_splice_read(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); +extern ssize_t default_file_splice_read(struct file *, loff_t *, + struct pipe_inode_info *, size_t, unsigned int); extern ssize_t iter_file_splice_write(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, From patchwork Wed May 15 19:26:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945259 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0233A933 for ; Wed, 15 May 2019 19:31:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6EEA28671 for ; Wed, 15 May 2019 19:31:33 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DB453288DA; Wed, 15 May 2019 19:31:33 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7587E288F4 for ; Wed, 15 May 2019 19:31:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727874AbfEOTb1 (ORCPT ); Wed, 15 May 2019 15:31:27 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49824 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726302AbfEOT1c (ORCPT ); Wed, 15 May 2019 15:27:32 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 62695300174D; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id D50411001DE1; Wed, 15 May 2019 19:27:29 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 6FE5E225478; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 04/30] fuse: export fuse_end_request() Date: Wed, 15 May 2019 15:26:49 -0400 Message-Id: <20190515192715.18000-5-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 15 May 2019 19:27:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need to complete requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 19 ++++++++++--------- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9971a35cf1ef..46d1aecd7506 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -427,7 +427,7 @@ static void flush_bg_queue(struct fuse_conn *fc) * the 'end' callback is called if given, else the reference to the * request is released */ -static void request_end(struct fuse_conn *fc, struct fuse_req *req) +void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req) { struct fuse_iqueue *fiq = &fc->iq; @@ -480,6 +480,7 @@ static void request_end(struct fuse_conn *fc, struct fuse_req *req) put_request: fuse_put_request(fc, req); } +EXPORT_SYMBOL_GPL(fuse_request_end); static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) { @@ -567,12 +568,12 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req) req->in.h.unique = fuse_get_unique(fiq); queue_request(fiq, req); /* acquire extra reference, since request is still needed - after request_end() */ + after fuse_request_end() */ __fuse_get_request(req); spin_unlock(&fiq->waitq.lock); request_wait_answer(fc, req); - /* Pairs with smp_wmb() in request_end() */ + /* Pairs with smp_wmb() in fuse_request_end() */ smp_rmb(); } } @@ -1302,7 +1303,7 @@ __releases(fiq->waitq.lock) * the pending list and copies request data to userspace buffer. If * no reply is needed (FORGET) or request has been aborted or there * was an error during the copying then it's finished by calling - * request_end(). Otherwise add it to the processing list, and set + * fuse_request_end(). Otherwise add it to the processing list, and set * the 'sent' flag. */ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, @@ -1362,7 +1363,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, /* SETXATTR is special, since it may contain too large data */ if (in->h.opcode == FUSE_SETXATTR) req->out.h.error = -E2BIG; - request_end(fc, req); + fuse_request_end(fc, req); goto restart; } spin_lock(&fpq->lock); @@ -1405,7 +1406,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, if (!test_bit(FR_PRIVATE, &req->flags)) list_del_init(&req->list); spin_unlock(&fpq->lock); - request_end(fc, req); + fuse_request_end(fc, req); return err; err_unlock: @@ -1913,7 +1914,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out, * the write buffer. The request is then searched on the processing * list by the unique ID found in the header. If found, then remove * it from the list and copy the rest of the buffer to the request. - * The request is finished by calling request_end() + * The request is finished by calling fuse_request_end(). */ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, struct fuse_copy_state *cs, size_t nbytes) @@ -2000,7 +2001,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, list_del_init(&req->list); spin_unlock(&fpq->lock); - request_end(fc, req); + fuse_request_end(fc, req); out: return err ? err : nbytes; @@ -2140,7 +2141,7 @@ static void end_requests(struct fuse_conn *fc, struct list_head *head) req->out.h.error = -ECONNABORTED; clear_bit(FR_SENT, &req->flags); list_del_init(&req->list); - request_end(fc, req); + fuse_request_end(fc, req); } } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 0920c0c032a0..c4584c873b87 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -949,6 +949,11 @@ ssize_t fuse_simple_request(struct fuse_conn *fc, struct fuse_args *args); void fuse_request_send_background(struct fuse_conn *fc, struct fuse_req *req); bool fuse_request_queue_background(struct fuse_conn *fc, struct fuse_req *req); +/** + * End a finished request + */ +void fuse_request_end(struct fuse_conn *fc, struct fuse_req *req); + /* Abort all requests */ void fuse_abort_conn(struct fuse_conn *fc); void fuse_wait_aborted(struct fuse_conn *fc); From patchwork Wed May 15 19:26:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945257 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9BB781515 for ; Wed, 15 May 2019 19:31:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 89EA728541 for ; Wed, 15 May 2019 19:31:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7E63D28947; Wed, 15 May 2019 19:31:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3126B28922 for ; Wed, 15 May 2019 19:31:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727173AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53864 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726677AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B04FF3004405; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8ACF31001DE1; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 769C3225479; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 05/30] fuse: export fuse_len_args() Date: Wed, 15 May 2019 15:26:50 -0400 Message-Id: <20190515192715.18000-6-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 15 May 2019 19:27:32 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need to query the length of fuse_arg lists. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 7 ++++--- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 46d1aecd7506..d8054b1a45f5 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -350,7 +350,7 @@ void fuse_put_request(struct fuse_conn *fc, struct fuse_req *req) } EXPORT_SYMBOL_GPL(fuse_put_request); -static unsigned len_args(unsigned numargs, struct fuse_arg *args) +unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args) { unsigned nbytes = 0; unsigned i; @@ -360,6 +360,7 @@ static unsigned len_args(unsigned numargs, struct fuse_arg *args) return nbytes; } +EXPORT_SYMBOL_GPL(fuse_len_args); static u64 fuse_get_unique(struct fuse_iqueue *fiq) { @@ -375,7 +376,7 @@ static unsigned int fuse_req_hash(u64 unique) static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req) { req->in.h.len = sizeof(struct fuse_in_header) + - len_args(req->in.numargs, (struct fuse_arg *) req->in.args); + fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args); list_add_tail(&req->list, &fiq->pending); wake_up_locked(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); @@ -1894,7 +1895,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_out *out, if (out->h.error) return nbytes != reqsize ? -EINVAL : 0; - reqsize += len_args(out->numargs, out->args); + reqsize += fuse_len_args(out->numargs, out->args); if (reqsize < nbytes || (reqsize > nbytes && !out->argvar)) return -EINVAL; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c4584c873b87..3a235386d667 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1091,4 +1091,9 @@ int fuse_set_acl(struct inode *inode, struct posix_acl *acl, int type); /* readdir.c */ int fuse_readdir(struct file *file, struct dir_context *ctx); +/** + * Return the number of bytes in an arguments list + */ +unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); + #endif /* _FS_FUSE_I_H */ From patchwork Wed May 15 19:26:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945087 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C51CB13AD for ; Wed, 15 May 2019 19:27:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B5E242842D for ; Wed, 15 May 2019 19:27:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A9B1228538; Wed, 15 May 2019 19:27:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5B2312842D for ; Wed, 15 May 2019 19:27:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727615AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56470 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726891AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 29021821C1; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8D0635D9CC; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 7AF4222547A; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 06/30] fuse: Export fuse_send_init_request() Date: Wed, 15 May 2019 15:26:51 -0400 Message-Id: <20190515192715.18000-7-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This will be used by virtio-fs to send init request to fuse server after initialization of virt queues. Signed-off-by: Vivek Goyal --- fs/fuse/dev.c | 1 + fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 3 ++- 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index d8054b1a45f5..40eb827caa10 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -139,6 +139,7 @@ void fuse_request_free(struct fuse_req *req) fuse_req_pages_free(req); kmem_cache_free(fuse_req_cachep, req); } +EXPORT_SYMBOL_GPL(fuse_request_free); void __fuse_get_request(struct fuse_req *req) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 3a235386d667..16f238d7f624 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -987,6 +987,7 @@ void fuse_conn_put(struct fuse_conn *fc); struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc); void fuse_dev_free(struct fuse_dev *fud); +void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req); /** * Add connection to control filesystem diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ec5d9953dfb6..f02291469518 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -958,7 +958,7 @@ static void process_init_reply(struct fuse_conn *fc, struct fuse_req *req) wake_up_all(&fc->blocked_waitq); } -static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) +void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) { struct fuse_init_in *arg = &req->misc.init_in; @@ -988,6 +988,7 @@ static void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req) req->end = process_init_reply; fuse_request_send_background(fc, req); } +EXPORT_SYMBOL_GPL(fuse_send_init); static void fuse_free_conn(struct fuse_conn *fc) { From patchwork Wed May 15 19:26:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945165 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 48065933 for ; Wed, 15 May 2019 19:28:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 36EF62842D for ; Wed, 15 May 2019 19:28:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2AD4728541; Wed, 15 May 2019 19:28:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C83C32842D for ; Wed, 15 May 2019 19:28:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727966AbfEOT2Y (ORCPT ); Wed, 15 May 2019 15:28:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44056 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727716AbfEOT1f (ORCPT ); Wed, 15 May 2019 15:27:35 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7771931760ED; Wed, 15 May 2019 19:27:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 98F13608AB; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 823A022547B; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 07/30] fuse: export fuse_get_unique() Date: Wed, 15 May 2019 15:26:52 -0400 Message-Id: <20190515192715.18000-8-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Wed, 15 May 2019 19:27:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi virtio-fs will need unique IDs for FORGET requests from outside fs/fuse/dev.c. Make the symbol visible. Signed-off-by: Stefan Hajnoczi --- fs/fuse/dev.c | 3 ++- fs/fuse/fuse_i.h | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 40eb827caa10..42fd3b576686 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -363,11 +363,12 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args) } EXPORT_SYMBOL_GPL(fuse_len_args); -static u64 fuse_get_unique(struct fuse_iqueue *fiq) +u64 fuse_get_unique(struct fuse_iqueue *fiq) { fiq->reqctr += FUSE_REQ_ID_STEP; return fiq->reqctr; } +EXPORT_SYMBOL_GPL(fuse_get_unique); static unsigned int fuse_req_hash(u64 unique) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 16f238d7f624..38a572ba650d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1097,4 +1097,9 @@ int fuse_readdir(struct file *file, struct dir_context *ctx); */ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); +/** + * Get the next unique ID for a request + */ +u64 fuse_get_unique(struct fuse_iqueue *fiq); + #endif /* _FS_FUSE_I_H */ From patchwork Wed May 15 19:26:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945143 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 475261515 for ; Wed, 15 May 2019 19:28:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3867B28516 for ; Wed, 15 May 2019 19:28:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2BE992863E; Wed, 15 May 2019 19:28:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 60BF028516 for ; Wed, 15 May 2019 19:28:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727753AbfEOT1x (ORCPT ); Wed, 15 May 2019 15:27:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36224 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727714AbfEOT1g (ORCPT ); Wed, 15 May 2019 15:27:36 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 69D4A30024AD; Wed, 15 May 2019 19:27:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F79C62667; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 8D5BF22547C; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Date: Wed, 15 May 2019 15:26:53 -0400 Message-Id: <20190515192715.18000-9-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 15 May 2019 19:27:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi fuse_fill_super() includes code to process the fd= option and link the struct fuse_dev to the fd's struct file. In virtio-fs there is no file descriptor because /dev/fuse is not used. This patch extracts fuse_fill_super_common() so that both classic fuse and virtio-fs can share the code to initialize a mount. parse_fuse_opt() is also extracted so that the fuse_fill_super_common() caller has access to the mount options. This allows classic fuse to handle the fd= option outside fuse_fill_super_common(). Signed-off-by: Stefan Hajnoczi Signed-off-by: Miklos Szeredi --- fs/fuse/fuse_i.h | 33 ++++++++++++ fs/fuse/inode.c | 137 ++++++++++++++++++++++++----------------------- 2 files changed, 103 insertions(+), 67 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 38a572ba650d..84f094e4ac36 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -56,6 +56,25 @@ extern struct mutex fuse_mutex; extern unsigned max_user_bgreq; extern unsigned max_user_congthresh; +/** Mount options */ +struct fuse_mount_data { + int fd; + unsigned rootmode; + kuid_t user_id; + kgid_t group_id; + unsigned fd_present:1; + unsigned rootmode_present:1; + unsigned user_id_present:1; + unsigned group_id_present:1; + unsigned default_permissions:1; + unsigned allow_other:1; + unsigned max_read; + unsigned blksize; + + /* fuse_dev pointer to fill in, should contain NULL on entry */ + void **fudptr; +}; + /* One forget request */ struct fuse_forget_link { struct fuse_forget_one forget_one; @@ -989,6 +1008,20 @@ struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc); void fuse_dev_free(struct fuse_dev *fud); void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req); +/** + * Parse a mount options string + */ +int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, + struct user_namespace *user_ns); + +/** + * Fill in superblock and initialize fuse connection + * @sb: partially-initialized superblock to fill in + * @mount_data: mount parameters + */ +int fuse_fill_super_common(struct super_block *sb, + struct fuse_mount_data *mount_data); + /** * Add connection to control filesystem */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f02291469518..baf2966a753a 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -59,21 +59,6 @@ MODULE_PARM_DESC(max_user_congthresh, /** Congestion starts at 75% of maximum */ #define FUSE_DEFAULT_CONGESTION_THRESHOLD (FUSE_DEFAULT_MAX_BACKGROUND * 3 / 4) -struct fuse_mount_data { - int fd; - unsigned rootmode; - kuid_t user_id; - kgid_t group_id; - unsigned fd_present:1; - unsigned rootmode_present:1; - unsigned user_id_present:1; - unsigned group_id_present:1; - unsigned default_permissions:1; - unsigned allow_other:1; - unsigned max_read; - unsigned blksize; -}; - struct fuse_forget_link *fuse_alloc_forget(void) { return kzalloc(sizeof(struct fuse_forget_link), GFP_KERNEL); @@ -482,7 +467,7 @@ static int fuse_match_uint(substring_t *s, unsigned int *res) return err; } -static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, +int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, struct user_namespace *user_ns) { char *p; @@ -559,12 +544,13 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, } } - if (!d->fd_present || !d->rootmode_present || - !d->user_id_present || !d->group_id_present) + if (!d->rootmode_present || !d->user_id_present || + !d->group_id_present) return 0; return 1; } +EXPORT_SYMBOL_GPL(parse_fuse_opt); static int fuse_show_options(struct seq_file *m, struct dentry *root) { @@ -1079,15 +1065,13 @@ void fuse_dev_free(struct fuse_dev *fud) } EXPORT_SYMBOL_GPL(fuse_dev_free); -static int fuse_fill_super(struct super_block *sb, void *data, int silent) +int fuse_fill_super_common(struct super_block *sb, + struct fuse_mount_data *mount_data) { struct fuse_dev *fud; struct fuse_conn *fc; struct inode *root; - struct fuse_mount_data d; - struct file *file; struct dentry *root_dentry; - struct fuse_req *init_req; int err; int is_bdev = sb->s_bdev != NULL; @@ -1097,13 +1081,10 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) sb->s_flags &= ~(SB_NOSEC | SB_I_VERSION); - if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) - goto err; - if (is_bdev) { #ifdef CONFIG_BLOCK err = -EINVAL; - if (!sb_set_blocksize(sb, d.blksize)) + if (!sb_set_blocksize(sb, mount_data->blksize)) goto err; #endif } else { @@ -1120,19 +1101,6 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) if (sb->s_user_ns != &init_user_ns) sb->s_iflags |= SB_I_UNTRUSTED_MOUNTER; - file = fget(d.fd); - err = -EINVAL; - if (!file) - goto err; - - /* - * Require mount to happen from the same user namespace which - * opened /dev/fuse to prevent potential attacks. - */ - if (file->f_op != &fuse_dev_operations || - file->f_cred->user_ns != sb->s_user_ns) - goto err_fput; - /* * If we are not in the initial user namespace posix * acls must be translated. @@ -1143,7 +1111,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) fc = kmalloc(sizeof(*fc), GFP_KERNEL); err = -ENOMEM; if (!fc) - goto err_fput; + goto err; fuse_conn_init(fc, sb->s_user_ns); fc->release = fuse_free_conn; @@ -1163,17 +1131,17 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) fc->dont_mask = 1; sb->s_flags |= SB_POSIXACL; - fc->default_permissions = d.default_permissions; - fc->allow_other = d.allow_other; - fc->user_id = d.user_id; - fc->group_id = d.group_id; - fc->max_read = max_t(unsigned, 4096, d.max_read); + fc->default_permissions = mount_data->default_permissions; + fc->allow_other = mount_data->allow_other; + fc->user_id = mount_data->user_id; + fc->group_id = mount_data->group_id; + fc->max_read = max_t(unsigned, 4096, mount_data->max_read); /* Used by get_root_inode() */ sb->s_fs_info = fc; err = -ENOMEM; - root = fuse_get_root_inode(sb, d.rootmode); + root = fuse_get_root_inode(sb, mount_data->rootmode); sb->s_d_op = &fuse_root_dentry_operations; root_dentry = d_make_root(root); if (!root_dentry) @@ -1181,20 +1149,15 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) /* Root dentry doesn't have .d_revalidate */ sb->s_d_op = &fuse_dentry_operations; - init_req = fuse_request_alloc(0); - if (!init_req) - goto err_put_root; - __set_bit(FR_BACKGROUND, &init_req->flags); - if (is_bdev) { fc->destroy_req = fuse_request_alloc(0); if (!fc->destroy_req) - goto err_free_init_req; + goto err_put_root; } mutex_lock(&fuse_mutex); err = -EINVAL; - if (file->private_data) + if (*mount_data->fudptr) goto err_unlock; err = fuse_ctl_add_conn(fc); @@ -1203,23 +1166,12 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) list_add_tail(&fc->entry, &fuse_conn_list); sb->s_root = root_dentry; - file->private_data = fud; + *mount_data->fudptr = fud; mutex_unlock(&fuse_mutex); - /* - * atomic_dec_and_test() in fput() provides the necessary - * memory barrier for file->private_data to be visible on all - * CPUs after this - */ - fput(file); - - fuse_send_init(fc, init_req); - return 0; err_unlock: mutex_unlock(&fuse_mutex); - err_free_init_req: - fuse_request_free(init_req); err_put_root: dput(root_dentry); err_dev_free: @@ -1227,11 +1179,62 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) err_put_conn: fuse_conn_put(fc); sb->s_fs_info = NULL; - err_fput: - fput(file); err: return err; } +EXPORT_SYMBOL_GPL(fuse_fill_super_common); + +static int fuse_fill_super(struct super_block *sb, void *data, int silent) +{ + struct fuse_mount_data d; + struct file *file; + int is_bdev = sb->s_bdev != NULL; + int err; + struct fuse_req *init_req; + + err = -EINVAL; + if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) + goto err; + if (!d.fd_present) + goto err; + + file = fget(d.fd); + if (!file) + goto err; + + /* + * Require mount to happen from the same user namespace which + * opened /dev/fuse to prevent potential attacks. + */ + if ((file->f_op != &fuse_dev_operations) || + (file->f_cred->user_ns != sb->s_user_ns)) + goto err_fput; + + init_req = fuse_request_alloc(0); + if (!init_req) + goto err_fput; + __set_bit(FR_BACKGROUND, &init_req->flags); + + d.fudptr = &file->private_data; + err = fuse_fill_super_common(sb, &d); + if (err < 0) + goto err_free_init_req; + /* + * atomic_dec_and_test() in fput() provides the necessary + * memory barrier for file->private_data to be visible on all + * CPUs after this + */ + fput(file); + fuse_send_init(get_fuse_conn_super(sb), init_req); + return 0; + +err_free_init_req: + fuse_request_free(init_req); +err_fput: + fput(file); +err: + return err; +} static struct dentry *fuse_mount(struct file_system_type *fs_type, int flags, const char *dev_name, From patchwork Wed May 15 19:26:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945133 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79E081515 for ; Wed, 15 May 2019 19:28:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6791C2842D for ; Wed, 15 May 2019 19:28:04 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5B08328541; Wed, 15 May 2019 19:28:04 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 845412842D for ; Wed, 15 May 2019 19:28:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727832AbfEOT1y (ORCPT ); Wed, 15 May 2019 15:27:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46622 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727717AbfEOT1g (ORCPT ); Wed, 15 May 2019 15:27:36 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8414A12B43; Wed, 15 May 2019 19:27:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id C59EA600C4; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9648622547D; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Date: Wed, 15 May 2019 15:26:54 -0400 Message-Id: <20190515192715.18000-10-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 15 May 2019 19:27:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi The /dev/fuse device uses fiq->waitq and fasync to signal that requests are available. These mechanisms do not apply to virtio-fs. This patch introduces callbacks so alternative behavior can be used. Note that queue_interrupt() changes along these lines: spin_lock(&fiq->waitq.lock); wake_up_locked(&fiq->waitq); + kill_fasync(&fiq->fasync, SIGIO, POLL_IN); spin_unlock(&fiq->waitq.lock); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); Since queue_request() and queue_forget() also call kill_fasync() inside the spinlock this should be safe. Signed-off-by: Stefan Hajnoczi Signed-off-by: Miklos Szeredi --- fs/fuse/cuse.c | 2 +- fs/fuse/dev.c | 50 ++++++++++++++++++++++++++++++++---------------- fs/fuse/fuse_i.h | 48 +++++++++++++++++++++++++++++++++++++++++++++- fs/fuse/inode.c | 16 ++++++++++++---- 4 files changed, 94 insertions(+), 22 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index 55a26f351467..a6ed7a036b50 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -504,7 +504,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) * Limit the cuse channel to requests that can * be represented in file->f_cred->user_ns. */ - fuse_conn_init(&cc->fc, file->f_cred->user_ns); + fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL); fud = fuse_dev_alloc(&cc->fc); if (!fud) { diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 42fd3b576686..ef489beadf58 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -375,13 +375,33 @@ static unsigned int fuse_req_hash(u64 unique) return hash_long(unique & ~FUSE_INT_REQ_BIT, FUSE_PQ_HASH_BITS); } -static void queue_request(struct fuse_iqueue *fiq, struct fuse_req *req) +/** + * A new request is available, wake fiq->waitq + */ +static void fuse_dev_wake_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) { - req->in.h.len = sizeof(struct fuse_in_header) + - fuse_len_args(req->in.numargs, (struct fuse_arg *) req->in.args); - list_add_tail(&req->list, &fiq->pending); wake_up_locked(&fiq->waitq); kill_fasync(&fiq->fasync, SIGIO, POLL_IN); + spin_unlock(&fiq->waitq.lock); +} + +const struct fuse_iqueue_ops fuse_dev_fiq_ops = { + .wake_forget_and_unlock = fuse_dev_wake_and_unlock, + .wake_interrupt_and_unlock = fuse_dev_wake_and_unlock, + .wake_pending_and_unlock = fuse_dev_wake_and_unlock, +}; +EXPORT_SYMBOL_GPL(fuse_dev_fiq_ops); + +static void queue_request_and_unlock(struct fuse_iqueue *fiq, + struct fuse_req *req) +__releases(fiq->waitq.lock) +{ + req->in.h.len = sizeof(struct fuse_in_header) + + fuse_len_args(req->in.numargs, + (struct fuse_arg *) req->in.args); + list_add_tail(&req->list, &fiq->pending); + fiq->ops->wake_pending_and_unlock(fiq); } void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, @@ -396,12 +416,11 @@ void fuse_queue_forget(struct fuse_conn *fc, struct fuse_forget_link *forget, if (fiq->connected) { fiq->forget_list_tail->next = forget; fiq->forget_list_tail = forget; - wake_up_locked(&fiq->waitq); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); + fiq->ops->wake_forget_and_unlock(fiq); } else { kfree(forget); + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); } static void flush_bg_queue(struct fuse_conn *fc) @@ -417,8 +436,7 @@ static void flush_bg_queue(struct fuse_conn *fc) fc->active_background++; spin_lock(&fiq->waitq.lock); req->in.h.unique = fuse_get_unique(fiq); - queue_request(fiq, req); - spin_unlock(&fiq->waitq.lock); + queue_request_and_unlock(fiq, req); } } @@ -506,10 +524,10 @@ static int queue_interrupt(struct fuse_iqueue *fiq, struct fuse_req *req) spin_unlock(&fiq->waitq.lock); return 0; } - wake_up_locked(&fiq->waitq); - kill_fasync(&fiq->fasync, SIGIO, POLL_IN); + fiq->ops->wake_interrupt_and_unlock(fiq); + } else { + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); return 0; } @@ -569,11 +587,10 @@ static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req) req->out.h.error = -ENOTCONN; } else { req->in.h.unique = fuse_get_unique(fiq); - queue_request(fiq, req); /* acquire extra reference, since request is still needed after fuse_request_end() */ __fuse_get_request(req); - spin_unlock(&fiq->waitq.lock); + queue_request_and_unlock(fiq, req); request_wait_answer(fc, req); /* Pairs with smp_wmb() in fuse_request_end() */ @@ -706,10 +723,11 @@ static int fuse_request_send_notify_reply(struct fuse_conn *fc, req->in.h.unique = unique; spin_lock(&fiq->waitq.lock); if (fiq->connected) { - queue_request(fiq, req); + queue_request_and_unlock(fiq, req); err = 0; + } else { + spin_unlock(&fiq->waitq.lock); } - spin_unlock(&fiq->waitq.lock); return err; } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 84f094e4ac36..0b578e07156d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -71,6 +71,12 @@ struct fuse_mount_data { unsigned max_read; unsigned blksize; + /* fuse input queue operations */ + const struct fuse_iqueue_ops *fiq_ops; + + /* device-specific state for fuse_iqueue */ + void *fiq_priv; + /* fuse_dev pointer to fill in, should contain NULL on entry */ void **fudptr; }; @@ -461,6 +467,39 @@ struct fuse_req { struct file *stolen_file; }; +struct fuse_iqueue; + +/** + * Input queue callbacks + * + * Input queue signalling is device-specific. For example, the /dev/fuse file + * uses fiq->waitq and fasync to wake processes that are waiting on queue + * readiness. These callbacks allow other device types to respond to input + * queue activity. + */ +struct fuse_iqueue_ops { + /** + * Signal that a forget has been queued + */ + void (*wake_forget_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); + + /** + * Signal that an INTERRUPT request has been queued + */ + void (*wake_interrupt_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); + + /** + * Signal that a request has been queued + */ + void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq) + __releases(fiq->waitq.lock); +}; + +/** /dev/fuse input queue operations */ +extern const struct fuse_iqueue_ops fuse_dev_fiq_ops; + struct fuse_iqueue { /** Connection established */ unsigned connected; @@ -486,6 +525,12 @@ struct fuse_iqueue { /** O_ASYNC requests */ struct fasync_struct *fasync; + + /** Device-specific callbacks */ + const struct fuse_iqueue_ops *ops; + + /** Device-specific state */ + void *priv; }; #define FUSE_PQ_HASH_BITS 8 @@ -997,7 +1042,8 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); /** * Initialize fuse_conn */ -void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns); +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); /** * Release reference to fuse_conn diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index baf2966a753a..126e77854dac 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -570,7 +570,9 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) return 0; } -static void fuse_iqueue_init(struct fuse_iqueue *fiq) +static void fuse_iqueue_init(struct fuse_iqueue *fiq, + const struct fuse_iqueue_ops *ops, + void *priv) { memset(fiq, 0, sizeof(struct fuse_iqueue)); init_waitqueue_head(&fiq->waitq); @@ -578,6 +580,8 @@ static void fuse_iqueue_init(struct fuse_iqueue *fiq) INIT_LIST_HEAD(&fiq->interrupts); fiq->forget_list_tail = &fiq->forget_list_head; fiq->connected = 1; + fiq->ops = ops; + fiq->priv = priv; } static void fuse_pqueue_init(struct fuse_pqueue *fpq) @@ -591,7 +595,8 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) fpq->connected = 1; } -void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns) +void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) { memset(fc, 0, sizeof(*fc)); spin_lock_init(&fc->lock); @@ -601,7 +606,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns) atomic_set(&fc->dev_count, 1); init_waitqueue_head(&fc->blocked_waitq); init_waitqueue_head(&fc->reserved_req_waitq); - fuse_iqueue_init(&fc->iq); + fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); INIT_LIST_HEAD(&fc->bg_queue); INIT_LIST_HEAD(&fc->entry); INIT_LIST_HEAD(&fc->devices); @@ -1113,7 +1118,8 @@ int fuse_fill_super_common(struct super_block *sb, if (!fc) goto err; - fuse_conn_init(fc, sb->s_user_ns); + fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops, + mount_data->fiq_priv); fc->release = fuse_free_conn; fud = fuse_dev_alloc(fc); @@ -1215,6 +1221,8 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) goto err_fput; __set_bit(FR_BACKGROUND, &init_req->flags); + d.fiq_ops = &fuse_dev_fiq_ops; + d.fiq_priv = NULL; d.fudptr = &file->private_data; err = fuse_fill_super_common(sb, &d); if (err < 0) From patchwork Wed May 15 19:26:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945235 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 206B392A for ; Wed, 15 May 2019 19:30:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 11B3528658 for ; Wed, 15 May 2019 19:30:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 066E6288F1; Wed, 15 May 2019 19:30:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 912D328658 for ; Wed, 15 May 2019 19:30:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727986AbfEOTak (ORCPT ); Wed, 15 May 2019 15:30:40 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34990 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727009AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 56312C0624A1; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id C759519C71; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 9F35D22547E; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Date: Wed, 15 May 2019 15:26:55 -0400 Message-Id: <20190515192715.18000-11-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP As of now fuse_dev_alloc() both allocates a fuse device and installs it in fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails. virtio-fs needs to initialize multiple fuse devices (one per virtio queue). It initializes one fuse device as part of call to fuse_fill_super_common() and rest of the devices are allocated and installed after that. But, we can't affort to fail after calling fuse_fill_super_common() as we don't have a way to undo all the actions done by fuse_fill_super_common(). So to avoid failures after the call to fuse_fill_super_common(), pre-allocate all fuse devices early and install them into fuse connection later. This patch provides two separate helpers for fuse device allocation and fuse device installation in fuse_conn. Signed-off-by: Vivek Goyal --- fs/fuse/cuse.c | 2 +- fs/fuse/dev.c | 2 +- fs/fuse/fuse_i.h | 4 +++- fs/fuse/inode.c | 25 ++++++++++++++++++++----- 4 files changed, 25 insertions(+), 8 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index a6ed7a036b50..a509747153a7 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -506,7 +506,7 @@ static int cuse_channel_open(struct inode *inode, struct file *file) */ fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL); - fud = fuse_dev_alloc(&cc->fc); + fud = fuse_dev_alloc_install(&cc->fc); if (!fud) { kfree(cc); return -ENOMEM; diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index ef489beadf58..ee9dd38bc0f0 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -2318,7 +2318,7 @@ static int fuse_device_clone(struct fuse_conn *fc, struct file *new) if (new->private_data) return -EINVAL; - fud = fuse_dev_alloc(fc); + fud = fuse_dev_alloc_install(fc); if (!fud) return -ENOMEM; diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 0b578e07156d..4008ed65a48d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -1050,7 +1050,9 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, */ void fuse_conn_put(struct fuse_conn *fc); -struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc); +struct fuse_dev *fuse_dev_alloc_install(struct fuse_conn *fc); +struct fuse_dev *fuse_dev_alloc(void); +void fuse_dev_install(struct fuse_dev *fud, struct fuse_conn *fc); void fuse_dev_free(struct fuse_dev *fud); void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req); diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 126e77854dac..9b0114437a14 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1027,8 +1027,7 @@ static int fuse_bdi_init(struct fuse_conn *fc, struct super_block *sb) return 0; } -struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc) -{ +struct fuse_dev *fuse_dev_alloc(void) { struct fuse_dev *fud; struct list_head *pq; @@ -1043,16 +1042,32 @@ struct fuse_dev *fuse_dev_alloc(struct fuse_conn *fc) } fud->pq.processing = pq; - fud->fc = fuse_conn_get(fc); fuse_pqueue_init(&fud->pq); + return fud; +} +EXPORT_SYMBOL_GPL(fuse_dev_alloc); + +void fuse_dev_install(struct fuse_dev *fud, struct fuse_conn *fc) { + fud->fc = fuse_conn_get(fc); spin_lock(&fc->lock); list_add_tail(&fud->entry, &fc->devices); spin_unlock(&fc->lock); +} +EXPORT_SYMBOL_GPL(fuse_dev_install); +struct fuse_dev *fuse_dev_alloc_install(struct fuse_conn *fc) +{ + struct fuse_dev *fud; + + fud = fuse_dev_alloc(); + if (!fud) + return NULL; + + fuse_dev_install(fud, fc); return fud; } -EXPORT_SYMBOL_GPL(fuse_dev_alloc); +EXPORT_SYMBOL_GPL(fuse_dev_alloc_install); void fuse_dev_free(struct fuse_dev *fud) { @@ -1122,7 +1137,7 @@ int fuse_fill_super_common(struct super_block *sb, mount_data->fiq_priv); fc->release = fuse_free_conn; - fud = fuse_dev_alloc(fc); + fud = fuse_dev_alloc_install(fc); if (!fud) goto err_put_conn; From patchwork Wed May 15 19:26:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945155 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E60913AD for ; Wed, 15 May 2019 19:28:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1A3E82842D for ; Wed, 15 May 2019 19:28:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0EA2128555; Wed, 15 May 2019 19:28:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 501EB2842D for ; Wed, 15 May 2019 19:28:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727899AbfEOT2J (ORCPT ); Wed, 15 May 2019 15:28:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41220 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727720AbfEOT1f (ORCPT ); Wed, 15 May 2019 15:27:35 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B2239882EF; Wed, 15 May 2019 19:27:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id E398D60928; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id A8BA222547F; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Date: Wed, 15 May 2019 15:26:56 -0400 Message-Id: <20190515192715.18000-12-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 15 May 2019 19:27:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Add a basic file system module for virtio-fs. Signed-off-by: Stefan Hajnoczi Signed-off-by: Vivek Goyal --- fs/fuse/Kconfig | 11 + fs/fuse/Makefile | 1 + fs/fuse/fuse_i.h | 13 + fs/fuse/inode.c | 15 +- fs/fuse/virtio_fs.c | 956 ++++++++++++++++++++++++++++++++ include/uapi/linux/virtio_fs.h | 41 ++ include/uapi/linux/virtio_ids.h | 1 + 7 files changed, 1035 insertions(+), 3 deletions(-) create mode 100644 fs/fuse/virtio_fs.c create mode 100644 include/uapi/linux/virtio_fs.h diff --git a/fs/fuse/Kconfig b/fs/fuse/Kconfig index 76f09ce7e5b2..46e9a8ff9f7a 100644 --- a/fs/fuse/Kconfig +++ b/fs/fuse/Kconfig @@ -26,3 +26,14 @@ config CUSE If you want to develop or use a userspace character device based on CUSE, answer Y or M. + +config VIRTIO_FS + tristate "Virtio Filesystem" + depends on FUSE_FS + select VIRTIO + help + The Virtio Filesystem allows guests to mount file systems from the + host. + + If you want to share files between guests or with the host, answer Y + or M. diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile index f7b807bc1027..47b78fac5809 100644 --- a/fs/fuse/Makefile +++ b/fs/fuse/Makefile @@ -4,5 +4,6 @@ obj-$(CONFIG_FUSE_FS) += fuse.o obj-$(CONFIG_CUSE) += cuse.o +obj-$(CONFIG_VIRTIO_FS) += virtio_fs.o fuse-objs := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 4008ed65a48d..f5cb4d40b83f 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -59,15 +59,18 @@ extern unsigned max_user_congthresh; /** Mount options */ struct fuse_mount_data { int fd; + const char *tag; /* lifetime: .fill_super() data argument */ unsigned rootmode; kuid_t user_id; kgid_t group_id; unsigned fd_present:1; + unsigned tag_present:1; unsigned rootmode_present:1; unsigned user_id_present:1; unsigned group_id_present:1; unsigned default_permissions:1; unsigned allow_other:1; + unsigned destroy:1; unsigned max_read; unsigned blksize; @@ -465,6 +468,9 @@ struct fuse_req { /** Request is stolen from fuse_file->reserved_req */ struct file *stolen_file; + + /** virtio-fs's physically contiguous buffer for in and out args */ + void *argbuf; }; struct fuse_iqueue; @@ -1070,6 +1076,13 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, int fuse_fill_super_common(struct super_block *sb, struct fuse_mount_data *mount_data); +/** + * Disassociate fuse connection from superblock and kill the superblock + * + * Calls kill_anon_super(), use with do not use with bdev mounts. + */ +void fuse_kill_sb_anon(struct super_block *sb); + /** * Add connection to control filesystem */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 9b0114437a14..731a8a74d032 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -434,6 +434,7 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf) enum { OPT_FD, + OPT_TAG, OPT_ROOTMODE, OPT_USER_ID, OPT_GROUP_ID, @@ -446,6 +447,7 @@ enum { static const match_table_t tokens = { {OPT_FD, "fd=%u"}, + {OPT_TAG, "tag=%s"}, {OPT_ROOTMODE, "rootmode=%o"}, {OPT_USER_ID, "user_id=%u"}, {OPT_GROUP_ID, "group_id=%u"}, @@ -492,6 +494,11 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, d->fd_present = 1; break; + case OPT_TAG: + d->tag = args[0].from; + d->tag_present = 1; + break; + case OPT_ROOTMODE: if (match_octal(&args[0], &value)) return 0; @@ -1170,7 +1177,7 @@ int fuse_fill_super_common(struct super_block *sb, /* Root dentry doesn't have .d_revalidate */ sb->s_d_op = &fuse_dentry_operations; - if (is_bdev) { + if (mount_data->destroy) { fc->destroy_req = fuse_request_alloc(0); if (!fc->destroy_req) goto err_put_root; @@ -1216,7 +1223,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) err = -EINVAL; if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) goto err; - if (!d.fd_present) + if (!d.fd_present || d.tag_present) goto err; file = fget(d.fd); @@ -1239,6 +1246,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) d.fiq_ops = &fuse_dev_fiq_ops; d.fiq_priv = NULL; d.fudptr = &file->private_data; + d.destroy = is_bdev; err = fuse_fill_super_common(sb, &d); if (err < 0) goto err_free_init_req; @@ -1282,11 +1290,12 @@ static void fuse_sb_destroy(struct super_block *sb) } } -static void fuse_kill_sb_anon(struct super_block *sb) +void fuse_kill_sb_anon(struct super_block *sb) { fuse_sb_destroy(sb); kill_anon_super(sb); } +EXPORT_SYMBOL_GPL(fuse_kill_sb_anon); static struct file_system_type fuse_fs_type = { .owner = THIS_MODULE, diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c new file mode 100644 index 000000000000..e76e0f5dce40 --- /dev/null +++ b/fs/fuse/virtio_fs.c @@ -0,0 +1,956 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * virtio-fs: Virtio Filesystem + * Copyright (C) 2018 Red Hat, Inc. + */ + +#include +#include +#include +#include +#include +#include "fuse_i.h" + +/* List of virtio-fs device instances and a lock for the list */ +static DEFINE_MUTEX(virtio_fs_mutex); +static LIST_HEAD(virtio_fs_instances); + +enum { + VQ_HIPRIO, + VQ_REQUEST +}; + +/* Per-virtqueue state */ +struct virtio_fs_vq { + spinlock_t lock; + struct virtqueue *vq; /* protected by ->lock */ + struct work_struct done_work; + struct list_head queued_reqs; + struct delayed_work dispatch_work; + struct fuse_dev *fud; + char name[24]; +} ____cacheline_aligned_in_smp; + +/* A virtio-fs device instance */ +struct virtio_fs { + struct list_head list; /* on virtio_fs_instances */ + char *tag; + struct virtio_fs_vq *vqs; + unsigned nvqs; /* number of virtqueues */ + unsigned num_queues; /* number of request queues */ +}; + +struct virtio_fs_forget { + struct fuse_in_header ih; + struct fuse_forget_in arg; + /* This request can be temporarily queued on virt queue */ + struct list_head list; +}; + +static inline struct virtio_fs_vq *vq_to_fsvq(struct virtqueue *vq) +{ + struct virtio_fs *fs = vq->vdev->priv; + + return &fs->vqs[vq->index]; +} + +static inline struct fuse_pqueue *vq_to_fpq(struct virtqueue *vq) +{ + return &vq_to_fsvq(vq)->fud->pq; +} + +/* Add a new instance to the list or return -EEXIST if tag name exists*/ +static int virtio_fs_add_instance(struct virtio_fs *fs) +{ + struct virtio_fs *fs2; + bool duplicate = false; + + mutex_lock(&virtio_fs_mutex); + + list_for_each_entry(fs2, &virtio_fs_instances, list) { + if (strcmp(fs->tag, fs2->tag) == 0) + duplicate = true; + } + + if (!duplicate) + list_add_tail(&fs->list, &virtio_fs_instances); + + mutex_unlock(&virtio_fs_mutex); + + if (duplicate) + return -EEXIST; + return 0; +} + +/* Return the virtio_fs with a given tag, or NULL */ +static struct virtio_fs *virtio_fs_find_instance(const char *tag) +{ + struct virtio_fs *fs; + + mutex_lock(&virtio_fs_mutex); + + list_for_each_entry(fs, &virtio_fs_instances, list) { + if (strcmp(fs->tag, tag) == 0) + goto found; + } + + fs = NULL; /* not found */ + +found: + mutex_unlock(&virtio_fs_mutex); + + return fs; +} + +static void virtio_fs_free_devs(struct virtio_fs *fs) +{ + unsigned int i; + + /* TODO lock */ + + for (i = 0; i < fs->nvqs; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; + + if (!fsvq->fud) + continue; + + flush_work(&fsvq->done_work); + flush_delayed_work(&fsvq->dispatch_work); + + fuse_dev_free(fsvq->fud); /* TODO need to quiesce/end_requests/decrement dev_count */ + fsvq->fud = NULL; + } +} + +/* Read filesystem name from virtio config into fs->tag (must kfree()). */ +static int virtio_fs_read_tag(struct virtio_device *vdev, struct virtio_fs *fs) +{ + char tag_buf[sizeof_field(struct virtio_fs_config, tag)]; + char *end; + size_t len; + + virtio_cread_bytes(vdev, offsetof(struct virtio_fs_config, tag), + &tag_buf, sizeof(tag_buf)); + end = memchr(tag_buf, '\0', sizeof(tag_buf)); + if (end == tag_buf) + return -EINVAL; /* empty tag */ + if (!end) + end = &tag_buf[sizeof(tag_buf)]; + + len = end - tag_buf; + fs->tag = devm_kmalloc(&vdev->dev, len + 1, GFP_KERNEL); + if (!fs->tag) + return -ENOMEM; + memcpy(fs->tag, tag_buf, len); + fs->tag[len] = '\0'; + return 0; +} + +/* Work function for hiprio completion */ +static void virtio_fs_hiprio_done_work(struct work_struct *work) +{ + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + done_work); + struct virtqueue *vq = fsvq->vq; + + /* Free completed FUSE_FORGET requests */ + spin_lock(&fsvq->lock); + do { + unsigned len; + void *req; + + virtqueue_disable_cb(vq); + + while ((req = virtqueue_get_buf(vq, &len)) != NULL) + kfree(req); + } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); + spin_unlock(&fsvq->lock); +} + +static void virtio_fs_dummy_dispatch_work(struct work_struct *work) +{ + return; +} + +static void virtio_fs_hiprio_dispatch_work(struct work_struct *work) +{ + struct virtio_fs_forget *forget; + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + dispatch_work.work); + struct virtqueue *vq = fsvq->vq; + struct scatterlist sg; + struct scatterlist *sgs[] = {&sg}; + bool notify; + int ret; + + pr_debug("worker virtio_fs_hiprio_dispatch_work() called.\n"); + while(1) { + spin_lock(&fsvq->lock); + forget = list_first_entry_or_null(&fsvq->queued_reqs, + struct virtio_fs_forget, list); + if (!forget) { + spin_unlock(&fsvq->lock); + return; + } + + list_del(&forget->list); + sg_init_one(&sg, forget, sizeof(*forget)); + + /* Enqueue the request */ + dev_dbg(&vq->vdev->dev, "%s\n", __func__); + ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC); + if (ret < 0) { + if (ret == -ENOMEM || ret == -ENOSPC) { + pr_debug("virtio-fs: Could not queue FORGET:" + " err=%d. Will try later\n", ret); + list_add_tail(&forget->list, + &fsvq->queued_reqs); + schedule_delayed_work(&fsvq->dispatch_work, + msecs_to_jiffies(1)); + } else { + pr_debug("virtio-fs: Could not queue FORGET:" + " err=%d. Dropping it.\n", ret); + kfree(forget); + } + spin_unlock(&fsvq->lock); + return; + } + + notify = virtqueue_kick_prepare(vq); + spin_unlock(&fsvq->lock); + + if (notify) + virtqueue_notify(vq); + pr_debug("worker virtio_fs_hiprio_dispatch_work() dispatched one forget request.\n"); + } +} + +/* Allocate and copy args into req->argbuf */ +static int copy_args_to_argbuf(struct fuse_req *req) +{ + unsigned offset = 0; + unsigned num_in; + unsigned num_out; + unsigned len; + unsigned i; + + num_in = req->in.numargs - req->in.argpages; + num_out = req->out.numargs - req->out.argpages; + len = fuse_len_args(num_in, (struct fuse_arg *)req->in.args) + + fuse_len_args(num_out, req->out.args); + + req->argbuf = kmalloc(len, GFP_ATOMIC); + if (!req->argbuf) + return -ENOMEM; + + for (i = 0; i < num_in; i++) { + memcpy(req->argbuf + offset, + req->in.args[i].value, + req->in.args[i].size); + offset += req->in.args[i].size; + } + + return 0; +} + +/* Copy args out of and free req->argbuf */ +static void copy_args_from_argbuf(struct fuse_req *req) +{ + unsigned remaining; + unsigned offset; + unsigned num_in; + unsigned num_out; + unsigned i; + + remaining = req->out.h.len - sizeof(req->out.h); + num_in = req->in.numargs - req->in.argpages; + num_out = req->out.numargs - req->out.argpages; + offset = fuse_len_args(num_in, (struct fuse_arg *)req->in.args); + + for (i = 0; i < num_out; i++) { + unsigned argsize = req->out.args[i].size; + + if (req->out.argvar && + i == req->out.numargs - 1 && + argsize > remaining) { + argsize = remaining; + } + + memcpy(req->out.args[i].value, req->argbuf + offset, argsize); + offset += argsize; + + if (i != req->out.numargs - 1) + remaining -= argsize; + } + + /* Store the actual size of the variable-length arg */ + if (req->out.argvar) + req->out.args[req->out.numargs - 1].size = remaining; + + kfree(req->argbuf); + req->argbuf = NULL; +} + +/* Work function for request completion */ +static void virtio_fs_requests_done_work(struct work_struct *work) +{ + struct virtio_fs_vq *fsvq = container_of(work, struct virtio_fs_vq, + done_work); + struct fuse_pqueue *fpq = &fsvq->fud->pq; + struct fuse_conn *fc = fsvq->fud->fc; + struct virtqueue *vq = fsvq->vq; + struct fuse_req *req; + struct fuse_req *next; + LIST_HEAD(reqs); + + /* Collect completed requests off the virtqueue */ + spin_lock(&fsvq->lock); + do { + unsigned len; + + virtqueue_disable_cb(vq); + + while ((req = virtqueue_get_buf(vq, &len)) != NULL) { + spin_lock(&fpq->lock); + list_move_tail(&req->list, &reqs); + spin_unlock(&fpq->lock); + } + } while (!virtqueue_enable_cb(vq) && likely(!virtqueue_is_broken(vq))); + spin_unlock(&fsvq->lock); + + /* End requests */ + list_for_each_entry_safe(req, next, &reqs, list) { + /* TODO check unique */ + /* TODO fuse_len_args(out) against oh.len */ + + copy_args_from_argbuf(req); + + /* TODO zeroing? */ + + spin_lock(&fpq->lock); + clear_bit(FR_SENT, &req->flags); + list_del_init(&req->list); + spin_unlock(&fpq->lock); + + fuse_request_end(fc, req); + } +} + +/* Virtqueue interrupt handler */ +static void virtio_fs_vq_done(struct virtqueue *vq) +{ + struct virtio_fs_vq *fsvq = vq_to_fsvq(vq); + + dev_dbg(&vq->vdev->dev, "%s %s\n", __func__, fsvq->name); + + schedule_work(&fsvq->done_work); +} + +/* Initialize virtqueues */ +static int virtio_fs_setup_vqs(struct virtio_device *vdev, + struct virtio_fs *fs) +{ + struct virtqueue **vqs; + vq_callback_t **callbacks; + const char **names; + unsigned i; + int ret; + + virtio_cread(vdev, struct virtio_fs_config, num_queues, + &fs->num_queues); + if (fs->num_queues == 0) + return -EINVAL; + + fs->nvqs = 1 + fs->num_queues; + + fs->vqs = devm_kcalloc(&vdev->dev, fs->nvqs, + sizeof(fs->vqs[VQ_HIPRIO]), GFP_KERNEL); + if (!fs->vqs) + return -ENOMEM; + + vqs = kmalloc_array(fs->nvqs, sizeof(vqs[VQ_HIPRIO]), GFP_KERNEL); + callbacks = kmalloc_array(fs->nvqs, sizeof(callbacks[VQ_HIPRIO]), + GFP_KERNEL); + names = kmalloc_array(fs->nvqs, sizeof(names[VQ_HIPRIO]), GFP_KERNEL); + if (!vqs || !callbacks || !names) { + ret = -ENOMEM; + goto out; + } + + callbacks[VQ_HIPRIO] = virtio_fs_vq_done; + snprintf(fs->vqs[VQ_HIPRIO].name, sizeof(fs->vqs[VQ_HIPRIO].name), + "hiprio"); + names[VQ_HIPRIO] = fs->vqs[VQ_HIPRIO].name; + INIT_WORK(&fs->vqs[VQ_HIPRIO].done_work, virtio_fs_hiprio_done_work); + INIT_LIST_HEAD(&fs->vqs[VQ_HIPRIO].queued_reqs); + INIT_DELAYED_WORK(&fs->vqs[VQ_HIPRIO].dispatch_work, + virtio_fs_hiprio_dispatch_work); + spin_lock_init(&fs->vqs[VQ_HIPRIO].lock); + + /* Initialize the requests virtqueues */ + for (i = VQ_REQUEST; i < fs->nvqs; i++) { + spin_lock_init(&fs->vqs[i].lock); + INIT_WORK(&fs->vqs[i].done_work, virtio_fs_requests_done_work); + INIT_DELAYED_WORK(&fs->vqs[i].dispatch_work, + virtio_fs_dummy_dispatch_work); + INIT_LIST_HEAD(&fs->vqs[i].queued_reqs); + snprintf(fs->vqs[i].name, sizeof(fs->vqs[i].name), + "requests.%u", i - VQ_REQUEST); + callbacks[i] = virtio_fs_vq_done; + names[i] = fs->vqs[i].name; + } + + ret = virtio_find_vqs(vdev, fs->nvqs, vqs, callbacks, names, NULL); + if (ret < 0) + goto out; + + for (i = 0; i < fs->nvqs; i++) + fs->vqs[i].vq = vqs[i]; + +out: + kfree(names); + kfree(callbacks); + kfree(vqs); + return ret; +} + +/* Free virtqueues (device must already be reset) */ +static void virtio_fs_cleanup_vqs(struct virtio_device *vdev, + struct virtio_fs *fs) +{ + vdev->config->del_vqs(vdev); +} + +static int virtio_fs_probe(struct virtio_device *vdev) +{ + struct virtio_fs *fs; + int ret; + + fs = devm_kzalloc(&vdev->dev, sizeof(*fs), GFP_KERNEL); + if (!fs) + return -ENOMEM; + vdev->priv = fs; + + ret = virtio_fs_read_tag(vdev, fs); + if (ret < 0) + goto out; + + ret = virtio_fs_setup_vqs(vdev, fs); + if (ret < 0) + goto out; + + /* TODO vq affinity */ + /* TODO populate notifications vq */ + + /* Bring the device online in case the filesystem is mounted and + * requests need to be sent before we return. + */ + virtio_device_ready(vdev); + + ret = virtio_fs_add_instance(fs); + if (ret < 0) + goto out_vqs; + + return 0; + +out_vqs: + vdev->config->reset(vdev); + virtio_fs_cleanup_vqs(vdev, fs); + +out: + vdev->priv = NULL; + return ret; +} + +static void virtio_fs_remove(struct virtio_device *vdev) +{ + struct virtio_fs *fs = vdev->priv; + + virtio_fs_free_devs(fs); + + vdev->config->reset(vdev); + virtio_fs_cleanup_vqs(vdev, fs); + + mutex_lock(&virtio_fs_mutex); + list_del(&fs->list); + mutex_unlock(&virtio_fs_mutex); + + vdev->priv = NULL; +} + +#ifdef CONFIG_PM +static int virtio_fs_freeze(struct virtio_device *vdev) +{ + return 0; /* TODO */ +} + +static int virtio_fs_restore(struct virtio_device *vdev) +{ + return 0; /* TODO */ +} +#endif /* CONFIG_PM */ + +const static struct virtio_device_id id_table[] = { + { VIRTIO_ID_FS, VIRTIO_DEV_ANY_ID }, + {}, +}; + +const static unsigned int feature_table[] = {}; + +static struct virtio_driver virtio_fs_driver = { + .driver.name = KBUILD_MODNAME, + .driver.owner = THIS_MODULE, + .id_table = id_table, + .feature_table = feature_table, + .feature_table_size = ARRAY_SIZE(feature_table), + /* TODO validate config_get != NULL */ + .probe = virtio_fs_probe, + .remove = virtio_fs_remove, +#ifdef CONFIG_PM_SLEEP + .freeze = virtio_fs_freeze, + .restore = virtio_fs_restore, +#endif +}; + +static void virtio_fs_wake_forget_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + struct fuse_forget_link *link; + struct virtio_fs_forget *forget; + struct scatterlist sg; + struct scatterlist *sgs[] = {&sg}; + struct virtio_fs *fs; + struct virtqueue *vq; + struct virtio_fs_vq *fsvq; + bool notify; + u64 unique; + int ret; + + BUG_ON(!fiq->forget_list_head.next); + link = fiq->forget_list_head.next; + BUG_ON(link->next); + fiq->forget_list_head.next = NULL; + fiq->forget_list_tail = &fiq->forget_list_head; + + unique = fuse_get_unique(fiq); + + fs = fiq->priv; + fsvq = &fs->vqs[VQ_HIPRIO]; + spin_unlock(&fiq->waitq.lock); + + /* Allocate a buffer for the request */ + forget = kmalloc(sizeof(*forget), GFP_ATOMIC); + if (!forget) { + pr_err("virtio-fs: dropped FORGET: kmalloc failed\n"); + goto out; /* TODO avoid dropping it? */ + } + + forget->ih = (struct fuse_in_header){ + .opcode = FUSE_FORGET, + .nodeid = link->forget_one.nodeid, + .unique = unique, + .len = sizeof(*forget), + }; + forget->arg = (struct fuse_forget_in){ + .nlookup = link->forget_one.nlookup, + }; + + sg_init_one(&sg, forget, sizeof(*forget)); + + /* Enqueue the request */ + vq = fsvq->vq; + dev_dbg(&vq->vdev->dev, "%s\n", __func__); + spin_lock(&fsvq->lock); + + ret = virtqueue_add_sgs(vq, sgs, 1, 0, forget, GFP_ATOMIC); + if (ret < 0) { + if (ret == -ENOMEM || ret == -ENOSPC) { + pr_debug("virtio-fs: Could not queue FORGET: err=%d." + " Will try later.\n", ret); + list_add_tail(&forget->list, &fsvq->queued_reqs); + schedule_delayed_work(&fsvq->dispatch_work, + msecs_to_jiffies(1)); + } else { + pr_debug("virtio-fs: Could not queue FORGET: err=%d." + " Dropping it.\n", ret); + kfree(forget); + } + spin_unlock(&fsvq->lock); + goto out; + } + + notify = virtqueue_kick_prepare(vq); + + spin_unlock(&fsvq->lock); + + if (notify) + virtqueue_notify(vq); +out: + kfree(link); +} + +static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + /* TODO */ + spin_unlock(&fiq->waitq.lock); +} + +/* Return the number of scatter-gather list elements required */ +static unsigned sg_count_fuse_req(struct fuse_req *req) +{ + unsigned total_sgs = 1 /* fuse_in_header */; + + if (req->in.numargs - req->in.argpages) + total_sgs += 1; + + if (req->in.argpages) + total_sgs += req->num_pages; + + if (!test_bit(FR_ISREPLY, &req->flags)) + return total_sgs; + + total_sgs += 1 /* fuse_out_header */; + + if (req->out.numargs - req->out.argpages) + total_sgs += 1; + + if (req->out.argpages) + total_sgs += req->num_pages; + + return total_sgs; +} + +/* Add pages to scatter-gather list and return number of elements used */ +static unsigned sg_init_fuse_pages(struct scatterlist *sg, + struct page **pages, + struct fuse_page_desc *page_descs, + unsigned num_pages) +{ + unsigned i; + + for (i = 0; i < num_pages; i++) { + sg_init_table(&sg[i], 1); + sg_set_page(&sg[i], pages[i], + page_descs[i].length, + page_descs[i].offset); + } + + return i; +} + +/* Add args to scatter-gather list and return number of elements used */ +static unsigned sg_init_fuse_args(struct scatterlist *sg, + struct fuse_req *req, + struct fuse_arg *args, + unsigned numargs, + bool argpages, + void *argbuf, + unsigned *len_used) +{ + unsigned total_sgs = 0; + unsigned len; + + len = fuse_len_args(numargs - argpages, args); + if (len) + sg_init_one(&sg[total_sgs++], argbuf, len); + + if (argpages) + total_sgs += sg_init_fuse_pages(&sg[total_sgs], + req->pages, + req->page_descs, + req->num_pages); + + if (len_used) + *len_used = len; + + return total_sgs; +} + +/* Add a request to a virtqueue and kick the device */ +static int virtio_fs_enqueue_req(struct virtqueue *vq, struct fuse_req *req) +{ + struct scatterlist *stack_sgs[6 /* requests need at least 4 elements */]; + struct scatterlist stack_sg[ARRAY_SIZE(stack_sgs)]; + struct scatterlist **sgs = stack_sgs; + struct scatterlist *sg = stack_sg; + struct virtio_fs_vq *fsvq; + unsigned argbuf_used = 0; + unsigned out_sgs = 0; + unsigned in_sgs = 0; + unsigned total_sgs; + unsigned i; + int ret; + bool notify; + + /* Does the sglist fit on the stack? */ + total_sgs = sg_count_fuse_req(req); + if (total_sgs > ARRAY_SIZE(stack_sgs)) { + sgs = kmalloc_array(total_sgs, sizeof(sgs[0]), GFP_ATOMIC); + sg = kmalloc_array(total_sgs, sizeof(sg[0]), GFP_ATOMIC); + if (!sgs || !sg) { + ret = -ENOMEM; + goto out; + } + } + + /* Use a bounce buffer since stack args cannot be mapped */ + ret = copy_args_to_argbuf(req); + if (ret < 0) + goto out; + + /* Request elements */ + sg_init_one(&sg[out_sgs++], &req->in.h, sizeof(req->in.h)); + out_sgs += sg_init_fuse_args(&sg[out_sgs], req, + (struct fuse_arg *)req->in.args, + req->in.numargs, req->in.argpages, + req->argbuf, &argbuf_used); + + /* Reply elements */ + if (test_bit(FR_ISREPLY, &req->flags)) { + sg_init_one(&sg[out_sgs + in_sgs++], + &req->out.h, sizeof(req->out.h)); + in_sgs += sg_init_fuse_args(&sg[out_sgs + in_sgs], req, + req->out.args, req->out.numargs, + req->out.argpages, + req->argbuf + argbuf_used, NULL); + } + + BUG_ON(out_sgs + in_sgs != total_sgs); + + for (i = 0; i < total_sgs; i++) + sgs[i] = &sg[i]; + + fsvq = vq_to_fsvq(vq); + spin_lock(&fsvq->lock); + + ret = virtqueue_add_sgs(vq, sgs, out_sgs, in_sgs, req, GFP_ATOMIC); + if (ret < 0) { + /* TODO handle full virtqueue */ + spin_unlock(&fsvq->lock); + goto out; + } + + notify = virtqueue_kick_prepare(vq); + + spin_unlock(&fsvq->lock); + + if (notify) + virtqueue_notify(vq); + +out: + if (ret < 0 && req->argbuf) { + kfree(req->argbuf); + req->argbuf = NULL; + } + if (sgs != stack_sgs) { + kfree(sgs); + kfree(sg); + } + + return ret; +} + +static void virtio_fs_wake_pending_and_unlock(struct fuse_iqueue *fiq) +__releases(fiq->waitq.lock) +{ + unsigned queue_id = VQ_REQUEST; /* TODO multiqueue */ + struct virtio_fs *fs; + struct fuse_conn *fc; + struct fuse_req *req; + struct fuse_pqueue *fpq; + int ret; + + BUG_ON(list_empty(&fiq->pending)); + req = list_last_entry(&fiq->pending, struct fuse_req, list); + clear_bit(FR_PENDING, &req->flags); + list_del_init(&req->list); + BUG_ON(!list_empty(&fiq->pending)); + spin_unlock(&fiq->waitq.lock); + + fs = fiq->priv; + fc = fs->vqs[queue_id].fud->fc; + + dev_dbg(&fs->vqs[queue_id].vq->vdev->dev, + "%s: opcode %u unique %#llx nodeid %#llx in.len %u out.len %u\n", + __func__, req->in.h.opcode, req->in.h.unique, req->in.h.nodeid, + req->in.h.len, fuse_len_args(req->out.numargs, req->out.args)); + + fpq = &fs->vqs[queue_id].fud->pq; + spin_lock(&fpq->lock); + if (!fpq->connected) { + spin_unlock(&fpq->lock); + req->out.h.error = -ENODEV; + printk(KERN_ERR "%s: disconnected\n", __func__); + fuse_request_end(fc, req); + return; + } + list_add_tail(&req->list, fpq->processing); + spin_unlock(&fpq->lock); + set_bit(FR_SENT, &req->flags); + /* matches barrier in request_wait_answer() */ + smp_mb__after_atomic(); + /* TODO check for FR_INTERRUPTED? */ + +retry: + ret = virtio_fs_enqueue_req(fs->vqs[queue_id].vq, req); + if (ret < 0) { + if (ret == -ENOMEM || ret == -ENOSPC) { + /* Virtqueue full. Retry submission */ + usleep_range(20, 30); + goto retry; + } + req->out.h.error = ret; + printk(KERN_ERR "%s: virtio_fs_enqueue_req failed %d\n", + __func__, ret); + fuse_request_end(fc, req); + return; + } +} + +const static struct fuse_iqueue_ops virtio_fs_fiq_ops = { + .wake_forget_and_unlock = virtio_fs_wake_forget_and_unlock, + .wake_interrupt_and_unlock = virtio_fs_wake_interrupt_and_unlock, + .wake_pending_and_unlock = virtio_fs_wake_pending_and_unlock, +}; + +static int virtio_fs_fill_super(struct super_block *sb, void *data, + int silent) +{ + struct fuse_mount_data d; + struct fuse_conn *fc; + struct virtio_fs *fs; + int is_bdev = sb->s_bdev != NULL; + unsigned int i; + int err; + struct fuse_req *init_req; + + err = -EINVAL; + if (!parse_fuse_opt(data, &d, is_bdev, sb->s_user_ns)) + goto err; + if (d.fd_present) { + printk(KERN_ERR "virtio-fs: fd option cannot be used\n"); + goto err; + } + if (!d.tag_present) { + printk(KERN_ERR "virtio-fs: missing tag option\n"); + goto err; + } + + fs = virtio_fs_find_instance(d.tag); + if (!fs) { + printk(KERN_ERR "virtio-fs: tag not found\n"); + err = -ENOENT; + goto err; + } + + /* TODO lock */ + if (fs->vqs[VQ_REQUEST].fud) { + printk(KERN_ERR "virtio-fs: device already in use\n"); + err = -EBUSY; + goto err; + } + + err = -ENOMEM; + /* Allocate fuse_dev for hiprio and notification queues */ + for (i = 0; i < VQ_REQUEST; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; + + fsvq->fud = fuse_dev_alloc(); + if (!fsvq->fud) + goto err_free_fuse_devs; + } + + init_req = fuse_request_alloc(0); + if (!init_req) + goto err_free_fuse_devs; + __set_bit(FR_BACKGROUND, &init_req->flags); + + d.fiq_ops = &virtio_fs_fiq_ops; + d.fiq_priv = fs; + d.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud; + d.destroy = true; /* Send destroy request on unmount */ + err = fuse_fill_super_common(sb, &d); + if (err < 0) + goto err_free_init_req; + + fc = fs->vqs[VQ_REQUEST].fud->fc; + + /* TODO take fuse_mutex around this loop? */ + for (i = 0; i < fs->nvqs; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; + + if (i == VQ_REQUEST) + continue; /* already initialized */ + fuse_dev_install(fsvq->fud, fc); + atomic_inc(&fc->dev_count); + } + + fuse_send_init(fc, init_req); + return 0; + +err_free_init_req: + fuse_request_free(init_req); +err_free_fuse_devs: + for (i = 0; i < fs->nvqs; i++) { + struct virtio_fs_vq *fsvq = &fs->vqs[i]; + fuse_dev_free(fsvq->fud); + } +err: + return err; +} + +static void virtio_kill_sb(struct super_block *sb) +{ + struct fuse_conn *fc = get_fuse_conn_super(sb); + fuse_kill_sb_anon(sb); + if (fc) { + struct virtio_fs *vfs = fc->iq.priv; + virtio_fs_free_devs(vfs); + } +} + +static struct dentry *virtio_fs_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, + void *raw_data) +{ + return mount_nodev(fs_type, flags, raw_data, virtio_fs_fill_super); +} + +static struct file_system_type virtio_fs_type = { + .owner = THIS_MODULE, + .name = KBUILD_MODNAME, + .mount = virtio_fs_mount, + .kill_sb = virtio_kill_sb, +}; + +static int __init virtio_fs_init(void) +{ + int ret; + + ret = register_virtio_driver(&virtio_fs_driver); + if (ret < 0) + return ret; + + ret = register_filesystem(&virtio_fs_type); + if (ret < 0) { + unregister_virtio_driver(&virtio_fs_driver); + return ret; + } + + return 0; +} +module_init(virtio_fs_init); + +static void __exit virtio_fs_exit(void) +{ + unregister_filesystem(&virtio_fs_type); + unregister_virtio_driver(&virtio_fs_driver); +} +module_exit(virtio_fs_exit); + +MODULE_AUTHOR("Stefan Hajnoczi "); +MODULE_DESCRIPTION("Virtio Filesystem"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS_FS(KBUILD_MODNAME); +MODULE_DEVICE_TABLE(virtio, id_table); diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h new file mode 100644 index 000000000000..48f3590dcfbe --- /dev/null +++ b/include/uapi/linux/virtio_fs.h @@ -0,0 +1,41 @@ +#ifndef _UAPI_LINUX_VIRTIO_FS_H +#define _UAPI_LINUX_VIRTIO_FS_H +/* This header is BSD licensed so anyone can use the definitions to implement + * compatible drivers/servers. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * 3. Neither the name of IBM nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. */ +#include +#include +#include +#include + +struct virtio_fs_config { + /* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */ + __u8 tag[36]; + + /* Number of request queues */ + __u32 num_queues; +} __attribute__((packed)); + +#endif /* _UAPI_LINUX_VIRTIO_FS_H */ diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h index 6d5c3b2d4f4d..884b0e2734bb 100644 --- a/include/uapi/linux/virtio_ids.h +++ b/include/uapi/linux/virtio_ids.h @@ -43,5 +43,6 @@ #define VIRTIO_ID_INPUT 18 /* virtio input */ #define VIRTIO_ID_VSOCK 19 /* virtio vsock transport */ #define VIRTIO_ID_CRYPTO 20 /* virtio crypto */ +#define VIRTIO_ID_FS 26 /* virtio filesystem */ #endif /* _LINUX_VIRTIO_IDS_H */ From patchwork Wed May 15 19:26:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945157 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4486013AD for ; Wed, 15 May 2019 19:28:22 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 343B72842D for ; Wed, 15 May 2019 19:28:22 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2858A28555; Wed, 15 May 2019 19:28:22 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CBC502842D for ; Wed, 15 May 2019 19:28:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727879AbfEOT2I (ORCPT ); Wed, 15 May 2019 15:28:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49680 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727725AbfEOT1g (ORCPT ); Wed, 15 May 2019 15:27:36 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D49807E9F9; Wed, 15 May 2019 19:27:35 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EC45B9CCA; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id B0D40225480; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 12/30] dax: remove block device dependencies Date: Wed, 15 May 2019 15:26:57 -0400 Message-Id: <20190515192715.18000-13-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 15 May 2019 19:27:35 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Although struct dax_device itself is not tied to a block device, some DAX code assumes there is a block device. Make block devices optional by allowing bdev to be NULL in commonly used DAX APIs. When there is no block device: * Skip the partition offset calculation in bdev_dax_pgoff() * Skip the blkdev_issue_zeroout() optimization Note that more block device assumptions remain but I haven't reach those code paths yet. Signed-off-by: Stefan Hajnoczi --- drivers/dax/super.c | 3 ++- fs/dax.c | 7 ++++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/dax/super.c b/drivers/dax/super.c index 0a339b85133e..cb44ec663991 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -53,7 +53,8 @@ EXPORT_SYMBOL_GPL(dax_read_unlock); int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size, pgoff_t *pgoff) { - phys_addr_t phys_off = (get_start_sect(bdev) + sector) * 512; + sector_t start_sect = bdev ? get_start_sect(bdev) : 0; + phys_addr_t phys_off = (start_sect + sector) * 512; if (pgoff) *pgoff = PHYS_PFN(phys_off); diff --git a/fs/dax.c b/fs/dax.c index e5e54da1715f..815bc32fd967 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1042,7 +1042,12 @@ static vm_fault_t dax_load_hole(struct xa_state *xas, static bool dax_range_is_aligned(struct block_device *bdev, unsigned int offset, unsigned int length) { - unsigned short sector_size = bdev_logical_block_size(bdev); + unsigned short sector_size; + + if (!bdev) + return false; + + sector_size = bdev_logical_block_size(bdev); if (!IS_ALIGNED(offset, sector_size)) return false; From patchwork Wed May 15 19:26:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA87B933 for ; Wed, 15 May 2019 19:30:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9F0B288EE for ; Wed, 15 May 2019 19:30:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AE4EE2891A; Wed, 15 May 2019 19:30:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 41FD3288EE for ; Wed, 15 May 2019 19:30:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728384AbfEOTa6 (ORCPT ); Wed, 15 May 2019 15:30:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49836 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726124AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1999230001DA; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EA71B1001DE1; Wed, 15 May 2019 19:27:32 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id B7996225481; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Date: Wed, 15 May 2019 15:26:58 -0400 Message-Id: <20190515192715.18000-14-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Right now dax_writeback_mapping_range() is passed a bdev and dax_dev is searched from that bdev name. virtio-fs does not have a bdev. So pass in dax_dev also to dax_writeback_mapping_range(). If dax_dev is passed in, bdev is not used otherwise dax_dev is searched using bdev. Signed-off-by: Vivek Goyal --- fs/dax.c | 16 ++++++++++------ fs/ext2/inode.c | 2 +- fs/ext4/inode.c | 2 +- fs/xfs/xfs_aops.c | 2 +- include/linux/dax.h | 6 ++++-- 5 files changed, 17 insertions(+), 11 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 815bc32fd967..c944c1efc78f 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -932,12 +932,12 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev, * on persistent storage prior to completion of the operation. */ int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc) + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc) { XA_STATE(xas, &mapping->i_pages, wbc->range_start >> PAGE_SHIFT); struct inode *inode = mapping->host; pgoff_t end_index = wbc->range_end >> PAGE_SHIFT; - struct dax_device *dax_dev; void *entry; int ret = 0; unsigned int scanned = 0; @@ -948,9 +948,12 @@ int dax_writeback_mapping_range(struct address_space *mapping, if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL) return 0; - dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); - if (!dax_dev) - return -EIO; + if (bdev) { + WARN_ON(dax_dev); + dax_dev = dax_get_by_host(bdev->bd_disk->disk_name); + if (!dax_dev) + return -EIO; + } trace_dax_writeback_range(inode, xas.xa_index, end_index); @@ -972,7 +975,8 @@ int dax_writeback_mapping_range(struct address_space *mapping, xas_lock_irq(&xas); } xas_unlock_irq(&xas); - put_dax(dax_dev); + if (bdev) + put_dax(dax_dev); trace_dax_writeback_range_done(inode, xas.xa_index, end_index); return ret; } diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index c27c27300d95..9b0131c53429 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -956,7 +956,7 @@ static int ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc) { return dax_writeback_mapping_range(mapping, - mapping->host->i_sb->s_bdev, wbc); + mapping->host->i_sb->s_bdev, NULL, wbc); } const struct address_space_operations ext2_aops = { diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index b32a57bc5d5d..cb8cf5eddd9b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2972,7 +2972,7 @@ static int ext4_dax_writepages(struct address_space *mapping, percpu_down_read(&sbi->s_journal_flag_rwsem); trace_ext4_writepages(inode, wbc); - ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc); + ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, NULL, wbc); trace_ext4_writepages_result(inode, wbc, ret, nr_to_write - wbc->nr_to_write); percpu_up_read(&sbi->s_journal_flag_rwsem); diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 3619e9e8d359..27f71ff55096 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -994,7 +994,7 @@ xfs_dax_writepages( { xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); return dax_writeback_mapping_range(mapping, - xfs_find_bdev_for_inode(mapping->host), wbc); + xfs_find_bdev_for_inode(mapping->host), NULL, wbc); } STATIC int diff --git a/include/linux/dax.h b/include/linux/dax.h index 0dd316a74a29..bf3b00b5f0bf 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -87,7 +87,8 @@ static inline void fs_put_dax(struct dax_device *dax_dev) struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev); int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc); + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc); struct page *dax_layout_busy_page(struct address_space *mapping); dax_entry_t dax_lock_page(struct page *page); @@ -119,7 +120,8 @@ static inline struct page *dax_layout_busy_page(struct address_space *mapping) } static inline int dax_writeback_mapping_range(struct address_space *mapping, - struct block_device *bdev, struct writeback_control *wbc) + struct block_device *bdev, struct dax_device *dax_dev, + struct writeback_control *wbc) { return -EOPNOTSUPP; } From patchwork Wed May 15 19:26:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945251 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40EF5933 for ; Wed, 15 May 2019 19:31:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3124C28541 for ; Wed, 15 May 2019 19:31:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2F2AB28569; Wed, 15 May 2019 19:31:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D700F28541 for ; Wed, 15 May 2019 19:31:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728042AbfEOTa5 (ORCPT ); Wed, 15 May 2019 15:30:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42870 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727046AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4A40F90906; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 257DA5D729; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id BE116225482; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 14/30] virtio: Add get_shm_region method Date: Wed, 15 May 2019 15:26:59 -0400 Message-Id: <20190515192715.18000-15-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sebastien Boeuf Virtio defines 'shared memory regions' that provide a continuously shared region between the host and guest. Provide a method to find a particular region on a device. Signed-off-by: Sebastien Boeuf Signed-off-by: Dr. David Alan Gilbert --- include/linux/virtio_config.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index bb4cc4910750..c859f000a751 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -10,6 +10,11 @@ struct irq_affinity; +struct virtio_shm_region { + u64 addr; + u64 len; +}; + /** * virtio_config_ops - operations for configuring a virtio device * Note: Do not assume that a transport implements all of the operations @@ -65,6 +70,7 @@ struct irq_affinity; * the caller can then copy. * @set_vq_affinity: set the affinity for a virtqueue (optional). * @get_vq_affinity: get the affinity for a virtqueue (optional). + * @get_shm_region: get a shared memory region based on the index. */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -88,6 +94,8 @@ struct virtio_config_ops { const struct cpumask *cpu_mask); const struct cpumask *(*get_vq_affinity)(struct virtio_device *vdev, int index); + bool (*get_shm_region)(struct virtio_device *vdev, + struct virtio_shm_region *region, u8 id); }; /* If driver didn't advertise the feature, it will never appear. */ @@ -250,6 +258,15 @@ int virtqueue_set_affinity(struct virtqueue *vq, const struct cpumask *cpu_mask) return 0; } +static inline +bool virtio_get_shm_region(struct virtio_device *vdev, + struct virtio_shm_region *region, u8 id) +{ + if (!vdev->config->get_shm_region) + return false; + return vdev->config->get_shm_region(vdev, region, id); +} + static inline bool virtio_is_little_endian(struct virtio_device *vdev) { return virtio_has_feature(vdev, VIRTIO_F_VERSION_1) || From patchwork Wed May 15 19:27:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945175 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F154713AD for ; Wed, 15 May 2019 19:28:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DF63C2842D for ; Wed, 15 May 2019 19:28:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D3A2A28541; Wed, 15 May 2019 19:28:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6B3642842D for ; Wed, 15 May 2019 19:28:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727975AbfEOT2r (ORCPT ); Wed, 15 May 2019 15:28:47 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42882 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727266AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7D84333027E; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5491B1001DE1; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id C56EC225483; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Date: Wed, 15 May 2019 15:27:00 -0400 Message-Id: <20190515192715.18000-16-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sebastien Boeuf On PCI the shm regions are found using capability entries; find a region by searching for the capability. Signed-off-by: Sebastien Boeuf Signed-off-by: Dr. David Alan Gilbert --- drivers/virtio/virtio_pci_modern.c | 108 +++++++++++++++++++++++++++++ include/uapi/linux/virtio_pci.h | 10 +++ 2 files changed, 118 insertions(+) diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 07571daccfec..51c9e6eca5ac 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -446,6 +446,112 @@ static void del_vq(struct virtio_pci_vq_info *info) vring_del_virtqueue(vq); } +static int virtio_pci_find_shm_cap(struct pci_dev *dev, + u8 required_id, + u8 *bar, u64 *offset, u64 *len) +{ + int pos; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); + pos > 0; + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type, cap_len, id; + u32 tmp32; + u64 res_offset, res_length; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + cfg_type), + &type); + if (type != VIRTIO_PCI_CAP_SHARED_MEMORY_CFG) + continue; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + cap_len), + &cap_len); + if (cap_len != sizeof(struct virtio_pci_shm_cap)) { + printk(KERN_ERR "%s: shm cap with bad size offset: %d size: %d\n", + __func__, pos, cap_len); + continue; + }; + + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_shm_cap, + id), + &id); + if (id != required_id) + continue; + + /* Type, and ID match, looks good */ + pci_read_config_byte(dev, pos + offsetof(struct virtio_pci_cap, + bar), + bar); + + /* Read the lower 32bit of length and offset */ + pci_read_config_dword(dev, pos + offsetof(struct virtio_pci_cap, offset), + &tmp32); + res_offset = tmp32; + pci_read_config_dword(dev, pos + offsetof(struct virtio_pci_cap, length), + &tmp32); + res_length = tmp32; + + /* and now the top half */ + pci_read_config_dword(dev, + pos + offsetof(struct virtio_pci_shm_cap, + offset_hi), + &tmp32); + res_offset |= ((u64)tmp32) << 32; + pci_read_config_dword(dev, + pos + offsetof(struct virtio_pci_shm_cap, + length_hi), + &tmp32); + res_length |= ((u64)tmp32) << 32; + + *offset = res_offset; + *len = res_length; + + return pos; + } + return 0; +} + +static bool vp_get_shm_region(struct virtio_device *vdev, + struct virtio_shm_region *region, u8 id) +{ + struct virtio_pci_device *vp_dev = to_vp_device(vdev); + struct pci_dev *pci_dev = vp_dev->pci_dev; + u8 bar; + u64 offset, len; + phys_addr_t phys_addr; + size_t bar_len; + char *bar_name; + int ret; + + if (!virtio_pci_find_shm_cap(pci_dev, id, &bar, &offset, &len)) { + return false; + } + + ret = pci_request_region(pci_dev, bar, "virtio-pci-shm"); + if (ret < 0) { + dev_err(&pci_dev->dev, "%s: failed to request BAR\n", + __func__); + return false; + } + + phys_addr = pci_resource_start(pci_dev, bar); + bar_len = pci_resource_len(pci_dev, bar); + + if (offset + len > bar_len) { + dev_err(&pci_dev->dev, + "%s: bar shorter than cap offset+len\n", + __func__); + return false; + } + + region->len = len; + region->addr = (u64) phys_addr + offset; + + return true; +} + static const struct virtio_config_ops virtio_pci_config_nodev_ops = { .get = NULL, .set = NULL, @@ -460,6 +566,7 @@ static const struct virtio_config_ops virtio_pci_config_nodev_ops = { .bus_name = vp_bus_name, .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, + .get_shm_region = vp_get_shm_region, }; static const struct virtio_config_ops virtio_pci_config_ops = { @@ -476,6 +583,7 @@ static const struct virtio_config_ops virtio_pci_config_ops = { .bus_name = vp_bus_name, .set_vq_affinity = vp_set_vq_affinity, .get_vq_affinity = vp_get_vq_affinity, + .get_shm_region = vp_get_shm_region, }; /** diff --git a/include/uapi/linux/virtio_pci.h b/include/uapi/linux/virtio_pci.h index 90007a1abcab..31841a60a4ad 100644 --- a/include/uapi/linux/virtio_pci.h +++ b/include/uapi/linux/virtio_pci.h @@ -113,6 +113,8 @@ #define VIRTIO_PCI_CAP_DEVICE_CFG 4 /* PCI configuration access */ #define VIRTIO_PCI_CAP_PCI_CFG 5 +/* Additional shared memory capability */ +#define VIRTIO_PCI_CAP_SHARED_MEMORY_CFG 8 /* This is the PCI capability header: */ struct virtio_pci_cap { @@ -163,6 +165,14 @@ struct virtio_pci_cfg_cap { __u8 pci_cfg_data[4]; /* Data for BAR access. */ }; +/* Fields in VIRTIO_PCI_CAP_SHARED_MEMORY_CFG */ +struct virtio_pci_shm_cap { + struct virtio_pci_cap cap; + __le32 offset_hi; /* Most sig 32 bits of offset */ + __le32 length_hi; /* Most sig 32 bits of length */ + __u8 id; /* To distinguish shm chunks */ +}; + /* Macro versions of offsets for the Old Timers! */ #define VIRTIO_PCI_CAP_VNDR 0 #define VIRTIO_PCI_CAP_NEXT 1 From patchwork Wed May 15 19:27:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945239 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E9DDE1515 for ; Wed, 15 May 2019 19:30:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D099128630 for ; Wed, 15 May 2019 19:30:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C4AA2288E6; Wed, 15 May 2019 19:30:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EFD9288E1 for ; Wed, 15 May 2019 19:30:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727912AbfEOTaj (ORCPT ); Wed, 15 May 2019 15:30:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35000 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727297AbfEOT1d (ORCPT ); Wed, 15 May 2019 15:27:33 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 80E0BC09AD11; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5C0D15D9CC; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id CBD3F225484; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Date: Wed, 15 May 2019 15:27:01 -0400 Message-Id: <20190515192715.18000-17-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Sebastien Boeuf On MMIO a new set of registers is defined for finding SHM regions. Add their definitions and use them to find the region. Signed-off-by: Sebastien Boeuf --- drivers/virtio/virtio_mmio.c | 32 ++++++++++++++++++++++++++++++++ include/uapi/linux/virtio_mmio.h | 11 +++++++++++ 2 files changed, 43 insertions(+) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index d9dd0f789279..ac42520abd86 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -499,6 +499,37 @@ static const char *vm_bus_name(struct virtio_device *vdev) return vm_dev->pdev->name; } +static bool vm_get_shm_region(struct virtio_device *vdev, + struct virtio_shm_region *region, u8 id) +{ + struct virtio_mmio_device *vm_dev = to_virtio_mmio_device(vdev); + u64 len, addr; + + /* Select the region we're interested in */ + writel(id, vm_dev->base + VIRTIO_MMIO_SHM_SEL); + + /* Read the region size */ + len = (u64) readl(vm_dev->base + VIRTIO_MMIO_SHM_LEN_LOW); + len |= (u64) readl(vm_dev->base + VIRTIO_MMIO_SHM_LEN_HIGH) << 32; + + region->len = len; + + /* Check if region length is -1. If that's the case, the shared memory + * region does not exist and there is no need to proceed further. + */ + if (len == ~(u64)0) { + return false; + } + + /* Read the region base address */ + addr = (u64) readl(vm_dev->base + VIRTIO_MMIO_SHM_BASE_LOW); + addr |= (u64) readl(vm_dev->base + VIRTIO_MMIO_SHM_BASE_HIGH) << 32; + + region->addr = addr; + + return true; +} + static const struct virtio_config_ops virtio_mmio_config_ops = { .get = vm_get, .set = vm_set, @@ -511,6 +542,7 @@ static const struct virtio_config_ops virtio_mmio_config_ops = { .get_features = vm_get_features, .finalize_features = vm_finalize_features, .bus_name = vm_bus_name, + .get_shm_region = vm_get_shm_region, }; diff --git a/include/uapi/linux/virtio_mmio.h b/include/uapi/linux/virtio_mmio.h index c4b09689ab64..0650f91bea6c 100644 --- a/include/uapi/linux/virtio_mmio.h +++ b/include/uapi/linux/virtio_mmio.h @@ -122,6 +122,17 @@ #define VIRTIO_MMIO_QUEUE_USED_LOW 0x0a0 #define VIRTIO_MMIO_QUEUE_USED_HIGH 0x0a4 +/* Shared memory region id */ +#define VIRTIO_MMIO_SHM_SEL 0x0ac + +/* Shared memory region length, 64 bits in two halves */ +#define VIRTIO_MMIO_SHM_LEN_LOW 0x0b0 +#define VIRTIO_MMIO_SHM_LEN_HIGH 0x0b4 + +/* Shared memory region base address, 64 bits in two halves */ +#define VIRTIO_MMIO_SHM_BASE_LOW 0x0b8 +#define VIRTIO_MMIO_SHM_BASE_HIGH 0x0bc + /* Configuration atomicity value */ #define VIRTIO_MMIO_CONFIG_GENERATION 0x0fc From patchwork Wed May 15 19:27:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945191 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DE03E933 for ; Wed, 15 May 2019 19:29:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE4FE2842D for ; Wed, 15 May 2019 19:29:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C28FA28541; Wed, 15 May 2019 19:29:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E4242842D for ; Wed, 15 May 2019 19:29:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728153AbfEOT3T (ORCPT ); Wed, 15 May 2019 15:29:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53886 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727560AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B3C34308ED53; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B80B5D728; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id D1897225485; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Date: Wed, 15 May 2019 15:27:02 -0400 Message-Id: <20190515192715.18000-18-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi A struct dax_device instance is a prerequisite for the DAX filesystem APIs. Let virtio_fs associate a dax_device with a fuse_conn. Classic FUSE and CUSE set the pointer to NULL, disabling DAX. Signed-off-by: Stefan Hajnoczi --- fs/fuse/cuse.c | 3 ++- fs/fuse/fuse_i.h | 9 ++++++++- fs/fuse/inode.c | 9 ++++++--- fs/fuse/virtio_fs.c | 1 + 4 files changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/fuse/cuse.c b/fs/fuse/cuse.c index a509747153a7..417448f11f9f 100644 --- a/fs/fuse/cuse.c +++ b/fs/fuse/cuse.c @@ -504,7 +504,8 @@ static int cuse_channel_open(struct inode *inode, struct file *file) * Limit the cuse channel to requests that can * be represented in file->f_cred->user_ns. */ - fuse_conn_init(&cc->fc, file->f_cred->user_ns, &fuse_dev_fiq_ops, NULL); + fuse_conn_init(&cc->fc, file->f_cred->user_ns, NULL, &fuse_dev_fiq_ops, + NULL); fud = fuse_dev_alloc_install(&cc->fc); if (!fud) { diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f5cb4d40b83f..46fc1a454084 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -74,6 +74,9 @@ struct fuse_mount_data { unsigned max_read; unsigned blksize; + /* DAX device, may be NULL */ + struct dax_device *dax_dev; + /* fuse input queue operations */ const struct fuse_iqueue_ops *fiq_ops; @@ -822,6 +825,9 @@ struct fuse_conn { /** List of device instances belonging to this connection */ struct list_head devices; + + /** DAX device, non-NULL if DAX is supported */ + struct dax_device *dax_dev; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) @@ -1049,7 +1055,8 @@ struct fuse_conn *fuse_conn_get(struct fuse_conn *fc); * Initialize fuse_conn */ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, - const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); + struct dax_device *dax_dev, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv); /** * Release reference to fuse_conn diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 731a8a74d032..42f3ac5b7521 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -603,7 +603,8 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) } void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, - const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) + struct dax_device *dax_dev, + const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) { memset(fc, 0, sizeof(*fc)); spin_lock_init(&fc->lock); @@ -628,6 +629,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, atomic64_set(&fc->attr_version, 1); get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key)); fc->pid_ns = get_pid_ns(task_active_pid_ns(current)); + fc->dax_dev = dax_dev; fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; } @@ -1140,8 +1142,8 @@ int fuse_fill_super_common(struct super_block *sb, if (!fc) goto err; - fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops, - mount_data->fiq_priv); + fuse_conn_init(fc, sb->s_user_ns, mount_data->dax_dev, + mount_data->fiq_ops, mount_data->fiq_priv); fc->release = fuse_free_conn; fud = fuse_dev_alloc_install(fc); @@ -1243,6 +1245,7 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent) goto err_fput; __set_bit(FR_BACKGROUND, &init_req->flags); + d.dax_dev = NULL; d.fiq_ops = &fuse_dev_fiq_ops; d.fiq_priv = NULL; d.fudptr = &file->private_data; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index e76e0f5dce40..a23a1fb67217 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -866,6 +866,7 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, goto err_free_fuse_devs; __set_bit(FR_BACKGROUND, &init_req->flags); + d.dax_dev = NULL; d.fiq_ops = &virtio_fs_fiq_ops; d.fiq_priv = fs; d.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud; From patchwork Wed May 15 19:27:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945185 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D11BE933 for ; Wed, 15 May 2019 19:29:13 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C09062842D for ; Wed, 15 May 2019 19:29:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AF49428541; Wed, 15 May 2019 19:29:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F3D382842D for ; Wed, 15 May 2019 19:29:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728093AbfEOT3D (ORCPT ); Wed, 15 May 2019 15:29:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49862 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727576AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CB1953001561; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 92F671972A; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id D722C225486; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Date: Wed, 15 May 2019 15:27:03 -0400 Message-Id: <20190515192715.18000-19-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.46]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Setup a dax device. Use the shm capability to find the cache entry and map it. The DAX window is accessed by the fs/dax.c infrastructure and must have struct pages (at least on x86). Use devm_memremap_pages() to map the DAX window PCI BAR and allocate struct page. Signed-off-by: Stefan Hajnoczi Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Signed-off-by: Sebastien Boeuf Signed-off-by: Liu Bo --- fs/fuse/fuse_i.h | 1 + fs/fuse/inode.c | 8 ++ fs/fuse/virtio_fs.c | 173 ++++++++++++++++++++++++++++++++- include/uapi/linux/virtio_fs.h | 3 + 4 files changed, 183 insertions(+), 2 deletions(-) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 46fc1a454084..840c88af711c 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -70,6 +70,7 @@ struct fuse_mount_data { unsigned group_id_present:1; unsigned default_permissions:1; unsigned allow_other:1; + unsigned dax:1; unsigned destroy:1; unsigned max_read; unsigned blksize; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 42f3ac5b7521..97d218a7daa8 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -442,6 +442,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_DAX, OPT_ERR }; @@ -455,6 +456,7 @@ static const match_table_t tokens = { {OPT_ALLOW_OTHER, "allow_other"}, {OPT_MAX_READ, "max_read=%u"}, {OPT_BLKSIZE, "blksize=%u"}, + {OPT_DAX, "dax"}, {OPT_ERR, NULL} }; @@ -546,6 +548,10 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev, d->blksize = value; break; + case OPT_DAX: + d->dax = 1; + break; + default: return 0; } @@ -574,6 +580,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root) seq_printf(m, ",max_read=%u", fc->max_read); if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE) seq_printf(m, ",blksize=%lu", sb->s_blocksize); + if (fc->dax_dev) + seq_printf(m, ",dax"); return 0; } diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index a23a1fb67217..2b790865dc21 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -5,6 +5,9 @@ */ #include +#include +#include +#include #include #include #include @@ -31,6 +34,18 @@ struct virtio_fs_vq { char name[24]; } ____cacheline_aligned_in_smp; +/* State needed for devm_memremap_pages(). This API is called on the + * underlying pci_dev instead of struct virtio_fs (layering violation). Since + * the memremap release function only gets called when the pci_dev is released, + * keep the associated state separate from struct virtio_fs (it has a different + * lifecycle from pci_dev). + */ +struct virtio_fs_memremap_info { + struct dev_pagemap pgmap; + struct percpu_ref ref; + struct completion completion; +}; + /* A virtio-fs device instance */ struct virtio_fs { struct list_head list; /* on virtio_fs_instances */ @@ -38,6 +53,12 @@ struct virtio_fs { struct virtio_fs_vq *vqs; unsigned nvqs; /* number of virtqueues */ unsigned num_queues; /* number of request queues */ + struct dax_device *dax_dev; + + /* DAX memory window where file contents are mapped */ + void *window_kaddr; + phys_addr_t window_phys_addr; + size_t window_len; }; struct virtio_fs_forget { @@ -421,6 +442,151 @@ static void virtio_fs_cleanup_vqs(struct virtio_device *vdev, vdev->config->del_vqs(vdev); } +/* Map a window offset to a page frame number. The window offset will have + * been produced by .iomap_begin(), which maps a file offset to a window + * offset. + */ +static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, + long nr_pages, void **kaddr, pfn_t *pfn) +{ + struct virtio_fs *fs = dax_get_private(dax_dev); + phys_addr_t offset = PFN_PHYS(pgoff); + size_t max_nr_pages = fs->window_len/PAGE_SIZE - pgoff; + + if (kaddr) + *kaddr = fs->window_kaddr + offset; + if (pfn) + *pfn = phys_to_pfn_t(fs->window_phys_addr + offset, + PFN_DEV | PFN_MAP); + return nr_pages > max_nr_pages ? max_nr_pages : nr_pages; +} + +static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i) +{ + return copy_from_iter(addr, bytes, i); +} + +static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, + pgoff_t pgoff, void *addr, + size_t bytes, struct iov_iter *i) +{ + return copy_to_iter(addr, bytes, i); +} + +static const struct dax_operations virtio_fs_dax_ops = { + .direct_access = virtio_fs_direct_access, + .copy_from_iter = virtio_fs_copy_from_iter, + .copy_to_iter = virtio_fs_copy_to_iter, +}; + +static void virtio_fs_percpu_release(struct percpu_ref *ref) +{ + struct virtio_fs_memremap_info *mi = + container_of(ref, struct virtio_fs_memremap_info, ref); + + complete(&mi->completion); +} + +static void virtio_fs_percpu_exit(void *data) +{ + struct virtio_fs_memremap_info *mi = data; + + wait_for_completion(&mi->completion); + percpu_ref_exit(&mi->ref); +} + +static void virtio_fs_percpu_kill(struct percpu_ref *ref) +{ + percpu_ref_kill(ref); +} + +static void virtio_fs_cleanup_dax(void *data) +{ + struct virtio_fs *fs = data; + + kill_dax(fs->dax_dev); + put_dax(fs->dax_dev); +} + +static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) +{ + struct virtio_shm_region cache_reg; + struct virtio_fs_memremap_info *mi; + struct dev_pagemap *pgmap; + bool have_cache; + int ret; + + if (!IS_ENABLED(CONFIG_DAX_DRIVER)) + return 0; + + /* Get cache region */ + have_cache = virtio_get_shm_region(vdev, + &cache_reg, + (u8)VIRTIO_FS_SHMCAP_ID_CACHE); + if (!have_cache) { + dev_err(&vdev->dev, "%s: No cache capability\n", __func__); + return -ENXIO; + } else { + dev_notice(&vdev->dev, "Cache len: 0x%llx @ 0x%llx\n", + cache_reg.len, cache_reg.addr); + } + + mi = devm_kzalloc(&vdev->dev, sizeof(*mi), GFP_KERNEL); + if (!mi) + return -ENOMEM; + + init_completion(&mi->completion); + ret = percpu_ref_init(&mi->ref, virtio_fs_percpu_release, 0, + GFP_KERNEL); + if (ret < 0) { + dev_err(&vdev->dev, "%s: percpu_ref_init failed (%d)\n", + __func__, ret); + return ret; + } + + ret = devm_add_action(&vdev->dev, virtio_fs_percpu_exit, mi); + if (ret < 0) { + percpu_ref_exit(&mi->ref); + return ret; + } + + pgmap = &mi->pgmap; + pgmap->altmap_valid = false; + pgmap->ref = &mi->ref; + pgmap->kill = virtio_fs_percpu_kill; + pgmap->type = MEMORY_DEVICE_FS_DAX; + + /* Ideally we would directly use the PCI BAR resource but + * devm_memremap_pages() wants its own copy in pgmap. So + * initialize a struct resource from scratch (only the start + * and end fields will be used). + */ + pgmap->res = (struct resource){ + .name = "virtio-fs dax window", + .start = (phys_addr_t) cache_reg.addr, + .end = (phys_addr_t) cache_reg.addr + cache_reg.len - 1, + }; + + fs->window_kaddr = devm_memremap_pages(&vdev->dev, pgmap); + if (IS_ERR(fs->window_kaddr)) + return PTR_ERR(fs->window_kaddr); + + fs->window_phys_addr = (phys_addr_t) cache_reg.addr; + fs->window_len = (phys_addr_t) cache_reg.len; + + dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx" + " len 0x%llx\n", __func__, fs->window_kaddr, cache_reg.addr, + cache_reg.len); + + fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops); + if (!fs->dax_dev) + return -ENOMEM; + + return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax, fs); +} + static int virtio_fs_probe(struct virtio_device *vdev) { struct virtio_fs *fs; @@ -442,6 +608,10 @@ static int virtio_fs_probe(struct virtio_device *vdev) /* TODO vq affinity */ /* TODO populate notifications vq */ + ret = virtio_fs_setup_dax(vdev, fs); + if (ret < 0) + goto out_vqs; + /* Bring the device online in case the filesystem is mounted and * requests need to be sent before we return. */ @@ -456,7 +626,6 @@ static int virtio_fs_probe(struct virtio_device *vdev) out_vqs: vdev->config->reset(vdev); virtio_fs_cleanup_vqs(vdev, fs); - out: vdev->priv = NULL; return ret; @@ -866,7 +1035,7 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, goto err_free_fuse_devs; __set_bit(FR_BACKGROUND, &init_req->flags); - d.dax_dev = NULL; + d.dax_dev = d.dax ? fs->dax_dev : NULL; d.fiq_ops = &virtio_fs_fiq_ops; d.fiq_priv = fs; d.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud; diff --git a/include/uapi/linux/virtio_fs.h b/include/uapi/linux/virtio_fs.h index 48f3590dcfbe..d4bb549568eb 100644 --- a/include/uapi/linux/virtio_fs.h +++ b/include/uapi/linux/virtio_fs.h @@ -38,4 +38,7 @@ struct virtio_fs_config { __u32 num_queues; } __attribute__((packed)); +/* For the id field in virtio_pci_shm_cap */ +#define VIRTIO_FS_SHMCAP_ID_CACHE 0 + #endif /* _UAPI_LINUX_VIRTIO_FS_H */ From patchwork Wed May 15 19:27:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945163 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBB4813AD for ; Wed, 15 May 2019 19:28:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AAEAD2842D for ; Wed, 15 May 2019 19:28:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9ED5E28541; Wed, 15 May 2019 19:28:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BE3E2842D for ; Wed, 15 May 2019 19:28:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727694AbfEOT1f (ORCPT ); Wed, 15 May 2019 15:27:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:49638 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727526AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AAA2285A04; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 859F05D723; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id DC60A225487; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Date: Wed, 15 May 2019 15:27:04 -0400 Message-Id: <20190515192715.18000-20-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Divide the dax memory range into fixed size ranges (2MB for now) and put them in a list. This will track free ranges. Once an inode requires a free range, we will take one from here and put it in interval-tree of ranges assigned to inode. Signed-off-by: Vivek Goyal Signed-off-by: Peng Tao --- fs/fuse/fuse_i.h | 23 ++++++++++++ fs/fuse/inode.c | 86 +++++++++++++++++++++++++++++++++++++++++++++ fs/fuse/virtio_fs.c | 2 ++ 3 files changed, 111 insertions(+) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 840c88af711c..5439e4628362 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -46,6 +46,10 @@ /** Number of page pointers embedded in fuse_req */ #define FUSE_REQ_INLINE_PAGES 1 +/* Default memory range size, 2MB */ +#define FUSE_DAX_MEM_RANGE_SZ (2*1024*1024) +#define FUSE_DAX_MEM_RANGE_PAGES (FUSE_DAX_MEM_RANGE_SZ/PAGE_SIZE) + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -94,6 +98,18 @@ struct fuse_forget_link { struct fuse_forget_link *next; }; +/** Translation information for file offsets to DAX window offsets */ +struct fuse_dax_mapping { + /* Will connect in fc->free_ranges to keep track of free memory */ + struct list_head list; + + /** Position in DAX window */ + u64 window_offset; + + /** Length of mapping, in bytes */ + loff_t length; +}; + /** FUSE inode */ struct fuse_inode { /** Inode data */ @@ -829,6 +845,13 @@ struct fuse_conn { /** DAX device, non-NULL if DAX is supported */ struct dax_device *dax_dev; + + /* + * DAX Window Free Ranges. TODO: This might not be best place to store + * this free list + */ + long nr_free_ranges; + struct list_head free_ranges; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 97d218a7daa8..8a3dd72f9843 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -22,6 +22,8 @@ #include #include #include +#include +#include MODULE_AUTHOR("Miklos Szeredi "); MODULE_DESCRIPTION("Filesystem in Userspace"); @@ -610,6 +612,76 @@ static void fuse_pqueue_init(struct fuse_pqueue *fpq) fpq->connected = 1; } +static void fuse_free_dax_mem_ranges(struct list_head *mem_list) +{ + struct fuse_dax_mapping *range, *temp; + + /* Free All allocated elements */ + list_for_each_entry_safe(range, temp, mem_list, list) { + list_del(&range->list); + kfree(range); + } +} + +#ifdef CONFIG_FS_DAX +static int fuse_dax_mem_range_init(struct fuse_conn *fc, + struct dax_device *dax_dev) +{ + long nr_pages, nr_ranges; + void *kaddr; + pfn_t pfn; + struct fuse_dax_mapping *range; + LIST_HEAD(mem_ranges); + phys_addr_t phys_addr; + int ret = 0, id; + size_t dax_size = -1; + unsigned long i; + + id = dax_read_lock(); + nr_pages = dax_direct_access(dax_dev, 0, PHYS_PFN(dax_size), &kaddr, + &pfn); + dax_read_unlock(id); + if (nr_pages < 0) { + pr_debug("dax_direct_access() returned %ld\n", nr_pages); + return nr_pages; + } + + phys_addr = pfn_t_to_phys(pfn); + nr_ranges = nr_pages/FUSE_DAX_MEM_RANGE_PAGES; + printk("fuse_dax_mem_range_init(): dax mapped %ld pages. nr_ranges=%ld\n", nr_pages, nr_ranges); + + for (i = 0; i < nr_ranges; i++) { + range = kzalloc(sizeof(struct fuse_dax_mapping), GFP_KERNEL); + if (!range) { + pr_debug("memory allocation for mem_range failed.\n"); + ret = -ENOMEM; + goto out_err; + } + /* TODO: This offset only works if virtio-fs driver is not + * having some memory hidden at the beginning. This needs + * better handling + */ + range->window_offset = i * FUSE_DAX_MEM_RANGE_SZ; + range->length = FUSE_DAX_MEM_RANGE_SZ; + list_add_tail(&range->list, &mem_ranges); + } + + list_replace_init(&mem_ranges, &fc->free_ranges); + fc->nr_free_ranges = nr_ranges; + return 0; +out_err: + /* Free All allocated elements */ + fuse_free_dax_mem_ranges(&mem_ranges); + return ret; +} +#else /* !CONFIG_FS_DAX */ +static inline int fuse_dax_mem_range_init(struct fuse_conn *fc, + struct dax_device *dax_dev) +{ + return 0; +} +#endif /* CONFIG_FS_DAX */ + void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, struct dax_device *dax_dev, const struct fuse_iqueue_ops *fiq_ops, void *fiq_priv) @@ -640,6 +712,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->dax_dev = dax_dev; fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; + INIT_LIST_HEAD(&fc->free_ranges); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -648,6 +721,8 @@ void fuse_conn_put(struct fuse_conn *fc) if (refcount_dec_and_test(&fc->count)) { if (fc->destroy_req) fuse_request_free(fc->destroy_req); + if (fc->dax_dev) + fuse_free_dax_mem_ranges(&fc->free_ranges); put_pid_ns(fc->pid_ns); put_user_ns(fc->user_ns); fc->release(fc); @@ -1154,6 +1229,14 @@ int fuse_fill_super_common(struct super_block *sb, mount_data->fiq_ops, mount_data->fiq_priv); fc->release = fuse_free_conn; + if (mount_data->dax_dev) { + err = fuse_dax_mem_range_init(fc, mount_data->dax_dev); + if (err) { + pr_debug("fuse_dax_mem_range_init() returned %d\n", err); + goto err_free_ranges; + } + } + fud = fuse_dev_alloc_install(fc); if (!fud) goto err_put_conn; @@ -1214,6 +1297,9 @@ int fuse_fill_super_common(struct super_block *sb, dput(root_dentry); err_dev_free: fuse_dev_free(fud); + err_free_ranges: + if (mount_data->dax_dev) + fuse_free_dax_mem_ranges(&fc->free_ranges); err_put_conn: fuse_conn_put(fc); sb->s_fs_info = NULL; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 2b790865dc21..76c46edcc8ac 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -453,6 +453,8 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, phys_addr_t offset = PFN_PHYS(pgoff); size_t max_nr_pages = fs->window_len/PAGE_SIZE - pgoff; + pr_debug("virtio_fs_direct_access(): called. nr_pages=%ld max_nr_pages=%zu\n", nr_pages, max_nr_pages); + if (kaddr) *kaddr = fs->window_kaddr + offset; if (pfn) From patchwork Wed May 15 19:27:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945231 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 478EC933 for ; Wed, 15 May 2019 19:30:34 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3996028516 for ; Wed, 15 May 2019 19:30:34 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E0B228609; Wed, 15 May 2019 19:30:34 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF94E28516 for ; Wed, 15 May 2019 19:30:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727726AbfEOTa2 (ORCPT ); Wed, 15 May 2019 15:30:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58872 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727550AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B327E3001A77; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8F3DF19C71; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id E5707225488; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Date: Wed, 15 May 2019 15:27:05 -0400 Message-Id: <20190515192715.18000-21-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Introduce two new fuse commands to setup/remove memory mappings. This will be used to setup/tear down file mapping in dax window. Signed-off-by: Vivek Goyal --- include/uapi/linux/fuse.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 2ac598614a8f..9eb313220549 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -399,6 +399,8 @@ enum fuse_opcode { FUSE_RENAME2 = 45, FUSE_LSEEK = 46, FUSE_COPY_FILE_RANGE = 47, + FUSE_SETUPMAPPING = 48, + FUSE_REMOVEMAPPING = 49, /* CUSE specific operations */ CUSE_INIT = 4096, @@ -822,4 +824,35 @@ struct fuse_copy_file_range_in { uint64_t flags; }; +#define FUSE_SETUPMAPPING_ENTRIES 8 +#define FUSE_SETUPMAPPING_FLAG_WRITE (1ull << 0) +struct fuse_setupmapping_in { + /* An already open handle */ + uint64_t fh; + /* Offset into the file to start the mapping */ + uint64_t foffset; + /* Length of mapping required */ + uint64_t len; + /* Flags, FUSE_SETUPMAPPING_FLAG_* */ + uint64_t flags; + /* Offset in Memory Window */ + uint64_t moffset; +}; + +struct fuse_setupmapping_out { + /* Offsets into the cache of mappings */ + uint64_t coffset[FUSE_SETUPMAPPING_ENTRIES]; + /* Lengths of each mapping */ + uint64_t len[FUSE_SETUPMAPPING_ENTRIES]; +}; + +struct fuse_removemapping_in { + /* An already open handle */ + uint64_t fh; + /* Offset into the dax window start the unmapping */ + uint64_t moffset; + /* Length of mapping required */ + uint64_t len; +}; + #endif /* _LINUX_FUSE_H */ From patchwork Wed May 15 19:27:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945209 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BC2DF1515 for ; Wed, 15 May 2019 19:30:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AD1DE285ED for ; Wed, 15 May 2019 19:30:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A0AC528569; Wed, 15 May 2019 19:30:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7E3E928541 for ; Wed, 15 May 2019 19:30:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728215AbfEOT3i (ORCPT ); Wed, 15 May 2019 15:29:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58888 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727631AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0EE74300C76F; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id B855219733; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id E8F06225489; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Date: Wed, 15 May 2019 15:27:06 -0400 Message-Id: <20190515192715.18000-22-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch implements basic DAX support. mmap() is not implemented yet and will come in later patches. This patch looks into implemeting read/write. We make use of interval tree to keep track of per inode dax mappings. Do not use dax for file extending writes, instead just send WRITE message to daemon (like we do for direct I/O path). This will keep write and i_size change atomic w.r.t crash. Signed-off-by: Stefan Hajnoczi Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Signed-off-by: Miklos Szeredi Signed-off-by: Liu Bo --- fs/fuse/file.c | 454 +++++++++++++++++++++++++++++++++++++- fs/fuse/fuse_i.h | 21 ++ fs/fuse/inode.c | 6 + include/uapi/linux/fuse.h | 1 + 4 files changed, 476 insertions(+), 6 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e9a7aa97c539..edbb11ca735e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -18,6 +18,12 @@ #include #include #include +#include +#include +#include + +INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, rb, __u64, __subtree_last, + START, LAST, static inline, fuse_dax_interval_tree); static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, int opcode, struct fuse_open_out *outargp) @@ -171,6 +177,173 @@ static void fuse_link_write_file(struct file *file) spin_unlock(&fi->lock); } +static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) +{ + struct fuse_dax_mapping *dmap = NULL; + + spin_lock(&fc->lock); + + /* TODO: Add logic to try to free up memory if wait is allowed */ + if (fc->nr_free_ranges <= 0) { + spin_unlock(&fc->lock); + return NULL; + } + + WARN_ON(list_empty(&fc->free_ranges)); + + /* Take a free range */ + dmap = list_first_entry(&fc->free_ranges, struct fuse_dax_mapping, + list); + list_del_init(&dmap->list); + fc->nr_free_ranges--; + spin_unlock(&fc->lock); + return dmap; +} + +/* This assumes fc->lock is held */ +static void __free_dax_mapping(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + list_add_tail(&dmap->list, &fc->free_ranges); + fc->nr_free_ranges++; +} + +static void free_dax_mapping(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + /* Return fuse_dax_mapping to free list */ + spin_lock(&fc->lock); + __free_dax_mapping(fc, dmap); + spin_unlock(&fc->lock); +} + +/* offset passed in should be aligned to FUSE_DAX_MEM_RANGE_SZ */ +static int fuse_setup_one_mapping(struct inode *inode, + struct file *file, loff_t offset, + struct fuse_dax_mapping *dmap) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_file *ff = NULL; + struct fuse_setupmapping_in inarg; + FUSE_ARGS(args); + ssize_t err; + + if (file) + ff = file->private_data; + + WARN_ON(offset % FUSE_DAX_MEM_RANGE_SZ); + WARN_ON(fc->nr_free_ranges < 0); + + /* Ask fuse daemon to setup mapping */ + memset(&inarg, 0, sizeof(inarg)); + inarg.foffset = offset; + if (ff) + inarg.fh = ff->fh; + else + inarg.fh = -1; + inarg.moffset = dmap->window_offset; + inarg.len = FUSE_DAX_MEM_RANGE_SZ; + if (file) { + inarg.flags |= (file->f_mode & FMODE_WRITE) ? + FUSE_SETUPMAPPING_FLAG_WRITE : 0; + inarg.flags |= (file->f_mode & FMODE_READ) ? + FUSE_SETUPMAPPING_FLAG_READ : 0; + } else { + inarg.flags |= FUSE_SETUPMAPPING_FLAG_READ; + inarg.flags |= FUSE_SETUPMAPPING_FLAG_WRITE; + } + args.in.h.opcode = FUSE_SETUPMAPPING; + args.in.h.nodeid = fi->nodeid; + args.in.numargs = 1; + args.in.args[0].size = sizeof(inarg); + args.in.args[0].value = &inarg; + err = fuse_simple_request(fc, &args); + if (err < 0) { + printk(KERN_ERR "%s request failed at mem_offset=0x%llx %zd\n", + __func__, dmap->window_offset, err); + return err; + } + + pr_debug("fuse_setup_one_mapping() succeeded. offset=0x%llx err=%zd\n", offset, err); + + /* TODO: What locking is required here. For now, using fc->lock */ + dmap->start = offset; + dmap->end = offset + FUSE_DAX_MEM_RANGE_SZ - 1; + /* Protected by fi->i_dmap_sem */ + fuse_dax_interval_tree_insert(dmap, &fi->dmap_tree); + fi->nr_dmaps++; + return 0; +} + +static int fuse_removemapping_one(struct inode *inode, + struct fuse_dax_mapping *dmap) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_removemapping_in inarg; + FUSE_ARGS(args); + + memset(&inarg, 0, sizeof(inarg)); + inarg.moffset = dmap->window_offset; + inarg.len = dmap->length; + args.in.h.opcode = FUSE_REMOVEMAPPING; + args.in.h.nodeid = fi->nodeid; + args.in.numargs = 1; + args.in.args[0].size = sizeof(inarg); + args.in.args[0].value = &inarg; + return fuse_simple_request(fc, &args); +} + +/* + * It is called from evict_inode() and by that time inode is going away. So + * this function does not take any locks like fi->i_dmap_sem for traversing + * that fuse inode interval tree. If that lock is taken then lock validator + * complains of deadlock situation w.r.t fs_reclaim lock. + */ +void fuse_removemapping(struct inode *inode) +{ + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_inode *fi = get_fuse_inode(inode); + ssize_t err; + struct fuse_dax_mapping *dmap; + + /* Clear the mappings list */ + while (true) { + WARN_ON(fi->nr_dmaps < 0); + + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, 0, + -1); + if (dmap) { + fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); + fi->nr_dmaps--; + } + + if (!dmap) + break; + + /* + * During umount/shutdown, fuse connection is dropped first + * and later evict_inode() is called later. That means any + * removemapping messages are going to fail. Send messages + * only if connection is up. Otherwise fuse daemon is + * responsible for cleaning up any leftover references and + * mappings. + */ + if (fc->connected) { + err = fuse_removemapping_one(inode, dmap); + if (err) { + pr_warn("Failed to removemapping. offset=0x%llx" + " len=0x%llx\n", dmap->window_offset, + dmap->length); + } + } + + /* Add it back to free ranges list */ + free_dax_mapping(fc, dmap); + } +} + void fuse_finish_open(struct inode *inode, struct file *file) { struct fuse_file *ff = file->private_data; @@ -1476,32 +1649,290 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) return res; } +static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to); static ssize_t fuse_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; if (is_bad_inode(file_inode(file))) return -EIO; - if (!(ff->open_flags & FOPEN_DIRECT_IO)) - return fuse_cache_read_iter(iocb, to); - else + if (IS_DAX(inode)) + return fuse_dax_read_iter(iocb, to); + + if (ff->open_flags & FOPEN_DIRECT_IO) return fuse_direct_read_iter(iocb, to); + + return fuse_cache_read_iter(iocb, to); } +static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from); static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct fuse_file *ff = file->private_data; + struct inode *inode = file->f_mapping->host; if (is_bad_inode(file_inode(file))) return -EIO; - if (!(ff->open_flags & FOPEN_DIRECT_IO)) - return fuse_cache_write_iter(iocb, from); - else + if (IS_DAX(inode)) + return fuse_dax_write_iter(iocb, from); + + if (ff->open_flags & FOPEN_DIRECT_IO) return fuse_direct_write_iter(iocb, from); + + return fuse_cache_write_iter(iocb, from); +} + +static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length) +{ + iomap->addr = IOMAP_NULL_ADDR; + iomap->length = length; + iomap->type = IOMAP_HOLE; +} + +static void fuse_fill_iomap(struct inode *inode, loff_t pos, loff_t length, + struct iomap *iomap, struct fuse_dax_mapping *dmap, + unsigned flags) +{ + loff_t offset, len; + loff_t i_size = i_size_read(inode); + + offset = pos - dmap->start; + len = min(length, dmap->length - offset); + + /* If length is beyond end of file, truncate further */ + if (pos + len > i_size) + len = i_size - pos; + + if (len > 0) { + iomap->addr = dmap->window_offset + offset; + iomap->length = len; + if (flags & IOMAP_FAULT) + iomap->length = ALIGN(len, PAGE_SIZE); + iomap->type = IOMAP_MAPPED; + pr_debug("%s: returns iomap: addr 0x%llx offset 0x%llx" + " length 0x%llx\n", __func__, iomap->addr, + iomap->offset, iomap->length); + } else { + /* Mapping beyond end of file is hole */ + fuse_fill_iomap_hole(iomap, length); + pr_debug("%s: returns iomap: addr 0x%llx offset 0x%llx" + "length 0x%llx\n", __func__, iomap->addr, + iomap->offset, iomap->length); + } +} + +/* This is just for DAX and the mapping is ephemeral, do not use it for other + * purposes since there is no block device with a permanent mapping. + */ +static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length, + unsigned flags, struct iomap *iomap) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); + struct fuse_dax_mapping *dmap, *alloc_dmap = NULL; + int ret; + + /* We don't support FIEMAP */ + BUG_ON(flags & IOMAP_REPORT); + + pr_debug("fuse_iomap_begin() called. pos=0x%llx length=0x%llx\n", + pos, length); + + /* + * Writes beyond end of file are not handled using dax path. Instead + * a fuse write message is sent to daemon + */ + if (flags & IOMAP_WRITE && pos >= i_size_read(inode)) + return -EIO; + + iomap->offset = pos; + iomap->flags = 0; + iomap->bdev = NULL; + iomap->dax_dev = fc->dax_dev; + + /* + * Both read/write and mmap path can race here. So we need something + * to make sure if we are setting up mapping, then other path waits + * + * For now, use a semaphore for this. It probably needs to be + * optimized later. + */ + down_read(&fi->i_dmap_sem); + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, pos, pos); + + if (dmap) { + fuse_fill_iomap(inode, pos, length, iomap, dmap, flags); + up_read(&fi->i_dmap_sem); + return 0; + } else { + up_read(&fi->i_dmap_sem); + pr_debug("%s: no mapping at offset 0x%llx length 0x%llx\n", + __func__, pos, length); + if (pos >= i_size_read(inode)) + goto iomap_hole; + + alloc_dmap = alloc_dax_mapping(fc); + if (!alloc_dmap) + return -EBUSY; + + /* + * Drop read lock and take write lock so that only one + * caller can try to setup mapping and other waits + */ + down_write(&fi->i_dmap_sem); + /* + * We dropped lock. Check again if somebody else setup + * mapping already. + */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, pos, + pos); + if (dmap) { + fuse_fill_iomap(inode, pos, length, iomap, dmap, flags); + free_dax_mapping(fc, alloc_dmap); + up_write(&fi->i_dmap_sem); + return 0; + } + + /* Setup one mapping */ + ret = fuse_setup_one_mapping(inode, NULL, + ALIGN_DOWN(pos, FUSE_DAX_MEM_RANGE_SZ), + alloc_dmap); + if (ret < 0) { + printk("fuse_setup_one_mapping() failed. err=%d" + " pos=0x%llx\n", ret, pos); + free_dax_mapping(fc, alloc_dmap); + up_write(&fi->i_dmap_sem); + return ret; + } + fuse_fill_iomap(inode, pos, length, iomap, alloc_dmap, flags); + up_write(&fi->i_dmap_sem); + return 0; + } + + /* + * If read beyond end of file happnes, fs code seems to return + * it as hole + */ +iomap_hole: + fuse_fill_iomap_hole(iomap, length); + pr_debug("fuse_iomap_begin() returning hole mapping. pos=0x%llx length_asked=0x%llx length_returned=0x%llx\n", pos, length, iomap->length); + return 0; +} + +static int fuse_iomap_end(struct inode *inode, loff_t pos, loff_t length, + ssize_t written, unsigned flags, + struct iomap *iomap) +{ + /* DAX writes beyond end-of-file aren't handled using iomap, so the + * file size is unchanged and there is nothing to do here. + */ + return 0; +} + +static const struct iomap_ops fuse_iomap_ops = { + .iomap_begin = fuse_iomap_begin, + .iomap_end = fuse_iomap_end, +}; + +static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + } else { + inode_lock_shared(inode); + } + + ret = dax_iomap_rw(iocb, to, &fuse_iomap_ops); + inode_unlock_shared(inode); + + /* TODO file_accessed(iocb->f_filp) */ + + return ret; +} + +static bool file_extending_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + + return (iov_iter_rw(from) == WRITE && + ((iocb->ki_pos) >= i_size_read(inode))); +} + +static ssize_t fuse_dax_direct_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb); + ssize_t ret; + + ret = fuse_direct_io(&io, from, &iocb->ki_pos, FUSE_DIO_WRITE); + if (ret < 0) + return ret; + + fuse_invalidate_attr(inode); + fuse_write_update_size(inode, iocb->ki_pos); + return ret; +} + +static ssize_t fuse_dax_write_iter(struct kiocb *iocb, struct iov_iter *from) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret, count; + + if (iocb->ki_flags & IOCB_NOWAIT) { + if (!inode_trylock(inode)) + return -EAGAIN; + } else { + inode_lock(inode); + } + + ret = generic_write_checks(iocb, from); + if (ret <= 0) + goto out; + + ret = file_remove_privs(iocb->ki_filp); + if (ret) + goto out; + /* TODO file_update_time() but we don't want metadata I/O */ + + /* Do not use dax for file extending writes as its an mmap and + * trying to write beyong end of existing page will generate + * SIGBUS. + */ + if (file_extending_write(iocb, from)) { + ret = fuse_dax_direct_write(iocb, from); + goto out; + } + + ret = dax_iomap_rw(iocb, from, &fuse_iomap_ops); + if (ret < 0) + goto out; + + /* + * If part of the write was file extending, fuse dax path will not + * take care of that. Do direct write instead. + */ + if (iov_iter_count(from) && file_extending_write(iocb, from)) { + count = fuse_dax_direct_write(iocb, from); + if (count < 0) + goto out; + ret += count; + } + +out: + inode_unlock(inode); + + if (ret > 0) + ret = generic_write_sync(iocb, ret); + return ret; } static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req) @@ -2180,6 +2611,11 @@ static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos, } +static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma) +{ + return -EINVAL; /* TODO */ +} + static int convert_fuse_file_lock(struct fuse_conn *fc, const struct fuse_file_lock *ffl, struct file_lock *fl) @@ -3212,6 +3648,7 @@ static const struct address_space_operations fuse_file_aops = { void fuse_init_file_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_conn *fc = get_fuse_conn(inode); inode->i_fop = &fuse_file_operations; inode->i_data.a_ops = &fuse_file_aops; @@ -3221,4 +3658,9 @@ void fuse_init_file_inode(struct inode *inode) fi->writectr = 0; init_waitqueue_head(&fi->page_waitq); INIT_LIST_HEAD(&fi->writepages); + fi->dmap_tree = RB_ROOT_CACHED; + + if (fc->dax_dev) { + inode->i_flags |= S_DAX; + } } diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 5439e4628362..f1ae549eff98 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -98,11 +98,22 @@ struct fuse_forget_link { struct fuse_forget_link *next; }; +#define START(node) ((node)->start) +#define LAST(node) ((node)->end) + /** Translation information for file offsets to DAX window offsets */ struct fuse_dax_mapping { /* Will connect in fc->free_ranges to keep track of free memory */ struct list_head list; + /* For interval tree in file/inode */ + struct rb_node rb; + /** Start Position in file */ + __u64 start; + /** End Position in file */ + __u64 end; + __u64 __subtree_last; + /** Position in DAX window */ u64 window_offset; @@ -195,6 +206,15 @@ struct fuse_inode { /** Lock to protect write related fields */ spinlock_t lock; + + /* + * Semaphore to protect modifications to dmap_tree + */ + struct rw_semaphore i_dmap_sem; + + /** Sorted rb tree of struct fuse_dax_mapping elements */ + struct rb_root_cached dmap_tree; + unsigned long nr_dmaps; }; /** FUSE inode state bits */ @@ -1226,5 +1246,6 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); * Get the next unique ID for a request */ u64 fuse_get_unique(struct fuse_iqueue *fiq); +void fuse_removemapping(struct inode *inode); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 8a3dd72f9843..ad66a353554b 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -83,7 +83,9 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->attr_version = 0; fi->orig_ino = 0; fi->state = 0; + fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_dmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); if (!fi->forget) { @@ -119,6 +121,10 @@ static void fuse_evict_inode(struct inode *inode) if (inode->i_sb->s_flags & SB_ACTIVE) { struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); + if (IS_DAX(inode)) { + fuse_removemapping(inode); + WARN_ON(fi->nr_dmaps); + } fuse_queue_forget(fc, fi->forget, fi->nodeid, fi->nlookup); fi->forget = NULL; } diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h index 9eb313220549..5042e227e8a8 100644 --- a/include/uapi/linux/fuse.h +++ b/include/uapi/linux/fuse.h @@ -826,6 +826,7 @@ struct fuse_copy_file_range_in { #define FUSE_SETUPMAPPING_ENTRIES 8 #define FUSE_SETUPMAPPING_FLAG_WRITE (1ull << 0) +#define FUSE_SETUPMAPPING_FLAG_READ (1ull << 1) struct fuse_setupmapping_in { /* An already open handle */ uint64_t fh; From patchwork Wed May 15 19:27:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945203 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0749792A for ; Wed, 15 May 2019 19:30:04 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAA7D28541 for ; Wed, 15 May 2019 19:30:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id DF04328560; Wed, 15 May 2019 19:30:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8AACC28541 for ; Wed, 15 May 2019 19:30:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728246AbfEOT3j (ORCPT ); Wed, 15 May 2019 15:29:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58882 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727586AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E0A353001A96; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id BCF8B1001E84; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id EFC0522548A; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 22/30] fuse, dax: add DAX mmap support Date: Wed, 15 May 2019 15:27:07 -0400 Message-Id: <20190515192715.18000-23-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Stefan Hajnoczi Add DAX mmap() support. Signed-off-by: Stefan Hajnoczi --- fs/fuse/file.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index edbb11ca735e..a053bcb9498d 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2576,10 +2576,15 @@ static const struct vm_operations_struct fuse_file_vm_ops = { .page_mkwrite = fuse_page_mkwrite, }; +static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma); static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma) { struct fuse_file *ff = file->private_data; + /* DAX mmap is superior to direct_io mmap */ + if (IS_DAX(file_inode(file))) + return fuse_dax_mmap(file, vma); + if (ff->open_flags & FOPEN_DIRECT_IO) { /* Can't provide the coherency needed for MAP_SHARED */ if (vma->vm_flags & VM_MAYSHARE) @@ -2611,9 +2616,65 @@ static ssize_t fuse_file_splice_read(struct file *in, loff_t *ppos, } +static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, + bool write) +{ + vm_fault_t ret; + struct inode *inode = file_inode(vmf->vma->vm_file); + struct super_block *sb = inode->i_sb; + pfn_t pfn; + + if (write) + sb_start_pagefault(sb); + + /* TODO inode semaphore to protect faults vs truncate */ + + ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); + + if (ret & VM_FAULT_NEEDDSYNC) + ret = dax_finish_sync_fault(vmf, pe_size, pfn); + + if (write) + sb_end_pagefault(sb); + + return ret; +} + +static vm_fault_t fuse_dax_fault(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, + vmf->flags & FAULT_FLAG_WRITE); +} + +static vm_fault_t fuse_dax_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size) +{ + return __fuse_dax_fault(vmf, pe_size, vmf->flags & FAULT_FLAG_WRITE); +} + +static vm_fault_t fuse_dax_page_mkwrite(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, true); +} + +static vm_fault_t fuse_dax_pfn_mkwrite(struct vm_fault *vmf) +{ + return __fuse_dax_fault(vmf, PE_SIZE_PTE, true); +} + +static const struct vm_operations_struct fuse_dax_vm_ops = { + .fault = fuse_dax_fault, + .huge_fault = fuse_dax_huge_fault, + .page_mkwrite = fuse_dax_page_mkwrite, + .pfn_mkwrite = fuse_dax_pfn_mkwrite, +}; + static int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma) { - return -EINVAL; /* TODO */ + file_accessed(file); + vma->vm_ops = &fuse_dax_vm_ops; + vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE; + return 0; } static int convert_fuse_file_lock(struct fuse_conn *fc, @@ -3622,6 +3683,7 @@ static const struct file_operations fuse_file_operations = { .release = fuse_release, .fsync = fuse_fsync, .lock = fuse_file_lock, + .get_unmapped_area = thp_get_unmapped_area, .flock = fuse_file_flock, .splice_read = fuse_file_splice_read, .splice_write = iter_file_splice_write, From patchwork Wed May 15 19:27:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945225 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8B636933 for ; Wed, 15 May 2019 19:30:24 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A6F92862D for ; Wed, 15 May 2019 19:30:24 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6B48B28609; Wed, 15 May 2019 19:30:24 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1ACFF28609 for ; Wed, 15 May 2019 19:30:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727671AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53894 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727578AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D84653004412; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id B45E51001DEF; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 00FA422548B; Wed, 15 May 2019 15:27:29 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 23/30] fuse: Define dax address space operations Date: Wed, 15 May 2019 15:27:08 -0400 Message-Id: <20190515192715.18000-24-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This is done along the lines of ext4 and xfs. I primarily wanted ->writepages hook at this time so that I could call into dax_writeback_mapping_range(). This in turn will decide which pfns need to be written back. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index a053bcb9498d..2777355bc245 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2402,6 +2402,17 @@ static int fuse_writepages_fill(struct page *page, return err; } +static int fuse_dax_writepages(struct address_space *mapping, + struct writeback_control *wbc) +{ + + struct inode *inode = mapping->host; + struct fuse_conn *fc = get_fuse_conn(inode); + + return dax_writeback_mapping_range(mapping, + NULL, fc->dax_dev, wbc); +} + static int fuse_writepages(struct address_space *mapping, struct writeback_control *wbc) { @@ -3707,6 +3718,13 @@ static const struct address_space_operations fuse_file_aops = { .write_end = fuse_write_end, }; +static const struct address_space_operations fuse_dax_file_aops = { + .writepages = fuse_dax_writepages, + .direct_IO = noop_direct_IO, + .set_page_dirty = noop_set_page_dirty, + .invalidatepage = noop_invalidatepage, +}; + void fuse_init_file_inode(struct inode *inode) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -3724,5 +3742,6 @@ void fuse_init_file_inode(struct inode *inode) if (fc->dax_dev) { inode->i_flags |= S_DAX; + inode->i_data.a_ops = &fuse_dax_file_aops; } } From patchwork Wed May 15 19:27:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945211 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C72AA933 for ; Wed, 15 May 2019 19:30:12 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B759028555 for ; Wed, 15 May 2019 19:30:12 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id ABCB8285ED; Wed, 15 May 2019 19:30:12 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2FFE128555 for ; Wed, 15 May 2019 19:30:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728202AbfEOT3i (ORCPT ); Wed, 15 May 2019 15:29:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35010 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727591AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E1BDAC09AD12; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id BCF9760C2A; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 0694C22548C; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Date: Wed, 15 May 2019 15:27:09 -0400 Message-Id: <20190515192715.18000-25-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 15 May 2019 19:27:33 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We need some kind of locking mechanism here. Normal file systems like ext4 and xfs seems to take their own semaphore to protect agains truncate while fault is going on. We have additional requirement to protect against fuse dax memory range reclaim. When a range has been selected for reclaim, we need to make sure no other read/write/fault can try to access that memory range while reclaim is in progress. Once reclaim is complete, lock will be released and read/write/fault will trigger allocation of fresh dax range. Taking inode_lock() is not an option in fault path as lockdep complains about circular dependencies. So define a new fuse_inode->i_mmap_sem. Signed-off-by: Vivek Goyal --- fs/fuse/dir.c | 2 ++ fs/fuse/file.c | 17 +++++++++++++---- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fd8636e67ae9..84c0b638affb 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1559,8 +1559,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + down_write(&fi->i_mmap_sem); truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(inode->i_mapping); + up_write(&fi->i_mmap_sem); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2777355bc245..e536a04aaa06 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2638,13 +2638,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, if (write) sb_start_pagefault(sb); - /* TODO inode semaphore to protect faults vs truncate */ - + /* + * We need to serialize against not only truncate but also against + * fuse dax memory range reclaim. While a range is being reclaimed, + * we do not want any read/write/mmap to make progress and try + * to populate page cache or access memory we are trying to free. + */ + down_read(&get_fuse_inode(inode)->i_mmap_sem); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + up_read(&get_fuse_inode(inode)->i_mmap_sem); + if (write) sb_end_pagefault(sb); @@ -3593,9 +3600,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, file_update_time(file); } - if (mode & FALLOC_FL_PUNCH_HOLE) + if (mode & FALLOC_FL_PUNCH_HOLE) { + down_write(&fi->i_mmap_sem); truncate_pagecache_range(inode, offset, offset + length - 1); - + up_write(&fi->i_mmap_sem); + } fuse_invalidate_attr(inode); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f1ae549eff98..a234cf30538d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -212,6 +212,13 @@ struct fuse_inode { */ struct rw_semaphore i_dmap_sem; + /** + * Can't take inode lock in fault path (leads to circular dependency). + * So take this in fuse dax fault path to make sure truncate and + * punch hole etc. can't make progress in parallel. + */ + struct rw_semaphore i_mmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ad66a353554b..713c5f32ab35 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->state = 0; fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_dmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); From patchwork Wed May 15 19:27:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 591F1933 for ; Wed, 15 May 2019 19:28:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4887F2842D for ; Wed, 15 May 2019 19:28:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3CF6628541; Wed, 15 May 2019 19:28:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFB9E2842D for ; Wed, 15 May 2019 19:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728039AbfEOT2s (ORCPT ); Wed, 15 May 2019 15:28:48 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36196 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727628AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 168B130C1AC2; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id E5DF65D9CC; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 0C9CF22548D; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 25/30] fuse: Maintain a list of busy elements Date: Wed, 15 May 2019 15:27:10 -0400 Message-Id: <20190515192715.18000-26-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This list will be used selecting fuse_dax_mapping to free when number of free mappings drops below a threshold. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 8 ++++++++ fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index e536a04aaa06..3f0f7a387341 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -273,6 +273,10 @@ static int fuse_setup_one_mapping(struct inode *inode, /* Protected by fi->i_dmap_sem */ fuse_dax_interval_tree_insert(dmap, &fi->dmap_tree); fi->nr_dmaps++; + spin_lock(&fc->lock); + list_add_tail(&dmap->busy_list, &fc->busy_ranges); + fc->nr_busy_ranges++; + spin_unlock(&fc->lock); return 0; } @@ -317,6 +321,10 @@ void fuse_removemapping(struct inode *inode) if (dmap) { fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); fi->nr_dmaps--; + spin_lock(&fc->lock); + list_del_init(&dmap->busy_list); + fc->nr_busy_ranges--; + spin_unlock(&fc->lock); } if (!dmap) diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index a234cf30538d..c93e9155b723 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -114,6 +114,9 @@ struct fuse_dax_mapping { __u64 end; __u64 __subtree_last; + /* Will connect in fc->busy_ranges to keep track busy memory */ + struct list_head busy_list; + /** Position in DAX window */ u64 window_offset; @@ -873,6 +876,10 @@ struct fuse_conn { /** DAX device, non-NULL if DAX is supported */ struct dax_device *dax_dev; + /* List of memory ranges which are busy */ + unsigned long nr_busy_ranges; + struct list_head busy_ranges; + /* * DAX Window Free Ranges. TODO: This might not be best place to store * this free list diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 713c5f32ab35..f57f7ce02acc 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -626,6 +626,8 @@ static void fuse_free_dax_mem_ranges(struct list_head *mem_list) /* Free All allocated elements */ list_for_each_entry_safe(range, temp, mem_list, list) { list_del(&range->list); + if (!list_empty(&range->busy_list)) + list_del(&range->busy_list); kfree(range); } } @@ -670,6 +672,7 @@ static int fuse_dax_mem_range_init(struct fuse_conn *fc, */ range->window_offset = i * FUSE_DAX_MEM_RANGE_SZ; range->length = FUSE_DAX_MEM_RANGE_SZ; + INIT_LIST_HEAD(&range->busy_list); list_add_tail(&range->list, &mem_ranges); } @@ -720,6 +723,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->user_ns = get_user_ns(user_ns); fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; INIT_LIST_HEAD(&fc->free_ranges); + INIT_LIST_HEAD(&fc->busy_ranges); } EXPORT_SYMBOL_GPL(fuse_conn_init); From patchwork Wed May 15 19:27:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945171 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C69D1933 for ; Wed, 15 May 2019 19:28:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B4A9B2842D for ; Wed, 15 May 2019 19:28:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A864728541; Wed, 15 May 2019 19:28:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6BBE12842D for ; Wed, 15 May 2019 19:28:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727931AbfEOT2Y (ORCPT ); Wed, 15 May 2019 15:28:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41198 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727658AbfEOT1f (ORCPT ); Wed, 15 May 2019 15:27:35 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 49C078110C; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EEBC71972D; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 173E022548E; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 26/30] fuse: Add logic to free up a memory range Date: Wed, 15 May 2019 15:27:11 -0400 Message-Id: <20190515192715.18000-27-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add logic to free up a busy memory range. Freed memory range will be returned to free pool. Add a worker which can be started to select and free some busy memory ranges. In certain cases (write path), process can steal one of its busy dax ranges if free range is not available. If free range is not available and nothing can't be stolen from same inode, caller waits on a waitq for free range to become available. For reclaiming a range, as of now we need to hold following locks in specified order. inode_trylock(inode); down_write(&fi->i_mmap_sem); down_write(&fi->i_dmap_sem); This means, one can not wait for a range to become free when in fault path because it can lead to deadlock in following two situations. - Worker thread to free memory might block on fuse_inode->i_mmap_sem as well. - This inode is holding all the memory and more memory can't be freed. In both the cases, deadlock will ensue. So return -ENOSPC from iomap_begin() in fault path if memory can't be allocated. Drop fuse_inode->i_mmap_sem, and wait for a free range to become available and retry. read path can't do direct reclaim as well because it holds shared inode lock while reclaim assumes that inode lock is held exclusively. Due to shared lock, it might happen that one reader is still reading from range and another reader reclaims that range leading to problems. So read path also returns -ENOSPC and higher layers retry (like fault path). a different story. We hold inode lock and lock ordering allows to grab fuse_inode->immap_sem, if needed. That means we can do direct reclaim in that path. But if there is no memory allocated to this inode, then direct reclaim will not work and we need to wait for a memory range to become free. So try following order. A. Try to get a free range. B. If not, try direct reclaim. C. If not, wait for a memory range to become free Here sleeping with locks held should be fine because in step B, we made sure this inode is not holding any ranges. That means other inodes are holding ranges and somebody should be able to free memory. Also, worker thread does a trylock() on inode lock. That means worker tread will not wait on this inode and move onto next memory range. Hence above sequence should be deadlock free. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 357 +++++++++++++++++++++++++++++++++++++++++++++-- fs/fuse/fuse_i.h | 22 +++ fs/fuse/inode.c | 4 + 3 files changed, 374 insertions(+), 9 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 3f0f7a387341..87fc2b5e0a3a 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -25,6 +25,8 @@ INTERVAL_TREE_DEFINE(struct fuse_dax_mapping, rb, __u64, __subtree_last, START, LAST, static inline, fuse_dax_interval_tree); +static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc, + struct inode *inode); static int fuse_send_open(struct fuse_conn *fc, u64 nodeid, struct file *file, int opcode, struct fuse_open_out *outargp) { @@ -179,6 +181,7 @@ static void fuse_link_write_file(struct file *file) static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) { + unsigned long free_threshold; struct fuse_dax_mapping *dmap = NULL; spin_lock(&fc->lock); @@ -186,7 +189,7 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) /* TODO: Add logic to try to free up memory if wait is allowed */ if (fc->nr_free_ranges <= 0) { spin_unlock(&fc->lock); - return NULL; + goto out_kick; } WARN_ON(list_empty(&fc->free_ranges)); @@ -197,15 +200,43 @@ static struct fuse_dax_mapping *alloc_dax_mapping(struct fuse_conn *fc) list_del_init(&dmap->list); fc->nr_free_ranges--; spin_unlock(&fc->lock); + +out_kick: + /* If number of free ranges are below threshold, start reclaim */ + free_threshold = max((fc->nr_ranges * FUSE_DAX_RECLAIM_THRESHOLD)/100, + (unsigned long)1); + if (fc->nr_free_ranges < free_threshold) { + pr_debug("fuse: Kicking dax memory reclaim worker. nr_free_ranges=0x%ld nr_total_ranges=%ld\n", fc->nr_free_ranges, fc->nr_ranges); + queue_delayed_work(system_long_wq, &fc->dax_free_work, 0); + } return dmap; } +/* This assumes fc->lock is held */ +static void __dmap_remove_busy_list(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + list_del_init(&dmap->busy_list); + WARN_ON(fc->nr_busy_ranges == 0); + fc->nr_busy_ranges--; +} + +static void dmap_remove_busy_list(struct fuse_conn *fc, + struct fuse_dax_mapping *dmap) +{ + spin_lock(&fc->lock); + __dmap_remove_busy_list(fc, dmap); + spin_unlock(&fc->lock); +} + /* This assumes fc->lock is held */ static void __free_dax_mapping(struct fuse_conn *fc, struct fuse_dax_mapping *dmap) { list_add_tail(&dmap->list, &fc->free_ranges); fc->nr_free_ranges++; + /* TODO: Wake up only when needed */ + wake_up(&fc->dax_range_waitq); } static void free_dax_mapping(struct fuse_conn *fc, @@ -267,7 +298,15 @@ static int fuse_setup_one_mapping(struct inode *inode, pr_debug("fuse_setup_one_mapping() succeeded. offset=0x%llx err=%zd\n", offset, err); - /* TODO: What locking is required here. For now, using fc->lock */ + /* + * We don't take a refernce on inode. inode is valid right now and + * when inode is going away, cleanup logic should first cleanup + * dmap entries. + * + * TODO: Do we need to ensure that we are holding inode lock + * as well. + */ + dmap->inode = inode; dmap->start = offset; dmap->end = offset + FUSE_DAX_MEM_RANGE_SZ - 1; /* Protected by fi->i_dmap_sem */ @@ -321,10 +360,7 @@ void fuse_removemapping(struct inode *inode) if (dmap) { fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); fi->nr_dmaps--; - spin_lock(&fc->lock); - list_del_init(&dmap->busy_list); - fc->nr_busy_ranges--; - spin_unlock(&fc->lock); + dmap_remove_busy_list(fc, dmap); } if (!dmap) @@ -347,6 +383,8 @@ void fuse_removemapping(struct inode *inode) } } + dmap->inode = NULL; + /* Add it back to free ranges list */ free_dax_mapping(fc, dmap); } @@ -1784,8 +1822,23 @@ static int fuse_iomap_begin(struct inode *inode, loff_t pos, loff_t length, if (pos >= i_size_read(inode)) goto iomap_hole; - alloc_dmap = alloc_dax_mapping(fc); - if (!alloc_dmap) + /* Can't do reclaim in fault path yet due to lock ordering. + * Read path takes shared inode lock and that's not sufficient + * for inline range reclaim. Caller needs to drop lock, wait + * and retry. + */ + if (flags & IOMAP_FAULT || !(flags & IOMAP_WRITE)) { + alloc_dmap = alloc_dax_mapping(fc); + if (!alloc_dmap) + return -ENOSPC; + } else { + alloc_dmap = alloc_dax_mapping_reclaim(fc, inode); + if (IS_ERR(alloc_dmap)) + return PTR_ERR(alloc_dmap); + } + + /* If we are here, we should have memory allocated */ + if (WARN_ON(!alloc_dmap)) return -EBUSY; /* @@ -1850,7 +1903,18 @@ static const struct iomap_ops fuse_iomap_ops = { static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) { struct inode *inode = file_inode(iocb->ki_filp); + struct fuse_conn *fc = get_fuse_conn(inode); ssize_t ret; + bool retry = false; + +retry: + if (retry && !(fc->nr_free_ranges > 0)) { + ret = -EINTR; + if (wait_event_killable_exclusive(fc->dax_range_waitq, + (fc->nr_free_ranges > 0))) { + goto out; + } + } if (iocb->ki_flags & IOCB_NOWAIT) { if (!inode_trylock_shared(inode)) @@ -1862,8 +1926,19 @@ static ssize_t fuse_dax_read_iter(struct kiocb *iocb, struct iov_iter *to) ret = dax_iomap_rw(iocb, to, &fuse_iomap_ops); inode_unlock_shared(inode); + /* If a dax range could not be allocated and it can't be reclaimed + * inline, then drop inode lock and retry. Range reclaim logic + * requires exclusive access to inode lock. + * + * TODO: What if -ENOSPC needs to be returned to user space. Fix it. + */ + if (ret == -ENOSPC) { + retry = true; + goto retry; + } /* TODO file_accessed(iocb->f_filp) */ +out: return ret; } @@ -2642,10 +2717,21 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, struct inode *inode = file_inode(vmf->vma->vm_file); struct super_block *sb = inode->i_sb; pfn_t pfn; + int error = 0; + struct fuse_conn *fc = get_fuse_conn(inode); + bool retry = false; if (write) sb_start_pagefault(sb); +retry: + if (retry && !(fc->nr_free_ranges > 0)) { + ret = -EINTR; + if (wait_event_killable_exclusive(fc->dax_range_waitq, + (fc->nr_free_ranges > 0))) + goto out; + } + /* * We need to serialize against not only truncate but also against * fuse dax memory range reclaim. While a range is being reclaimed, @@ -2653,13 +2739,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, * to populate page cache or access memory we are trying to free. */ down_read(&get_fuse_inode(inode)->i_mmap_sem); - ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); + ret = dax_iomap_fault(vmf, pe_size, &pfn, &error, &fuse_iomap_ops); + if ((ret & VM_FAULT_ERROR) && error == -ENOSPC) { + error = 0; + retry = true; + up_read(&get_fuse_inode(inode)->i_mmap_sem); + goto retry; + } if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); up_read(&get_fuse_inode(inode)->i_mmap_sem); +out: if (write) sb_end_pagefault(sb); @@ -3762,3 +3855,249 @@ void fuse_init_file_inode(struct inode *inode) inode->i_data.a_ops = &fuse_dax_file_aops; } } + +int fuse_dax_reclaim_dmap_locked(struct fuse_conn *fc, struct inode *inode, + struct fuse_dax_mapping *dmap) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + + ret = filemap_fdatawrite_range(inode->i_mapping, dmap->start, + dmap->end); + if (ret) { + printk("filemap_fdatawrite_range() failed. err=%d start=0x%llx," + " end=0x%llx\n", ret, dmap->start, dmap->end); + return ret; + } + + ret = invalidate_inode_pages2_range(inode->i_mapping, + dmap->start >> PAGE_SHIFT, + dmap->end >> PAGE_SHIFT); + /* TODO: What to do if above fails? For now, + * leave the range in place. + */ + if (ret) { + printk("invalidate_inode_pages2_range() failed err=%d\n", ret); + return ret; + } + + /* Remove dax mapping from inode interval tree now */ + fuse_dax_interval_tree_remove(dmap, &fi->dmap_tree); + fi->nr_dmaps--; + + ret = fuse_removemapping_one(inode, dmap); + if (ret) { + pr_warn("Failed to remove mapping. offset=0x%llx len=0x%llx\n", + dmap->window_offset, dmap->length); + } + + return 0; +} + +/* First first mapping in the tree and free it. */ +struct fuse_dax_mapping *fuse_dax_reclaim_first_mapping_locked( + struct fuse_conn *fc, struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + int ret; + + /* Find fuse dax mapping at file offset inode. */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, 0, -1); + if (!dmap) + return NULL; + + ret = fuse_dax_reclaim_dmap_locked(fc, inode, dmap); + if (ret < 0) + return ERR_PTR(ret); + + /* Clean up dmap. Do not add back to free list */ + dmap_remove_busy_list(fc, dmap); + dmap->inode = NULL; + dmap->start = dmap->end = 0; + + pr_debug("fuse: reclaimed memory range window_offset=0x%llx," + " length=0x%llx\n", dmap->window_offset, + dmap->length); + return dmap; +} + +/* + * First first mapping in the tree and free it and return it. Do not add + * it back to free pool. + * + * This is called with inode lock held. + */ +struct fuse_dax_mapping *fuse_dax_reclaim_first_mapping(struct fuse_conn *fc, + struct inode *inode) +{ + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + + down_write(&fi->i_mmap_sem); + down_write(&fi->i_dmap_sem); + dmap = fuse_dax_reclaim_first_mapping_locked(fc, inode); + up_write(&fi->i_dmap_sem); + up_write(&fi->i_mmap_sem); + return dmap; +} + +static struct fuse_dax_mapping *alloc_dax_mapping_reclaim(struct fuse_conn *fc, + struct inode *inode) +{ + struct fuse_dax_mapping *dmap; + struct fuse_inode *fi = get_fuse_inode(inode); + + while(1) { + dmap = alloc_dax_mapping(fc); + if (dmap) + return dmap; + + if (fi->nr_dmaps) + return fuse_dax_reclaim_first_mapping(fc, inode); + /* + * There are no mappings which can be reclaimed. + * Wait for one. + */ + if (!(fc->nr_free_ranges > 0)) { + if (wait_event_killable_exclusive(fc->dax_range_waitq, + (fc->nr_free_ranges > 0))) + return ERR_PTR(-EINTR); + } + } +} + +int fuse_dax_free_one_mapping_locked(struct fuse_conn *fc, struct inode *inode, + u64 dmap_start) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + struct fuse_dax_mapping *dmap; + + WARN_ON(!inode_is_locked(inode)); + + /* Find fuse dax mapping at file offset inode. */ + dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, dmap_start, + dmap_start); + + /* Range already got cleaned up by somebody else */ + if (!dmap) + return 0; + + ret = fuse_dax_reclaim_dmap_locked(fc, inode, dmap); + if (ret < 0) + return ret; + + /* Cleanup dmap entry and add back to free list */ + spin_lock(&fc->lock); + __dmap_remove_busy_list(fc, dmap); + dmap->inode = NULL; + dmap->start = dmap->end = 0; + __free_dax_mapping(fc, dmap); + spin_unlock(&fc->lock); + + pr_debug("fuse: freed memory range window_offset=0x%llx," + " length=0x%llx\n", dmap->window_offset, + dmap->length); + return ret; +} + +/* + * Free a range of memory. + * Locking. + * 1. Take inode->i_rwsem to prever further read/write. + * 2. Take fuse_inode->i_mmap_sem to block dax faults. + * 3. Take fuse_inode->i_dmap_sem to protect interval tree. It might not + * be strictly necessary as lock 1 and 2 seem sufficient. + */ +int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, + u64 dmap_start) +{ + int ret; + struct fuse_inode *fi = get_fuse_inode(inode); + + /* + * If process is blocked waiting for memory while holding inode + * lock, we will deadlock. So continue to free next range. + */ + if (!inode_trylock(inode)) + return -EAGAIN; + down_write(&fi->i_mmap_sem); + down_write(&fi->i_dmap_sem); + ret = fuse_dax_free_one_mapping_locked(fc, inode, dmap_start); + up_write(&fi->i_dmap_sem); + up_write(&fi->i_mmap_sem); + inode_unlock(inode); + return ret; +} + +int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) +{ + struct fuse_dax_mapping *dmap, *pos, *temp; + int ret, nr_freed = 0; + u64 dmap_start = 0, window_offset = 0; + struct inode *inode = NULL; + + /* Pick first busy range and free it for now*/ + while(1) { + if (nr_freed >= nr_to_free) + break; + + dmap = NULL; + spin_lock(&fc->lock); + + list_for_each_entry_safe(pos, temp, &fc->busy_ranges, + busy_list) { + inode = igrab(pos->inode); + /* + * This inode is going away. That will free + * up all the ranges anyway, continue to + * next range. + */ + if (!inode) + continue; + /* + * Take this element off list and add it tail. If + * inode lock can't be obtained, this will help with + * selecting new element + */ + dmap = pos; + list_move_tail(&dmap->busy_list, &fc->busy_ranges); + dmap_start = dmap->start; + window_offset = dmap->window_offset; + break; + } + spin_unlock(&fc->lock); + if (!dmap) + return 0; + + ret = fuse_dax_free_one_mapping(fc, inode, dmap_start); + iput(inode); + if (ret && ret != -EAGAIN) { + printk("%s(window_offset=0x%llx) failed. err=%d\n", + __func__, window_offset, ret); + return ret; + } + + /* Could not get inode lock. Try next element */ + if (ret == -EAGAIN) + continue; + nr_freed++; + } + return 0; +} + +/* TODO: This probably should go in inode.c */ +void fuse_dax_free_mem_worker(struct work_struct *work) +{ + int ret; + struct fuse_conn *fc = container_of(work, struct fuse_conn, + dax_free_work.work); + pr_debug("fuse: Worker to free memory called.\n"); + pr_debug("fuse: Worker to free memory called. nr_free_ranges=%lu" + " nr_busy_ranges=%lu\n", fc->nr_free_ranges, + fc->nr_busy_ranges); + ret = fuse_dax_free_memory(fc, FUSE_DAX_RECLAIM_CHUNK); + if (ret) + pr_debug("fuse: fuse_dax_free_memory() failed with err=%d\n", ret); +} diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index c93e9155b723..b4a5728444bb 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -50,6 +50,16 @@ #define FUSE_DAX_MEM_RANGE_SZ (2*1024*1024) #define FUSE_DAX_MEM_RANGE_PAGES (FUSE_DAX_MEM_RANGE_SZ/PAGE_SIZE) +/* Number of ranges reclaimer will try to free in one invocation */ +#define FUSE_DAX_RECLAIM_CHUNK (10) + +/* + * Dax memory reclaim threshold in percetage of total ranges. When free + * number of free ranges drops below this threshold, reclaim can trigger + * Default is 20% + * */ +#define FUSE_DAX_RECLAIM_THRESHOLD (20) + /** List of active connections */ extern struct list_head fuse_conn_list; @@ -103,6 +113,9 @@ struct fuse_forget_link { /** Translation information for file offsets to DAX window offsets */ struct fuse_dax_mapping { + /* Pointer to inode where this memory range is mapped */ + struct inode *inode; + /* Will connect in fc->free_ranges to keep track of free memory */ struct list_head list; @@ -880,12 +893,20 @@ struct fuse_conn { unsigned long nr_busy_ranges; struct list_head busy_ranges; + /* Worker to free up memory ranges */ + struct delayed_work dax_free_work; + + /* Wait queue for a dax range to become free */ + wait_queue_head_t dax_range_waitq; + /* * DAX Window Free Ranges. TODO: This might not be best place to store * this free list */ long nr_free_ranges; struct list_head free_ranges; + + unsigned long nr_ranges; }; static inline struct fuse_conn *get_fuse_conn_super(struct super_block *sb) @@ -1260,6 +1281,7 @@ unsigned fuse_len_args(unsigned numargs, struct fuse_arg *args); * Get the next unique ID for a request */ u64 fuse_get_unique(struct fuse_iqueue *fiq); +void fuse_dax_free_mem_worker(struct work_struct *work); void fuse_removemapping(struct inode *inode); #endif /* _FS_FUSE_I_H */ diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index f57f7ce02acc..8af7f31c6e19 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -678,6 +678,7 @@ static int fuse_dax_mem_range_init(struct fuse_conn *fc, list_replace_init(&mem_ranges, &fc->free_ranges); fc->nr_free_ranges = nr_ranges; + fc->nr_ranges = nr_ranges; return 0; out_err: /* Free All allocated elements */ @@ -704,6 +705,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, atomic_set(&fc->dev_count, 1); init_waitqueue_head(&fc->blocked_waitq); init_waitqueue_head(&fc->reserved_req_waitq); + init_waitqueue_head(&fc->dax_range_waitq); fuse_iqueue_init(&fc->iq, fiq_ops, fiq_priv); INIT_LIST_HEAD(&fc->bg_queue); INIT_LIST_HEAD(&fc->entry); @@ -724,6 +726,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct user_namespace *user_ns, fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ; INIT_LIST_HEAD(&fc->free_ranges); INIT_LIST_HEAD(&fc->busy_ranges); + INIT_DELAYED_WORK(&fc->dax_free_work, fuse_dax_free_mem_worker); } EXPORT_SYMBOL_GPL(fuse_conn_init); @@ -732,6 +735,7 @@ void fuse_conn_put(struct fuse_conn *fc) if (refcount_dec_and_test(&fc->count)) { if (fc->destroy_req) fuse_request_free(fc->destroy_req); + flush_delayed_work(&fc->dax_free_work); if (fc->dax_dev) fuse_free_dax_mem_ranges(&fc->free_ranges); put_pid_ns(fc->pid_ns); From patchwork Wed May 15 19:27:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945219 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00392933 for ; Wed, 15 May 2019 19:30:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E54B5285ED for ; Wed, 15 May 2019 19:30:19 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D8B6328560; Wed, 15 May 2019 19:30:19 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8A6FE285ED for ; Wed, 15 May 2019 19:30:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727742AbfEOT3h (ORCPT ); Wed, 15 May 2019 15:29:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46604 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727651AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 17A6144FB1; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id E58975D723; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 1EF9022548F; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 27/30] fuse: Release file in process context Date: Wed, 15 May 2019 15:27:12 -0400 Message-Id: <20190515192715.18000-28-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fuse_file_put(sync) can be called with sync=true/false. If sync=true, it waits for release request response and then calls iput() in the caller's context. If sync=false, it does not wait for release request response, frees the fuse_file struct immediately and req->end function does the iput(). iput() can be a problem with DAX if called in req->end context. If this is last reference to inode (VFS has let go its reference already), then iput() will clean DAX mappings as well and send REMOVEMAPPING requests and wait for completion. (All the the worker thread context which is processing fuse replies from daemon on the host). That means it blocks worker thread and it stops processing further replies and system deadlocks. So for now, force sync release of file in case of DAX inodes. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 87fc2b5e0a3a..b0293a308b5e 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -475,6 +475,7 @@ void fuse_release_common(struct file *file, bool isdir) struct fuse_file *ff = file->private_data; struct fuse_req *req = ff->reserved_req; int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE; + bool sync = false; fuse_prepare_release(fi, ff, file->f_flags, opcode); @@ -495,8 +496,20 @@ void fuse_release_common(struct file *file, bool isdir) * Make the release synchronous if this is a fuseblk mount, * synchronous RELEASE is allowed (and desirable) in this case * because the server can be trusted not to screw up. + * + * For DAX, fuse server is trusted. So it should be fine to + * do a sync file put. Doing async file put is creating + * problems right now because when request finish, iput() + * can lead to freeing of inode. That means it tears down + * mappings backing DAX memory and sends REMOVEMAPPING message + * to server and blocks for completion. Currently, waiting + * in req->end context deadlocks the system as same worker thread + * can't process REMOVEMAPPING reply it is waiting for. */ - fuse_file_put(ff, ff->fc->destroy_req != NULL, isdir); + if (IS_DAX(req->misc.release.inode) || ff->fc->destroy_req != NULL) + sync = true; + + fuse_file_put(ff, sync, isdir); } static int fuse_open(struct inode *inode, struct file *file) From patchwork Wed May 15 19:27:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945195 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 65D841515 for ; Wed, 15 May 2019 19:29:36 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 54C912842D for ; Wed, 15 May 2019 19:29:36 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 45A6628541; Wed, 15 May 2019 19:29:36 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 65F3D2842D for ; Wed, 15 May 2019 19:29:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728107AbfEOT3T (ORCPT ); Wed, 15 May 2019 15:29:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36204 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727655AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1FF7330C1AE6; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE2AC5D728; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 253D4225490; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Date: Wed, 15 May 2019 15:27:13 -0400 Message-Id: <20190515192715.18000-29-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP fuse_dax_free_memory() can be very cpu intensive in corner cases. For example, if one inode has consumed all the memory and a setupmapping request is pending, that means inode lock is held by request and worker thread will not get lock for a while. And given there is only one inode consuming all the dax ranges, all the attempts to acquire lock will fail. So if there are too many inode lock failures (-EAGAIN), reschedule the worker with a 10ms delay. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index b0293a308b5e..9b82d9b4ebc3 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -4047,7 +4047,7 @@ int fuse_dax_free_one_mapping(struct fuse_conn *fc, struct inode *inode, int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) { struct fuse_dax_mapping *dmap, *pos, *temp; - int ret, nr_freed = 0; + int ret, nr_freed = 0, nr_eagain = 0; u64 dmap_start = 0, window_offset = 0; struct inode *inode = NULL; @@ -4056,6 +4056,12 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) if (nr_freed >= nr_to_free) break; + if (nr_eagain > 20) { + queue_delayed_work(system_long_wq, &fc->dax_free_work, + msecs_to_jiffies(10)); + return 0; + } + dmap = NULL; spin_lock(&fc->lock); @@ -4093,8 +4099,10 @@ int fuse_dax_free_memory(struct fuse_conn *fc, unsigned long nr_to_free) } /* Could not get inode lock. Try next element */ - if (ret == -EAGAIN) + if (ret == -EAGAIN) { + nr_eagain++; continue; + } nr_freed++; } return 0; From patchwork Wed May 15 19:27:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945213 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BDCE5933 for ; Wed, 15 May 2019 19:30:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AF44128560 for ; Wed, 15 May 2019 19:30:14 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A260B285ED; Wed, 15 May 2019 19:30:14 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D61E28560 for ; Wed, 15 May 2019 19:30:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727808AbfEOT3i (ORCPT ); Wed, 15 May 2019 15:29:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53908 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727634AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 20248308ED54; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE3501972A; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 2B3EB225491; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Date: Wed, 15 May 2019 15:27:14 -0400 Message-Id: <20190515192715.18000-30-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When a file is opened with O_TRUNC, we need to make sure that any other DAX operation is not in progress. DAX expects i_size to be stable. In fuse_iomap_begin() we check for i_size at multiple places and we expect i_size to not change. Another problem is, if we setup a mapping in fuse_iomap_begin(), and file gets truncated and dax read/write happens, KVM currently hangs. It tries to fault in a page which does not exist on host (file got truncated). It probably requries fixing in KVM. So for now, take inode lock. Once KVM is fixed, we might have to have a look at it again. Signed-off-by: Vivek Goyal --- fs/fuse/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 9b82d9b4ebc3..d0979dc32f08 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -420,7 +420,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir) int err; bool lock_inode = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc && - fc->writeback_cache; + (fc->writeback_cache || IS_DAX(inode)); err = generic_file_open(inode, file); if (err) From patchwork Wed May 15 19:27:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vivek Goyal X-Patchwork-Id: 10945187 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 008EA13AD for ; Wed, 15 May 2019 19:29:17 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E406B28516 for ; Wed, 15 May 2019 19:29:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D822228555; Wed, 15 May 2019 19:29:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 83BC028516 for ; Wed, 15 May 2019 19:29:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728082AbfEOT3C (ORCPT ); Wed, 15 May 2019 15:29:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53902 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727636AbfEOT1e (ORCPT ); Wed, 15 May 2019 15:27:34 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1DE5F3003C77; Wed, 15 May 2019 19:27:34 +0000 (UTC) Received: from horse.redhat.com (unknown [10.18.25.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id EE03919C71; Wed, 15 May 2019 19:27:33 +0000 (UTC) Received: by horse.redhat.com (Postfix, from userid 10451) id 31BA8225492; Wed, 15 May 2019 15:27:30 -0400 (EDT) From: Vivek Goyal To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Date: Wed, 15 May 2019 15:27:15 -0400 Message-Id: <20190515192715.18000-31-vgoyal@redhat.com> In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> References: <20190515192715.18000-1-vgoyal@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Wed, 15 May 2019 19:27:34 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP virtio-fs does not support aborting requests which are being processed. That is requests which have been sent to fuse daemon on host. So do not provide "abort" interface for virtio-fs in fusectl. Signed-off-by: Vivek Goyal --- fs/fuse/control.c | 4 ++-- fs/fuse/fuse_i.h | 4 ++++ fs/fuse/inode.c | 1 + fs/fuse/virtio_fs.c | 1 + 4 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/control.c b/fs/fuse/control.c index fe80bea4ad89..c1423f2ebc5e 100644 --- a/fs/fuse/control.c +++ b/fs/fuse/control.c @@ -278,8 +278,8 @@ int fuse_ctl_add_conn(struct fuse_conn *fc) if (!fuse_ctl_add_dentry(parent, fc, "waiting", S_IFREG | 0400, 1, NULL, &fuse_ctl_waiting_ops) || - !fuse_ctl_add_dentry(parent, fc, "abort", S_IFREG | 0200, 1, - NULL, &fuse_ctl_abort_ops) || + (!fc->no_abort && !fuse_ctl_add_dentry(parent, fc, "abort", + S_IFREG | 0200, 1, NULL, &fuse_ctl_abort_ops)) || !fuse_ctl_add_dentry(parent, fc, "max_background", S_IFREG | 0600, 1, NULL, &fuse_conn_max_background_ops) || !fuse_ctl_add_dentry(parent, fc, "congestion_threshold", diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index b4a5728444bb..7ac7f9a0b81b 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -86,6 +86,7 @@ struct fuse_mount_data { unsigned allow_other:1; unsigned dax:1; unsigned destroy:1; + unsigned no_abort:1; unsigned max_read; unsigned blksize; @@ -847,6 +848,9 @@ struct fuse_conn { /** Does the filesystem support copy_file_range? */ unsigned no_copy_file_range:1; + /** Do not create abort file in fuse control fs */ + unsigned no_abort:1; + /** The number of requests waiting for completion */ atomic_t num_waiting; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 8af7f31c6e19..302f7e04b645 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -1272,6 +1272,7 @@ int fuse_fill_super_common(struct super_block *sb, fc->user_id = mount_data->user_id; fc->group_id = mount_data->group_id; fc->max_read = max_t(unsigned, 4096, mount_data->max_read); + fc->no_abort = mount_data->no_abort; /* Used by get_root_inode() */ sb->s_fs_info = fc; diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c index 76c46edcc8ac..18fc0dca0abc 100644 --- a/fs/fuse/virtio_fs.c +++ b/fs/fuse/virtio_fs.c @@ -1042,6 +1042,7 @@ static int virtio_fs_fill_super(struct super_block *sb, void *data, d.fiq_priv = fs; d.fudptr = (void **)&fs->vqs[VQ_REQUEST].fud; d.destroy = true; /* Send destroy request on unmount */ + d.no_abort = 1; err = fuse_fill_super_common(sb, &d); if (err < 0) goto err_free_init_req;