From patchwork Tue Dec 17 14:39:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297751 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EF3B56C1 for ; Tue, 17 Dec 2019 14:39:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C25AD24682 for ; Tue, 17 Dec 2019 14:39:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="Rz0PTf0W" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728581AbfLQOjy (ORCPT ); Tue, 17 Dec 2019 09:39:54 -0500 Received: from mail-il1-f193.google.com ([209.85.166.193]:34962 "EHLO mail-il1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727926AbfLQOjx (ORCPT ); Tue, 17 Dec 2019 09:39:53 -0500 Received: by mail-il1-f193.google.com with SMTP id g12so8566980ild.2 for ; Tue, 17 Dec 2019 06:39:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=Rz0PTf0WwZEmsAUdF5wBZZkfVKCAltgfrzi74jjEA3lLneDy8dSauI/jbG3xoO7X7R e/VLmB+AiDGb4AjO2B/a6EyP+ygtn5D4ibbttq9rQLt5QiJNuzGA8PKC/CE9cdH4s9f+ dNaoH3JbLMpqNSpla+Y1wRrLG2jXJyiYRfyNa/cTX2EX/ow4JZkh0XhafeJtifcvPcPw /mA6Xil4l219I3wA5uND8mxzy81yPgODPqEFgFMTmd/OtU9Ds3F3kx/oD5fjJd118/Sd EO3NABK5jRgz+ogHrcK6ROc0cGWS6zuF8O/S6gQWgDglVQaB14e5ws/hq5XF5GmnU0g9 mBug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qEQyHfYbXoJej+s6/OMXT2BLZEPX6SiZ+5HLaKOeShc=; b=aFVwdnm+uQzbIYdIcpmjz+M0rNvKFJzUyRUkwl7YhRCSkF8BLBWaEEDyRJgxIepdWW lGSEAkAUsKP0IbRD1CMuSGdfvcRydrK8zijRjItylJhEYSFxscTEFiNCtlXPYtFuAwm3 3sSbpJnSp1piKPinFZBeZ0NjquDMSnV4SRIJNBHbLHRLhZLEnWC/9kz4SnlRQoJaXfhD K1upoltuhNqoYSIvI4mOdX7g5ph5hkjXWKEziFs6NtRRaDwI/uA4byXhdRLhlDwdb2N/ FeungSiduy59CXbBq29jnU+mFXFEDmN++f/evi6wpgtPqM3NAKZuowwYa8F2xqptn8IZ kwng== X-Gm-Message-State: APjAAAVTMoL766I48+Z7kVB+hv4CoaSfWhE5+sqqJopUcCt04zUJrmH5 kER4oW32a1WJuPum1AZuAfuNyQ== X-Google-Smtp-Source: APXvYqxNs1UwvqiKd0kuf9aHlqRHi8PBLH1NSYmwr4Xu4fISsW1/lVDKFz8aN9GKiiU+f+n2z5OIYg== X-Received: by 2002:a92:d809:: with SMTP id y9mr18025008ilm.261.1576593592098; Tue, 17 Dec 2019 06:39:52 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:51 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/6] fs: add read support for RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:43 -0700 Message-Id: <20191217143948.26380-2-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If RWF_UNCACHED is set for io_uring (or preadv2(2)), we'll use private pages for the buffered reads. These pages will never be inserted into the page cache, and they are simply droped when we have done the copy at the end of IO. If pages in the read range are already in the page cache, then use those for just copying the data instead of starting IO on private pages. A previous solution used the page cache even for non-cached ranges, but the cost of doing so was too high. Removing nodes at the end is expensive, even with LRU bypass. On top of that, repeatedly instantiating new xarray nodes is very costly, as it needs to memset 576 bytes of data, and freeing said nodes involve an RCU call per node as well. All that adds up, making uncached somewhat slower than O_DIRECT. With the current solition, we're basically at O_DIRECT levels of performance for RWF_UNCACHED IO. Protect against truncate the same way O_DIRECT does, by calling inode_dio_begin() to elevate the inode->i_dio_count. Signed-off-by: Jens Axboe --- include/linux/fs.h | 3 +++ include/uapi/linux/fs.h | 5 ++++- mm/filemap.c | 38 ++++++++++++++++++++++++++++++++------ 3 files changed, 39 insertions(+), 7 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349adb52..092ea2a4319b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -314,6 +314,7 @@ enum rw_hint { #define IOCB_SYNC (1 << 5) #define IOCB_WRITE (1 << 6) #define IOCB_NOWAIT (1 << 7) +#define IOCB_UNCACHED (1 << 8) struct kiocb { struct file *ki_filp; @@ -3418,6 +3419,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags) ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC); if (flags & RWF_APPEND) ki->ki_flags |= IOCB_APPEND; + if (flags & RWF_UNCACHED) + ki->ki_flags |= IOCB_UNCACHED; return 0; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 379a612f8f1d..357ebb0e0c5d 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t; /* per-IO O_APPEND */ #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010) +/* drop cache after reading or writing data */ +#define RWF_UNCACHED ((__force __kernel_rwf_t)0x00000040) + /* mask of flags supported by the kernel */ #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ - RWF_APPEND) + RWF_APPEND | RWF_UNCACHED) #endif /* _UAPI_LINUX_FS_H */ diff --git a/mm/filemap.c b/mm/filemap.c index bf6aa30be58d..7ddc4d8386cf 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1990,6 +1990,13 @@ static void shrink_readahead_size_eio(struct file *filp, ra->ra_pages /= 4; } +static void buffered_put_page(struct page *page, bool clear_mapping) +{ + if (clear_mapping) + page->mapping = NULL; + put_page(page); +} + /** * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read @@ -2013,6 +2020,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, struct address_space *mapping = filp->f_mapping; struct inode *inode = mapping->host; struct file_ra_state *ra = &filp->f_ra; + bool did_dio_begin = false; loff_t *ppos = &iocb->ki_pos; pgoff_t index; pgoff_t last_index; @@ -2032,6 +2040,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset = *ppos & ~PAGE_MASK; for (;;) { + bool clear_mapping = false; struct page *page; pgoff_t end_index; loff_t isize; @@ -2048,6 +2057,13 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (!page) { if (iocb->ki_flags & IOCB_NOWAIT) goto would_block; + /* UNCACHED implies no read-ahead */ + if (iocb->ki_flags & IOCB_UNCACHED) { + did_dio_begin = true; + /* block truncate for UNCACHED reads */ + inode_dio_begin(inode); + goto no_cached_page; + } page_cache_sync_readahead(mapping, ra, filp, index, last_index - index); @@ -2106,7 +2122,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, isize = i_size_read(inode); end_index = (isize - 1) >> PAGE_SHIFT; if (unlikely(!isize || index > end_index)) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } @@ -2115,7 +2131,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (index == end_index) { nr = ((isize - 1) & ~PAGE_MASK) + 1; if (nr <= offset) { - put_page(page); + buffered_put_page(page, clear_mapping); goto out; } } @@ -2147,7 +2163,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, offset &= ~PAGE_MASK; prev_offset = offset; - put_page(page); + buffered_put_page(page, clear_mapping); written += ret; if (!iov_iter_count(iter)) goto out; @@ -2189,7 +2205,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { - put_page(page); + buffered_put_page(page, clear_mapping); error = 0; goto find_page; } @@ -2206,7 +2222,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, * invalidate_mapping_pages got it */ unlock_page(page); - put_page(page); + buffered_put_page(page, clear_mapping); goto find_page; } unlock_page(page); @@ -2221,7 +2237,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, readpage_error: /* UHHUH! A synchronous read error occurred. Report it */ - put_page(page); + buffered_put_page(page, clear_mapping); goto out; no_cached_page: @@ -2234,6 +2250,14 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, error = -ENOMEM; goto out; } + if (iocb->ki_flags & IOCB_UNCACHED) { + __SetPageLocked(page); + page->mapping = mapping; + page->index = index; + clear_mapping = true; + goto readpage; + } + error = add_to_page_cache_lru(page, mapping, index, mapping_gfp_constraint(mapping, GFP_KERNEL)); if (error) { @@ -2250,6 +2274,8 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, would_block: error = -EAGAIN; out: + if (did_dio_begin) + inode_dio_end(inode); ra->prev_pos = prev_index; ra->prev_pos <<= PAGE_SHIFT; ra->prev_pos |= prev_offset; From patchwork Tue Dec 17 14:39:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297757 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7AF1B186D for ; Tue, 17 Dec 2019 14:39:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 575D424672 for ; Tue, 17 Dec 2019 14:39:56 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="2T6/KvYQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728493AbfLQOjz (ORCPT ); Tue, 17 Dec 2019 09:39:55 -0500 Received: from mail-io1-f67.google.com ([209.85.166.67]:34576 "EHLO mail-io1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728558AbfLQOjy (ORCPT ); Tue, 17 Dec 2019 09:39:54 -0500 Received: by mail-io1-f67.google.com with SMTP id z193so7707406iof.1 for ; Tue, 17 Dec 2019 06:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eKvsGoetsIcjvq9qtw7wq8EOlHDh0dLtS5dcS2crnAM=; b=2T6/KvYQuQLJXZ7z69OfzRHfUkIi00yPsJDIgAwSi0vvuhijjnSfxE68no83S8niWu 4jzOvLSgX/c7gaxyBryaYuILa6HvnP+6BHist5GL9yIZAkFEkCd8+cSblazxNaVhiqcv lwZZcg+Ks/yuF1Gm4gOtk+hZKVJcM0Qz6+lpMD1mMWZsqYYqPfStQ6Vj0kmPNZVJrnnv 5T7Px6Qvf4gmwW81lJw98OofW81TjYZFDOT1up8o63YqZ8IBEohTcpk56PkgqTPwO/c8 C06udPw6lIKChQqvmExmfG604lXCIcMoo3fcfguwOup6upWiqBrUA3BH+twqfs/XpiHj 5btQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=eKvsGoetsIcjvq9qtw7wq8EOlHDh0dLtS5dcS2crnAM=; b=gvKnzcXabayWAHhilJd7+8ADCSCGI7y3f8J8en5IizvibD45Lop5pakdIO+oIXEoAD 5Xuo+sJEImPF5AibafZ3e+iDwt4Skyo2VtP2GEYdfVv84hjhy+BG0thIviWdg/auTIrR 4YEjbvsBoEEuLMFYxUDpcOa4zuuuSkcaWE/W4NXaf5DMF6DFGJqgltkINMvWEg0APQOu UggqBFRqjllXLhS+0rzmUfrUNHUveUXAfgJ6LPAbEn8DblHA9WbtSE6QIae9K/UDOnCc KHQiMTp+DwKlSXTVBthDXotGLAjknldZ7YixCvFFohe0sW5T+ZUmZPqMVN7qIolwX+bf sejw== X-Gm-Message-State: APjAAAVnAKJ82EG9ZzSEE1RGNM/E5K95x3fsmxhd6ZXTaummII8G+PN1 tXRqH1n12cSriKSLDBKOOIzJkA== X-Google-Smtp-Source: APXvYqxMdQvhXZ9Q18XgGJtSH0yvOWY1lHHyKek4U1CcHUB2OSSyfo2zR9Bt8k43JQ6faNwcUtoHvg== X-Received: by 2002:a6b:ed15:: with SMTP id n21mr1525183iog.128.1576593593105; Tue, 17 Dec 2019 06:39:53 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:52 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/6] mm: make generic_perform_write() take a struct kiocb Date: Tue, 17 Dec 2019 07:39:44 -0700 Message-Id: <20191217143948.26380-3-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Right now all callers pass in iocb->ki_pos, just pass in the iocb. This is in preparation for using the iocb flags in generic_perform_write(). Signed-off-by: Jens Axboe --- fs/ceph/file.c | 2 +- fs/ext4/file.c | 2 +- fs/nfs/file.c | 2 +- include/linux/fs.h | 3 ++- mm/filemap.c | 8 +++++--- 5 files changed, 10 insertions(+), 7 deletions(-) diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 11929d2bb594..096c009f188f 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1538,7 +1538,7 @@ static ssize_t ceph_write_iter(struct kiocb *iocb, struct iov_iter *from) * are pending vmtruncate. So write and vmtruncate * can not run at the same time */ - written = generic_perform_write(file, from, pos); + written = generic_perform_write(file, from, iocb); if (likely(written >= 0)) iocb->ki_pos = pos + written; ceph_end_io_write(inode); diff --git a/fs/ext4/file.c b/fs/ext4/file.c index 6a7293a5cda2..9ffb857765d5 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -249,7 +249,7 @@ static ssize_t ext4_buffered_write_iter(struct kiocb *iocb, goto out; current->backing_dev_info = inode_to_bdi(inode); - ret = generic_perform_write(iocb->ki_filp, from, iocb->ki_pos); + ret = generic_perform_write(iocb->ki_filp, from, iocb); current->backing_dev_info = NULL; out: diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 8eb731d9be3e..d8f51a702a4e 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -624,7 +624,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from) result = generic_write_checks(iocb, from); if (result > 0) { current->backing_dev_info = inode_to_bdi(inode); - result = generic_perform_write(file, from, iocb->ki_pos); + result = generic_perform_write(file, from, iocb); current->backing_dev_info = NULL; } nfs_end_io_write(inode); diff --git a/include/linux/fs.h b/include/linux/fs.h index 092ea2a4319b..bf58db1bc032 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3103,7 +3103,8 @@ extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *); -extern ssize_t generic_perform_write(struct file *, struct iov_iter *, loff_t); +extern ssize_t generic_perform_write(struct file *, struct iov_iter *, + struct kiocb *); ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags); diff --git a/mm/filemap.c b/mm/filemap.c index 7ddc4d8386cf..522152ed86d8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3292,10 +3292,11 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping, EXPORT_SYMBOL(grab_cache_page_write_begin); ssize_t generic_perform_write(struct file *file, - struct iov_iter *i, loff_t pos) + struct iov_iter *i, struct kiocb *iocb) { struct address_space *mapping = file->f_mapping; const struct address_space_operations *a_ops = mapping->a_ops; + loff_t pos = iocb->ki_pos; long status = 0; ssize_t written = 0; unsigned int flags = 0; @@ -3429,7 +3430,8 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) if (written < 0 || !iov_iter_count(from) || IS_DAX(inode)) goto out; - status = generic_perform_write(file, from, pos = iocb->ki_pos); + pos = iocb->ki_pos; + status = generic_perform_write(file, from, iocb); /* * If generic_perform_write() returned a synchronous error * then we want to return the number of bytes which were @@ -3461,7 +3463,7 @@ ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) */ } } else { - written = generic_perform_write(file, from, iocb->ki_pos); + written = generic_perform_write(file, from, iocb); if (likely(written > 0)) iocb->ki_pos += written; } From patchwork Tue Dec 17 14:39:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297765 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A76D96C1 for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 775552467B for ; Tue, 17 Dec 2019 14:39:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="tLwP2zT0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728667AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-il1-f196.google.com ([209.85.166.196]:38835 "EHLO mail-il1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727926AbfLQOjy (ORCPT ); Tue, 17 Dec 2019 09:39:54 -0500 Received: by mail-il1-f196.google.com with SMTP id f5so8547061ilq.5 for ; Tue, 17 Dec 2019 06:39:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=EOt7OEDxASzJpz4aW2OtR1NM6hhZ1aYaA0R6ITxPWMg=; b=tLwP2zT0MtI2kOTe48uSomwZIjJUT0pySzMCDXzO1sNdjdjOCFI1XHkUHEOozqFc8c nD8w1OUooAyprmV1auAjotvgf/lwQQgtJmjU0OOuifVAcFA6tNABcFtRb6VTMGQNvXW8 R+UIQQhagkuaIdQ7XtSKY2NXtNhP26jjoETMW/zxAtTmWwRpQW/LEGudNmZGscJC8QcP 5bfFqpV+WXfMvXGjvCr86gr0YU9WkuoxK8tJyuGnk+U8na/TmrO8XsxhHpezKrsvh+V8 wbxIlP/OlD3s6ghZTqeNhQeYNghwcPJnzfXH2JT64KUq2IgoyaXWFWMfddWHwG8HGDQK 3y0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=EOt7OEDxASzJpz4aW2OtR1NM6hhZ1aYaA0R6ITxPWMg=; b=nbaNJTkRN7BW0xETGnOC83IwWCJyh3S1kXnEvtqwZ3wG+N9J/Eevz0I93Fd0NNuVqI 5tGfRgMU7cJ5ElJAE118qTQHT2IxPlXEvNoJc6A9TT8mlgJz4dxO3/g2HQbb0+A9ISkC lbElVNF1jemZqltC3sCP4lhGodo9m933WBQ6LCB2MIJKXBlIxtMDaso08JYvI68pjgLA TIJ1SIqX7J+/MhtuXB+Q24RrgmMQo9lZnMU1oTrMYIpP97oc7lnOgqEDQWwxBnDSNv07 zZEBRGjPiQK/pscSysLLQNnuKAvViKjSJw5Cjif73m9JS5+Eya3ciFFCaSkfvkRjpijE UpXQ== X-Gm-Message-State: APjAAAXDOw5iMthm/d9ppj96r7rDWnGgr0Jc7BdZTXQ1HbOkka5KpP0F Ai05WiNnHxV+O+MtZkwkqbLwZwwKYEa5FA== X-Google-Smtp-Source: APXvYqy373igjl5RaNDqw4GXVKR/B2FPaCQeTRyE7cK02n7sdyI5xVepqdHZGKv7py+DgJxlDb4Q6Q== X-Received: by 2002:a92:cb11:: with SMTP id s17mr10211920ilo.114.1576593593985; Tue, 17 Dec 2019 06:39:53 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:53 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/6] mm: make buffered writes work with RWF_UNCACHED Date: Tue, 17 Dec 2019 07:39:45 -0700 Message-Id: <20191217143948.26380-4-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org If RWF_UNCACHED is set for io_uring (or pwritev2(2)), we'll drop the cache instantiated for buffered writes. If new pages aren't instantiated, we leave them alone. This provides similar semantics to reads with RWF_UNCACHED set. Signed-off-by: Jens Axboe --- include/linux/fs.h | 1 + mm/filemap.c | 41 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index bf58db1bc032..5ea5fc167524 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -285,6 +285,7 @@ enum positive_aop_returns { #define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct * helper code (eg buffer layer) * to clear GFP_FS from alloc */ +#define AOP_FLAG_UNCACHED 0x0004 /* * oh the beauties of C type declarations. diff --git a/mm/filemap.c b/mm/filemap.c index 522152ed86d8..0b5f29b30c34 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3277,10 +3277,12 @@ struct page *grab_cache_page_write_begin(struct address_space *mapping, pgoff_t index, unsigned flags) { struct page *page; - int fgp_flags = FGP_LOCK|FGP_WRITE|FGP_CREAT; + int fgp_flags = FGP_LOCK|FGP_WRITE; if (flags & AOP_FLAG_NOFS) fgp_flags |= FGP_NOFS; + if (!(flags & AOP_FLAG_UNCACHED)) + fgp_flags |= FGP_CREAT; page = pagecache_get_page(mapping, index, fgp_flags, mapping_gfp_mask(mapping)); @@ -3301,6 +3303,9 @@ ssize_t generic_perform_write(struct file *file, ssize_t written = 0; unsigned int flags = 0; + if (iocb->ki_flags & IOCB_UNCACHED) + flags |= AOP_FLAG_UNCACHED; + do { struct page *page; unsigned long offset; /* Offset into pagecache page */ @@ -3333,10 +3338,16 @@ ssize_t generic_perform_write(struct file *file, break; } +retry: status = a_ops->write_begin(file, mapping, pos, bytes, flags, &page, &fsdata); - if (unlikely(status < 0)) + if (unlikely(status < 0)) { + if (status == -ENOMEM && (flags & AOP_FLAG_UNCACHED)) { + flags &= ~AOP_FLAG_UNCACHED; + goto retry; + } break; + } if (mapping_writably_mapped(mapping)) flush_dcache_page(page); @@ -3372,6 +3383,32 @@ ssize_t generic_perform_write(struct file *file, balance_dirty_pages_ratelimited(mapping); } while (iov_iter_count(i)); + if (written && (iocb->ki_flags & IOCB_UNCACHED)) { + loff_t end; + + pos = iocb->ki_pos; + end = pos + written; + + status = filemap_write_and_wait_range(mapping, pos, end); + if (status) + goto out; + + /* + * No pages were created for this range, we're done + */ + if (flags & AOP_FLAG_UNCACHED) + goto out; + + /* + * Try to invalidate cache pages for the range we just wrote. + * We don't care if invalidation fails as the write has still + * worked and leaving clean uptodate pages in the page cache + * isn't a corruption vector for uncached IO. + */ + invalidate_inode_pages2_range(mapping, + pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + } +out: return written ? written : status; } EXPORT_SYMBOL(generic_perform_write); From patchwork Tue Dec 17 14:39:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297775 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6DC6A112B for ; Tue, 17 Dec 2019 14:40:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3877224680 for ; Tue, 17 Dec 2019 14:40:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="HFMnVbnX" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728665AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-io1-f66.google.com ([209.85.166.66]:38178 "EHLO mail-io1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728558AbfLQOj4 (ORCPT ); Tue, 17 Dec 2019 09:39:56 -0500 Received: by mail-io1-f66.google.com with SMTP id v3so11276162ioj.5 for ; Tue, 17 Dec 2019 06:39:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=W8/Oau/0zx1upVgXD58HD2IdXBLquun38bfgvolAqVE=; b=HFMnVbnXL5nozNIYcQ6hd0G1F6cDCr6MWodjZKlkiKMVgERb3yFNCXBVvLbTvUeiPm X4dVlRaSHT8QhoF6Ela8U1soaTakFVvY6c5YW17xeltyddTSof8pRiiQtx3O6PGEFvlM Sa0uUKXQIHHgmfOOrwUdoVqn0hCmO785dqgmann+Uxf2y6TmNsWseofnSfPT1IMYJnUE QAT6oQ0TqmVlR0eG/xarhwSDBmEl7HwC8aIqUgQlzb2RT/ImBcuAhQBUoRLxbL3PdxJD yxJ/j6dInL9jkYTNH2nwwcv86h/OXvoBCKGcSDvlDm98iW2rfyPC/4nUFbDX8jf3VdUu C5Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=W8/Oau/0zx1upVgXD58HD2IdXBLquun38bfgvolAqVE=; b=MBvh2XlxEXBS4N6HjF6oM/qtSJ1JvR9e+1JohIv+6BUiTvgi3RXsxXYhU5YUYS4LrI BjD0eUp8/dvyowPnqfpISWpnV/52lhtiQ+0ma5kJf0MPaxa/51ecYLu9VWoca+zRlD3Y GpXq5OBUUXcAoDjTCkCxsd655ZE6yMV796jbrjvE6MTnvcDQKadrI0m+Bnr5MjzQANwV MtZEPb7baEyVooA+KQ+QHlnCJB+HDinbFP9JMdrXdw8242YFiB6iqvz2DqRkDtxl+R8J 7r1mL1pbGwi9KRG21nyC9ahJ2r1Hwmv++9vwxRZWMFBS2qHZxiDCz7RKJvdaMcWO2vyZ YzYg== X-Gm-Message-State: APjAAAXs5bHeWhKd32boAyOQtRCAZ+JaePcowece8/lByHjmF7ocdLm8 1eZ4oPwlzT5xHoZ3e8KafX2vqg== X-Google-Smtp-Source: APXvYqyrdRNqE7YVIsRIri3H/jmjcdDKJ/YOrL184+dus7WHLg/xK9HpNsJVhN8e+QKeYiZCaVyzjw== X-Received: by 2002:a5d:8d95:: with SMTP id b21mr3885410ioj.303.1576593594982; Tue, 17 Dec 2019 06:39:54 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:54 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/6] iomap: add struct iomap_ctx Date: Tue, 17 Dec 2019 07:39:46 -0700 Message-Id: <20191217143948.26380-5-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org We pass a lot of arguments to iomap_apply(), and subsequently to the actors that it calls. In preparation for adding one more argument, switch them to using a struct iomap_ctx instead. The actor gets a const version of that, they are not supposed to change anything in it. Signed-off-by: Jens Axboe --- fs/dax.c | 25 +++-- fs/iomap/apply.c | 26 +++--- fs/iomap/buffered-io.c | 202 +++++++++++++++++++++++++---------------- fs/iomap/direct-io.c | 57 +++++++----- fs/iomap/fiemap.c | 48 ++++++---- fs/iomap/seek.c | 64 ++++++++----- fs/iomap/swapfile.c | 27 +++--- include/linux/iomap.h | 15 ++- 8 files changed, 278 insertions(+), 186 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 1f1f0201cad1..2637afec30b2 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1090,13 +1090,16 @@ int __dax_zero_page_range(struct block_device *bdev, EXPORT_SYMBOL_GPL(__dax_zero_page_range); static loff_t -dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +dax_iomap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { struct block_device *bdev = iomap->bdev; struct dax_device *dax_dev = iomap->dax_dev; - struct iov_iter *iter = data; + struct iov_iter *iter = data->priv; + loff_t pos = data->pos; + loff_t length = pos + data->len; loff_t end = pos + length, done = 0; + struct inode *inode = data->inode; ssize_t ret = 0; size_t xfer; int id; @@ -1197,22 +1200,26 @@ dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = mapping->host; - loff_t pos = iocb->ki_pos, ret = 0, done = 0; - unsigned flags = 0; + loff_t ret = 0, done = 0; + struct iomap_ctx data = { + .inode = inode, + .pos = iocb->ki_pos, + .priv = iter, + }; if (iov_iter_rw(iter) == WRITE) { lockdep_assert_held_write(&inode->i_rwsem); - flags |= IOMAP_WRITE; + data.flags |= IOMAP_WRITE; } else { lockdep_assert_held(&inode->i_rwsem); } while (iov_iter_count(iter)) { - ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops, - iter, dax_iomap_actor); + data.len = iov_iter_count(iter); + ret = iomap_apply(&data, ops, dax_iomap_actor); if (ret <= 0) break; - pos += ret; + data.pos += ret; done += ret; } diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 76925b40b5fd..792079403a22 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -21,15 +21,16 @@ * iomap_end call. */ loff_t -iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, - const struct iomap_ops *ops, void *data, iomap_actor_t actor) +iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, + iomap_actor_t actor) { struct iomap iomap = { .type = IOMAP_HOLE }; struct iomap srcmap = { .type = IOMAP_HOLE }; loff_t written = 0, ret; u64 end; - trace_iomap_apply(inode, pos, length, flags, ops, actor, _RET_IP_); + trace_iomap_apply(data->inode, data->pos, data->len, data->flags, ops, + actor, _RET_IP_); /* * Need to map a range from start position for length bytes. This can @@ -43,17 +44,18 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * expose transient stale data. If the reserve fails, we can safely * back out at this point as there is nothing to undo. */ - ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap); + ret = ops->iomap_begin(data->inode, data->pos, data->len, data->flags, + &iomap, &srcmap); if (ret) return ret; - if (WARN_ON(iomap.offset > pos)) + if (WARN_ON(iomap.offset > data->pos)) return -EIO; if (WARN_ON(iomap.length == 0)) return -EIO; - trace_iomap_apply_dstmap(inode, &iomap); + trace_iomap_apply_dstmap(data->inode, &iomap); if (srcmap.type != IOMAP_HOLE) - trace_iomap_apply_srcmap(inode, &srcmap); + trace_iomap_apply_srcmap(data->inode, &srcmap); /* * Cut down the length to the one actually provided by the filesystem, @@ -62,8 +64,8 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, end = iomap.offset + iomap.length; if (srcmap.type != IOMAP_HOLE) end = min(end, srcmap.offset + srcmap.length); - if (pos + length > end) - length = end - pos; + if (data->pos + data->len > end) + data->len = end - data->pos; /* * Now that we have guaranteed that the space allocation will succeed, @@ -77,7 +79,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * iomap into the actors so that they don't need to have special * handling for the two cases. */ - written = actor(inode, pos, length, data, &iomap, + written = actor(data, &iomap, srcmap.type != IOMAP_HOLE ? &srcmap : &iomap); /* @@ -85,9 +87,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * should not fail unless the filesystem has had a fatal error. */ if (ops->iomap_end) { - ret = ops->iomap_end(inode, pos, length, + ret = ops->iomap_end(data->inode, data->pos, data->len, written > 0 ? written : 0, - flags, &iomap); + data->flags, &iomap); } return written ? written : ret; diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 828444e14d09..7f8300bce767 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -248,14 +248,15 @@ static inline bool iomap_block_needs_zeroing(struct inode *inode, } static loff_t -iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_readpage_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_readpage_ctx *ctx = data; + struct iomap_readpage_ctx *ctx = data->priv; + struct inode *inode = data->inode; struct page *page = ctx->cur_page; struct iomap_page *iop = iomap_page_create(inode, page); bool same_page = false, is_contig = false; - loff_t orig_pos = pos; + loff_t pos = data->pos, orig_pos = data->pos; unsigned poff, plen; sector_t sector; @@ -266,7 +267,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, } /* zero post-eof blocks as the page may be mapped */ - iomap_adjust_read_range(inode, iop, &pos, length, &poff, &plen); + iomap_adjust_read_range(inode, iop, &pos, data->len, &poff, &plen); if (plen == 0) goto done; @@ -302,7 +303,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, if (!ctx->bio || !is_contig || bio_full(ctx->bio, plen)) { gfp_t gfp = mapping_gfp_constraint(page->mapping, GFP_KERNEL); - int nr_vecs = (length + PAGE_SIZE - 1) >> PAGE_SHIFT; + int nr_vecs = (data->len + PAGE_SIZE - 1) >> PAGE_SHIFT; if (ctx->bio) submit_bio(ctx->bio); @@ -333,16 +334,20 @@ int iomap_readpage(struct page *page, const struct iomap_ops *ops) { struct iomap_readpage_ctx ctx = { .cur_page = page }; - struct inode *inode = page->mapping->host; + struct iomap_ctx data = { + .inode = page->mapping->host, + .priv = &ctx, + .flags = 0 + }; unsigned poff; loff_t ret; trace_iomap_readpage(page->mapping->host, 1); for (poff = 0; poff < PAGE_SIZE; poff += ret) { - ret = iomap_apply(inode, page_offset(page) + poff, - PAGE_SIZE - poff, 0, ops, &ctx, - iomap_readpage_actor); + data.pos = page_offset(page) + poff; + data.len = PAGE_SIZE - poff; + ret = iomap_apply(&data, ops, iomap_readpage_actor); if (ret <= 0) { WARN_ON_ONCE(ret == 0); SetPageError(page); @@ -396,28 +401,34 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, } static loff_t -iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_readpages_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_readpage_ctx *ctx = data; + struct iomap_readpage_ctx *ctx = data->priv; loff_t done, ret; - for (done = 0; done < length; done += ret) { - if (ctx->cur_page && offset_in_page(pos + done) == 0) { + for (done = 0; done < data->len; done += ret) { + struct iomap_ctx rp_data = { + .inode = data->inode, + .pos = data->pos + done, + .len = data->len - done, + .priv = ctx, + }; + + if (ctx->cur_page && offset_in_page(rp_data.pos) == 0) { if (!ctx->cur_page_in_bio) unlock_page(ctx->cur_page); put_page(ctx->cur_page); ctx->cur_page = NULL; } if (!ctx->cur_page) { - ctx->cur_page = iomap_next_page(inode, ctx->pages, - pos, length, &done); + ctx->cur_page = iomap_next_page(data->inode, ctx->pages, + data->pos, data->len, &done); if (!ctx->cur_page) break; ctx->cur_page_in_bio = false; } - ret = iomap_readpage_actor(inode, pos + done, length - done, - ctx, iomap, srcmap); + ret = iomap_readpage_actor(&rp_data, iomap, srcmap); } return done; @@ -431,21 +442,27 @@ iomap_readpages(struct address_space *mapping, struct list_head *pages, .pages = pages, .is_readahead = true, }; - loff_t pos = page_offset(list_entry(pages->prev, struct page, lru)); + struct iomap_ctx data = { + .inode = mapping->host, + .priv = &ctx, + .flags = 0 + }; loff_t last = page_offset(list_entry(pages->next, struct page, lru)); - loff_t length = last - pos + PAGE_SIZE, ret = 0; + loff_t ret = 0; + + data.pos = page_offset(list_entry(pages->prev, struct page, lru)); + data.len = last - data.pos + PAGE_SIZE; - trace_iomap_readpages(mapping->host, nr_pages); + trace_iomap_readpages(data.inode, nr_pages); - while (length > 0) { - ret = iomap_apply(mapping->host, pos, length, 0, ops, - &ctx, iomap_readpages_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_readpages_actor); if (ret <= 0) { WARN_ON_ONCE(ret == 0); goto done; } - pos += ret; - length -= ret; + data.pos += ret; + data.len -= ret; } ret = 0; done: @@ -796,10 +813,13 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, unsigned copied, } static loff_t -iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iov_iter *i = data; + struct inode *inode = data->inode; + struct iov_iter *i = data->priv; + loff_t length = data->len; + loff_t pos = data->pos; long status = 0; ssize_t written = 0; @@ -879,15 +899,20 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops) { - struct inode *inode = iocb->ki_filp->f_mapping->host; - loff_t pos = iocb->ki_pos, ret = 0, written = 0; + struct iomap_ctx data = { + .inode = iocb->ki_filp->f_mapping->host, + .pos = iocb->ki_pos, + .priv = iter, + .flags = IOMAP_WRITE + }; + loff_t ret = 0, written = 0; while (iov_iter_count(iter)) { - ret = iomap_apply(inode, pos, iov_iter_count(iter), - IOMAP_WRITE, ops, iter, iomap_write_actor); + data.len = iov_iter_count(iter); + ret = iomap_apply(&data, ops, iomap_write_actor); if (ret <= 0) break; - pos += ret; + data.pos += ret; written += ret; } @@ -896,9 +921,11 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, EXPORT_SYMBOL_GPL(iomap_file_buffered_write); static loff_t -iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_unshare_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t pos = data->pos; + loff_t length = data->len; long status = 0; ssize_t written = 0; @@ -914,13 +941,13 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length); struct page *page; - status = iomap_write_begin(inode, pos, bytes, + status = iomap_write_begin(data->inode, pos, bytes, IOMAP_WRITE_F_UNSHARE, &page, iomap, srcmap); if (unlikely(status)) return status; - status = iomap_write_end(inode, pos, bytes, bytes, page, iomap, - srcmap); + status = iomap_write_end(data->inode, pos, bytes, bytes, page, + iomap, srcmap); if (unlikely(status <= 0)) { if (WARN_ON_ONCE(status == 0)) return -EIO; @@ -933,7 +960,7 @@ iomap_unshare_actor(struct inode *inode, loff_t pos, loff_t length, void *data, written += status; length -= status; - balance_dirty_pages_ratelimited(inode->i_mapping); + balance_dirty_pages_ratelimited(data->inode->i_mapping); } while (length); return written; @@ -943,15 +970,20 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops) { + struct iomap_ctx data = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_WRITE, + }; loff_t ret; - while (len) { - ret = iomap_apply(inode, pos, len, IOMAP_WRITE, ops, NULL, - iomap_unshare_actor); + while (data.len) { + ret = iomap_apply(&data, ops, iomap_unshare_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } return 0; @@ -982,16 +1014,18 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, } static loff_t -iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_zero_range_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - bool *did_zero = data; + bool *did_zero = data->priv; + loff_t count = data->len; + loff_t pos = data->pos; loff_t written = 0; int status; /* already zeroed? we're done. */ if (srcmap->type == IOMAP_HOLE || srcmap->type == IOMAP_UNWRITTEN) - return count; + return data->len; do { unsigned offset, bytes; @@ -999,11 +1033,11 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, offset = offset_in_page(pos); bytes = min_t(loff_t, PAGE_SIZE - offset, count); - if (IS_DAX(inode)) + if (IS_DAX(data->inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else - status = iomap_zero(inode, pos, offset, bytes, iomap, - srcmap); + status = iomap_zero(data->inode, pos, offset, bytes, + iomap, srcmap); if (status < 0) return status; @@ -1021,16 +1055,22 @@ int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops) { + struct iomap_ctx data = { + .inode = inode, + .pos = pos, + .len = len, + .priv = did_zero, + .flags = IOMAP_ZERO + }; loff_t ret; - while (len > 0) { - ret = iomap_apply(inode, pos, len, IOMAP_ZERO, - ops, did_zero, iomap_zero_range_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_zero_range_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } return 0; @@ -1052,57 +1092,59 @@ iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, EXPORT_SYMBOL_GPL(iomap_truncate_page); static loff_t -iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_page_mkwrite_actor(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap) { - struct page *page = data; + struct page *page = data->priv; int ret; if (iomap->flags & IOMAP_F_BUFFER_HEAD) { - ret = __block_write_begin_int(page, pos, length, NULL, iomap); + ret = __block_write_begin_int(page, data->pos, data->len, NULL, + iomap); if (ret) return ret; - block_commit_write(page, 0, length); + block_commit_write(page, 0, data->len); } else { WARN_ON_ONCE(!PageUptodate(page)); - iomap_page_create(inode, page); + iomap_page_create(data->inode, page); set_page_dirty(page); } - return length; + return data->len; } vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops) { struct page *page = vmf->page; - struct inode *inode = file_inode(vmf->vma->vm_file); - unsigned long length; - loff_t offset, size; + struct iomap_ctx data = { + .inode = file_inode(vmf->vma->vm_file), + .pos = page_offset(page), + .flags = IOMAP_WRITE | IOMAP_FAULT, + .priv = page, + }; ssize_t ret; + loff_t size; lock_page(page); - size = i_size_read(inode); - offset = page_offset(page); - if (page->mapping != inode->i_mapping || offset > size) { + size = i_size_read(data.inode); + if (page->mapping != data.inode->i_mapping || data.pos > size) { /* We overload EFAULT to mean page got truncated */ ret = -EFAULT; goto out_unlock; } /* page is wholly or partially inside EOF */ - if (offset > size - PAGE_SIZE) - length = offset_in_page(size); + if (data.pos > size - PAGE_SIZE) + data.len = offset_in_page(size); else - length = PAGE_SIZE; + data.len = PAGE_SIZE; - while (length > 0) { - ret = iomap_apply(inode, offset, length, - IOMAP_WRITE | IOMAP_FAULT, ops, page, - iomap_page_mkwrite_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_page_mkwrite_actor); if (unlikely(ret <= 0)) goto out_unlock; - offset += ret; - length -= ret; + data.pos += ret; + data.len -= ret; } wait_for_stable_page(page); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 23837926c0c5..7f1bffa262e1 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -364,24 +364,27 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, } static loff_t -iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_dio_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct iomap_dio *dio = data; + struct iomap_dio *dio = data->priv; switch (iomap->type) { case IOMAP_HOLE: if (WARN_ON_ONCE(dio->flags & IOMAP_DIO_WRITE)) return -EIO; - return iomap_dio_hole_actor(length, dio); + return iomap_dio_hole_actor(data->len, dio); case IOMAP_UNWRITTEN: if (!(dio->flags & IOMAP_DIO_WRITE)) - return iomap_dio_hole_actor(length, dio); - return iomap_dio_bio_actor(inode, pos, length, dio, iomap); + return iomap_dio_hole_actor(data->len, dio); + return iomap_dio_bio_actor(data->inode, data->pos, data->len, + dio, iomap); case IOMAP_MAPPED: - return iomap_dio_bio_actor(inode, pos, length, dio, iomap); + return iomap_dio_bio_actor(data->inode, data->pos, data->len, + dio, iomap); case IOMAP_INLINE: - return iomap_dio_inline_actor(inode, pos, length, dio, iomap); + return iomap_dio_inline_actor(data->inode, data->pos, data->len, + dio, iomap); default: WARN_ON_ONCE(1); return -EIO; @@ -404,16 +407,19 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, { struct address_space *mapping = iocb->ki_filp->f_mapping; struct inode *inode = file_inode(iocb->ki_filp); - size_t count = iov_iter_count(iter); - loff_t pos = iocb->ki_pos; - loff_t end = iocb->ki_pos + count - 1, ret = 0; - unsigned int flags = IOMAP_DIRECT; + struct iomap_ctx data = { + .inode = inode, + .pos = iocb->ki_pos, + .len = iov_iter_count(iter), + .flags = IOMAP_DIRECT + }; + loff_t end = data.pos + data.len - 1, ret = 0; struct blk_plug plug; struct iomap_dio *dio; lockdep_assert_held(&inode->i_rwsem); - if (!count) + if (!data.len) return 0; if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) @@ -436,14 +442,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.cookie = BLK_QC_T_NONE; dio->submit.last_queue = NULL; + data.priv = dio; + if (iov_iter_rw(iter) == READ) { - if (pos >= dio->i_size) + if (data.pos >= dio->i_size) goto out_free_dio; if (iter_is_iovec(iter)) dio->flags |= IOMAP_DIO_DIRTY; } else { - flags |= IOMAP_WRITE; + data.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; /* for data sync or sync, we need sync completion processing */ @@ -461,14 +469,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, } if (iocb->ki_flags & IOCB_NOWAIT) { - if (filemap_range_has_page(mapping, pos, end)) { + if (filemap_range_has_page(mapping, data.pos, end)) { ret = -EAGAIN; goto out_free_dio; } - flags |= IOMAP_NOWAIT; + data.flags |= IOMAP_NOWAIT; } - ret = filemap_write_and_wait_range(mapping, pos, end); + ret = filemap_write_and_wait_range(mapping, data.pos, end); if (ret) goto out_free_dio; @@ -479,7 +487,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, * pretty crazy thing to do, so we don't support it 100%. */ ret = invalidate_inode_pages2_range(mapping, - pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + data.pos >> PAGE_SHIFT, end >> PAGE_SHIFT); if (ret) dio_warn_stale_pagecache(iocb->ki_filp); ret = 0; @@ -495,8 +503,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, blk_start_plug(&plug); do { - ret = iomap_apply(inode, pos, count, flags, ops, dio, - iomap_dio_actor); + ret = iomap_apply(&data, ops, iomap_dio_actor); if (ret <= 0) { /* magic error code to fall back to buffered I/O */ if (ret == -ENOTBLK) { @@ -505,18 +512,18 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, } break; } - pos += ret; + data.pos += ret; - if (iov_iter_rw(iter) == READ && pos >= dio->i_size) { + if (iov_iter_rw(iter) == READ && data.pos >= dio->i_size) { /* * We only report that we've read data up to i_size. * Revert iter to a state corresponding to that as * some callers (such as splice code) rely on it. */ - iov_iter_revert(iter, pos - dio->i_size); + iov_iter_revert(iter, data.pos - dio->i_size); break; } - } while ((count = iov_iter_count(iter)) > 0); + } while ((data.len = iov_iter_count(iter)) > 0); blk_finish_plug(&plug); if (ret < 0) diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c index bccf305ea9ce..61cd264a5d36 100644 --- a/fs/iomap/fiemap.c +++ b/fs/iomap/fiemap.c @@ -43,20 +43,20 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi, } static loff_t -iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap, struct iomap *srcmap) +iomap_fiemap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - struct fiemap_ctx *ctx = data; - loff_t ret = length; + struct fiemap_ctx *ctx = data->priv; + loff_t ret = data->len; if (iomap->type == IOMAP_HOLE) - return length; + return data->len; ret = iomap_to_fiemap(ctx->fi, &ctx->prev, 0); ctx->prev = *iomap; switch (ret) { case 0: /* success */ - return length; + return data->len; case 1: /* extent array full */ return 0; default: @@ -68,6 +68,13 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, loff_t start, loff_t len, const struct iomap_ops *ops) { struct fiemap_ctx ctx; + struct iomap_ctx data = { + .inode = inode, + .pos = start, + .len = len, + .flags = IOMAP_REPORT, + .priv = &ctx + }; loff_t ret; memset(&ctx, 0, sizeof(ctx)); @@ -84,9 +91,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, return ret; } - while (len > 0) { - ret = iomap_apply(inode, start, len, IOMAP_REPORT, ops, &ctx, - iomap_fiemap_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_fiemap_actor); /* inode with no (attribute) mapping will give ENOENT */ if (ret == -ENOENT) break; @@ -95,8 +101,8 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, if (ret == 0) break; - start += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } if (ctx.prev.type != IOMAP_HOLE) { @@ -110,13 +116,14 @@ int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fi, EXPORT_SYMBOL_GPL(iomap_fiemap); static loff_t -iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_bmap_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { - sector_t *bno = data, addr; + sector_t *bno = data->priv, addr; if (iomap->type == IOMAP_MAPPED) { - addr = (pos - iomap->offset + iomap->addr) >> inode->i_blkbits; + addr = (data->pos - iomap->offset + iomap->addr) >> + data->inode->i_blkbits; if (addr > INT_MAX) WARN(1, "would truncate bmap result\n"); else @@ -131,16 +138,19 @@ iomap_bmap(struct address_space *mapping, sector_t bno, const struct iomap_ops *ops) { struct inode *inode = mapping->host; - loff_t pos = bno << inode->i_blkbits; - unsigned blocksize = i_blocksize(inode); + struct iomap_ctx data = { + .inode = inode, + .pos = bno << inode->i_blkbits, + .len = i_blocksize(inode), + .priv = &bno + }; int ret; if (filemap_write_and_wait(mapping)) return 0; bno = 0; - ret = iomap_apply(inode, pos, blocksize, 0, ops, &bno, - iomap_bmap_actor); + ret = iomap_apply(&data, ops, iomap_bmap_actor); if (ret) return 0; return bno; diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c index 89f61d93c0bc..5501be02557a 100644 --- a/fs/iomap/seek.c +++ b/fs/iomap/seek.c @@ -118,21 +118,23 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, static loff_t -iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_seek_hole_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t offset = data->pos; + switch (iomap->type) { case IOMAP_UNWRITTEN: - offset = page_cache_seek_hole_data(inode, offset, length, - SEEK_HOLE); + offset = page_cache_seek_hole_data(data->inode, offset, + data->len, SEEK_HOLE); if (offset < 0) - return length; + return data->len; /* fall through */ case IOMAP_HOLE: - *(loff_t *)data = offset; + *(loff_t *)data->priv = offset; return 0; default: - return length; + return data->len; } } @@ -140,23 +142,28 @@ loff_t iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops) { loff_t size = i_size_read(inode); - loff_t length = size - offset; + struct iomap_ctx data = { + .inode = inode, + .len = size - offset, + .priv = &offset, + .flags = IOMAP_REPORT + }; loff_t ret; /* Nothing to be found before or beyond the end of the file. */ if (offset < 0 || offset >= size) return -ENXIO; - while (length > 0) { - ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops, - &offset, iomap_seek_hole_actor); + while (data.len > 0) { + data.pos = offset; + ret = iomap_apply(&data, ops, iomap_seek_hole_actor); if (ret < 0) return ret; if (ret == 0) break; offset += ret; - length -= ret; + data.len -= ret; } return offset; @@ -164,20 +171,22 @@ iomap_seek_hole(struct inode *inode, loff_t offset, const struct iomap_ops *ops) EXPORT_SYMBOL_GPL(iomap_seek_hole); static loff_t -iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap, struct iomap *srcmap) +iomap_seek_data_actor(const struct iomap_ctx *data, struct iomap *iomap, + struct iomap *srcmap) { + loff_t offset = data->pos; + switch (iomap->type) { case IOMAP_HOLE: - return length; + return data->len; case IOMAP_UNWRITTEN: - offset = page_cache_seek_hole_data(inode, offset, length, - SEEK_DATA); + offset = page_cache_seek_hole_data(data->inode, offset, + data->len, SEEK_DATA); if (offset < 0) - return length; + return data->len; /*FALLTHRU*/ default: - *(loff_t *)data = offset; + *(loff_t *)data->priv = offset; return 0; } } @@ -186,26 +195,31 @@ loff_t iomap_seek_data(struct inode *inode, loff_t offset, const struct iomap_ops *ops) { loff_t size = i_size_read(inode); - loff_t length = size - offset; + struct iomap_ctx data = { + .inode = inode, + .len = size - offset, + .priv = &offset, + .flags = IOMAP_REPORT + }; loff_t ret; /* Nothing to be found before or beyond the end of the file. */ if (offset < 0 || offset >= size) return -ENXIO; - while (length > 0) { - ret = iomap_apply(inode, offset, length, IOMAP_REPORT, ops, - &offset, iomap_seek_data_actor); + while (data.len > 0) { + data.pos = offset; + ret = iomap_apply(&data, ops, iomap_seek_data_actor); if (ret < 0) return ret; if (ret == 0) break; offset += ret; - length -= ret; + data.len -= ret; } - if (length <= 0) + if (data.len <= 0) return -ENXIO; return offset; } diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index a648dbf6991e..aae04b40a3b7 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -75,11 +75,10 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) * swap only cares about contiguous page-aligned physical extents and makes no * distinction between written and unwritten extents. */ -static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, - loff_t count, void *data, struct iomap *iomap, - struct iomap *srcmap) +static loff_t iomap_swapfile_activate_actor(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap) { - struct iomap_swapfile_info *isi = data; + struct iomap_swapfile_info *isi = data->priv; int error; switch (iomap->type) { @@ -125,7 +124,7 @@ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, return error; memcpy(&isi->iomap, iomap, sizeof(isi->iomap)); } - return count; + return data->len; } /* @@ -142,8 +141,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, }; struct address_space *mapping = swap_file->f_mapping; struct inode *inode = mapping->host; - loff_t pos = 0; - loff_t len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE); + struct iomap_ctx data = { + .inode = inode, + .pos = 0, + .len = ALIGN_DOWN(i_size_read(inode), PAGE_SIZE), + .priv = &isi, + .flags = IOMAP_REPORT + }; loff_t ret; /* @@ -154,14 +158,13 @@ int iomap_swapfile_activate(struct swap_info_struct *sis, if (ret) return ret; - while (len > 0) { - ret = iomap_apply(inode, pos, len, IOMAP_REPORT, - ops, &isi, iomap_swapfile_activate_actor); + while (data.len > 0) { + ret = iomap_apply(&data, ops, iomap_swapfile_activate_actor); if (ret <= 0) return ret; - pos += ret; - len -= ret; + data.pos += ret; + data.len -= ret; } if (isi.iomap.length) { diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 8b09463dae0d..00e439aac8ea 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -145,11 +145,18 @@ struct iomap_ops { /* * Main iomap iterator function. */ -typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len, - void *data, struct iomap *iomap, struct iomap *srcmap); +struct iomap_ctx { + struct inode *inode; + loff_t pos; + loff_t len; + void *priv; + unsigned flags; +}; + +typedef loff_t (*iomap_actor_t)(const struct iomap_ctx *data, + struct iomap *iomap, struct iomap *srcmap); -loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, const struct iomap_ops *ops, void *data, +loff_t iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, iomap_actor_t actor); ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, From patchwork Tue Dec 17 14:39:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297777 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D0A3113B6 for ; Tue, 17 Dec 2019 14:40:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ABCB224680 for ; Tue, 17 Dec 2019 14:40:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="msJo2enQ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728625AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:40865 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728608AbfLQOj4 (ORCPT ); Tue, 17 Dec 2019 09:39:56 -0500 Received: by mail-io1-f68.google.com with SMTP id x1so2131508iop.7 for ; Tue, 17 Dec 2019 06:39:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q2t1wbHxKKKtUdtcUmZwxHQTSURnhol0rUQmYc3rnRQ=; b=msJo2enQjInrTvHVWg7Q+Z7EQ/nUVNXBraZbHG0oDoPJai9Aw7fJelgGmqmg/yLz3Q IxNq7da227XNWskPs5YZDPkksLrksK+B3OlR4q2IpVoEhR4KXC5DfOgwykS+rNJXLWsr 6M4whoghpniEmcY1DAo891btCNofKb7o9xwPc+DGmYSsLnmhOIoVUkLefX6kwwIigoEP +Y7/MRJx/lIJnHCLqCK2uvHxatGExrnKbHZhcBoLJV3muyKEMeqWgGgsNEpklsg9O5ZE BlgVdEjoljwWHtcsa7FR1vyJ8iNszpL3IemrDGlKfiN/T/qWAtG+ESv//j24ERZiaYYD 3lXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q2t1wbHxKKKtUdtcUmZwxHQTSURnhol0rUQmYc3rnRQ=; b=O0A3NzqWgYYYV4BF6ObJcXdqk3LNKQ7dQt7O7OPt63aP5a3hChMTnWOAWnx6u0qz08 iskSqCZxCjbWA52yZTsg62EUUaZIlhAI/kj5x9jnltDEc+fEmfBSP+t/uIJQP0lbQzPZ J830c5SRa52btzYlv4MkEPYe2DU/NyWmbclzPGmMdsG2zgVD6ct3MmCOhFlw3JETYoRp Jm+ga3vYwprrTHvRerHbXzZ3j/ZkE4pbJPssCX/L+PlzgXaJ7jEFf++ecgYV4FxcDhvK z72buPgg9QUvM26pBnxO9oU581POn0Yv8NVF6T/CwecCvXWAM98Wn0Lb5FK+ugcshVzO /6dA== X-Gm-Message-State: APjAAAUEmDneMc+zq/XvYd76NuVEo7UTdrN7vppN/rfamtejtz8gpsqa 6qVQRjprHyCicNtQZeAxs+EZAg== X-Google-Smtp-Source: APXvYqyAnsuB8ZAralfexxDLoU7WvWvM0qDJKnIIFzyd4aLl/ADMJwnw3aX9J/s22QIMOlWMxosrwQ== X-Received: by 2002:a6b:7409:: with SMTP id s9mr4133569iog.197.1576593595835; Tue, 17 Dec 2019 06:39:55 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:55 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/6] iomap: support RWF_UNCACHED for buffered writes Date: Tue, 17 Dec 2019 07:39:47 -0700 Message-Id: <20191217143948.26380-6-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This adds support for RWF_UNCACHED for file systems using iomap to perform buffered writes. We use the generic infrastructure for this, by tracking pages we created and calling write_drop_cached_pages() to issue writeback and prune those pages. Signed-off-by: Jens Axboe --- fs/iomap/apply.c | 35 +++++++++++++++++++++++++++++++++++ fs/iomap/buffered-io.c | 28 ++++++++++++++++++++++++---- fs/iomap/trace.h | 4 +++- include/linux/iomap.h | 5 +++++ 4 files changed, 67 insertions(+), 5 deletions(-) diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 792079403a22..687e86945b27 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -92,5 +92,40 @@ iomap_apply(struct iomap_ctx *data, const struct iomap_ops *ops, data->flags, &iomap); } + if (written <= 0) + goto out; + + /* + * If this is an uncached write, then we need to write and sync this + * range of data. This is only true for a buffered write, not for + * O_DIRECT. + */ + if ((data->flags & (IOMAP_WRITE|IOMAP_DIRECT|IOMAP_UNCACHED)) == + (IOMAP_WRITE|IOMAP_UNCACHED)) { + struct address_space *mapping = data->inode->i_mapping; + + end = data->pos + written; + ret = filemap_write_and_wait_range(mapping, data->pos, end); + if (ret) + goto out; + + /* + * No pages were created for this range, we're done. We only + * invalidate the range if no pages were created for the + * entire range. + */ + if (!(iomap.flags & IOMAP_F_PAGE_CREATE)) + goto out; + + /* + * Try to invalidate cache pages for the range we just wrote. + * We don't care if invalidation fails as the write has still + * worked and leaving clean uptodate pages in the page cache + * isn't a corruption vector for uncached IO. + */ + invalidate_inode_pages2_range(mapping, + data->pos >> PAGE_SHIFT, end >> PAGE_SHIFT); + } +out: return written ? written : ret; } diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 7f8300bce767..328afeba950f 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -582,6 +582,7 @@ EXPORT_SYMBOL_GPL(iomap_migrate_page); enum { IOMAP_WRITE_F_UNSHARE = (1 << 0), + IOMAP_WRITE_F_UNCACHED = (1 << 1), }; static void @@ -659,6 +660,7 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, struct page **pagep, struct iomap *iomap, struct iomap *srcmap) { const struct iomap_page_ops *page_ops = iomap->page_ops; + unsigned aop_flags; struct page *page; int status = 0; @@ -675,8 +677,11 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, return status; } + aop_flags = AOP_FLAG_NOFS; + if (flags & IOMAP_WRITE_F_UNCACHED) + aop_flags |= AOP_FLAG_UNCACHED; page = grab_cache_page_write_begin(inode->i_mapping, pos >> PAGE_SHIFT, - AOP_FLAG_NOFS); + aop_flags); if (!page) { status = -ENOMEM; goto out_no_page; @@ -820,9 +825,13 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, struct iov_iter *i = data->priv; loff_t length = data->len; loff_t pos = data->pos; + unsigned flags = 0; long status = 0; ssize_t written = 0; + if (data->flags & IOMAP_UNCACHED) + flags |= IOMAP_WRITE_F_UNCACHED; + do { struct page *page; unsigned long offset; /* Offset into pagecache page */ @@ -851,10 +860,18 @@ iomap_write_actor(const struct iomap_ctx *data, struct iomap *iomap, break; } - status = iomap_write_begin(inode, pos, bytes, 0, &page, iomap, - srcmap); - if (unlikely(status)) +retry: + status = iomap_write_begin(inode, pos, bytes, flags, + &page, iomap, srcmap); + if (unlikely(status)) { + if (status == -ENOMEM && + (flags & IOMAP_WRITE_F_UNCACHED)) { + iomap->flags |= IOMAP_F_PAGE_CREATE; + flags &= ~IOMAP_WRITE_F_UNCACHED; + goto retry; + } break; + } if (mapping_writably_mapped(inode->i_mapping)) flush_dcache_page(page); @@ -907,6 +924,9 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *iter, }; loff_t ret = 0, written = 0; + if (iocb->ki_flags & IOCB_UNCACHED) + data.flags |= IOMAP_UNCACHED; + while (iov_iter_count(iter)) { data.len = iov_iter_count(iter); ret = iomap_apply(&data, ops, iomap_write_actor); diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index 6dc227b8c47e..63c771e3eef5 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -93,7 +93,8 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage); { IOMAP_REPORT, "REPORT" }, \ { IOMAP_FAULT, "FAULT" }, \ { IOMAP_DIRECT, "DIRECT" }, \ - { IOMAP_NOWAIT, "NOWAIT" } + { IOMAP_NOWAIT, "NOWAIT" }, \ + { IOMAP_UNCACHED, "UNCACHED" } #define IOMAP_F_FLAGS_STRINGS \ { IOMAP_F_NEW, "NEW" }, \ @@ -101,6 +102,7 @@ DEFINE_PAGE_EVENT(iomap_invalidatepage); { IOMAP_F_SHARED, "SHARED" }, \ { IOMAP_F_MERGED, "MERGED" }, \ { IOMAP_F_BUFFER_HEAD, "BH" }, \ + { IOMAP_F_PAGE_CREATE, "PAGE_CREATE" }, \ { IOMAP_F_SIZE_CHANGED, "SIZE_CHANGED" } DECLARE_EVENT_CLASS(iomap_class, diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 00e439aac8ea..58311b6fdfdd 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -48,12 +48,16 @@ struct vm_fault; * * IOMAP_F_BUFFER_HEAD indicates that the file system requires the use of * buffer heads for this mapping. + * + * IOMAP_F_PAGE_CREATE indicates that pages had to be allocated to satisfy + * this operation. */ #define IOMAP_F_NEW 0x01 #define IOMAP_F_DIRTY 0x02 #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 +#define IOMAP_F_PAGE_CREATE 0x20 /* * Flags set by the core iomap code during operations: @@ -121,6 +125,7 @@ struct iomap_page_ops { #define IOMAP_FAULT (1 << 3) /* mapping for page fault */ #define IOMAP_DIRECT (1 << 4) /* direct I/O */ #define IOMAP_NOWAIT (1 << 5) /* do not block */ +#define IOMAP_UNCACHED (1 << 6) /* uncached IO */ struct iomap_ops { /* From patchwork Tue Dec 17 14:39:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 11297771 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E6FD913B6 for ; Tue, 17 Dec 2019 14:39:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C52CE24682 for ; Tue, 17 Dec 2019 14:39:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="1w7hkIWz" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728690AbfLQOj6 (ORCPT ); Tue, 17 Dec 2019 09:39:58 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:36689 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728608AbfLQOj5 (ORCPT ); Tue, 17 Dec 2019 09:39:57 -0500 Received: by mail-io1-f65.google.com with SMTP id r13so993965ioa.3 for ; Tue, 17 Dec 2019 06:39:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hSvlHPa+iglQscNJSf7xQ/5HwRTlC6yUkzeNdFCcqV8=; b=1w7hkIWzvLtMu/dLZZMcZIijlZyfYWT25XG5COOJJfc8oH+ykaNVmhW+kyy47uEmxT 44BfhsIy0PIQPcGGVhchbkR3/gV/mUaau6NjOzs4QWM9ffCFMNcxvMItscKe+DnZ8qPF Po3NAj4x4dgXGv6qHecBmhualhrDq1w6EvAdBXsygv8MWFEzZ7R9YWkEgVggpMNvni6n n37RGtE7bIULBdKOjo8NKGU+SO0IhHzTxOZvEEt83L9ftLfxh3r/Lej6KO9UMyH6X+iN W11mqMTLUelQXj54y55ExU0ini0NqxR9dbme6TbUTRm5QE3/urjzZE6sDy95hbxBcJ6x ntdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hSvlHPa+iglQscNJSf7xQ/5HwRTlC6yUkzeNdFCcqV8=; b=pj6d+/OecgbANBN++1MhPx9me1xXLtppcfVxK8BbrvnpmLvPsyBGvHfA8IeV/o/O8B q8PY61zd8rqQ1ruIXFb8cXSj6Ziib/hts2FlwlUXsPs/bSJcDyPhUSMUrS7PgO2MbJIq 06Sg++Lw68y8MYOg4TCK0t5kSILpIrXz6ELZLTaSRS782PwOWm38rY98pIzho4aHsKAo WV0TAQbYLs0LP5OQ1zWrV2LET+MWbObVI/Wk4ZhInMgUVQpbnTbdax2warzP3dzdxiuK UJ/TMJeZk5o82oPfmWZuGeUC0mW0BZu+D4dhm03cId/y+k6nqNWPneVPTho0x4lYY94H KUug== X-Gm-Message-State: APjAAAXZ1t1m0dkHEYyNo6SGib63SvQP0QCBwfUOfrKR61uXU/Z//aCz KUOZPNFPXNYI0Tho1+xZ77m20g== X-Google-Smtp-Source: APXvYqyLgOsF8+OSoa6LuQKDnmSyMVA3kpVF98IU1cYOBzMtc4vb4+N38upf1F2528sIApqSicZCyw== X-Received: by 2002:a6b:680d:: with SMTP id d13mr3796828ioc.188.1576593596972; Tue, 17 Dec 2019 06:39:56 -0800 (PST) Received: from x1.thefacebook.com ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id w21sm5285255ioc.34.2019.12.17.06.39.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Dec 2019 06:39:56 -0800 (PST) From: Jens Axboe To: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: willy@infradead.org, clm@fb.com, torvalds@linux-foundation.org, david@fromorbit.com, Jens Axboe Subject: [PATCH 6/6] xfs: don't do delayed allocations for uncached buffered writes Date: Tue, 17 Dec 2019 07:39:48 -0700 Message-Id: <20191217143948.26380-7-axboe@kernel.dk> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20191217143948.26380-1-axboe@kernel.dk> References: <20191217143948.26380-1-axboe@kernel.dk> MIME-Version: 1.0 Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org This data is going to be written immediately, so don't bother trying to do delayed allocation for it. Suggested-by: Dave Chinner Signed-off-by: Jens Axboe Reviewed-by: Darrick J. Wong --- fs/xfs/xfs_iomap.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 28e2d1f37267..d0cd4a05d59f 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -847,8 +847,11 @@ xfs_buffered_write_iomap_begin( int allocfork = XFS_DATA_FORK; int error = 0; - /* we can't use delayed allocations when using extent size hints */ - if (xfs_get_extsz_hint(ip)) + /* + * Don't do delayed allocations when using extent size hints, or + * if we were asked to do uncached buffered writes. + */ + if (xfs_get_extsz_hint(ip) || (flags & IOMAP_UNCACHED)) return xfs_direct_write_iomap_begin(inode, offset, count, flags, iomap, srcmap);