From patchwork Thu Sep 5 15:06:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133309 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7D5C614ED for ; Thu, 5 Sep 2019 15:07:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BBFA62082C for ; Thu, 5 Sep 2019 15:07:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390253AbfIEPG7 (ORCPT ); Thu, 5 Sep 2019 11:06:59 -0400 Received: from mx2.suse.de ([195.135.220.15]:55902 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1732130AbfIEPG7 (ORCPT ); Thu, 5 Sep 2019 11:06:59 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D60CFB61B; Thu, 5 Sep 2019 15:06:56 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 01/15] iomap: Use a srcmap for a read-modify-write I/O Date: Thu, 5 Sep 2019 10:06:36 -0500 Message-Id: <20190905150650.21089-2-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues A preparation patch for copy-on-write (CoW). The srcmap is used to identify where the read is to be performed from. This is passed to iomap->begin() of the respective filesystem, which is supposed to put in the details for reading before performing the copy for CoW. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Darrick J. Wong --- fs/dax.c | 8 +++++--- fs/ext2/inode.c | 2 +- fs/ext4/inode.c | 2 +- fs/gfs2/bmap.c | 3 ++- fs/iomap/apply.c | 5 +++-- fs/iomap/buffered-io.c | 14 +++++++------- fs/iomap/direct-io.c | 2 +- fs/iomap/fiemap.c | 4 ++-- fs/iomap/seek.c | 4 ++-- fs/iomap/swapfile.c | 3 ++- fs/xfs/xfs_iomap.c | 9 ++++++--- include/linux/iomap.h | 5 +++-- 12 files changed, 35 insertions(+), 26 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 6bf81f931de3..e961d8dc23ef 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1090,7 +1090,7 @@ EXPORT_SYMBOL_GPL(__dax_zero_page_range); static loff_t dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct block_device *bdev = iomap->bdev; struct dax_device *dax_dev = iomap->dax_dev; @@ -1248,6 +1248,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, unsigned long vaddr = vmf->address; loff_t pos = (loff_t)vmf->pgoff << PAGE_SHIFT; struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; unsigned flags = IOMAP_FAULT; int error, major = 0; bool write = vmf->flags & FAULT_FLAG_WRITE; @@ -1292,7 +1293,7 @@ static vm_fault_t dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp, * the file system block size to be equal the page size, which means * that we never have to deal with more than a single extent here. */ - error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap); + error = ops->iomap_begin(inode, pos, PAGE_SIZE, flags, &iomap, &srcmap); if (iomap_errp) *iomap_errp = error; if (error) { @@ -1472,6 +1473,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, struct inode *inode = mapping->host; vm_fault_t result = VM_FAULT_FALLBACK; struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; pgoff_t max_pgoff; void *entry; loff_t pos; @@ -1546,7 +1548,7 @@ static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp, * to look up our filesystem block. */ pos = (loff_t)xas.xa_index << PAGE_SHIFT; - error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap); + error = ops->iomap_begin(inode, pos, PMD_SIZE, iomap_flags, &iomap, &srcmap); if (error) goto unlock_entry; diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index 7004ce581a32..467c13ff6b40 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -801,7 +801,7 @@ int ext2_get_block(struct inode *inode, sector_t iblock, #ifdef CONFIG_FS_DAX static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, struct iomap *srcmap) { unsigned int blkbits = inode->i_blkbits; unsigned long first_block = offset >> blkbits; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 420fe3deed39..918f94eff799 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3453,7 +3453,7 @@ static bool ext4_inode_datasync_dirty(struct inode *inode) } static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, struct iomap *srcmap) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); unsigned int blkbits = inode->i_blkbits; diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 4f8b5fd6c81f..0189262989f2 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -1164,7 +1164,8 @@ static int gfs2_iomap_begin_write(struct inode *inode, loff_t pos, } static int gfs2_iomap_begin(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, struct iomap *iomap) + unsigned flags, struct iomap *iomap, + struct iomap *srcmap) { struct gfs2_inode *ip = GFS2_I(inode); struct metapath mp = { .mp_aheight = 1, }; diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c index 54c02aecf3cd..6cdb362fff36 100644 --- a/fs/iomap/apply.c +++ b/fs/iomap/apply.c @@ -24,6 +24,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, const struct iomap_ops *ops, void *data, iomap_actor_t actor) { struct iomap iomap = { 0 }; + struct iomap srcmap = { 0 }; loff_t written = 0, ret; /* @@ -38,7 +39,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * expose transient stale data. If the reserve fails, we can safely * back out at this point as there is nothing to undo. */ - ret = ops->iomap_begin(inode, pos, length, flags, &iomap); + ret = ops->iomap_begin(inode, pos, length, flags, &iomap, &srcmap); if (ret) return ret; if (WARN_ON(iomap.offset > pos)) @@ -58,7 +59,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, * we can do the copy-in page by page without having to worry about * failures exposing transient data. */ - written = actor(inode, pos, length, data, &iomap); + written = actor(inode, pos, length, data, &iomap, &srcmap); /* * Now the data has been copied, commit the range we've copied. This diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index e25901ae3ff4..f27756c0b31c 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -205,7 +205,7 @@ iomap_read_inline_data(struct inode *inode, struct page *page, static loff_t iomap_readpage_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct iomap_readpage_ctx *ctx = data; struct page *page = ctx->cur_page; @@ -351,7 +351,7 @@ iomap_next_page(struct inode *inode, struct list_head *pages, loff_t pos, static loff_t iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct iomap_readpage_ctx *ctx = data; loff_t done, ret; @@ -371,7 +371,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, loff_t length, ctx->cur_page_in_bio = false; } ret = iomap_readpage_actor(inode, pos + done, length - done, - ctx, iomap); + ctx, iomap, srcmap); } return done; @@ -736,7 +736,7 @@ iomap_write_end(struct inode *inode, loff_t pos, unsigned len, static loff_t iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct iov_iter *i = data; long status = 0; @@ -853,7 +853,7 @@ __iomap_read_page(struct inode *inode, loff_t offset) static loff_t iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { long status = 0; ssize_t written = 0; @@ -942,7 +942,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, unsigned bytes, static loff_t iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { bool *did_zero = data; loff_t written = 0; @@ -1011,7 +1011,7 @@ EXPORT_SYMBOL_GPL(iomap_truncate_page); static loff_t iomap_page_mkwrite_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct page *page = data; int ret; diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 1fc28c2da279..e3ccbf7daaae 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -358,7 +358,7 @@ iomap_dio_inline_actor(struct inode *inode, loff_t pos, loff_t length, static loff_t iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { struct iomap_dio *dio = data; diff --git a/fs/iomap/fiemap.c b/fs/iomap/fiemap.c index f26fdd36e383..690ef2d7c6c8 100644 --- a/fs/iomap/fiemap.c +++ b/fs/iomap/fiemap.c @@ -44,7 +44,7 @@ static int iomap_to_fiemap(struct fiemap_extent_info *fi, static loff_t iomap_fiemap_actor(struct inode *inode, loff_t pos, loff_t length, void *data, - struct iomap *iomap) + struct iomap *iomap, struct iomap *srcmap) { struct fiemap_ctx *ctx = data; loff_t ret = length; @@ -111,7 +111,7 @@ EXPORT_SYMBOL_GPL(iomap_fiemap); static loff_t iomap_bmap_actor(struct inode *inode, loff_t pos, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { sector_t *bno = data, addr; diff --git a/fs/iomap/seek.c b/fs/iomap/seek.c index c04bad4b2b43..89f61d93c0bc 100644 --- a/fs/iomap/seek.c +++ b/fs/iomap/seek.c @@ -119,7 +119,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length, static loff_t iomap_seek_hole_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { switch (iomap->type) { case IOMAP_UNWRITTEN: @@ -165,7 +165,7 @@ EXPORT_SYMBOL_GPL(iomap_seek_hole); static loff_t iomap_seek_data_actor(struct inode *inode, loff_t offset, loff_t length, - void *data, struct iomap *iomap) + void *data, struct iomap *iomap, struct iomap *srcmap) { switch (iomap->type) { case IOMAP_HOLE: diff --git a/fs/iomap/swapfile.c b/fs/iomap/swapfile.c index 152a230f668d..a648dbf6991e 100644 --- a/fs/iomap/swapfile.c +++ b/fs/iomap/swapfile.c @@ -76,7 +76,8 @@ static int iomap_swapfile_add_extent(struct iomap_swapfile_info *isi) * distinction between written and unwritten extents. */ static loff_t iomap_swapfile_activate_actor(struct inode *inode, loff_t pos, - loff_t count, void *data, struct iomap *iomap) + loff_t count, void *data, struct iomap *iomap, + struct iomap *srcmap) { struct iomap_swapfile_info *isi = data; int error; diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 3a4310d7cb59..8321733c16c3 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -922,7 +922,8 @@ xfs_file_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; @@ -1145,7 +1146,8 @@ xfs_seek_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; @@ -1231,7 +1233,8 @@ xfs_xattr_iomap_begin( loff_t offset, loff_t length, unsigned flags, - struct iomap *iomap) + struct iomap *iomap, + struct iomap *srcmap) { struct xfs_inode *ip = XFS_I(inode); struct xfs_mount *mp = ip->i_mount; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 7aa5d6117936..9782a79dde59 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -110,7 +110,8 @@ struct iomap_ops { * The actual length is returned in iomap->length. */ int (*iomap_begin)(struct inode *inode, loff_t pos, loff_t length, - unsigned flags, struct iomap *iomap); + unsigned flags, struct iomap *iomap, + struct iomap *srcmap); /* * Commit and/or unreserve space previous allocated using iomap_begin. @@ -126,7 +127,7 @@ struct iomap_ops { * Main iomap iterator function. */ typedef loff_t (*iomap_actor_t)(struct inode *inode, loff_t pos, loff_t len, - void *data, struct iomap *iomap); + void *data, struct iomap *iomap, struct iomap *srcmap); loff_t iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags, const struct iomap_ops *ops, void *data, From patchwork Thu Sep 5 15:06:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133313 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 02B4F924 for ; Thu, 5 Sep 2019 15:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 415A62082C for ; Thu, 5 Sep 2019 15:07:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390258AbfIEPHC (ORCPT ); Thu, 5 Sep 2019 11:07:02 -0400 Received: from mx2.suse.de ([195.135.220.15]:55962 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390251AbfIEPHB (ORCPT ); Thu, 5 Sep 2019 11:07:01 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 06792B114; Thu, 5 Sep 2019 15:06:59 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 02/15] iomap: Read page from srcmap if IOMAP_F_COW is set Date: Thu, 5 Sep 2019 10:06:37 -0500 Message-Id: <20190905150650.21089-3-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues In case of a IOMAP_F_COW, read a page from the srcmap before performing a write on the page. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Darrick J. Wong --- fs/iomap/buffered-io.c | 30 +++++++++++++++++++++--------- include/linux/iomap.h | 3 +++ 2 files changed, 24 insertions(+), 9 deletions(-) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index f27756c0b31c..560459df75e4 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -581,7 +581,7 @@ __iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, static int iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, - struct page **pagep, struct iomap *iomap) + struct page **pagep, struct iomap *iomap, struct iomap *srcmap) { const struct iomap_page_ops *page_ops = iomap->page_ops; pgoff_t index = pos >> PAGE_SHIFT; @@ -605,12 +605,24 @@ iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, unsigned flags, goto out_no_page; } - if (iomap->type == IOMAP_INLINE) + if (iomap->type == IOMAP_INLINE) { iomap_read_inline_data(inode, page, iomap); - else if (iomap->flags & IOMAP_F_BUFFER_HEAD) + } else if (iomap->flags & IOMAP_F_COW) { + if (WARN_ON_ONCE(iomap->flags & IOMAP_F_BUFFER_HEAD)) { + status = -EIO; + goto out_no_page; + } + if (WARN_ON_ONCE(srcmap->type == IOMAP_HOLE && + srcmap->addr != IOMAP_NULL_ADDR)) { + status = -EIO; + goto out_no_page; + } + status = __iomap_write_begin(inode, pos, len, page, srcmap); + } else if (iomap->flags & IOMAP_F_BUFFER_HEAD) { status = __block_write_begin_int(page, pos, len, NULL, iomap); - else + } else { status = __iomap_write_begin(inode, pos, len, page, iomap); + } if (unlikely(status)) goto out_unlock; @@ -772,7 +784,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t length, void *data, } status = iomap_write_begin(inode, pos, bytes, flags, &page, - iomap); + iomap, srcmap); if (unlikely(status)) break; @@ -871,7 +883,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t length, void *data, return PTR_ERR(rpage); status = iomap_write_begin(inode, pos, bytes, - AOP_FLAG_NOFS, &page, iomap); + AOP_FLAG_NOFS, &page, iomap, srcmap); put_page(rpage); if (unlikely(status)) return status; @@ -917,13 +929,13 @@ iomap_file_dirty(struct inode *inode, loff_t pos, loff_t len, EXPORT_SYMBOL_GPL(iomap_file_dirty); static int iomap_zero(struct inode *inode, loff_t pos, unsigned offset, - unsigned bytes, struct iomap *iomap) + unsigned bytes, struct iomap *iomap, struct iomap *srcmap) { struct page *page; int status; status = iomap_write_begin(inode, pos, bytes, AOP_FLAG_NOFS, &page, - iomap); + iomap, srcmap); if (status) return status; @@ -961,7 +973,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, loff_t count, if (IS_DAX(inode)) status = iomap_dax_zero(pos, offset, bytes, iomap); else - status = iomap_zero(inode, pos, offset, bytes, iomap); + status = iomap_zero(inode, pos, offset, bytes, iomap, srcmap); if (status < 0) return status; diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9782a79dde59..7fdb09925740 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -44,6 +44,9 @@ struct vm_fault; #define IOMAP_F_MERGED 0x10 /* contains multiple blocks/extents */ #define IOMAP_F_SHARED 0x20 /* block shared with another file */ +/* Flags for CoW */ +#define IOMAP_F_COW 0x100 /* copy from srcmap before write */ + /* * Flags from 0x1000 up are for file system specific usage: */ From patchwork Thu Sep 5 15:06:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133319 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DC7CC924 for ; Thu, 5 Sep 2019 15:07:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1F58E2082C for ; Thu, 5 Sep 2019 15:07:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390266AbfIEPHD (ORCPT ); Thu, 5 Sep 2019 11:07:03 -0400 Received: from mx2.suse.de ([195.135.220.15]:56008 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390255AbfIEPHC (ORCPT ); Thu, 5 Sep 2019 11:07:02 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 27A25ABC7; Thu, 5 Sep 2019 15:07:01 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 03/15] btrfs: Eliminate PagePrivate for btrfs data pages Date: Thu, 5 Sep 2019 10:06:38 -0500 Message-Id: <20190905150650.21089-4-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues While most of the code works just eliminating page's private field and related code, there is a problem when we are cloning. The extent assumes the data is uptodate. Clear the EXTENT_UPTODATE flag for the extent so the next time the file is read, it is forced to be read from the disk as opposed to pagecache. This patch is required to make sure we don't conflict with iomap's usage of page->private. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/compression.c | 1 - fs/btrfs/extent_io.c | 13 ------------- fs/btrfs/extent_io.h | 2 -- fs/btrfs/file.c | 1 - fs/btrfs/free-space-cache.c | 1 - fs/btrfs/inode.c | 15 +-------------- fs/btrfs/ioctl.c | 4 ++-- fs/btrfs/relocation.c | 2 -- 8 files changed, 3 insertions(+), 36 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 60c47b417a4b..fe41fa3d2999 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -481,7 +481,6 @@ static noinline int add_ra_bio_pages(struct inode *inode, * for these bytes in the file. But, we have to make * sure they map to this compressed extent on disk. */ - set_page_extent_mapped(page); lock_extent(tree, last_offset, end); read_lock(&em_tree->lock); em = lookup_extent_mapping(em_tree, last_offset, diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1ff438fd5bc2..27233fb6660c 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3005,15 +3005,6 @@ static void attach_extent_buffer_page(struct extent_buffer *eb, } } -void set_page_extent_mapped(struct page *page) -{ - if (!PagePrivate(page)) { - SetPagePrivate(page); - get_page(page); - set_page_private(page, EXTENT_PAGE_PRIVATE); - } -} - static struct extent_map * __get_extent_map(struct inode *inode, struct page *page, size_t pg_offset, u64 start, u64 len, get_extent_t *get_extent, @@ -3074,8 +3065,6 @@ static int __do_readpage(struct extent_io_tree *tree, size_t blocksize = inode->i_sb->s_blocksize; unsigned long this_bio_flag = 0; - set_page_extent_mapped(page); - if (!PageUptodate(page)) { if (cleancache_get_page(page) == 0) { BUG_ON(blocksize != PAGE_SIZE); @@ -3589,8 +3578,6 @@ static int __extent_writepage(struct page *page, struct writeback_control *wbc, pg_offset = 0; - set_page_extent_mapped(page); - if (!epd->extent_locked) { ret = writepage_delalloc(inode, page, wbc, start, &nr_written); if (ret == 1) diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 401423b16976..8082774371b5 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -416,8 +416,6 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages, unsigned nr_pages); int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, __u64 start, __u64 len); -void set_page_extent_mapped(struct page *page); - struct extent_buffer *alloc_extent_buffer(struct btrfs_fs_info *fs_info, u64 start); struct extent_buffer *__alloc_dummy_extent_buffer(struct btrfs_fs_info *fs_info, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 58a18ed11546..4466a09f2d98 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1539,7 +1539,6 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, * delalloc bits and dirty the pages as required. */ for (i = 0; i < num_pages; i++) { - set_page_extent_mapped(pages[i]); WARN_ON(!PageLocked(pages[i])); } diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c index 062be9dde4c6..9a0c519bd6d4 100644 --- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -395,7 +395,6 @@ static int io_ctl_prepare_pages(struct btrfs_io_ctl *io_ctl, struct inode *inode for (i = 0; i < io_ctl->num_pages; i++) { clear_page_dirty_for_io(io_ctl->pages[i]); - set_page_extent_mapped(io_ctl->pages[i]); } return 0; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ee582a36653d..258bacefdf5f 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4932,7 +4932,6 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, wait_on_page_writeback(page); lock_extent_bits(io_tree, block_start, block_end, &cached_state); - set_page_extent_mapped(page); ordered = btrfs_lookup_ordered_extent(inode, block_start); if (ordered) { @@ -8754,13 +8753,7 @@ btrfs_readpages(struct file *file, struct address_space *mapping, static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags) { - int ret = try_release_extent_mapping(page, gfp_flags); - if (ret == 1) { - ClearPagePrivate(page); - set_page_private(page, 0); - put_page(page); - } - return ret; + return try_release_extent_mapping(page, gfp_flags); } static int btrfs_releasepage(struct page *page, gfp_t gfp_flags) @@ -8878,11 +8871,6 @@ static void btrfs_invalidatepage(struct page *page, unsigned int offset, } ClearPageChecked(page); - if (PagePrivate(page)) { - ClearPagePrivate(page); - set_page_private(page, 0); - put_page(page); - } } /* @@ -8961,7 +8949,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) wait_on_page_writeback(page); lock_extent_bits(io_tree, page_start, page_end, &cached_state); - set_page_extent_mapped(page); /* * we can't set the delalloc bits if there are pending ordered diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 818f7ec8bb0e..861617e3d0c9 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1355,7 +1355,6 @@ static int cluster_pages_for_defrag(struct inode *inode, for (i = 0; i < i_done; i++) { clear_page_dirty_for_io(pages[i]); ClearPageChecked(pages[i]); - set_page_extent_mapped(pages[i]); set_page_dirty(pages[i]); unlock_page(pages[i]); put_page(pages[i]); @@ -3550,6 +3549,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode, int ret; const u64 len = olen_aligned; u64 last_dest_end = destoff; + struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; ret = -ENOMEM; buf = kvmalloc(fs_info->nodesize, GFP_KERNEL); @@ -3864,6 +3864,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode, destoff, olen, no_time_update); } + clear_extent_uptodate(tree, destoff, destoff+olen, NULL); out: btrfs_free_path(path); kvfree(buf); @@ -3935,7 +3936,6 @@ static noinline int btrfs_clone_files(struct file *file, struct file *file_src, truncate_inode_pages_range(&inode->i_data, round_down(destoff, PAGE_SIZE), round_up(destoff + len, PAGE_SIZE) - 1); - return ret; } diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 7f219851fa23..612988b7eb27 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -3300,8 +3300,6 @@ static int relocate_file_extent_cluster(struct inode *inode, lock_extent(&BTRFS_I(inode)->io_tree, page_start, page_end); - set_page_extent_mapped(page); - if (nr < cluster->nr && page_start + offset == cluster->boundary[nr]) { set_extent_bits(&BTRFS_I(inode)->io_tree, From patchwork Thu Sep 5 15:06:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133331 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0B90414ED for ; Thu, 5 Sep 2019 15:07:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 423852070C for ; Thu, 5 Sep 2019 15:07:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390273AbfIEPHH (ORCPT ); Thu, 5 Sep 2019 11:07:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:56062 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390264AbfIEPHF (ORCPT ); Thu, 5 Sep 2019 11:07:05 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 39678AD1A; Thu, 5 Sep 2019 15:07:03 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 04/15] btrfs: Add a simple buffered iomap write Date: Thu, 5 Sep 2019 10:06:39 -0500 Message-Id: <20190905150650.21089-5-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues This one is a long patch. Most of the code is "inspired" by fs/btrfs/file.c. To keep the size small, all removals are in following patches. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/Kconfig | 1 + fs/btrfs/Makefile | 2 +- fs/btrfs/ctree.h | 1 + fs/btrfs/file.c | 4 +- fs/btrfs/iomap.c | 383 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 388 insertions(+), 3 deletions(-) create mode 100644 fs/btrfs/iomap.c diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig index 38651fae7f21..ba87d21885ca 100644 --- a/fs/btrfs/Kconfig +++ b/fs/btrfs/Kconfig @@ -14,6 +14,7 @@ config BTRFS_FS select RAID6_PQ select XOR_BLOCKS select SRCU + select FS_IOMAP help Btrfs is a general purpose copy-on-write filesystem with extents, diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 76a843198bcb..f88e696b0698 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -11,7 +11,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ - block-rsv.o delalloc-space.o + block-rsv.o delalloc-space.o iomap.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 94660063a162..9c501c7826b4 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3243,6 +3243,7 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); +size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4466a09f2d98..0707db04d3cc 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1829,7 +1829,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return written; pos = iocb->ki_pos; - written_buffered = btrfs_buffered_write(iocb, from); + written_buffered = btrfs_buffered_iomap_write(iocb, from); if (written_buffered < 0) { err = written_buffered; goto out; @@ -1966,7 +1966,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (iocb->ki_flags & IOCB_DIRECT) { num_written = __btrfs_direct_write(iocb, from); } else { - num_written = btrfs_buffered_write(iocb, from); + num_written = btrfs_buffered_iomap_write(iocb, from); if (num_written > 0) iocb->ki_pos = pos + num_written; if (clean_page) diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c new file mode 100644 index 000000000000..025ccbf471bf --- /dev/null +++ b/fs/btrfs/iomap.c @@ -0,0 +1,383 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * iomap support for BTRFS + * + * Copyright (c) 2019 SUSE Linux + * Author: Goldwyn Rodrigues + */ + +#include +#include "ctree.h" +#include "btrfs_inode.h" +#include "volumes.h" +#include "disk-io.h" +#include "delalloc-space.h" + +struct btrfs_iomap { + u64 start; + u64 end; + bool nocow; + int extents_locked; + ssize_t reserved_bytes; + struct extent_changeset *data_reserved; + struct extent_state *cached_state; +}; + + +/* + * This function locks the extent and properly waits for data=ordered extents + * to finish before allowing the pages to be modified if need. + * + * The return value: + * 1 - the extent is locked + * 0 - the extent is not locked, and everything is OK + * -EAGAIN - need re-prepare the pages + * the other < 0 number - Something wrong happens + */ +static noinline int +lock_and_cleanup_extent(struct btrfs_inode *inode, loff_t pos, + size_t write_bytes, + u64 *lockstart, u64 *lockend, + struct extent_state **cached_state) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + u64 start_pos; + u64 last_pos; + int ret = 0; + + start_pos = round_down(pos, fs_info->sectorsize); + last_pos = start_pos + + round_up(pos + write_bytes - start_pos, + fs_info->sectorsize) - 1; + + if (start_pos < inode->vfs_inode.i_size) { + struct btrfs_ordered_extent *ordered; + + lock_extent_bits(&inode->io_tree, start_pos, last_pos, + cached_state); + ordered = btrfs_lookup_ordered_range(inode, start_pos, + last_pos - start_pos + 1); + if (ordered && + ordered->file_offset + ordered->len > start_pos && + ordered->file_offset <= last_pos) { + unlock_extent_cached(&inode->io_tree, start_pos, + last_pos, cached_state); + btrfs_start_ordered_extent(&inode->vfs_inode, + ordered, 1); + btrfs_put_ordered_extent(ordered); + return -EAGAIN; + } + if (ordered) + btrfs_put_ordered_extent(ordered); + + *lockstart = start_pos; + *lockend = last_pos; + ret = 1; + } + + return ret; +} + +static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes) +{ + struct btrfs_fs_info *fs_info = inode->root->fs_info; + struct btrfs_root *root = inode->root; + struct btrfs_ordered_extent *ordered; + u64 lockstart, lockend; + u64 num_bytes; + int ret; + + ret = btrfs_start_write_no_snapshotting(root); + if (!ret) + return -ENOSPC; + + lockstart = round_down(pos, fs_info->sectorsize); + lockend = round_up(pos + *write_bytes, + fs_info->sectorsize) - 1; + + while (1) { + lock_extent(&inode->io_tree, lockstart, lockend); + ordered = btrfs_lookup_ordered_range(inode, lockstart, + lockend - lockstart + 1); + if (!ordered) { + break; + } + unlock_extent(&inode->io_tree, lockstart, lockend); + btrfs_start_ordered_extent(&inode->vfs_inode, ordered, 1); + btrfs_put_ordered_extent(ordered); + } + + num_bytes = lockend - lockstart + 1; + ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes, + NULL, NULL, NULL); + if (ret <= 0) { + ret = 0; + btrfs_end_write_no_snapshotting(root); + } else { + *write_bytes = min_t(size_t, *write_bytes , + num_bytes - pos + lockstart); + } + + unlock_extent(&inode->io_tree, lockstart, lockend); + + return ret; +} + +static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode, + const u64 start, + const u64 len, + struct extent_state **cached_state) +{ + u64 search_start = start; + const u64 end = start + len - 1; + + while (search_start < end) { + const u64 search_len = end - search_start + 1; + struct extent_map *em; + u64 em_len; + int ret = 0; + + em = btrfs_get_extent(inode, NULL, 0, search_start, + search_len, 0); + if (IS_ERR(em)) + return PTR_ERR(em); + + if (em->block_start != EXTENT_MAP_HOLE) + goto next; + + em_len = em->len; + if (em->start < search_start) + em_len -= search_start - em->start; + if (em_len > search_len) + em_len = search_len; + + ret = set_extent_bit(&inode->io_tree, search_start, + search_start + em_len - 1, + EXTENT_DELALLOC_NEW, + NULL, cached_state, GFP_NOFS); +next: + search_start = extent_map_end(em); + free_extent_map(em); + if (ret) + return ret; + } + return 0; +} + +static void btrfs_buffered_page_done(struct inode *inode, loff_t pos, + unsigned copied, struct page *page, + struct iomap *iomap) +{ + if (!page) + return; + SetPageUptodate(page); + ClearPageChecked(page); + set_page_dirty(page); + get_page(page); +} + + +static const struct iomap_page_ops btrfs_buffered_page_ops = { + .page_done = btrfs_buffered_page_done, +}; + + +static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, + loff_t length, unsigned flags, struct iomap *iomap, + struct iomap *srcmap) +{ + int ret; + size_t write_bytes = length; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + size_t sector_offset = pos & (fs_info->sectorsize - 1); + struct btrfs_iomap *bi; + + bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS); + if (!bi) + return -ENOMEM; + + bi->reserved_bytes = round_up(write_bytes + sector_offset, + fs_info->sectorsize); + + /* Reserve data space */ + ret = btrfs_check_data_free_space(inode, &bi->data_reserved, pos, + write_bytes); + if (ret < 0) { + /* + * Space allocation failed. Let's check if we can + * continue I/O without allocations + */ + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | + BTRFS_INODE_PREALLOC)) && + check_can_nocow(BTRFS_I(inode), pos, + &write_bytes) > 0) { + bi->nocow = true; + /* + * our prealloc extent may be smaller than + * write_bytes, so scale down. + */ + bi->reserved_bytes = round_up(write_bytes + + sector_offset, + fs_info->sectorsize); + } else { + goto error; + } + } + + WARN_ON(bi->reserved_bytes == 0); + + /* We have the data space allocated, reserve the metadata now */ + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), + bi->reserved_bytes); + if (ret) { + struct btrfs_root *root = BTRFS_I(inode)->root; + if (!bi->nocow) + btrfs_free_reserved_data_space(inode, + bi->data_reserved, pos, + write_bytes); + else + btrfs_end_write_no_snapshotting(root); + goto error; + } + + do { + ret = lock_and_cleanup_extent( + BTRFS_I(inode), pos, write_bytes, &bi->start, + &bi->end, &bi->cached_state); + } while (ret == -EAGAIN); + + if (ret < 0) { + btrfs_delalloc_release_extents(BTRFS_I(inode), + bi->reserved_bytes, true); + goto release; + } else { + bi->extents_locked = ret; + } + iomap->private = bi; + iomap->length = round_up(write_bytes, fs_info->sectorsize); + iomap->offset = round_down(pos, fs_info->sectorsize); + iomap->addr = IOMAP_NULL_ADDR; + iomap->type = IOMAP_DELALLOC; + iomap->bdev = fs_info->fs_devices->latest_bdev; + iomap->page_ops = &btrfs_buffered_page_ops; + return 0; +release: + if (bi->extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, bi->start, + bi->end, &bi->cached_state); + if (bi->nocow) { + struct btrfs_root *root = BTRFS_I(inode)->root; + btrfs_end_write_no_snapshotting(root); + btrfs_delalloc_release_metadata(BTRFS_I(inode), + bi->reserved_bytes, true); + } else { + btrfs_delalloc_release_space(inode, bi->data_reserved, + round_down(pos, fs_info->sectorsize), + bi->reserved_bytes, true); + } + extent_changeset_free(bi->data_reserved); + +error: + kfree(bi); + return ret; +} + +static int btrfs_buffered_iomap_end(struct inode *inode, loff_t pos, + loff_t length, ssize_t written, unsigned flags, + struct iomap *iomap) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_iomap *bi = iomap->private; + ssize_t release_bytes = round_down(bi->reserved_bytes - written, + 1 << fs_info->sb->s_blocksize_bits); + unsigned int extra_bits = 0; + u64 start_pos = pos & ~((u64) fs_info->sectorsize - 1); + u64 num_bytes = round_up(written + pos - start_pos, + fs_info->sectorsize); + u64 end_of_last_block = start_pos + num_bytes - 1; + int ret = 0; + + if (release_bytes > 0) { + if (bi->nocow) { + btrfs_delalloc_release_metadata(BTRFS_I(inode), + release_bytes, true); + } else { + u64 __pos = round_down(pos + written, fs_info->sectorsize); + btrfs_delalloc_release_space(inode, bi->data_reserved, + __pos, release_bytes, true); + } + } + + /* + * The pages may have already been dirty, clear out old accounting so + * we can set things up properly + */ + clear_extent_bit(&BTRFS_I(inode)->io_tree, start_pos, end_of_last_block, + EXTENT_DIRTY | EXTENT_DELALLOC | EXTENT_DO_ACCOUNTING | + EXTENT_DEFRAG, 0, 0, &bi->cached_state); + + if (!btrfs_is_free_space_inode(BTRFS_I(inode))) { + if (start_pos >= i_size_read(inode) && + !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC)) { + /* + * There can't be any extents following eof in this case + * so just set the delalloc new bit for the range + * directly. + */ + extra_bits |= EXTENT_DELALLOC_NEW; + } else { + ret = btrfs_find_new_delalloc_bytes(BTRFS_I(inode), + start_pos, num_bytes, + &bi->cached_state); + if (ret) + goto unlock; + } + } + + ret = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block, + extra_bits, &bi->cached_state, 0); +unlock: + if (bi->extents_locked) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + bi->start, bi->end, &bi->cached_state); + + if (bi->nocow) { + struct btrfs_root *root = BTRFS_I(inode)->root; + btrfs_end_write_no_snapshotting(root); + if (written > 0) { + u64 start = round_down(pos, fs_info->sectorsize); + u64 end = round_up(pos + written, fs_info->sectorsize) - 1; + set_extent_bit(&BTRFS_I(inode)->io_tree, start, end, + EXTENT_NORESERVE, NULL, NULL, GFP_NOFS); + } + + } + btrfs_delalloc_release_extents(BTRFS_I(inode), bi->reserved_bytes, + true); + + if (written < fs_info->nodesize) + btrfs_btree_balance_dirty(fs_info); + + extent_changeset_free(bi->data_reserved); + kfree(bi); + return ret; +} + +static const struct iomap_ops btrfs_buffered_iomap_ops = { + .iomap_begin = btrfs_buffered_iomap_begin, + .iomap_end = btrfs_buffered_iomap_end, +}; + +size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from) +{ + ssize_t written; + struct inode *inode = file_inode(iocb->ki_filp); + written = iomap_file_buffered_write(iocb, from, &btrfs_buffered_iomap_ops); + if (written > 0) + iocb->ki_pos += written; + if (iocb->ki_pos > i_size_read(inode)) + i_size_write(inode, iocb->ki_pos); + return written; +} + From patchwork Thu Sep 5 15:06:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133327 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A753714ED for ; Thu, 5 Sep 2019 15:07:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DC0C22082C for ; Thu, 5 Sep 2019 15:07:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390278AbfIEPHH (ORCPT ); Thu, 5 Sep 2019 11:07:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:56106 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390255AbfIEPHG (ORCPT ); Thu, 5 Sep 2019 11:07:06 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 44F68B114; Thu, 5 Sep 2019 15:07:05 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 05/15] btrfs: Add CoW in iomap based writes Date: Thu, 5 Sep 2019 10:06:40 -0500 Message-Id: <20190905150650.21089-6-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Set iomap->flags to IOMAP_F_COW and fill up the source map in case the I/O is not page aligned. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/iomap.c | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c index 025ccbf471bf..f8fa34105838 100644 --- a/fs/btrfs/iomap.c +++ b/fs/btrfs/iomap.c @@ -165,6 +165,35 @@ static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode, return 0; } +/* + * get_iomap: Get the block map and fill the iomap structure + * @pos: file position + * @length: I/O length + * @iomap: The iomap structure to fill + */ + +static int get_iomap(struct inode *inode, loff_t pos, loff_t length, + struct iomap *iomap) +{ + struct extent_map *em; + iomap->addr = IOMAP_NULL_ADDR; + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, pos, length, 0); + if (IS_ERR(em)) + return PTR_ERR(em); + /* XXX Do we need to check for em->flags here? */ + if (em->block_start == EXTENT_MAP_HOLE) { + iomap->type = IOMAP_HOLE; + } else { + iomap->addr = em->block_start; + iomap->type = IOMAP_MAPPED; + } + iomap->offset = em->start; + iomap->bdev = em->bdev; + iomap->length = em->len; + free_extent_map(em); + return 0; +} + static void btrfs_buffered_page_done(struct inode *inode, loff_t pos, unsigned copied, struct page *page, struct iomap *iomap) @@ -190,6 +219,7 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, int ret; size_t write_bytes = length; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + size_t end; size_t sector_offset = pos & (fs_info->sectorsize - 1); struct btrfs_iomap *bi; @@ -257,8 +287,18 @@ static int btrfs_buffered_iomap_begin(struct inode *inode, loff_t pos, iomap->private = bi; iomap->length = round_up(write_bytes, fs_info->sectorsize); iomap->offset = round_down(pos, fs_info->sectorsize); + end = pos + write_bytes; + /* Set IOMAP_F_COW if start/end is not page aligned */ + if (((pos & (PAGE_SIZE - 1)) || (end & (PAGE_SIZE - 1)))) { + iomap->flags = IOMAP_F_COW; + ret = get_iomap(inode, pos, length, srcmap); + if (ret < 0) + goto release; + } else { + iomap->type = IOMAP_DELALLOC; + } + iomap->addr = IOMAP_NULL_ADDR; - iomap->type = IOMAP_DELALLOC; iomap->bdev = fs_info->fs_devices->latest_bdev; iomap->page_ops = &btrfs_buffered_page_ops; return 0; From patchwork Thu Sep 5 15:06:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133341 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 58DBC14ED for ; Thu, 5 Sep 2019 15:07:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 85A802082E for ; Thu, 5 Sep 2019 15:07:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390285AbfIEPHM (ORCPT ); Thu, 5 Sep 2019 11:07:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:56356 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731451AbfIEPHK (ORCPT ); Thu, 5 Sep 2019 11:07:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2A434B68F; Thu, 5 Sep 2019 15:07:08 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 06/15] btrfs: remove buffered write code made unnecessary Date: Thu, 5 Sep 2019 10:06:41 -0500 Message-Id: <20190905150650.21089-7-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Better done in a separate patch to keep the main patch short(er) Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/file.c | 463 -------------------------------------------------------- 1 file changed, 463 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 0707db04d3cc..f7087e28ac08 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -390,79 +390,6 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info) return 0; } -/* simple helper to fault in pages and copy. This should go away - * and be replaced with calls into generic code. - */ -static noinline int btrfs_copy_from_user(loff_t pos, size_t write_bytes, - struct page **prepared_pages, - struct iov_iter *i) -{ - size_t copied = 0; - size_t total_copied = 0; - int pg = 0; - int offset = offset_in_page(pos); - - while (write_bytes > 0) { - size_t count = min_t(size_t, - PAGE_SIZE - offset, write_bytes); - struct page *page = prepared_pages[pg]; - /* - * Copy data from userspace to the current page - */ - copied = iov_iter_copy_from_user_atomic(page, i, offset, count); - - /* Flush processor's dcache for this page */ - flush_dcache_page(page); - - /* - * if we get a partial write, we can end up with - * partially up to date pages. These add - * a lot of complexity, so make sure they don't - * happen by forcing this copy to be retried. - * - * The rest of the btrfs_file_write code will fall - * back to page at a time copies after we return 0. - */ - if (!PageUptodate(page) && copied < count) - copied = 0; - - iov_iter_advance(i, copied); - write_bytes -= copied; - total_copied += copied; - - /* Return to btrfs_file_write_iter to fault page */ - if (unlikely(copied == 0)) - break; - - if (copied < PAGE_SIZE - offset) { - offset += copied; - } else { - pg++; - offset = 0; - } - } - return total_copied; -} - -/* - * unlocks pages after btrfs_file_write is done with them - */ -static void btrfs_drop_pages(struct page **pages, size_t num_pages) -{ - size_t i; - for (i = 0; i < num_pages; i++) { - /* page checked is some magic around finding pages that - * have been modified without going through btrfs_set_page_dirty - * clear it here. There should be no need to mark the pages - * accessed as prepare_pages should have marked them accessed - * in prepare_pages via find_or_create_page() - */ - ClearPageChecked(pages[i]); - unlock_page(pages[i]); - put_page(pages[i]); - } -} - static int btrfs_find_new_delalloc_bytes(struct btrfs_inode *inode, const u64 start, const u64 len, @@ -1387,164 +1314,6 @@ int btrfs_mark_extent_written(struct btrfs_trans_handle *trans, return 0; } -/* - * on error we return an unlocked page and the error value - * on success we return a locked page and 0 - */ -static int prepare_uptodate_page(struct inode *inode, - struct page *page, u64 pos, - bool force_uptodate) -{ - int ret = 0; - - if (((pos & (PAGE_SIZE - 1)) || force_uptodate) && - !PageUptodate(page)) { - ret = btrfs_readpage(NULL, page); - if (ret) - return ret; - lock_page(page); - if (!PageUptodate(page)) { - unlock_page(page); - return -EIO; - } - if (page->mapping != inode->i_mapping) { - unlock_page(page); - return -EAGAIN; - } - } - return 0; -} - -/* - * this just gets pages into the page cache and locks them down. - */ -static noinline int prepare_pages(struct inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, bool force_uptodate) -{ - int i; - unsigned long index = pos >> PAGE_SHIFT; - gfp_t mask = btrfs_alloc_write_mask(inode->i_mapping); - int err = 0; - int faili; - - for (i = 0; i < num_pages; i++) { -again: - pages[i] = find_or_create_page(inode->i_mapping, index + i, - mask | __GFP_WRITE); - if (!pages[i]) { - faili = i - 1; - err = -ENOMEM; - goto fail; - } - - if (i == 0) - err = prepare_uptodate_page(inode, pages[i], pos, - force_uptodate); - if (!err && i == num_pages - 1) - err = prepare_uptodate_page(inode, pages[i], - pos + write_bytes, false); - if (err) { - put_page(pages[i]); - if (err == -EAGAIN) { - err = 0; - goto again; - } - faili = i - 1; - goto fail; - } - wait_on_page_writeback(pages[i]); - } - - return 0; -fail: - while (faili >= 0) { - unlock_page(pages[faili]); - put_page(pages[faili]); - faili--; - } - return err; - -} - -/* - * This function locks the extent and properly waits for data=ordered extents - * to finish before allowing the pages to be modified if need. - * - * The return value: - * 1 - the extent is locked - * 0 - the extent is not locked, and everything is OK - * -EAGAIN - need re-prepare the pages - * the other < 0 number - Something wrong happens - */ -static noinline int -lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, - size_t num_pages, loff_t pos, - size_t write_bytes, - u64 *lockstart, u64 *lockend, - struct extent_state **cached_state) -{ - struct btrfs_fs_info *fs_info = inode->root->fs_info; - u64 start_pos; - u64 last_pos; - int i; - int ret = 0; - - start_pos = round_down(pos, fs_info->sectorsize); - last_pos = start_pos - + round_up(pos + write_bytes - start_pos, - fs_info->sectorsize) - 1; - - if (start_pos < inode->vfs_inode.i_size) { - struct btrfs_ordered_extent *ordered; - - lock_extent_bits(&inode->io_tree, start_pos, last_pos, - cached_state); - ordered = btrfs_lookup_ordered_range(inode, start_pos, - last_pos - start_pos + 1); - if (ordered && - ordered->file_offset + ordered->len > start_pos && - ordered->file_offset <= last_pos) { - unlock_extent_cached(&inode->io_tree, start_pos, - last_pos, cached_state); - for (i = 0; i < num_pages; i++) { - unlock_page(pages[i]); - put_page(pages[i]); - } - btrfs_start_ordered_extent(&inode->vfs_inode, - ordered, 1); - btrfs_put_ordered_extent(ordered); - return -EAGAIN; - } - if (ordered) - btrfs_put_ordered_extent(ordered); - - *lockstart = start_pos; - *lockend = last_pos; - ret = 1; - } - - /* - * It's possible the pages are dirty right now, but we don't want - * to clean them yet because copy_from_user may catch a page fault - * and we might have to fall back to one page at a time. If that - * happens, we'll unlock these pages and we'd have a window where - * reclaim could sneak in and drop the once-dirty page on the floor - * without writing it. - * - * We have the pages locked and the extent range locked, so there's - * no way someone can start IO on any dirty pages in this range. - * - * We'll call btrfs_dirty_pages() later on, and that will flip around - * delalloc bits and dirty the pages as required. - */ - for (i = 0; i < num_pages; i++) { - WARN_ON(!PageLocked(pages[i])); - } - - return ret; -} - static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, size_t *write_bytes) { @@ -1581,238 +1350,6 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, return ret; } -static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, - struct iov_iter *i) -{ - struct file *file = iocb->ki_filp; - loff_t pos = iocb->ki_pos; - struct inode *inode = file_inode(file); - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_root *root = BTRFS_I(inode)->root; - struct page **pages = NULL; - struct extent_state *cached_state = NULL; - struct extent_changeset *data_reserved = NULL; - u64 release_bytes = 0; - u64 lockstart; - u64 lockend; - size_t num_written = 0; - int nrptrs; - int ret = 0; - bool only_release_metadata = false; - bool force_page_uptodate = false; - - nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), - PAGE_SIZE / (sizeof(struct page *))); - nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied); - nrptrs = max(nrptrs, 8); - pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; - - while (iov_iter_count(i) > 0) { - size_t offset = offset_in_page(pos); - size_t sector_offset; - size_t write_bytes = min(iov_iter_count(i), - nrptrs * (size_t)PAGE_SIZE - - offset); - size_t num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - size_t reserve_bytes; - size_t dirty_pages; - size_t copied; - size_t dirty_sectors; - size_t num_sectors; - int extents_locked; - - WARN_ON(num_pages > nrptrs); - - /* - * Fault pages before locking them in prepare_pages - * to avoid recursive lock - */ - if (unlikely(iov_iter_fault_in_readable(i, write_bytes))) { - ret = -EFAULT; - break; - } - - sector_offset = pos & (fs_info->sectorsize - 1); - reserve_bytes = round_up(write_bytes + sector_offset, - fs_info->sectorsize); - - extent_changeset_release(data_reserved); - ret = btrfs_check_data_free_space(inode, &data_reserved, pos, - write_bytes); - if (ret < 0) { - if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | - BTRFS_INODE_PREALLOC)) && - check_can_nocow(BTRFS_I(inode), pos, - &write_bytes) > 0) { - /* - * For nodata cow case, no need to reserve - * data space. - */ - only_release_metadata = true; - /* - * our prealloc extent may be smaller than - * write_bytes, so scale down. - */ - num_pages = DIV_ROUND_UP(write_bytes + offset, - PAGE_SIZE); - reserve_bytes = round_up(write_bytes + - sector_offset, - fs_info->sectorsize); - } else { - break; - } - } - - WARN_ON(reserve_bytes == 0); - ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), - reserve_bytes); - if (ret) { - if (!only_release_metadata) - btrfs_free_reserved_data_space(inode, - data_reserved, pos, - write_bytes); - else - btrfs_end_write_no_snapshotting(root); - break; - } - - release_bytes = reserve_bytes; -again: - /* - * This is going to setup the pages array with the number of - * pages we want, so we don't really need to worry about the - * contents of pages from loop to loop - */ - ret = prepare_pages(inode, pages, num_pages, - pos, write_bytes, - force_page_uptodate); - if (ret) { - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes, true); - break; - } - - extents_locked = lock_and_cleanup_extent_if_need( - BTRFS_I(inode), pages, - num_pages, pos, write_bytes, &lockstart, - &lockend, &cached_state); - if (extents_locked < 0) { - if (extents_locked == -EAGAIN) - goto again; - btrfs_delalloc_release_extents(BTRFS_I(inode), - reserve_bytes, true); - ret = extents_locked; - break; - } - - copied = btrfs_copy_from_user(pos, write_bytes, pages, i); - - num_sectors = BTRFS_BYTES_TO_BLKS(fs_info, reserve_bytes); - dirty_sectors = round_up(copied + sector_offset, - fs_info->sectorsize); - dirty_sectors = BTRFS_BYTES_TO_BLKS(fs_info, dirty_sectors); - - /* - * if we have trouble faulting in the pages, fall - * back to one page at a time - */ - if (copied < write_bytes) - nrptrs = 1; - - if (copied == 0) { - force_page_uptodate = true; - dirty_sectors = 0; - dirty_pages = 0; - } else { - force_page_uptodate = false; - dirty_pages = DIV_ROUND_UP(copied + offset, - PAGE_SIZE); - } - - if (num_sectors > dirty_sectors) { - /* release everything except the sectors we dirtied */ - release_bytes -= dirty_sectors << - fs_info->sb->s_blocksize_bits; - if (only_release_metadata) { - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - u64 __pos; - - __pos = round_down(pos, - fs_info->sectorsize) + - (dirty_pages << PAGE_SHIFT); - btrfs_delalloc_release_space(inode, - data_reserved, __pos, - release_bytes, true); - } - } - - release_bytes = round_up(copied + sector_offset, - fs_info->sectorsize); - - if (copied > 0) - ret = btrfs_dirty_pages(inode, pages, dirty_pages, - pos, copied, &cached_state); - if (extents_locked) - unlock_extent_cached(&BTRFS_I(inode)->io_tree, - lockstart, lockend, &cached_state); - btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, - true); - if (ret) { - btrfs_drop_pages(pages, num_pages); - break; - } - - release_bytes = 0; - if (only_release_metadata) - btrfs_end_write_no_snapshotting(root); - - if (only_release_metadata && copied > 0) { - lockstart = round_down(pos, - fs_info->sectorsize); - lockend = round_up(pos + copied, - fs_info->sectorsize) - 1; - - set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, - lockend, EXTENT_NORESERVE, NULL, - NULL, GFP_NOFS); - only_release_metadata = false; - } - - btrfs_drop_pages(pages, num_pages); - - cond_resched(); - - balance_dirty_pages_ratelimited(inode->i_mapping); - if (dirty_pages < (fs_info->nodesize >> PAGE_SHIFT) + 1) - btrfs_btree_balance_dirty(fs_info); - - pos += copied; - num_written += copied; - } - - kfree(pages); - - if (release_bytes) { - if (only_release_metadata) { - btrfs_end_write_no_snapshotting(root); - btrfs_delalloc_release_metadata(BTRFS_I(inode), - release_bytes, true); - } else { - btrfs_delalloc_release_space(inode, data_reserved, - round_down(pos, fs_info->sectorsize), - release_bytes, true); - } - } - - extent_changeset_free(data_reserved); - return num_written ? num_written : ret; -} - static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; From patchwork Thu Sep 5 15:06:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133345 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B1987924 for ; Thu, 5 Sep 2019 15:07:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFF3020CC7 for ; Thu, 5 Sep 2019 15:07:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390291AbfIEPHM (ORCPT ); Thu, 5 Sep 2019 11:07:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:56408 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390255AbfIEPHK (ORCPT ); Thu, 5 Sep 2019 11:07:10 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7F560B635; Thu, 5 Sep 2019 15:07:09 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 07/15] fs: Export generic_file_buffered_read() Date: Thu, 5 Sep 2019 10:06:42 -0500 Message-Id: <20190905150650.21089-8-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Export generic_file_buffered_read() to be used to supplement incomplete direct reads. While we are at it, correct the comments and variable names. Signed-off-by: Goldwyn Rodrigues --- include/linux/fs.h | 2 ++ mm/filemap.c | 13 +++++++------ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 997a530ff4e9..26d827434060 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3041,6 +3041,8 @@ extern int generic_file_rw_checks(struct file *file_in, struct file *file_out); extern int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t *count, unsigned int flags); +extern ssize_t generic_file_buffered_read(struct kiocb *iocb, + struct iov_iter *to, ssize_t already_read); extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); diff --git a/mm/filemap.c b/mm/filemap.c index d0cf700bf201..2121ae01eae8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2014,7 +2014,7 @@ static void shrink_readahead_size_eio(struct file *filp, * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read * @iter: data destination - * @written: already copied + * @copied: already copied * * This is a generic file read routine, and uses the * mapping->a_ops->readpage() function for the actual low-level stuff. @@ -2023,11 +2023,11 @@ static void shrink_readahead_size_eio(struct file *filp, * of the logic when it comes to error handling etc. * * Return: - * * total number of bytes copied, including those the were already @written + * * total number of bytes copied, including those the were @copied * * negative error code if nothing was copied */ -static ssize_t generic_file_buffered_read(struct kiocb *iocb, - struct iov_iter *iter, ssize_t written) +ssize_t generic_file_buffered_read(struct kiocb *iocb, + struct iov_iter *iter, ssize_t copied) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; @@ -2168,7 +2168,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, prev_offset = offset; put_page(page); - written += ret; + copied += ret; if (!iov_iter_count(iter)) goto out; if (ret < nr) { @@ -2276,8 +2276,9 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, *ppos = ((loff_t)index << PAGE_SHIFT) + offset; file_accessed(filp); - return written ? written : error; + return copied ? copied : error; } +EXPORT_SYMBOL_GPL(generic_file_buffered_read); /** * generic_file_read_iter - generic filesystem read routine From patchwork Thu Sep 5 15:06:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133351 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2A2D713BD for ; Thu, 5 Sep 2019 15:07:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 686F62082C for ; Thu, 5 Sep 2019 15:07:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390298AbfIEPHO (ORCPT ); Thu, 5 Sep 2019 11:07:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:56486 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390286AbfIEPHN (ORCPT ); Thu, 5 Sep 2019 11:07:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 9ED6AACC1; Thu, 5 Sep 2019 15:07:11 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 08/15] btrfs: basic direct read operation Date: Thu, 5 Sep 2019 10:06:43 -0500 Message-Id: <20190905150650.21089-9-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Add btrfs_dio_iomap_ops for iomap.begin() function. In order to accomodate dio reads, add a new function btrfs_file_read_iter() which would call btrfs_dio_iomap_read() for DIO reads and fallback to generic_file_buffered_read otherwise. Changed parameter written in generic_file_buffered_read() to already_read. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/file.c | 13 ++++++++++++- fs/btrfs/iomap.c | 20 ++++++++++++++++++++ 3 files changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 9c501c7826b4..5ca3a365e639 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3243,7 +3243,9 @@ int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); +/* iomap.c */ size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from); +ssize_t btrfs_dio_iomap_read(struct kiocb *iocb, struct iov_iter *to); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index f7087e28ac08..e7f67d514ba8 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2839,9 +2839,20 @@ static int btrfs_file_open(struct inode *inode, struct file *filp) return generic_file_open(inode, filp); } +static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + ssize_t ret = 0; + if (iocb->ki_flags & IOCB_DIRECT) + ret = btrfs_dio_iomap_read(iocb, to); + if (ret < 0) + return ret; + + return generic_file_buffered_read(iocb, to, ret); +} + const struct file_operations btrfs_file_operations = { .llseek = btrfs_file_llseek, - .read_iter = generic_file_read_iter, + .read_iter = btrfs_file_read_iter, .splice_read = generic_file_splice_read, .write_iter = btrfs_file_write_iter, .mmap = btrfs_file_mmap, diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c index f8fa34105838..6b633c483dba 100644 --- a/fs/btrfs/iomap.c +++ b/fs/btrfs/iomap.c @@ -421,3 +421,23 @@ size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from) return written; } +static int btrfs_dio_iomap_begin(struct inode *inode, loff_t pos, + loff_t length, unsigned flags, struct iomap *iomap, + struct iomap *srcmap) +{ + return get_iomap(inode, pos, length, iomap); +} + +static const struct iomap_ops btrfs_dio_iomap_ops = { + .iomap_begin = btrfs_dio_iomap_begin, +}; + +ssize_t btrfs_dio_iomap_read(struct kiocb *iocb, struct iov_iter *to) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret; + inode_lock_shared(inode); + ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, NULL); + inode_unlock_shared(inode); + return ret; +} From patchwork Thu Sep 5 15:06:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133355 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0BA091813 for ; Thu, 5 Sep 2019 15:07:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4B0302082C for ; Thu, 5 Sep 2019 15:07:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390302AbfIEPHP (ORCPT ); Thu, 5 Sep 2019 11:07:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:56592 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390297AbfIEPHP (ORCPT ); Thu, 5 Sep 2019 11:07:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BCD29B647; Thu, 5 Sep 2019 15:07:13 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 09/15] btrfs: Carve out btrfs_get_extent_map_write() out of btrfs_get_blocks_write() Date: Thu, 5 Sep 2019 10:06:44 -0500 Message-Id: <20190905150650.21089-10-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues This makes btrfs_get_extent_map_write() independent of Direct I/O code. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/ctree.h | 2 ++ fs/btrfs/inode.c | 40 +++++++++++++++++++++++++++------------- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5ca3a365e639..c25dfd8e619b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3168,6 +3168,8 @@ struct inode *btrfs_iget_path(struct super_block *s, struct btrfs_key *location, struct btrfs_path *path); struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, struct btrfs_root *root, int *was_new); +int btrfs_get_extent_map_write(struct extent_map **map, struct buffer_head *bh, + struct inode *inode, u64 start, u64 len); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, struct page *page, size_t pg_offset, u64 start, u64 end, int create); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 258bacefdf5f..24895793fd91 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7592,11 +7592,10 @@ static int btrfs_get_blocks_direct_read(struct extent_map *em, return 0; } -static int btrfs_get_blocks_direct_write(struct extent_map **map, - struct buffer_head *bh_result, - struct inode *inode, - struct btrfs_dio_data *dio_data, - u64 start, u64 len) +int btrfs_get_extent_map_write(struct extent_map **map, + struct buffer_head *bh, + struct inode *inode, + u64 start, u64 len) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em = *map; @@ -7650,22 +7649,38 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, */ btrfs_free_reserved_data_space_noquota(inode, start, len); - goto skip_cow; + /* skip COW */ + goto out; } } /* this will cow the extent */ - len = bh_result->b_size; + if (bh) + len = bh->b_size; free_extent_map(em); *map = em = btrfs_new_extent_direct(inode, start, len); - if (IS_ERR(em)) { - ret = PTR_ERR(em); - goto out; - } + if (IS_ERR(em)) + return PTR_ERR(em); +out: + return ret; +} +static int btrfs_get_blocks_direct_write(struct extent_map **map, + struct buffer_head *bh_result, + struct inode *inode, + struct btrfs_dio_data *dio_data, + u64 start, u64 len) +{ + int ret; + struct extent_map *em; + + ret = btrfs_get_extent_map_write(map, bh_result, inode, + start, len); + if (ret < 0) + return ret; + em = *map; len = min(len, em->len - (start - em->start)); -skip_cow: bh_result->b_blocknr = (em->block_start + (start - em->start)) >> inode->i_blkbits; bh_result->b_size = len; @@ -7686,7 +7701,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, dio_data->reserve -= len; dio_data->unsubmitted_oe_range_end = start + len; current->journal_info = dio_data; -out: return ret; } From patchwork Thu Sep 5 15:06:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133365 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D93C114ED for ; Thu, 5 Sep 2019 15:07:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 233372082E for ; Thu, 5 Sep 2019 15:07:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390316AbfIEPHT (ORCPT ); Thu, 5 Sep 2019 11:07:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:56670 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390297AbfIEPHS (ORCPT ); Thu, 5 Sep 2019 11:07:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E57CDB653; Thu, 5 Sep 2019 15:07:15 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 10/15] btrfs: Rename __endio_write_update_ordered() to btrfs_update_ordered_extent() Date: Thu, 5 Sep 2019 10:06:45 -0500 Message-Id: <20190905150650.21089-11-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Since we will be using it in another part of the code, use a better name to declare it non-static Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/ctree.h | 7 +++++-- fs/btrfs/inode.c | 14 +++++--------- 2 files changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c25dfd8e619b..04c119ca229b 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3171,8 +3171,11 @@ struct inode *btrfs_iget(struct super_block *s, struct btrfs_key *location, int btrfs_get_extent_map_write(struct extent_map **map, struct buffer_head *bh, struct inode *inode, u64 start, u64 len); struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, - struct page *page, size_t pg_offset, - u64 start, u64 end, int create); + struct page *page, size_t pg_offset, + u64 start, u64 end, int create); +void btrfs_update_ordered_extent(struct inode *inode, + const u64 offset, const u64 bytes, + const bool uptodate); int btrfs_update_inode(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct inode *inode); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 24895793fd91..d415534ce733 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -89,10 +89,6 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, u64 ram_bytes, int compress_type, int type); -static void __endio_write_update_ordered(struct inode *inode, - const u64 offset, const u64 bytes, - const bool uptodate); - /* * Cleanup all submitted ordered extents in specified range to handle errors * from the btrfs_run_delalloc_range() callback. @@ -133,7 +129,7 @@ static inline void btrfs_cleanup_ordered_extents(struct inode *inode, bytes -= PAGE_SIZE; } - return __endio_write_update_ordered(inode, offset, bytes, false); + return btrfs_update_ordered_extent(inode, offset, bytes, false); } static int btrfs_dirty_inode(struct inode *inode); @@ -8176,7 +8172,7 @@ static void btrfs_endio_direct_read(struct bio *bio) bio_put(bio); } -static void __endio_write_update_ordered(struct inode *inode, +void btrfs_update_ordered_extent(struct inode *inode, const u64 offset, const u64 bytes, const bool uptodate) { @@ -8229,7 +8225,7 @@ static void btrfs_endio_direct_write(struct bio *bio) struct btrfs_dio_private *dip = bio->bi_private; struct bio *dio_bio = dip->dio_bio; - __endio_write_update_ordered(dip->inode, dip->logical_offset, + btrfs_update_ordered_extent(dip->inode, dip->logical_offset, dip->bytes, !bio->bi_status); kfree(dip); @@ -8546,7 +8542,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, bio = NULL; } else { if (write) - __endio_write_update_ordered(inode, + btrfs_update_ordered_extent(inode, file_offset, dio_bio->bi_iter.bi_size, false); @@ -8686,7 +8682,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) */ if (dio_data.unsubmitted_oe_range_start < dio_data.unsubmitted_oe_range_end) - __endio_write_update_ordered(inode, + btrfs_update_ordered_extent(inode, dio_data.unsubmitted_oe_range_start, dio_data.unsubmitted_oe_range_end - dio_data.unsubmitted_oe_range_start, From patchwork Thu Sep 5 15:06:46 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133369 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A134714ED for ; Thu, 5 Sep 2019 15:07:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DF3CC2082E for ; Thu, 5 Sep 2019 15:07:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390324AbfIEPHW (ORCPT ); Thu, 5 Sep 2019 11:07:22 -0400 Received: from mx2.suse.de ([195.135.220.15]:56798 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390307AbfIEPHT (ORCPT ); Thu, 5 Sep 2019 11:07:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EA729B678; Thu, 5 Sep 2019 15:07:17 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 11/15] iomap: use a function pointer for dio submits Date: Thu, 5 Sep 2019 10:06:46 -0500 Message-Id: <20190905150650.21089-12-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues This helps filesystems to perform tasks on the bio while submitting for I/O. This could be post-write operations such as data CRC or data replication for fs-handled RAID. Signed-off-by: Goldwyn Rodrigues --- fs/iomap/direct-io.c | 16 +++++++++++----- include/linux/iomap.h | 2 ++ 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index e3ccbf7daaae..2923b02c1d57 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) EXPORT_SYMBOL_GPL(iomap_dio_iopoll); static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, - struct bio *bio) + struct bio *bio, loff_t pos) { atomic_inc(&dio->ref); @@ -67,7 +67,13 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, bio_set_polled(bio, dio->iocb); dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + if (dio->dops && dio->dops->submit_io) { + dio->dops->submit_io(bio, file_inode(dio->iocb->ki_filp), + pos); + dio->submit.cookie = BLK_QC_T_NONE; + } else { + dio->submit.cookie = submit_bio(bio); + } } static ssize_t iomap_dio_complete(struct iomap_dio *dio) @@ -191,7 +197,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - iomap_dio_submit_bio(dio, iomap, bio); + iomap_dio_submit_bio(dio, iomap, bio, pos); } static loff_t @@ -297,11 +303,11 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, iov_iter_advance(dio->submit.iter, n); dio->size += n; - pos += n; copied += n; nr_pages = iov_iter_npages(&iter, BIO_MAX_PAGES); - iomap_dio_submit_bio(dio, iomap, bio); + iomap_dio_submit_bio(dio, iomap, bio, pos); + pos += n; } while (nr_pages); /* diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 7fdb09925740..1bcb2f14bd9a 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -196,6 +196,8 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno, struct iomap_dio_ops { int (*end_io)(struct kiocb *iocb, ssize_t size, int error, unsigned flags); + void (*submit_io)(struct bio *bio, struct inode *inode, + loff_t file_offset); }; ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, From patchwork Thu Sep 5 15:06:47 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133375 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5A37913BD for ; Thu, 5 Sep 2019 15:07:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8F4F12070C for ; Thu, 5 Sep 2019 15:07:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390331AbfIEPHY (ORCPT ); Thu, 5 Sep 2019 11:07:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:56884 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390297AbfIEPHW (ORCPT ); Thu, 5 Sep 2019 11:07:22 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0E0EAB67A; Thu, 5 Sep 2019 15:07:20 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 12/15] btrfs: Use iomap_dio_rw for performing direct I/O writes Date: Thu, 5 Sep 2019 10:06:47 -0500 Message-Id: <20190905150650.21089-13-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues btrfs_iomap_init() is a function to be used to btrfs_iomap structure which is used to pass information between iomap begin() and end(). All data reservations and allocations must be performed in this function. For reads, btrfs_iomap allocation is not required. We perform space allocation and reservation before the iomap_dio_rw() call, as opposed to iomap_begin(). This is how the current direct I/O path performs. The problem with putting in the iomap_begin() is that the transaction needs to be committed before new allocation is performed. Performing a transaction for every direct sub-write will reduce performance. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/ctree.h | 3 ++ fs/btrfs/file.c | 2 +- fs/btrfs/inode.c | 14 +++-- fs/btrfs/iomap.c | 158 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 164 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 04c119ca229b..7f84b7e47c8a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3202,6 +3202,8 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end); void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, u64 end, int uptodate); extern const struct dentry_operations btrfs_dentry_operations; +void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, + loff_t file_offset); /* ioctl.c */ long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); @@ -3251,6 +3253,7 @@ loff_t btrfs_remap_file_range(struct file *file_in, loff_t pos_in, /* iomap.c */ size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from); ssize_t btrfs_dio_iomap_read(struct kiocb *iocb, struct iov_iter *to); +ssize_t btrfs_dio_iomap_write(struct kiocb *iocb, struct iov_iter *from); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index e7f67d514ba8..5d4347e12cdc 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1501,7 +1501,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, atomic_inc(&BTRFS_I(inode)->sync_writers); if (iocb->ki_flags & IOCB_DIRECT) { - num_written = __btrfs_direct_write(iocb, from); + num_written = btrfs_dio_iomap_write(iocb, from); } else { num_written = btrfs_buffered_iomap_write(iocb, from); if (num_written > 0) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index d415534ce733..323d72858c9c 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8167,9 +8167,8 @@ static void btrfs_endio_direct_read(struct bio *bio) kfree(dip); dio_bio->bi_status = err; - dio_end_io(dio_bio); + bio_endio(dio_bio); btrfs_io_bio_free_csum(io_bio); - bio_put(bio); } void btrfs_update_ordered_extent(struct inode *inode, @@ -8231,8 +8230,7 @@ static void btrfs_endio_direct_write(struct bio *bio) kfree(dip); dio_bio->bi_status = bio->bi_status; - dio_end_io(dio_bio); - bio_put(bio); + bio_endio(dio_bio); } static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data, @@ -8464,8 +8462,8 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) return 0; } -static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, - loff_t file_offset) +void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, + loff_t file_offset) { struct btrfs_dio_private *dip = NULL; struct bio *bio = NULL; @@ -8536,7 +8534,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, /* * The end io callbacks free our dip, do the final put on bio * and all the cleanup and final put for dio_bio (through - * dio_end_io()). + * end_io()). */ dip = NULL; bio = NULL; @@ -8555,7 +8553,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, * Releases and cleans up our dio_bio, no need to bio_put() * nor bio_endio()/bio_io_error() against dio_bio. */ - dio_end_io(dio_bio); + bio_endio(dio_bio); } if (bio) bio_put(bio); diff --git a/fs/btrfs/iomap.c b/fs/btrfs/iomap.c index 6b633c483dba..faefcab509aa 100644 --- a/fs/btrfs/iomap.c +++ b/fs/btrfs/iomap.c @@ -7,6 +7,7 @@ */ #include +#include #include "ctree.h" #include "btrfs_inode.h" #include "volumes.h" @@ -421,15 +422,113 @@ size_t btrfs_buffered_iomap_write(struct kiocb *iocb, struct iov_iter *from) return written; } +static struct btrfs_iomap *btrfs_iomap_init(struct inode *inode, + struct extent_map **em, + loff_t pos, loff_t length) +{ + int ret = 0; + struct extent_map *map = *em; + struct btrfs_iomap *bi; + u64 num_bytes; + + bi = kzalloc(sizeof(struct btrfs_iomap), GFP_NOFS); + if (!bi) + return ERR_PTR(-ENOMEM); + + bi->start = round_down(pos, PAGE_SIZE); + bi->end = PAGE_ALIGN(pos + length) - 1; + num_bytes = bi->end - bi->start + 1; + + /* Wait for existing ordered extents in range to finish */ + btrfs_wait_ordered_range(inode, bi->start, num_bytes); + + lock_extent_bits(&BTRFS_I(inode)->io_tree, bi->start, bi->end, &bi->cached_state); + + if (ret) { + unlock_extent_cached(&BTRFS_I(inode)->io_tree, bi->start, bi->end, + &bi->cached_state); + kfree(bi); + return ERR_PTR(ret); + } + + refcount_inc(&map->refs); + ret = btrfs_get_extent_map_write(em, NULL, + inode, bi->start, num_bytes); + if (ret) { + unlock_extent_cached(&BTRFS_I(inode)->io_tree, bi->start, bi->end, + &bi->cached_state); + kfree(bi); + return ERR_PTR(ret); + } + free_extent_map(map); + return bi; +} + static int btrfs_dio_iomap_begin(struct inode *inode, loff_t pos, - loff_t length, unsigned flags, struct iomap *iomap, - struct iomap *srcmap) + loff_t length, unsigned flags, struct iomap *iomap, + struct iomap *srcmap) +{ + struct extent_map *em; + struct btrfs_iomap *bi = NULL; + + em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, pos, length, 0); + + if (flags & IOMAP_WRITE) { + srcmap->offset = em->start; + srcmap->length = em->len; + srcmap->bdev = em->bdev; + if (em->block_start == EXTENT_MAP_HOLE) { + srcmap->type = IOMAP_HOLE; + } else { + srcmap->type = IOMAP_MAPPED; + srcmap->addr = em->block_start; + } + bi = btrfs_iomap_init(inode, &em, pos, length); + if (IS_ERR(bi)) + return PTR_ERR(bi); + } + + iomap->offset = em->start; + iomap->length = em->len; + iomap->bdev = em->bdev; + + if (em->block_start == EXTENT_MAP_HOLE) { + iomap->type = IOMAP_HOLE; + } else { + iomap->type = IOMAP_MAPPED; + iomap->addr = em->block_start; + } + iomap->private = bi; + return 0; +} + +static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, + loff_t length, ssize_t written, unsigned flags, + struct iomap *iomap) { - return get_iomap(inode, pos, length, iomap); + struct btrfs_iomap *bi = iomap->private; + u64 wend; + loff_t release_bytes; + + if (!bi) + return 0; + + unlock_extent_cached(&BTRFS_I(inode)->io_tree, bi->start, bi->end, + &bi->cached_state); + + wend = PAGE_ALIGN(pos + written); + release_bytes = wend - bi->end - 1; + kfree(bi); + return 0; } static const struct iomap_ops btrfs_dio_iomap_ops = { .iomap_begin = btrfs_dio_iomap_begin, + .iomap_end = btrfs_dio_iomap_end, +}; + +static const struct iomap_dio_ops btrfs_dops = { + .submit_io = btrfs_submit_direct, }; ssize_t btrfs_dio_iomap_read(struct kiocb *iocb, struct iov_iter *to) @@ -437,7 +536,58 @@ ssize_t btrfs_dio_iomap_read(struct kiocb *iocb, struct iov_iter *to) struct inode *inode = file_inode(iocb->ki_filp); ssize_t ret; inode_lock_shared(inode); - ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, NULL); + ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dops); inode_unlock_shared(inode); return ret; } + +ssize_t btrfs_dio_iomap_write(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + ssize_t written, written_buffered; + loff_t pos = iocb->ki_pos, endbyte; + size_t count = iov_iter_count(from); + struct extent_changeset *data_reserved = NULL; + int err; + + btrfs_delalloc_reserve_space(inode, &data_reserved, pos, count); + + written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, &btrfs_dops); + if (written < count) { + ssize_t done = (written < 0) ? 0 : written; + btrfs_delalloc_release_space(inode, data_reserved, pos, count - done, + true); + } + btrfs_delalloc_release_extents(BTRFS_I(inode), count, false); + extent_changeset_free(data_reserved); + + if (written < 0 || !iov_iter_count(from)) + return written; + + pos = iocb->ki_pos; + written_buffered = btrfs_buffered_iomap_write(iocb, from); + if (written_buffered < 0) { + err = written_buffered; + goto out; + } + /* + * Ensure all data is persisted. We want the next direct IO read to be + * able to read what was just written. + */ + endbyte = pos + written_buffered - 1; + err = btrfs_fdatawrite_range(inode, pos, endbyte); + if (err) + goto out; + err = filemap_fdatawait_range(inode->i_mapping, pos, endbyte); + if (err) + goto out; + written += written_buffered; + iocb->ki_pos = pos + written_buffered; + invalidate_mapping_pages(file->f_mapping, pos >> PAGE_SHIFT, + endbyte >> PAGE_SHIFT); +out: + if (written > 0 && iocb->ki_pos > i_size_read(inode)) + i_size_write(inode, iocb->ki_pos); + return written ? written : err; +} From patchwork Thu Sep 5 15:06:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133391 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 36835924 for ; Thu, 5 Sep 2019 15:07:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6BF392070C for ; Thu, 5 Sep 2019 15:07:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732858AbfIEPHb (ORCPT ); Thu, 5 Sep 2019 11:07:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:57002 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390307AbfIEPHX (ORCPT ); Thu, 5 Sep 2019 11:07:23 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2AD96B68B; Thu, 5 Sep 2019 15:07:22 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 13/15] btrfs: Remove btrfs_dio_data and __btrfs_direct_write Date: Thu, 5 Sep 2019 10:06:48 -0500 Message-Id: <20190905150650.21089-14-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues btrfs_dio_data is unnecessary since we are now storing all informaiton in btrfs_iomap. Advantage: We don't abuse current->journal_info anymore :) Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/file.c | 40 ---------------------------- fs/btrfs/inode.c | 81 ++------------------------------------------------------ 2 files changed, 2 insertions(+), 119 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5d4347e12cdc..e6c1ffd74660 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1350,46 +1350,6 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, return ret; } -static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file_inode(file); - loff_t pos; - ssize_t written; - ssize_t written_buffered; - loff_t endbyte; - int err; - - written = generic_file_direct_write(iocb, from); - - if (written < 0 || !iov_iter_count(from)) - return written; - - pos = iocb->ki_pos; - written_buffered = btrfs_buffered_iomap_write(iocb, from); - if (written_buffered < 0) { - err = written_buffered; - goto out; - } - /* - * Ensure all data is persisted. We want the next direct IO read to be - * able to read what was just written. - */ - endbyte = pos + written_buffered - 1; - err = btrfs_fdatawrite_range(inode, pos, endbyte); - if (err) - goto out; - err = filemap_fdatawait_range(inode->i_mapping, pos, endbyte); - if (err) - goto out; - written += written_buffered; - iocb->ki_pos = pos + written_buffered; - invalidate_mapping_pages(file->f_mapping, pos >> PAGE_SHIFT, - endbyte >> PAGE_SHIFT); -out: - return written ? written : err; -} - static void update_time_for_write(struct inode *inode) { struct timespec64 now; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 323d72858c9c..87fbe73ca2e4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -54,13 +54,6 @@ struct btrfs_iget_args { struct btrfs_root *root; }; -struct btrfs_dio_data { - u64 reserve; - u64 unsubmitted_oe_range_start; - u64 unsubmitted_oe_range_end; - int overwrite; -}; - static const struct inode_operations btrfs_dir_inode_operations; static const struct inode_operations btrfs_symlink_inode_operations; static const struct inode_operations btrfs_dir_ro_inode_operations; @@ -7664,7 +7657,6 @@ int btrfs_get_extent_map_write(struct extent_map **map, static int btrfs_get_blocks_direct_write(struct extent_map **map, struct buffer_head *bh_result, struct inode *inode, - struct btrfs_dio_data *dio_data, u64 start, u64 len) { int ret; @@ -7686,17 +7678,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) set_buffer_new(bh_result); - /* - * Need to update the i_size under the extent lock so buffered - * readers will get the updated i_size when we unlock. - */ - if (!dio_data->overwrite && start + len > i_size_read(inode)) - i_size_write(inode, start + len); - - WARN_ON(dio_data->reserve < len); - dio_data->reserve -= len; - dio_data->unsubmitted_oe_range_end = start + len; - current->journal_info = dio_data; return ret; } @@ -7706,7 +7687,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em; struct extent_state *cached_state = NULL; - struct btrfs_dio_data *dio_data = NULL; u64 start = iblock << inode->i_blkbits; u64 lockstart, lockend; u64 len = bh_result->b_size; @@ -7721,16 +7701,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, lockstart = start; lockend = start + len - 1; - if (current->journal_info) { - /* - * Need to pull our outstanding extents and set journal_info to NULL so - * that anything that needs to check if there's a transaction doesn't get - * confused. - */ - dio_data = current->journal_info; - current->journal_info = NULL; - } - /* * If this errors out it's because we couldn't invalidate pagecache for * this range and we need to fallback to buffered. @@ -7770,7 +7740,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, if (create) { ret = btrfs_get_blocks_direct_write(&em, bh_result, inode, - dio_data, start, len); + start, len); if (ret < 0) goto unlock_err; @@ -7808,8 +7778,6 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, clear_extent_bit(&BTRFS_I(inode)->io_tree, lockstart, lockend, unlock_bits, 1, 0, &cached_state); err: - if (dio_data) - current->journal_info = dio_data; return ret; } @@ -8498,21 +8466,6 @@ void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, dip->subio_endio = btrfs_subio_endio_read; } - /* - * Reset the range for unsubmitted ordered extents (to a 0 length range) - * even if we fail to submit a bio, because in such case we do the - * corresponding error handling below and it must not be done a second - * time by btrfs_direct_IO(). - */ - if (write) { - struct btrfs_dio_data *dio_data = current->journal_info; - - dio_data->unsubmitted_oe_range_end = dip->logical_offset + - dip->bytes; - dio_data->unsubmitted_oe_range_start = - dio_data->unsubmitted_oe_range_end; - } - ret = btrfs_submit_direct_hook(dip); if (!ret) return; @@ -8598,7 +8551,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_dio_data dio_data = { 0 }; struct extent_changeset *data_reserved = NULL; loff_t offset = iocb->ki_pos; size_t count = 0; @@ -8631,7 +8583,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * not unlock the i_mutex at this case. */ if (offset + count <= inode->i_size) { - dio_data.overwrite = 1; inode_unlock(inode); relock = true; } else if (iocb->ki_flags & IOCB_NOWAIT) { @@ -8643,16 +8594,6 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (ret) goto out; - /* - * We need to know how many extents we reserved so that we can - * do the accounting properly if we go over the number we - * originally calculated. Abuse current->journal_info for this. - */ - dio_data.reserve = round_up(count, - fs_info->sectorsize); - dio_data.unsubmitted_oe_range_start = (u64)offset; - dio_data.unsubmitted_oe_range_end = (u64)offset; - current->journal_info = &dio_data; down_read(&BTRFS_I(inode)->dio_sem); } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, &BTRFS_I(inode)->runtime_flags)) { @@ -8667,25 +8608,7 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) btrfs_submit_direct, flags); if (iov_iter_rw(iter) == WRITE) { up_read(&BTRFS_I(inode)->dio_sem); - current->journal_info = NULL; - if (ret < 0 && ret != -EIOCBQUEUED) { - if (dio_data.reserve) - btrfs_delalloc_release_space(inode, data_reserved, - offset, dio_data.reserve, true); - /* - * On error we might have left some ordered extents - * without submitting corresponding bios for them, so - * cleanup them up to avoid other tasks getting them - * and waiting for them to complete forever. - */ - if (dio_data.unsubmitted_oe_range_start < - dio_data.unsubmitted_oe_range_end) - btrfs_update_ordered_extent(inode, - dio_data.unsubmitted_oe_range_start, - dio_data.unsubmitted_oe_range_end - - dio_data.unsubmitted_oe_range_start, - false); - } else if (ret >= 0 && (size_t)ret < count) + if (ret >= 0 && (size_t)ret < count) btrfs_delalloc_release_space(inode, data_reserved, offset, count - (size_t)ret, true); btrfs_delalloc_release_extents(BTRFS_I(inode), count, false); From patchwork Thu Sep 5 15:06:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133383 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E3A814ED for ; Thu, 5 Sep 2019 15:07:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CCC552082E for ; Thu, 5 Sep 2019 15:07:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390337AbfIEPH1 (ORCPT ); Thu, 5 Sep 2019 11:07:27 -0400 Received: from mx2.suse.de ([195.135.220.15]:57062 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388380AbfIEPHZ (ORCPT ); Thu, 5 Sep 2019 11:07:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5CB30B68D; Thu, 5 Sep 2019 15:07:24 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 14/15] btrfs: update inode size during bio completion Date: Thu, 5 Sep 2019 10:06:49 -0500 Message-Id: <20190905150650.21089-15-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Update the inode size for dio writes during bio completion. This ties the success of the underlying block layer whether to increase the size of the inode. Especially for in aio cases. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/inode.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 87fbe73ca2e4..f87a9dd154a9 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8191,9 +8191,13 @@ static void btrfs_endio_direct_write(struct bio *bio) { struct btrfs_dio_private *dip = bio->bi_private; struct bio *dio_bio = dip->dio_bio; + struct inode *inode = dip->inode; - btrfs_update_ordered_extent(dip->inode, dip->logical_offset, + btrfs_update_ordered_extent(inode, dip->logical_offset, dip->bytes, !bio->bi_status); + if (!bio->bi_status && + i_size_read(inode) < dip->logical_offset + dip->bytes) + i_size_write(inode, dip->logical_offset + dip->bytes); kfree(dip); From patchwork Thu Sep 5 15:06:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11133387 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F03114ED for ; Thu, 5 Sep 2019 15:07:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CDBEE2070C for ; Thu, 5 Sep 2019 15:07:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390345AbfIEPHa (ORCPT ); Thu, 5 Sep 2019 11:07:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:57100 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2390336AbfIEPH1 (ORCPT ); Thu, 5 Sep 2019 11:07:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6E64DB692; Thu, 5 Sep 2019 15:07:26 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, darrick.wong@oracle.com, hch@lst.de, linux-xfs@vger.kernel.org, Goldwyn Rodrigues Subject: [PATCH 15/15] xfs: Use the new iomap infrastructure for CoW Date: Thu, 5 Sep 2019 10:06:50 -0500 Message-Id: <20190905150650.21089-16-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190905150650.21089-1-rgoldwyn@suse.de> References: <20190905150650.21089-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Set the IOMAP_F_COW flag and create the srcmap based on current extents to read from. Signed-off-by: Goldwyn Rodrigues --- fs/xfs/xfs_iomap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 8321733c16c3..13495d8a1ee2 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1006,7 +1006,10 @@ xfs_file_iomap_begin( */ if (directio || imap.br_startblock == HOLESTARTBLOCK) imap = cmap; + else + xfs_bmbt_to_iomap(ip, srcmap, &cmap, false); + iomap->flags |= IOMAP_F_COW; end_fsb = imap.br_startoff + imap.br_blockcount; length = XFS_FSB_TO_B(mp, end_fsb) - offset; }