From patchwork Thu Dec 12 00:30:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286625 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0163D6C1 for ; Thu, 12 Dec 2019 00:30:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DE7212173E for ; Thu, 12 Dec 2019 00:30:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727388AbfLLAay (ORCPT ); Wed, 11 Dec 2019 19:30:54 -0500 Received: from mx2.suse.de ([195.135.220.15]:42192 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727185AbfLLAax (ORCPT ); Wed, 11 Dec 2019 19:30:53 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id E7A75ADF1; Thu, 12 Dec 2019 00:30:51 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 1/8] fs: Export generic_file_buffered_read() Date: Wed, 11 Dec 2019 18:30:36 -0600 Message-Id: <20191212003043.31093-2-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Export generic_file_buffered_read() to be used to supplement incomplete direct reads. While we are at it, correct the comments and variable names. Reviewed-by: Johannes Thumshirn Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 2 ++ mm/filemap.c | 13 +++++++------ 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 98e0349adb52..3883adaa0e01 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3098,6 +3098,8 @@ extern int generic_file_rw_checks(struct file *file_in, struct file *file_out); extern int generic_copy_file_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t *count, unsigned int flags); +extern ssize_t generic_file_buffered_read(struct kiocb *iocb, + struct iov_iter *to, ssize_t already_read); extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); diff --git a/mm/filemap.c b/mm/filemap.c index bf6aa30be58d..dc2c7e3472a0 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1994,7 +1994,7 @@ static void shrink_readahead_size_eio(struct file *filp, * generic_file_buffered_read - generic file read routine * @iocb: the iocb to read * @iter: data destination - * @written: already copied + * @copied: already copied * * This is a generic file read routine, and uses the * mapping->a_ops->readpage() function for the actual low-level stuff. @@ -2003,11 +2003,11 @@ static void shrink_readahead_size_eio(struct file *filp, * of the logic when it comes to error handling etc. * * Return: - * * total number of bytes copied, including those the were already @written + * * total number of bytes copied, including those that were @copied * * negative error code if nothing was copied */ -static ssize_t generic_file_buffered_read(struct kiocb *iocb, - struct iov_iter *iter, ssize_t written) +ssize_t generic_file_buffered_read(struct kiocb *iocb, + struct iov_iter *iter, ssize_t copied) { struct file *filp = iocb->ki_filp; struct address_space *mapping = filp->f_mapping; @@ -2148,7 +2148,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, prev_offset = offset; put_page(page); - written += ret; + copied += ret; if (!iov_iter_count(iter)) goto out; if (ret < nr) { @@ -2256,8 +2256,9 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb, *ppos = ((loff_t)index << PAGE_SHIFT) + offset; file_accessed(filp); - return written ? written : error; + return copied ? copied : error; } +EXPORT_SYMBOL_GPL(generic_file_buffered_read); /** * generic_file_read_iter - generic filesystem read routine From patchwork Thu Dec 12 00:30:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286629 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1C0BE6C1 for ; Thu, 12 Dec 2019 00:30:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 051FB21655 for ; Thu, 12 Dec 2019 00:30:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727403AbfLLAa4 (ORCPT ); Wed, 11 Dec 2019 19:30:56 -0500 Received: from mx2.suse.de ([195.135.220.15]:42206 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727185AbfLLAaz (ORCPT ); Wed, 11 Dec 2019 19:30:55 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0632AB1A5; Thu, 12 Dec 2019 00:30:54 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 2/8] iomap: add a filesystem hook for direct I/O bio submission Date: Wed, 11 Dec 2019 18:30:37 -0600 Message-Id: <20191212003043.31093-3-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues This helps filesystems to perform tasks on the bio while submitting for I/O. This could be post-write operations such as data CRC or data replication for fs-handled RAID. Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- fs/iomap/direct-io.c | 14 +++++++++----- include/linux/iomap.h | 2 ++ 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 23837926c0c5..1a3bf3bd86fb 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -59,7 +59,7 @@ int iomap_dio_iopoll(struct kiocb *kiocb, bool spin) EXPORT_SYMBOL_GPL(iomap_dio_iopoll); static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, - struct bio *bio) + struct bio *bio, loff_t pos) { atomic_inc(&dio->ref); @@ -67,7 +67,11 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, bio_set_polled(bio, dio->iocb); dio->submit.last_queue = bdev_get_queue(iomap->bdev); - dio->submit.cookie = submit_bio(bio); + if (dio->dops && dio->dops->submit_io) + dio->submit.cookie = dio->dops->submit_io(bio, + dio->iocb->ki_filp, pos); + else + dio->submit.cookie = submit_bio(bio); } static ssize_t iomap_dio_complete(struct iomap_dio *dio) @@ -191,7 +195,7 @@ iomap_dio_zero(struct iomap_dio *dio, struct iomap *iomap, loff_t pos, get_page(page); __bio_add_page(bio, page, len, 0); bio_set_op_attrs(bio, REQ_OP_WRITE, flags); - iomap_dio_submit_bio(dio, iomap, bio); + iomap_dio_submit_bio(dio, iomap, bio, pos); } static loff_t @@ -299,11 +303,11 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length, } dio->size += n; - pos += n; copied += n; nr_pages = iov_iter_npages(dio->submit.iter, BIO_MAX_PAGES); - iomap_dio_submit_bio(dio, iomap, bio); + iomap_dio_submit_bio(dio, iomap, bio, pos); + pos += n; } while (nr_pages); /* diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 8b09463dae0d..2b093a23ef1c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -252,6 +252,8 @@ int iomap_writepages(struct address_space *mapping, struct iomap_dio_ops { int (*end_io)(struct kiocb *iocb, ssize_t size, int error, unsigned flags); + blk_qc_t (*submit_io)(struct bio *bio, struct file *file, + loff_t file_offset); }; ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, From patchwork Thu Dec 12 00:30:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286633 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E4174112B for ; Thu, 12 Dec 2019 00:31:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B994821655 for ; Thu, 12 Dec 2019 00:31:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727412AbfLLAbA (ORCPT ); Wed, 11 Dec 2019 19:31:00 -0500 Received: from mx2.suse.de ([195.135.220.15]:42228 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727185AbfLLAa7 (ORCPT ); Wed, 11 Dec 2019 19:30:59 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 287D3AE64; Thu, 12 Dec 2019 00:30:56 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 3/8] btrfs: Switch to iomap_dio_rw() for dio Date: Wed, 11 Dec 2019 18:30:38 -0600 Message-Id: <20191212003043.31093-4-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Switch from __blockdev_direct_IO() to iomap_dio_rw(). Rename btrfs_get_blocks_direct() to btrfs_dio_iomap_begin() and use it as iomap_begin() for iomap direct I/O functions. This function allocates and locks all the blocks required for the I/O. btrfs_submit_direct() is used as the submit_io() hook for direct I/O ops. Since we need direct I/O reads to go through iomap_dio_rw(), we change file_operations.read_iter() to a btrfs_file_read_iter() which calls btrfs_direct_IO() for direct reads and falls back to generic_file_buffered_read() for incomplete reads and buffered reads. We don't need address_space.direct_IO() anymore so set it to noop. Similarly, we don't need flags used in __blockdev_direct_IO(). iomap is capable of direct I/O reads from a hole, so we don't need to return -ENOENT. BTRFS direct I/O is now done under i_rwsem, shared in case of reads and exclusive in case of writes. This guards against simultaneous truncates. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/ctree.h | 1 + fs/btrfs/file.c | 21 +++++- fs/btrfs/inode.c | 190 ++++++++++++++++++++++++++----------------------------- 3 files changed, 109 insertions(+), 103 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b2e8fd8a8e59..113dcd1a11cd 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2904,6 +2904,7 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end); void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, u64 end, int uptodate); extern const struct dentry_operations btrfs_dentry_operations; +ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); /* ioctl.c */ long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 0cb43b682789..7010dd7beccc 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1832,7 +1832,7 @@ static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) loff_t endbyte; int err; - written = generic_file_direct_write(iocb, from); + written = btrfs_direct_IO(iocb, from); if (written < 0 || !iov_iter_count(from)) return written; @@ -3444,9 +3444,26 @@ static int btrfs_file_open(struct inode *inode, struct file *filp) return generic_file_open(inode, filp); } +static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) +{ + ssize_t ret = 0; + + if (iocb->ki_flags & IOCB_DIRECT) { + struct inode *inode = file_inode(iocb->ki_filp); + + inode_lock_shared(inode); + ret = btrfs_direct_IO(iocb, to); + inode_unlock_shared(inode); + if (ret < 0) + return ret; + } + + return generic_file_buffered_read(iocb, to, ret); +} + const struct file_operations btrfs_file_operations = { .llseek = btrfs_file_llseek, - .read_iter = generic_file_read_iter, + .read_iter = btrfs_file_read_iter, .splice_read = generic_file_splice_read, .write_iter = btrfs_file_write_iter, .mmap = btrfs_file_mmap, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 56032c518b26..91b830022146 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include "misc.h" #include "ctree.h" @@ -7510,7 +7511,7 @@ noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len, } static int lock_extent_direct(struct inode *inode, u64 lockstart, u64 lockend, - struct extent_state **cached_state, int writing) + struct extent_state **cached_state, bool writing) { struct btrfs_ordered_extent *ordered; int ret = 0; @@ -7648,30 +7649,7 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, } -static int btrfs_get_blocks_direct_read(struct extent_map *em, - struct buffer_head *bh_result, - struct inode *inode, - u64 start, u64 len) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - - if (em->block_start == EXTENT_MAP_HOLE || - test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) - return -ENOENT; - - len = min(len, em->len - (start - em->start)); - - bh_result->b_blocknr = (em->block_start + (start - em->start)) >> - inode->i_blkbits; - bh_result->b_size = len; - bh_result->b_bdev = fs_info->fs_devices->latest_bdev; - set_buffer_mapped(bh_result); - - return 0; -} - static int btrfs_get_blocks_direct_write(struct extent_map **map, - struct buffer_head *bh_result, struct inode *inode, struct btrfs_dio_data *dio_data, u64 start, u64 len) @@ -7733,7 +7711,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, } /* this will cow the extent */ - len = bh_result->b_size; free_extent_map(em); *map = em = btrfs_new_extent_direct(inode, start, len); if (IS_ERR(em)) { @@ -7744,15 +7721,6 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, len = min(len, em->len - (start - em->start)); skip_cow: - bh_result->b_blocknr = (em->block_start + (start - em->start)) >> - inode->i_blkbits; - bh_result->b_size = len; - bh_result->b_bdev = fs_info->fs_devices->latest_bdev; - set_buffer_mapped(bh_result); - - if (!test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) - set_buffer_new(bh_result); - /* * Need to update the i_size under the extent lock so buffered * readers will get the updated i_size when we unlock. @@ -7768,24 +7736,37 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, return ret; } -static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, - struct buffer_head *bh_result, int create) +static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, + loff_t length, unsigned flags, struct iomap *iomap, + struct iomap *srcmap) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em; struct extent_state *cached_state = NULL; struct btrfs_dio_data *dio_data = NULL; - u64 start = iblock << inode->i_blkbits; u64 lockstart, lockend; - u64 len = bh_result->b_size; + bool write = !!(flags & IOMAP_WRITE); int ret = 0; + u64 len = length; + bool unlock_extents = false; - if (!create) + if (!write) len = min_t(u64, len, fs_info->sectorsize); lockstart = start; lockend = start + len - 1; + /* + * The generic stuff only does filemap_write_and_wait_range, which + * isn't enough if we've written compressed pages to this area, so + * we need to flush the dirty pages again to make absolutely sure + * that any outstanding dirty pages are on disk. + */ + if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, + &BTRFS_I(inode)->runtime_flags)) + ret = filemap_fdatawrite_range(inode->i_mapping, start, + start + length - 1); + if (current->journal_info) { /* * Need to pull our outstanding extents and set journal_info to NULL so @@ -7801,7 +7782,7 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, * this range and we need to fallback to buffered. */ if (lock_extent_direct(inode, lockstart, lockend, &cached_state, - create)) { + write)) { ret = -ENOTBLK; goto err; } @@ -7833,35 +7814,52 @@ static int btrfs_get_blocks_direct(struct inode *inode, sector_t iblock, goto unlock_err; } - if (create) { - ret = btrfs_get_blocks_direct_write(&em, bh_result, inode, + len = min(len, em->len - (start - em->start)); + if (write) { + ret = btrfs_get_blocks_direct_write(&em, inode, dio_data, start, len); if (ret < 0) goto unlock_err; - - unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, - lockend, &cached_state); + unlock_extents = true; + /* Recalc len in case the new em is smaller than requested */ + len = min(len, em->len - (start - em->start)); + } else if (em->block_start == EXTENT_MAP_HOLE || + test_bit(EXTENT_FLAG_PREALLOC, &em->flags)) { + /* Unlock in case of direct reading from a hole */ + unlock_extents = true; } else { - ret = btrfs_get_blocks_direct_read(em, bh_result, inode, - start, len); - /* Can be negative only if we read from a hole */ - if (ret < 0) { - ret = 0; - free_extent_map(em); - goto unlock_err; - } /* * We need to unlock only the end area that we aren't using. * The rest is going to be unlocked by the endio routine. */ - lockstart = start + bh_result->b_size; - if (lockstart < lockend) { - unlock_extent_cached(&BTRFS_I(inode)->io_tree, - lockstart, lockend, &cached_state); - } else { - free_extent_state(cached_state); - } + lockstart = start + len; + if (lockstart < lockend) + unlock_extents = true; + } + + if (unlock_extents) + unlock_extent_cached(&BTRFS_I(inode)->io_tree, + lockstart, lockend, &cached_state); + else + free_extent_state(cached_state); + + /* + * Translate extent map information to iomap + * We trim the extents (and move the addr) even though + * iomap code does that, since we have locked only the parts + * we are performing I/O in. + */ + if ((em->block_start == EXTENT_MAP_HOLE) || + (test_bit(EXTENT_FLAG_PREALLOC, &em->flags) && !write)) { + iomap->addr = IOMAP_NULL_ADDR; + iomap->type = IOMAP_HOLE; + } else { + iomap->addr = em->block_start + (start - em->start); + iomap->type = IOMAP_MAPPED; } + iomap->offset = start; + iomap->bdev = fs_info->fs_devices->latest_bdev; + iomap->length = len; free_extent_map(em); @@ -8230,9 +8228,8 @@ static void btrfs_endio_direct_read(struct bio *bio) kfree(dip); dio_bio->bi_status = err; - dio_end_io(dio_bio); + bio_endio(dio_bio); btrfs_io_bio_free_csum(io_bio); - bio_put(bio); } static void __endio_write_update_ordered(struct inode *inode, @@ -8289,8 +8286,7 @@ static void btrfs_endio_direct_write(struct bio *bio) kfree(dip); dio_bio->bi_status = bio->bi_status; - dio_end_io(dio_bio); - bio_put(bio); + bio_endio(dio_bio); } static blk_status_t btrfs_submit_bio_start_direct_io(void *private_data, @@ -8522,9 +8518,10 @@ static int btrfs_submit_direct_hook(struct btrfs_dio_private *dip) return 0; } -static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, +static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file, loff_t file_offset) { + struct inode *inode = file_inode(file); struct btrfs_dio_private *dip = NULL; struct bio *bio = NULL; struct btrfs_io_bio *io_bio; @@ -8575,7 +8572,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, ret = btrfs_submit_direct_hook(dip); if (!ret) - return; + return BLK_QC_T_NONE; btrfs_io_bio_free_csum(io_bio); @@ -8594,7 +8591,7 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, /* * The end io callbacks free our dip, do the final put on bio * and all the cleanup and final put for dio_bio (through - * dio_end_io()). + * end_io()). */ dip = NULL; bio = NULL; @@ -8609,15 +8606,12 @@ static void btrfs_submit_direct(struct bio *dio_bio, struct inode *inode, file_offset + dio_bio->bi_iter.bi_size - 1); dio_bio->bi_status = BLK_STS_IOERR; - /* - * Releases and cleans up our dio_bio, no need to bio_put() - * nor bio_endio()/bio_io_error() against dio_bio. - */ - dio_end_io(dio_bio); + bio_endio(dio_bio); } if (bio) bio_put(bio); kfree(dip); + return BLK_QC_T_NONE; } static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, @@ -8653,7 +8647,23 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, return retval; } -static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) +static const struct iomap_ops btrfs_dio_iomap_ops = { + .iomap_begin = btrfs_dio_iomap_begin, +}; + +static const struct iomap_dio_ops btrfs_dops = { + .submit_io = btrfs_submit_direct, +}; + + +/* + * btrfs_direct_IO - perform direct I/O + * inode->i_rwsem must be locked before calling this function, shared or exclusive. + * @iocb - kernel iocb + * @iter - iter to/from data is copied + */ + +ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; @@ -8662,28 +8672,15 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct extent_changeset *data_reserved = NULL; loff_t offset = iocb->ki_pos; size_t count = 0; - int flags = 0; - bool wakeup = true; bool relock = false; ssize_t ret; + lockdep_assert_held(&inode->i_rwsem); + if (check_direct_IO(fs_info, iter, offset)) return 0; - inode_dio_begin(inode); - - /* - * The generic stuff only does filemap_write_and_wait_range, which - * isn't enough if we've written compressed pages to this area, so - * we need to flush the dirty pages again to make absolutely sure - * that any outstanding dirty pages are on disk. - */ count = iov_iter_count(iter); - if (test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, - &BTRFS_I(inode)->runtime_flags)) - filemap_fdatawrite_range(inode->i_mapping, offset, - offset + count - 1); - if (iov_iter_rw(iter) == WRITE) { /* * If the write DIO is beyond the EOF, we need update @@ -8714,17 +8711,11 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) dio_data.unsubmitted_oe_range_end = (u64)offset; current->journal_info = &dio_data; down_read(&BTRFS_I(inode)->dio_sem); - } else if (test_bit(BTRFS_INODE_READDIO_NEED_LOCK, - &BTRFS_I(inode)->runtime_flags)) { - inode_dio_end(inode); - flags = DIO_LOCKING | DIO_SKIP_HOLES; - wakeup = false; } - ret = __blockdev_direct_IO(iocb, inode, - fs_info->fs_devices->latest_bdev, - iter, btrfs_get_blocks_direct, NULL, - btrfs_submit_direct, flags); + ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, &btrfs_dops, + is_sync_kiocb(iocb)); + if (iov_iter_rw(iter) == WRITE) { up_read(&BTRFS_I(inode)->dio_sem); current->journal_info = NULL; @@ -8751,11 +8742,8 @@ static ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) btrfs_delalloc_release_extents(BTRFS_I(inode), count); } out: - if (wakeup) - inode_dio_end(inode); if (relock) inode_lock(inode); - extent_changeset_free(data_reserved); return ret; } @@ -11045,7 +11033,7 @@ static const struct address_space_operations btrfs_aops = { .writepage = btrfs_writepage, .writepages = btrfs_writepages, .readpages = btrfs_readpages, - .direct_IO = btrfs_direct_IO, + .direct_IO = noop_direct_IO, .invalidatepage = btrfs_invalidatepage, .releasepage = btrfs_releasepage, .set_page_dirty = btrfs_set_page_dirty, From patchwork Thu Dec 12 00:30:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286639 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ACA29188B for ; Thu, 12 Dec 2019 00:31:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 962D722B48 for ; Thu, 12 Dec 2019 00:31:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727421AbfLLAbA (ORCPT ); Wed, 11 Dec 2019 19:31:00 -0500 Received: from mx2.suse.de ([195.135.220.15]:42244 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727356AbfLLAbA (ORCPT ); Wed, 11 Dec 2019 19:31:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 48BA1B246; Thu, 12 Dec 2019 00:30:58 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 4/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls Date: Wed, 11 Dec 2019 18:30:39 -0600 Message-Id: <20191212003043.31093-5-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Filesystems such as btrfs can perform direct I/O without holding the inode->i_rwsem in some of the cases like writing within i_size. So, remove the check for lockdep_assert_held() in iomap_dio_rw() and move it to where iomap_dio_rw() is called so the checks are in place. Signed-off-by: Goldwyn Rodrigues --- fs/gfs2/file.c | 4 ++++ fs/iomap/direct-io.c | 2 -- fs/xfs/xfs_file.c | 9 ++++++++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 9d58295ccf7a..d0517a78640b 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -771,6 +771,8 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to) if (ret) goto out_uninit; + lockdep_assert_held(&file_inode(file)->i_rwsem); + ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); @@ -807,6 +809,8 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from) if (offset + len > i_size_read(&ip->i_inode)) goto out; + lockdep_assert_held(&inode->i_rwsem); + ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL, is_sync_kiocb(iocb)); diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 1a3bf3bd86fb..41c1e7c20a1f 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -415,8 +415,6 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct blk_plug plug; struct iomap_dio *dio; - lockdep_assert_held(&inode->i_rwsem); - if (!count) return 0; diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index c93250108952..dfaccd4d6e6c 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -176,7 +176,8 @@ xfs_file_dio_aio_read( struct kiocb *iocb, struct iov_iter *to) { - struct xfs_inode *ip = XFS_I(file_inode(iocb->ki_filp)); + struct inode *inode = file_inode(iocb->ki_filp); + struct xfs_inode *ip = XFS_I(inode); size_t count = iov_iter_count(to); ssize_t ret; @@ -188,6 +189,9 @@ xfs_file_dio_aio_read( file_accessed(iocb->ki_filp); xfs_ilock(ip, XFS_IOLOCK_SHARED); + + lockdep_assert_held(&inode->i_rwsem); + ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL, is_sync_kiocb(iocb)); xfs_iunlock(ip, XFS_IOLOCK_SHARED); @@ -547,6 +551,9 @@ xfs_file_dio_aio_write( } trace_xfs_file_direct_write(ip, count, iocb->ki_pos); + + lockdep_assert_held(&inode->i_rwsem); + /* * If unaligned, this is the only IO in-flight. Wait on it before we * release the iolock to prevent subsequent overlapping IO. From patchwork Thu Dec 12 00:30:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A7BE7112B for ; Thu, 12 Dec 2019 00:31:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 872FF21655 for ; Thu, 12 Dec 2019 00:31:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727434AbfLLAbE (ORCPT ); Wed, 11 Dec 2019 19:31:04 -0500 Received: from mx2.suse.de ([195.135.220.15]:42252 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727356AbfLLAbC (ORCPT ); Wed, 11 Dec 2019 19:31:02 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6BA29ACF2; Thu, 12 Dec 2019 00:31:00 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 5/8] fs: Remove dio_end_io() Date: Wed, 11 Dec 2019 18:30:40 -0600 Message-Id: <20191212003043.31093-6-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Since we removed the last user of dio_end_io(), remove the helper function dio_end_io(). Reviewed-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig --- fs/direct-io.c | 19 ------------------- include/linux/fs.h | 2 -- 2 files changed, 21 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 0ec4f270139f..fb19a77bec22 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -384,25 +384,6 @@ static void dio_bio_end_io(struct bio *bio) spin_unlock_irqrestore(&dio->bio_lock, flags); } -/** - * dio_end_io - handle the end io action for the given bio - * @bio: The direct io bio thats being completed - * - * This is meant to be called by any filesystem that uses their own dio_submit_t - * so that the DIO specific endio actions are dealt with after the filesystem - * has done it's completion work. - */ -void dio_end_io(struct bio *bio) -{ - struct dio *dio = bio->bi_private; - - if (dio->is_async) - dio_bio_end_aio(bio); - else - dio_bio_end_io(bio); -} -EXPORT_SYMBOL_GPL(dio_end_io); - static inline void dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, struct block_device *bdev, diff --git a/include/linux/fs.h b/include/linux/fs.h index 3883adaa0e01..d708422f77e0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3157,8 +3157,6 @@ enum { DIO_SKIP_HOLES = 0x02, }; -void dio_end_io(struct bio *bio); - ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, get_block_t get_block, From patchwork Thu Dec 12 00:30:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 44D4515AB for ; Thu, 12 Dec 2019 00:31:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2E42B2173E for ; Thu, 12 Dec 2019 00:31:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727440AbfLLAbF (ORCPT ); Wed, 11 Dec 2019 19:31:05 -0500 Received: from mx2.suse.de ([195.135.220.15]:42268 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727359AbfLLAbF (ORCPT ); Wed, 11 Dec 2019 19:31:05 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 7F614ADF1; Thu, 12 Dec 2019 00:31:02 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 6/8] btrfs: Wait for extent bits to release page Date: Wed, 11 Dec 2019 18:30:41 -0600 Message-Id: <20191212003043.31093-7-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues While trying to release a page, the extent containing the page may be locked which would stop the page from being released. Wait for the extent lock to be cleared, if blocking is allowed and then clear the bits. While we are at it, clean the code of try_release_extent_state() to make it simpler. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Johannes Thumshirn Reviewed-by: Nikolay Borisov --- fs/btrfs/extent_io.c | 37 ++++++++++++++++--------------------- fs/btrfs/extent_io.h | 2 +- fs/btrfs/inode.c | 4 ++-- 3 files changed, 19 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index eb8bd0258360..193785981183 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4360,33 +4360,28 @@ int extent_invalidatepage(struct extent_io_tree *tree, * are locked or under IO and drops the related state bits if it is safe * to drop the page. */ -static int try_release_extent_state(struct extent_io_tree *tree, +static bool try_release_extent_state(struct extent_io_tree *tree, struct page *page, gfp_t mask) { u64 start = page_offset(page); u64 end = start + PAGE_SIZE - 1; - int ret = 1; if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) { - ret = 0; - } else { - /* - * at this point we can safely clear everything except the - * locked bit and the nodatasum bit - */ - ret = __clear_extent_bit(tree, start, end, - ~(EXTENT_LOCKED | EXTENT_NODATASUM), - 0, 0, NULL, mask, NULL); - - /* if clear_extent_bit failed for enomem reasons, - * we can't allow the release to continue. - */ - if (ret < 0) - ret = 0; - else - ret = 1; + if (!gfpflags_allow_blocking(mask)) + return false; + wait_extent_bit(tree, start, end, EXTENT_LOCKED); } - return ret; + /* + * At this point we can safely clear everything except the locked and + * nodatasum bits. If clear_extent_bit failed due to -ENOMEM, + * don't allow release. + */ + if (__clear_extent_bit(tree, start, end, + ~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0, + NULL, mask, NULL) < 0) + return false; + + return true; } /* @@ -4394,7 +4389,7 @@ static int try_release_extent_state(struct extent_io_tree *tree, * in the range corresponding to the page, both state records and extent * map records are removed */ -int try_release_extent_mapping(struct page *page, gfp_t mask) +bool try_release_extent_mapping(struct page *page, gfp_t mask) { struct extent_map *em; u64 start = page_offset(page); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index a8551a1f56e2..89cc0cf8a7fd 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -188,7 +188,7 @@ typedef struct extent_map *(get_extent_t)(struct btrfs_inode *inode, u64 start, u64 len, int create); -int try_release_extent_mapping(struct page *page, gfp_t mask); +bool try_release_extent_mapping(struct page *page, gfp_t mask); int try_release_extent_buffer(struct page *page); int extent_read_full_page(struct extent_io_tree *tree, struct page *page, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 91b830022146..2e7452c7fe61 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8809,8 +8809,8 @@ btrfs_readpages(struct file *file, struct address_space *mapping, static int __btrfs_releasepage(struct page *page, gfp_t gfp_flags) { - int ret = try_release_extent_mapping(page, gfp_flags); - if (ret == 1) { + bool ret = try_release_extent_mapping(page, gfp_flags); + if (ret) { ClearPagePrivate(page); set_page_private(page, 0); put_page(page); From patchwork Thu Dec 12 00:30:42 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286651 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4C347112B for ; Thu, 12 Dec 2019 00:31:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 353E522B48 for ; Thu, 12 Dec 2019 00:31:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727444AbfLLAbI (ORCPT ); Wed, 11 Dec 2019 19:31:08 -0500 Received: from mx2.suse.de ([195.135.220.15]:42304 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727185AbfLLAbH (ORCPT ); Wed, 11 Dec 2019 19:31:07 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AF659B2AE; Thu, 12 Dec 2019 00:31:04 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 7/8] btrfs: Use iomap_end() instead of btrfs_dio_data Date: Wed, 11 Dec 2019 18:30:42 -0600 Message-Id: <20191212003043.31093-8-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Use iomap_end() to check for failed or incomplete writes and call __endio_write_update_ordered(). We don't need btrfs_dio_data anymore so remove that. The bonus is we don't abuse current->journal_info anymore. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/inode.c | 89 ++++++++++---------------------------------------------- 1 file changed, 16 insertions(+), 73 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2e7452c7fe61..c5c809ab84cf 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -56,13 +56,6 @@ struct btrfs_iget_args { struct btrfs_root *root; }; -struct btrfs_dio_data { - u64 reserve; - u64 unsubmitted_oe_range_start; - u64 unsubmitted_oe_range_end; - int overwrite; -}; - static const struct inode_operations btrfs_dir_inode_operations; static const struct inode_operations btrfs_symlink_inode_operations; static const struct inode_operations btrfs_dir_ro_inode_operations; @@ -7651,7 +7644,6 @@ static struct extent_map *create_io_em(struct inode *inode, u64 start, u64 len, static int btrfs_get_blocks_direct_write(struct extent_map **map, struct inode *inode, - struct btrfs_dio_data *dio_data, u64 start, u64 len) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); @@ -7725,13 +7717,9 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map, * Need to update the i_size under the extent lock so buffered * readers will get the updated i_size when we unlock. */ - if (!dio_data->overwrite && start + len > i_size_read(inode)) + if (start + len > i_size_read(inode)) i_size_write(inode, start + len); - WARN_ON(dio_data->reserve < len); - dio_data->reserve -= len; - dio_data->unsubmitted_oe_range_end = start + len; - current->journal_info = dio_data; out: return ret; } @@ -7743,7 +7731,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_map *em; struct extent_state *cached_state = NULL; - struct btrfs_dio_data *dio_data = NULL; u64 lockstart, lockend; bool write = !!(flags & IOMAP_WRITE); int ret = 0; @@ -7767,16 +7754,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, ret = filemap_fdatawrite_range(inode->i_mapping, start, start + length - 1); - if (current->journal_info) { - /* - * Need to pull our outstanding extents and set journal_info to NULL so - * that anything that needs to check if there's a transaction doesn't get - * confused. - */ - dio_data = current->journal_info; - current->journal_info = NULL; - } - /* * If this errors out it's because we couldn't invalidate pagecache for * this range and we need to fallback to buffered. @@ -7817,7 +7794,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, len = min(len, em->len - (start - em->start)); if (write) { ret = btrfs_get_blocks_direct_write(&em, inode, - dio_data, start, len); + start, len); if (ret < 0) goto unlock_err; unlock_extents = true; @@ -7869,11 +7846,21 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state); err: - if (dio_data) - current->journal_info = dio_data; return ret; } +static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length, + ssize_t written, unsigned flags, struct iomap *iomap) +{ + if (!(flags & IOMAP_WRITE)) + return 0; + + if (written < length) + __endio_write_update_ordered(inode, pos + written, + length - written, false); + return 0; +} + static inline blk_status_t submit_dio_repair_bio(struct inode *inode, struct bio *bio, int mirror_num) @@ -8555,21 +8542,6 @@ static blk_qc_t btrfs_submit_direct(struct bio *dio_bio, struct file *file, dip->subio_endio = btrfs_subio_endio_read; } - /* - * Reset the range for unsubmitted ordered extents (to a 0 length range) - * even if we fail to submit a bio, because in such case we do the - * corresponding error handling below and it must not be done a second - * time by btrfs_direct_IO(). - */ - if (write) { - struct btrfs_dio_data *dio_data = current->journal_info; - - dio_data->unsubmitted_oe_range_end = dip->logical_offset + - dip->bytes; - dio_data->unsubmitted_oe_range_start = - dio_data->unsubmitted_oe_range_end; - } - ret = btrfs_submit_direct_hook(dip); if (!ret) return BLK_QC_T_NONE; @@ -8649,6 +8621,7 @@ static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, static const struct iomap_ops btrfs_dio_iomap_ops = { .iomap_begin = btrfs_dio_iomap_begin, + .iomap_end = btrfs_dio_iomap_end, }; static const struct iomap_dio_ops btrfs_dops = { @@ -8668,7 +8641,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_dio_data dio_data = { 0 }; struct extent_changeset *data_reserved = NULL; loff_t offset = iocb->ki_pos; size_t count = 0; @@ -8688,7 +8660,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) * not unlock the i_mutex at this case. */ if (offset + count <= inode->i_size) { - dio_data.overwrite = 1; inode_unlock(inode); relock = true; } else if (iocb->ki_flags & IOCB_NOWAIT) { @@ -8700,16 +8671,6 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (ret) goto out; - /* - * We need to know how many extents we reserved so that we can - * do the accounting properly if we go over the number we - * originally calculated. Abuse current->journal_info for this. - */ - dio_data.reserve = round_up(count, - fs_info->sectorsize); - dio_data.unsubmitted_oe_range_start = (u64)offset; - dio_data.unsubmitted_oe_range_end = (u64)offset; - current->journal_info = &dio_data; down_read(&BTRFS_I(inode)->dio_sem); } @@ -8718,25 +8679,7 @@ ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) if (iov_iter_rw(iter) == WRITE) { up_read(&BTRFS_I(inode)->dio_sem); - current->journal_info = NULL; - if (ret < 0 && ret != -EIOCBQUEUED) { - if (dio_data.reserve) - btrfs_delalloc_release_space(inode, data_reserved, - offset, dio_data.reserve, true); - /* - * On error we might have left some ordered extents - * without submitting corresponding bios for them, so - * cleanup them up to avoid other tasks getting them - * and waiting for them to complete forever. - */ - if (dio_data.unsubmitted_oe_range_start < - dio_data.unsubmitted_oe_range_end) - __endio_write_update_ordered(inode, - dio_data.unsubmitted_oe_range_start, - dio_data.unsubmitted_oe_range_end - - dio_data.unsubmitted_oe_range_start, - false); - } else if (ret >= 0 && (size_t)ret < count) + if (ret >= 0 && (size_t)ret < count) btrfs_delalloc_release_space(inode, data_reserved, offset, count - (size_t)ret, true); btrfs_delalloc_release_extents(BTRFS_I(inode), count); From patchwork Thu Dec 12 00:30:43 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11286653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92B41112B for ; Thu, 12 Dec 2019 00:31:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 72A9221655 for ; Thu, 12 Dec 2019 00:31:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727453AbfLLAbJ (ORCPT ); Wed, 11 Dec 2019 19:31:09 -0500 Received: from mx2.suse.de ([195.135.220.15]:42316 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727359AbfLLAbI (ORCPT ); Wed, 11 Dec 2019 19:31:08 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CFF9DACF2; Thu, 12 Dec 2019 00:31:06 +0000 (UTC) From: Goldwyn Rodrigues To: linux-btrfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, hch@infradead.org, darrick.wong@oracle.com, fdmanana@kernel.org, dsterba@suse.cz, jthumshirn@suse.de, nborisov@suse.com, Goldwyn Rodrigues Subject: [PATCH 8/8] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Date: Wed, 11 Dec 2019 18:30:43 -0600 Message-Id: <20191212003043.31093-9-rgoldwyn@suse.de> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191212003043.31093-1-rgoldwyn@suse.de> References: <20191212003043.31093-1-rgoldwyn@suse.de> Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Since we now perform direct reads using i_rwsem, we can remove this inode flag used to co-ordinate unlocked reads. The truncate call takes i_rwsem. This means it is correctly synchronized with concurrent direct reads. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn --- fs/btrfs/btrfs_inode.h | 18 ------------------ fs/btrfs/inode.c | 5 ----- 2 files changed, 23 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 4e12a477d32e..cd8f378ed8e7 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -27,7 +27,6 @@ enum { BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_INODE_COPY_EVERYTHING, BTRFS_INODE_IN_DELALLOC_LIST, - BTRFS_INODE_READDIO_NEED_LOCK, BTRFS_INODE_HAS_PROPS, BTRFS_INODE_SNAPSHOT_FLUSH, }; @@ -317,23 +316,6 @@ struct btrfs_dio_private { blk_status_t); }; -/* - * Disable DIO read nolock optimization, so new dio readers will be forced - * to grab i_mutex. It is used to avoid the endless truncate due to - * nonlocked dio read. - */ -static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode) -{ - set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); - smp_mb(); -} - -static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode) -{ - smp_mb__before_atomic(); - clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); -} - /* Array of bytes with variable length, hexadecimal format 0x1234 */ #define CSUM_FMT "0x%*phN" #define CSUM_FMT_VALUE(size, bytes) size, bytes diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index c5c809ab84cf..501e2a8f5b93 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -5273,11 +5273,6 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) truncate_setsize(inode, newsize); - /* Disable nonlocked read DIO to avoid the endless truncate */ - btrfs_inode_block_unlocked_dio(BTRFS_I(inode)); - inode_dio_wait(inode); - btrfs_inode_resume_unlocked_dio(BTRFS_I(inode)); - ret = btrfs_truncate(inode, newsize == oldsize); if (ret && inode->i_nlink) { int err;