From patchwork Mon Sep 21 14:43:39 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790185 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 31E1E112E for ; Mon, 21 Sep 2020 14:44:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F881235F8 for ; Mon, 21 Sep 2020 14:44:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728283AbgIUOoK (ORCPT ); Mon, 21 Sep 2020 10:44:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:55288 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727978AbgIUOoJ (ORCPT ); Mon, 21 Sep 2020 10:44:09 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id EC4B4AC4C; Mon, 21 Sep 2020 14:44:43 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues , Nikolay Borisov , Johannes Thumshirn Subject: [PATCH 01/15] fs: remove dio_end_io() Date: Mon, 21 Sep 2020 09:43:39 -0500 Message-Id: <20200921144353.31319-2-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Since we removed the last user of dio_end_io(), remove the helper function dio_end_io(). Reviewed-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/direct-io.c | 19 ------------------- include/linux/fs.h | 2 -- 2 files changed, 21 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 183299892465..abf535b036ab 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -386,25 +386,6 @@ static void dio_bio_end_io(struct bio *bio) spin_unlock_irqrestore(&dio->bio_lock, flags); } -/** - * dio_end_io - handle the end io action for the given bio - * @bio: The direct io bio thats being completed - * - * This is meant to be called by any filesystem that uses their own dio_submit_t - * so that the DIO specific endio actions are dealt with after the filesystem - * has done it's completion work. - */ -void dio_end_io(struct bio *bio) -{ - struct dio *dio = bio->bi_private; - - if (dio->is_async) - dio_bio_end_aio(bio); - else - dio_bio_end_io(bio); -} -EXPORT_SYMBOL_GPL(dio_end_io); - static inline void dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, struct block_device *bdev, diff --git a/include/linux/fs.h b/include/linux/fs.h index 7519ae003a08..8e2842a9c0b3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3079,8 +3079,6 @@ enum { DIO_SKIP_HOLES = 0x02, }; -void dio_end_io(struct bio *bio); - ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, struct block_device *bdev, struct iov_iter *iter, get_block_t get_block, From patchwork Mon Sep 21 14:43:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790189 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4E5E59D for ; Mon, 21 Sep 2020 14:44:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 89FAE221EC for ; Mon, 21 Sep 2020 14:44:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728287AbgIUOoT (ORCPT ); Mon, 21 Sep 2020 10:44:19 -0400 Received: from mx2.suse.de ([195.135.220.15]:55414 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727109AbgIUOoM (ORCPT ); Mon, 21 Sep 2020 10:44:12 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id A43DBAD6B; Mon, 21 Sep 2020 14:44:46 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues , Nikolay Borisov , Johannes Thumshirn Subject: [PATCH 02/15] btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK Date: Mon, 21 Sep 2020 09:43:40 -0500 Message-Id: <20200921144353.31319-3-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Since we now perform direct reads using i_rwsem, we can remove this inode flag used to co-ordinate unlocked reads. The truncate call takes i_rwsem. This means it is correctly synchronized with concurrent direct reads. Reviewed-by: Nikolay Borisov Reviewed-by: Johannes Thumshirn Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/btrfs_inode.h | 18 ------------------ fs/btrfs/inode.c | 3 --- 2 files changed, 21 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 6fdb46d58299..738009a22320 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -33,7 +33,6 @@ enum { BTRFS_INODE_NEEDS_FULL_SYNC, BTRFS_INODE_COPY_EVERYTHING, BTRFS_INODE_IN_DELALLOC_LIST, - BTRFS_INODE_READDIO_NEED_LOCK, BTRFS_INODE_HAS_PROPS, BTRFS_INODE_SNAPSHOT_FLUSH, }; @@ -334,23 +333,6 @@ struct btrfs_dio_private { u8 csums[]; }; -/* - * Disable DIO read nolock optimization, so new dio readers will be forced - * to grab i_mutex. It is used to avoid the endless truncate due to - * nonlocked dio read. - */ -static inline void btrfs_inode_block_unlocked_dio(struct btrfs_inode *inode) -{ - set_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); - smp_mb(); -} - -static inline void btrfs_inode_resume_unlocked_dio(struct btrfs_inode *inode) -{ - smp_mb__before_atomic(); - clear_bit(BTRFS_INODE_READDIO_NEED_LOCK, &inode->runtime_flags); -} - /* Array of bytes with variable length, hexadecimal format 0x1234 */ #define CSUM_FMT "0x%*phN" #define CSUM_FMT_VALUE(size, bytes) size, bytes diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 18d171b2d544..3db4697ff1ba 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4855,10 +4855,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) truncate_setsize(inode, newsize); - /* Disable nonlocked read DIO to avoid the endless truncate */ - btrfs_inode_block_unlocked_dio(BTRFS_I(inode)); inode_dio_wait(inode); - btrfs_inode_resume_unlocked_dio(BTRFS_I(inode)); ret = btrfs_truncate(inode, newsize == oldsize); if (ret && inode->i_nlink) { From patchwork Mon Sep 21 14:43:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790197 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C168359D for ; Mon, 21 Sep 2020 14:44:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B36022311D for ; Mon, 21 Sep 2020 14:44:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728034AbgIUOoS (ORCPT ); Mon, 21 Sep 2020 10:44:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:55520 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727446AbgIUOoP (ORCPT ); Mon, 21 Sep 2020 10:44:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 8F450AD85; Mon, 21 Sep 2020 14:44:49 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 03/15] iomap: Allow filesystem to call iomap_dio_complete without i_rwsem Date: Mon, 21 Sep 2020 09:43:41 -0500 Message-Id: <20200921144353.31319-4-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Christoph Hellwig This is to avoid the deadlock caused in btrfs because of O_DIRECT | O_DSYNC. Filesystems such as btrfs require i_rwsem while performing sync on a file. iomap_dio_rw() is called under i_rw_sem. This leads to a deadlock because of: iomap_dio_complete() generic_write_sync() btrfs_sync_file() Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems can call iomap_dio_complete() after unlocking i_rwsem. Signed-off-by: Goldwyn Rodrigues Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/iomap/direct-io.c | 34 +++++++++++++++++++++++++--------- include/linux/iomap.h | 5 +++++ 2 files changed, 30 insertions(+), 9 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index c1aafb2ab990..d970c6bbbe11 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, dio->submit.cookie = submit_bio(bio); } -static ssize_t iomap_dio_complete(struct iomap_dio *dio) +ssize_t iomap_dio_complete(struct iomap_dio *dio) { const struct iomap_dio_ops *dops = dio->dops; struct kiocb *iocb = dio->iocb; @@ -130,6 +130,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio) return ret; } +EXPORT_SYMBOL_GPL(iomap_dio_complete); static void iomap_dio_complete_work(struct work_struct *work) { @@ -406,8 +407,8 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, * Returns -ENOTBLK In case of a page invalidation invalidation failure for * writes. The callers needs to fall back to buffered I/O in this case. */ -ssize_t -iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, +struct iomap_dio * +__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, bool wait_for_completion) { @@ -421,14 +422,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct iomap_dio *dio; if (!count) - return 0; + return NULL; if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) - return -EIO; + return ERR_PTR(-EIO); dio = kmalloc(sizeof(*dio), GFP_KERNEL); if (!dio) - return -ENOMEM; + return ERR_PTR(-ENOMEM); dio->iocb = iocb; atomic_set(&dio->ref, 1); @@ -558,7 +559,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->wait_for_completion = wait_for_completion; if (!atomic_dec_and_test(&dio->ref)) { if (!wait_for_completion) - return -EIOCBQUEUED; + return ERR_PTR(-EIOCBQUEUED); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); @@ -574,10 +575,25 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, __set_current_state(TASK_RUNNING); } - return iomap_dio_complete(dio); + return dio; out_free_dio: kfree(dio); - return ret; + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(__iomap_dio_rw); + +ssize_t +iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops, const struct iomap_dio_ops *dops, + bool wait_for_completion) +{ + struct iomap_dio *dio; + + dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion); + if (IS_ERR_OR_NULL(dio)) + return PTR_ERR_OR_ZERO(dio); + return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4d1d3c3469e9..172b3397a1a3 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -13,6 +13,7 @@ struct address_space; struct fiemap_extent_info; struct inode; +struct iomap_dio; struct iomap_writepage_ctx; struct iov_iter; struct kiocb; @@ -258,6 +259,10 @@ struct iomap_dio_ops { ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, bool wait_for_completion); +struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops, const struct iomap_dio_ops *dops, + bool wait_for_completion); +ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP From patchwork Mon Sep 21 14:43:42 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790195 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2DF7B112E for ; Mon, 21 Sep 2020 14:44:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1536F221EC for ; Mon, 21 Sep 2020 14:44:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728291AbgIUOoY (ORCPT ); Mon, 21 Sep 2020 10:44:24 -0400 Received: from mx2.suse.de ([195.135.220.15]:55620 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727610AbgIUOoT (ORCPT ); Mon, 21 Sep 2020 10:44:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 55DB4AC4C; Mon, 21 Sep 2020 14:44:52 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 04/15] iomap: Call inode_dio_end() before generic_write_sync() Date: Mon, 21 Sep 2020 09:43:42 -0500 Message-Id: <20200921144353.31319-5-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues iomap complete routine can deadlock with btrfs_fallocate because of the call to generic_write_sync(). P0 P1 inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) __iomap_dio_rw() inode_lock() inode_unlock() inode_dio_wait() iomap_dio_complete() generic_write_sync() btrfs_file_fsync() inode_lock() inode_dio_end() is used to notify the end of DIO data in order to synchronize with truncate. Call inode_dio_end() before calling generic_write_sync(), so filesystems can lock i_rwsem during a sync. Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/iomap/direct-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index d970c6bbbe11..e01f81e7b76f 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -118,6 +118,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) dio_warn_stale_pagecache(iocb->ki_filp); } + inode_dio_end(file_inode(iocb->ki_filp)); /* * If this is a DSYNC write, make sure we push it to stable storage now * that we've written data. @@ -125,7 +126,6 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) if (ret > 0 && (dio->flags & IOMAP_DIO_NEED_SYNC)) ret = generic_write_sync(iocb, ret); - inode_dio_end(file_inode(iocb->ki_filp)); kfree(dio); return ret; From patchwork Mon Sep 21 14:43:43 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790211 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8EC9359D for ; Mon, 21 Sep 2020 14:44:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 78EE22076E for ; Mon, 21 Sep 2020 14:44:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727337AbgIUOoe (ORCPT ); Mon, 21 Sep 2020 10:44:34 -0400 Received: from mx2.suse.de ([195.135.220.15]:55760 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728289AbgIUOoW (ORCPT ); Mon, 21 Sep 2020 10:44:22 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4FE8BAD76; Mon, 21 Sep 2020 14:44:55 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 05/15] btrfs: split btrfs_direct_IO to read and write Date: Mon, 21 Sep 2020 09:43:43 -0500 Message-Id: <20200921144353.31319-6-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues The read and write DIO don't have anything in common except for the call to iomap_dio_rw. Extract the write call into a new function to get rid of conditional statements for direct write. Originally proposed by Christoph Hellwig Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 5 ++- fs/btrfs/file.c | 98 +++++++++++++++++++++++++++++++++++++++++++----- fs/btrfs/inode.c | 91 ++------------------------------------------ 3 files changed, 95 insertions(+), 99 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index cd644c755142..b47a8dcff028 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -28,6 +28,7 @@ #include #include #include +#include #include "extent-io-tree.h" #include "extent_io.h" #include "extent_map.h" @@ -3048,7 +3049,9 @@ int btrfs_writepage_cow_fixup(struct page *page, u64 start, u64 end); void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, u64 end, int uptodate); extern const struct dentry_operations btrfs_dentry_operations; -ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter); +extern const struct iomap_ops btrfs_dio_iomap_ops; +extern const struct iomap_dio_ops btrfs_dio_ops; +extern const struct iomap_dio_ops btrfs_sync_dops; /* ioctl.c */ long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 038e0afaf3d0..910e2fd234a9 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1855,21 +1855,68 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, return num_written ? num_written : ret; } -static ssize_t __btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) +static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, + const struct iov_iter *iter, loff_t offset) +{ + const unsigned int blocksize_mask = fs_info->sectorsize - 1; + + if (offset & blocksize_mask) + return -EINVAL; + + if (iov_iter_alignment(iter) & blocksize_mask) + return -EINVAL; + + return 0; +} + +static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) { struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); - loff_t pos; - ssize_t written; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + loff_t pos = iocb->ki_pos; + ssize_t written = 0; + bool relock = false; ssize_t written_buffered; loff_t endbyte; int err; - written = btrfs_direct_IO(iocb, from); + if (check_direct_IO(fs_info, from, pos)) + goto buffered; + + /* + * If the write DIO is beyond the EOF, we need update + * the isize, but it is protected by i_mutex. So we can + * not unlock the i_mutex at this case. + */ + if (pos + iov_iter_count(from) <= inode->i_size) { + inode_unlock(inode); + relock = true; + } + down_read(&BTRFS_I(inode)->dio_sem); + + /* + * We have are actually a sync iocb, so we need our fancy endio to know + * if we need to sync. + */ + if (current->journal_info) + written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, + &btrfs_sync_dops, is_sync_kiocb(iocb)); + else + written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, + &btrfs_dio_ops, is_sync_kiocb(iocb)); + + if (written == -ENOTBLK) + written = 0; + + up_read(&BTRFS_I(inode)->dio_sem); + if (relock) + inode_lock(inode); if (written < 0 || !iov_iter_count(from)) return written; +buffered: pos = iocb->ki_pos; written_buffered = btrfs_buffered_write(iocb, from); if (written_buffered < 0) { @@ -2043,7 +2090,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, iocb->ki_flags &= ~IOCB_DSYNC; current->journal_info = BTRFS_DIO_SYNC_STUB; } - num_written = __btrfs_direct_write(iocb, from); + num_written = btrfs_direct_write(iocb, from); /* * As stated above, we cleared journal_info, so we need to do @@ -3618,16 +3665,47 @@ static int btrfs_file_open(struct inode *inode, struct file *filp) return generic_file_open(inode, filp); } +static int check_direct_read(struct btrfs_fs_info *fs_info, + const struct iov_iter *iter, loff_t offset) +{ + int ret; + int i, seg; + + ret = check_direct_IO(fs_info, iter, offset); + if (ret < 0) + return ret; + + if (!iter_is_iovec(iter)) + return 0; + + for (seg = 0; seg < iter->nr_segs; seg++) + for (i = seg + 1; i < iter->nr_segs; i++) + if (iter->iov[seg].iov_base == iter->iov[i].iov_base) + return -EINVAL; + return 0; +} + +static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) +{ + struct inode *inode = file_inode(iocb->ki_filp); + ssize_t ret; + + if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos)) + return 0; + + inode_lock_shared(inode); + ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops, + is_sync_kiocb(iocb)); + inode_unlock_shared(inode); + return ret; +} + static ssize_t btrfs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) { ssize_t ret = 0; if (iocb->ki_flags & IOCB_DIRECT) { - struct inode *inode = file_inode(iocb->ki_filp); - - inode_lock_shared(inode); - ret = btrfs_direct_IO(iocb, to); - inode_unlock_shared(inode); + ret = btrfs_direct_read(iocb, to); if (ret < 0) return ret; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3db4697ff1ba..0730131b6590 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7898,39 +7898,6 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, return BLK_QC_T_NONE; } -static ssize_t check_direct_IO(struct btrfs_fs_info *fs_info, - const struct iov_iter *iter, loff_t offset) -{ - int seg; - int i; - unsigned int blocksize_mask = fs_info->sectorsize - 1; - ssize_t retval = -EINVAL; - - if (offset & blocksize_mask) - goto out; - - if (iov_iter_alignment(iter) & blocksize_mask) - goto out; - - /* If this is a write we don't need to check anymore */ - if (iov_iter_rw(iter) != READ || !iter_is_iovec(iter)) - return 0; - /* - * Check to make sure we don't have duplicate iov_base's in this - * iovec, if so return EINVAL, otherwise we'll get csum errors - * when reading back. - */ - for (seg = 0; seg < iter->nr_segs; seg++) { - for (i = seg + 1; i < iter->nr_segs; i++) { - if (iter->iov[seg].iov_base == iter->iov[i].iov_base) - goto out; - } - } - retval = 0; -out: - return retval; -} - static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size, int error, unsigned flags) { @@ -7955,72 +7922,20 @@ static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size, return 0; } -static const struct iomap_ops btrfs_dio_iomap_ops = { +const struct iomap_ops btrfs_dio_iomap_ops = { .iomap_begin = btrfs_dio_iomap_begin, .iomap_end = btrfs_dio_iomap_end, }; -static const struct iomap_dio_ops btrfs_dio_ops = { +const struct iomap_dio_ops btrfs_dio_ops = { .submit_io = btrfs_submit_direct, }; -static const struct iomap_dio_ops btrfs_sync_dops = { +const struct iomap_dio_ops btrfs_sync_dops = { .submit_io = btrfs_submit_direct, .end_io = btrfs_maybe_fsync_end_io, }; -ssize_t btrfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter) -{ - struct file *file = iocb->ki_filp; - struct inode *inode = file->f_mapping->host; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct extent_changeset *data_reserved = NULL; - loff_t offset = iocb->ki_pos; - size_t count = 0; - bool relock = false; - ssize_t ret; - - if (check_direct_IO(fs_info, iter, offset)) - return 0; - - count = iov_iter_count(iter); - if (iov_iter_rw(iter) == WRITE) { - /* - * If the write DIO is beyond the EOF, we need update - * the isize, but it is protected by i_mutex. So we can - * not unlock the i_mutex at this case. - */ - if (offset + count <= inode->i_size) { - inode_unlock(inode); - relock = true; - } - down_read(&BTRFS_I(inode)->dio_sem); - } - - /* - * We have are actually a sync iocb, so we need our fancy endio to know - * if we need to sync. - */ - if (current->journal_info) - ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, - &btrfs_sync_dops, is_sync_kiocb(iocb)); - else - ret = iomap_dio_rw(iocb, iter, &btrfs_dio_iomap_ops, - &btrfs_dio_ops, is_sync_kiocb(iocb)); - - if (ret == -ENOTBLK) - ret = 0; - - if (iov_iter_rw(iter) == WRITE) - up_read(&BTRFS_I(inode)->dio_sem); - - if (relock) - inode_lock(inode); - - extent_changeset_free(data_reserved); - return ret; -} - static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { From patchwork Mon Sep 21 14:43:44 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790219 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A120459D for ; Mon, 21 Sep 2020 14:44:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FB1C2311D for ; Mon, 21 Sep 2020 14:44:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728303AbgIUOod (ORCPT ); Mon, 21 Sep 2020 10:44:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:55850 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727948AbgIUOoY (ORCPT ); Mon, 21 Sep 2020 10:44:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5DB40AD5F; Mon, 21 Sep 2020 14:44:58 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 06/15] btrfs: Move pos increment and pagecache extension to btrfs_buffered_write() Date: Mon, 21 Sep 2020 09:43:44 -0500 Message-Id: <20200921144353.31319-7-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues While we do this, correct the call to pagecache_isize_extended(): - pagecache_isisze_extended needs to be called to the starting of the write as opposed to i_size - We don't need to check range before the call, this is done in the function Signed-off-by: Goldwyn Rodrigues Reviewed-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 910e2fd234a9..4c40a2742aab 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1632,6 +1632,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, int ret = 0; bool only_release_metadata = false; bool force_page_uptodate = false; + loff_t old_isize = i_size_read(inode); nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), PAGE_SIZE / (sizeof(struct page *))); @@ -1852,6 +1853,10 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, } extent_changeset_free(data_reserved); + if (num_written > 0) { + pagecache_isize_extended(inode, old_isize, iocb->ki_pos); + iocb->ki_pos += num_written; + } return num_written ? num_written : ret; } @@ -1975,7 +1980,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, loff_t pos; size_t count; loff_t oldsize; - int clean_page = 0; if (!(iocb->ki_flags & IOCB_DIRECT) && (iocb->ki_flags & IOCB_NOWAIT)) @@ -2057,8 +2061,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, inode_unlock(inode); goto out; } - if (start_pos > round_up(oldsize, fs_info->sectorsize)) - clean_page = 1; } if (sync) @@ -2101,11 +2103,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, current->journal_info = NULL; } else { num_written = btrfs_buffered_write(iocb, from); - if (num_written > 0) - iocb->ki_pos = pos + num_written; - if (clean_page) - pagecache_isize_extended(inode, oldsize, - i_size_read(inode)); } inode_unlock(inode); From patchwork Mon Sep 21 14:43:45 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790201 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E250D59D for ; Mon, 21 Sep 2020 14:44:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C9AA5221EC for ; Mon, 21 Sep 2020 14:44:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728299AbgIUOod (ORCPT ); Mon, 21 Sep 2020 10:44:33 -0400 Received: from mx2.suse.de ([195.135.220.15]:55954 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727610AbgIUOo1 (ORCPT ); Mon, 21 Sep 2020 10:44:27 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 2721AAD8D; Mon, 21 Sep 2020 14:45:01 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 07/15] btrfs: Move FS error state bit early during write Date: Mon, 21 Sep 2020 09:43:45 -0500 Message-Id: <20200921144353.31319-8-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues fs_info->fs_state is a filesystem bit check as opposed to inode and can be performed before we begin with write checks. This eliminates inode lock/unlock in case of error bit is set. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 4c40a2742aab..ca374cb5ffc9 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1981,6 +1981,15 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, size_t count; loff_t oldsize; + /* + * If BTRFS flips readonly due to some impossible error + * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR), + * although we have opened a file as writable, we have + * to stop this write operation to ensure FS consistency. + */ + if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) + return -EROFS; + if (!(iocb->ki_flags & IOCB_DIRECT) && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; @@ -2030,18 +2039,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, goto out; } - /* - * If BTRFS flips readonly due to some impossible error - * (fs_info->fs_state now has BTRFS_SUPER_FLAG_ERROR), - * although we have opened a file as writable, we have - * to stop this write operation to ensure FS consistency. - */ - if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) { - inode_unlock(inode); - err = -EROFS; - goto out; - } - /* * We reserve space for updating the inode when we reserve space for the * extent we are going to write, so we will enospc out there. We don't From patchwork Mon Sep 21 14:43:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790245 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE4046CA for ; Mon, 21 Sep 2020 14:45:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD9E42311D for ; Mon, 21 Sep 2020 14:45:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728293AbgIUOoc (ORCPT ); Mon, 21 Sep 2020 10:44:32 -0400 Received: from mx2.suse.de ([195.135.220.15]:56006 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727760AbgIUOo3 (ORCPT ); Mon, 21 Sep 2020 10:44:29 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3A0CAAD6B; Mon, 21 Sep 2020 14:45:04 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 08/15] btrfs: Introduce btrfs_write_check() Date: Mon, 21 Sep 2020 09:43:46 -0500 Message-Id: <20200921144353.31319-9-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues btrfs_write_check() checks for all parameters in one place before beginning a write. This does away with inode_unlock() after every check. In the later patches, it will help push inode_lock/unlock() in buffered and direct write functions respectively. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/file.c | 159 ++++++++++++++++++++++++------------------------ 1 file changed, 81 insertions(+), 78 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index ca374cb5ffc9..0f961ce1fa98 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1615,6 +1615,85 @@ void btrfs_check_nocow_unlock(struct btrfs_inode *inode) btrfs_drew_write_unlock(&inode->root->snapshot_lock); } +static void update_time_for_write(struct inode *inode) +{ + struct timespec64 now; + + if (IS_NOCMTIME(inode)) + return; + + now = current_time(inode); + if (!timespec64_equal(&inode->i_mtime, &now)) + inode->i_mtime = now; + + if (!timespec64_equal(&inode->i_ctime, &now)) + inode->i_ctime = now; + + if (IS_I_VERSION(inode)) + inode_inc_iversion(inode); +} + +static size_t btrfs_write_check(struct kiocb *iocb, struct iov_iter *from) +{ + struct file *file = iocb->ki_filp; + struct inode *inode = file_inode(file); + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + loff_t pos = iocb->ki_pos; + size_t count = iov_iter_count(from); + int err; + loff_t oldsize; + loff_t start_pos; + + err = generic_write_checks(iocb, from); + if (err <= 0) + return err; + + if (iocb->ki_flags & IOCB_NOWAIT) { + size_t nocow_bytes = count; + + /* + * We will allocate space in case nodatacow is not set, + * so bail + */ + if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes) + <= 0) + return -EAGAIN; + /* + * There are holes in the range or parts of the range that must + * be COWed (shared extents, RO block groups, etc), so just bail + * out. + */ + if (nocow_bytes < count) + return -EAGAIN; + } + + current->backing_dev_info = inode_to_bdi(inode); + err = file_remove_privs(file); + if (err) + return err; + + /* + * We reserve space for updating the inode when we reserve space for the + * extent we are going to write, so we will enospc out there. We don't + * need to start yet another transaction to update the inode as we will + * update the inode when we finish writing whatever data we write. + */ + update_time_for_write(inode); + + start_pos = round_down(pos, fs_info->sectorsize); + oldsize = i_size_read(inode); + if (start_pos > oldsize) { + /* Expand hole size to cover write data, preventing empty gap */ + loff_t end_pos = round_up(pos + count, + fs_info->sectorsize); + err = btrfs_cont_expand(inode, oldsize, end_pos); + if (err) + return err; + } + + return count; +} + static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) { @@ -1947,24 +2026,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) return written ? written : err; } -static void update_time_for_write(struct inode *inode) -{ - struct timespec64 now; - - if (IS_NOCMTIME(inode)) - return; - - now = current_time(inode); - if (!timespec64_equal(&inode->i_mtime, &now)) - inode->i_mtime = now; - - if (!timespec64_equal(&inode->i_ctime, &now)) - inode->i_ctime = now; - - if (IS_I_VERSION(inode)) - inode_inc_iversion(inode); -} - static ssize_t btrfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) { @@ -1972,14 +2033,9 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; - u64 start_pos; - u64 end_pos; ssize_t num_written = 0; const bool sync = iocb->ki_flags & IOCB_DSYNC; ssize_t err; - loff_t pos; - size_t count; - loff_t oldsize; /* * If BTRFS flips readonly due to some impossible error @@ -2001,65 +2057,12 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, inode_lock(inode); } - err = generic_write_checks(iocb, from); + err = btrfs_write_check(iocb, from); if (err <= 0) { inode_unlock(inode); return err; } - pos = iocb->ki_pos; - count = iov_iter_count(from); - if (iocb->ki_flags & IOCB_NOWAIT) { - size_t nocow_bytes = count; - - /* - * We will allocate space in case nodatacow is not set, - * so bail - */ - if (check_nocow_nolock(BTRFS_I(inode), pos, &nocow_bytes) - <= 0) { - inode_unlock(inode); - return -EAGAIN; - } - /* - * There are holes in the range or parts of the range that must - * be COWed (shared extents, RO block groups, etc), so just bail - * out. - */ - if (nocow_bytes < count) { - inode_unlock(inode); - return -EAGAIN; - } - } - - current->backing_dev_info = inode_to_bdi(inode); - err = file_remove_privs(file); - if (err) { - inode_unlock(inode); - goto out; - } - - /* - * We reserve space for updating the inode when we reserve space for the - * extent we are going to write, so we will enospc out there. We don't - * need to start yet another transaction to update the inode as we will - * update the inode when we finish writing whatever data we write. - */ - update_time_for_write(inode); - - start_pos = round_down(pos, fs_info->sectorsize); - oldsize = i_size_read(inode); - if (start_pos > oldsize) { - /* Expand hole size to cover write data, preventing empty gap */ - end_pos = round_up(pos + count, - fs_info->sectorsize); - err = btrfs_cont_expand(inode, oldsize, end_pos); - if (err) { - inode_unlock(inode); - goto out; - } - } - if (sync) atomic_inc(&BTRFS_I(inode)->sync_writers); @@ -2117,7 +2120,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (sync) atomic_dec(&BTRFS_I(inode)->sync_writers); -out: + current->backing_dev_info = NULL; return num_written ? num_written : err; } From patchwork Mon Sep 21 14:43:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790209 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3C47B112E for ; Mon, 21 Sep 2020 14:44:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2CF50221EC for ; Mon, 21 Sep 2020 14:44:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728313AbgIUOof (ORCPT ); Mon, 21 Sep 2020 10:44:35 -0400 Received: from mx2.suse.de ([195.135.220.15]:56074 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727815AbgIUOod (ORCPT ); Mon, 21 Sep 2020 10:44:33 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1C9BCAF3F; Mon, 21 Sep 2020 14:45:07 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 09/15] btrfs: Introduce btrfs_inode_lock()/unlock() Date: Mon, 21 Sep 2020 09:43:47 -0500 Message-Id: <20200921144353.31319-10-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Helper functions for locking/unlocking i_rwsem. btrfs_inode_lock/unlock() acquires the inode->i_rwsem depending on the flags passed. ilock_flags determines the type of lock to be taken: BTRFS_ILOCK_SHARED - for shared locks, for possible parallel DIO BTRFS_ILOCK_TRY - for the RWF_NOWAIT sequence Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 7 +++++++ fs/btrfs/file.c | 31 ++++++++++++++++--------------- fs/btrfs/inode.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 54 insertions(+), 15 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index b47a8dcff028..ea15771bf3da 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3053,6 +3053,13 @@ extern const struct iomap_ops btrfs_dio_iomap_ops; extern const struct iomap_dio_ops btrfs_dio_ops; extern const struct iomap_dio_ops btrfs_sync_dops; +/* ilock flags definition */ +#define BTRFS_ILOCK_SHARED (1 << 0) +#define BTRFS_ILOCK_TRY (1 << 1) + +int btrfs_inode_lock(struct inode *inode, int ilock_flags); +void btrfs_inode_unlock(struct inode *inode, int ilock_flags); + /* ioctl.c */ long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg); long btrfs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 0f961ce1fa98..7e18334e8121 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1974,7 +1974,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) * not unlock the i_mutex at this case. */ if (pos + iov_iter_count(from) <= inode->i_size) { - inode_unlock(inode); + btrfs_inode_unlock(inode, 0); relock = true; } down_read(&BTRFS_I(inode)->dio_sem); @@ -1995,7 +1995,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) up_read(&BTRFS_I(inode)->dio_sem); if (relock) - inode_lock(inode); + btrfs_inode_lock(inode, 0); if (written < 0 || !iov_iter_count(from)) return written; @@ -2036,6 +2036,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, ssize_t num_written = 0; const bool sync = iocb->ki_flags & IOCB_DSYNC; ssize_t err; + int ilock_flags = 0; /* * If BTRFS flips readonly due to some impossible error @@ -2050,16 +2051,16 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; - if (iocb->ki_flags & IOCB_NOWAIT) { - if (!inode_trylock(inode)) - return -EAGAIN; - } else { - inode_lock(inode); - } + if (iocb->ki_flags & IOCB_NOWAIT) + ilock_flags |= BTRFS_ILOCK_TRY; + + err = btrfs_inode_lock(inode, ilock_flags); + if (err < 0) + return err; err = btrfs_write_check(iocb, from); if (err <= 0) { - inode_unlock(inode); + btrfs_inode_unlock(inode, ilock_flags); return err; } @@ -2105,7 +2106,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, num_written = btrfs_buffered_write(iocb, from); } - inode_unlock(inode); + btrfs_inode_unlock(inode, ilock_flags); /* * We also have to set last_sub_trans to the current log transid, @@ -3405,7 +3406,7 @@ static long btrfs_fallocate(struct file *file, int mode, return ret; } - inode_lock(inode); + btrfs_inode_lock(inode, 0); if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + len > inode->i_size) { ret = inode_newsize_ok(inode, offset + len); @@ -3644,9 +3645,9 @@ static loff_t btrfs_file_llseek(struct file *file, loff_t offset, int whence) return generic_file_llseek(file, offset, whence); case SEEK_DATA: case SEEK_HOLE: - inode_lock_shared(inode); + btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); offset = find_desired_extent(inode, offset, whence); - inode_unlock_shared(inode); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); break; } @@ -3690,10 +3691,10 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) if (check_direct_read(btrfs_sb(inode->i_sb), to, iocb->ki_pos)) return 0; - inode_lock_shared(inode); + btrfs_inode_lock(inode, BTRFS_ILOCK_SHARED); ret = iomap_dio_rw(iocb, to, &btrfs_dio_iomap_ops, &btrfs_dio_ops, is_sync_kiocb(iocb)); - inode_unlock_shared(inode); + btrfs_inode_unlock(inode, BTRFS_ILOCK_SHARED); return ret; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 0730131b6590..f305efac75ae 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -96,6 +96,37 @@ static void __endio_write_update_ordered(struct btrfs_inode *inode, const u64 offset, const u64 bytes, const bool uptodate); +int btrfs_inode_lock(struct inode *inode, int ilock_flags) +{ + if (ilock_flags & BTRFS_ILOCK_SHARED) { + if (ilock_flags & BTRFS_ILOCK_TRY) { + if (!inode_trylock_shared(inode)) + return -EAGAIN; + else + return 0; + } + inode_lock_shared(inode); + } else { + if (ilock_flags & BTRFS_ILOCK_TRY) { + if (!inode_trylock(inode)) + return -EAGAIN; + else + return 0; + } + inode_lock(inode); + } + return 0; +} + +void btrfs_inode_unlock(struct inode *inode, int ilock_flags) +{ + if (ilock_flags & BTRFS_ILOCK_SHARED) + inode_unlock_shared(inode); + else + inode_unlock(inode); + +} + /* * Cleanup all submitted ordered extents in specified range to handle errors * from the btrfs_run_delalloc_range() callback. From patchwork Mon Sep 21 14:43:48 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790213 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B0AC4618 for ; Mon, 21 Sep 2020 14:44:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A26492311D for ; Mon, 21 Sep 2020 14:44:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728317AbgIUOoh (ORCPT ); Mon, 21 Sep 2020 10:44:37 -0400 Received: from mx2.suse.de ([195.135.220.15]:56140 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728309AbgIUOog (ORCPT ); Mon, 21 Sep 2020 10:44:36 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E9EA9AF76; Mon, 21 Sep 2020 14:45:09 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 10/15] btrfs: Push inode locking and unlocking into buffered/direct write Date: Mon, 21 Sep 2020 09:43:48 -0500 Message-Id: <20200921144353.31319-11-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Push inode locking and unlocking closer to where we perform the I/O. For this we need to move the write checks inside the respective functions as well. pos is assigned after write checks because generic_write_check can modify iocb->ki_pos. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 67 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 44 insertions(+), 23 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 7e18334e8121..d9c3be19d7b3 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1698,7 +1698,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, struct iov_iter *i) { struct file *file = iocb->ki_filp; - loff_t pos = iocb->ki_pos; + loff_t pos; struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct page **pages = NULL; @@ -1712,14 +1712,29 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, bool only_release_metadata = false; bool force_page_uptodate = false; loff_t old_isize = i_size_read(inode); + int ilock_flags = 0; + + if (iocb->ki_flags & IOCB_NOWAIT) + ilock_flags |= BTRFS_ILOCK_TRY; + + ret = btrfs_inode_lock(inode, ilock_flags); + if (ret < 0) + return ret; + + ret = btrfs_write_check(iocb, i); + if (ret <= 0) + goto out; + pos = iocb->ki_pos; nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE), PAGE_SIZE / (sizeof(struct page *))); nrptrs = min(nrptrs, current->nr_dirtied_pause - current->nr_dirtied); nrptrs = max(nrptrs, 8); pages = kmalloc_array(nrptrs, sizeof(struct page *), GFP_KERNEL); - if (!pages) - return -ENOMEM; + if (!pages) { + ret = -ENOMEM; + goto out; + } while (iov_iter_count(i) > 0) { struct extent_state *cached_state = NULL; @@ -1936,6 +1951,8 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, pagecache_isize_extended(inode, old_isize, iocb->ki_pos); iocb->ki_pos += num_written; } +out: + btrfs_inode_unlock(inode, ilock_flags); return num_written ? num_written : ret; } @@ -1958,15 +1975,33 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) struct file *file = iocb->ki_filp; struct inode *inode = file_inode(file); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - loff_t pos = iocb->ki_pos; + loff_t pos; ssize_t written = 0; bool relock = false; ssize_t written_buffered; loff_t endbyte; int err; + int ilock_flags = 0; + + if (iocb->ki_flags & IOCB_NOWAIT) + ilock_flags |= BTRFS_ILOCK_TRY; + + err = btrfs_inode_lock(inode, ilock_flags); + if (err < 0) + return err; + + err = btrfs_write_check(iocb, from); + if (err <= 0) { + btrfs_inode_unlock(inode, ilock_flags); + goto out; + } + + pos = iocb->ki_pos; - if (check_direct_IO(fs_info, from, pos)) + if (check_direct_IO(fs_info, from, pos)) { + btrfs_inode_unlock(inode, ilock_flags); goto buffered; + } /* * If the write DIO is beyond the EOF, we need update @@ -1997,8 +2032,10 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) if (relock) btrfs_inode_lock(inode, 0); - if (written < 0 || !iov_iter_count(from)) - return written; + if (written < 0 || !iov_iter_count(from)) { + err = written; + goto out; + } buffered: pos = iocb->ki_pos; @@ -2036,7 +2073,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, ssize_t num_written = 0; const bool sync = iocb->ki_flags & IOCB_DSYNC; ssize_t err; - int ilock_flags = 0; /* * If BTRFS flips readonly due to some impossible error @@ -2051,19 +2087,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; - if (iocb->ki_flags & IOCB_NOWAIT) - ilock_flags |= BTRFS_ILOCK_TRY; - - err = btrfs_inode_lock(inode, ilock_flags); - if (err < 0) - return err; - - err = btrfs_write_check(iocb, from); - if (err <= 0) { - btrfs_inode_unlock(inode, ilock_flags); - return err; - } - if (sync) atomic_inc(&BTRFS_I(inode)->sync_writers); @@ -2106,8 +2129,6 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, num_written = btrfs_buffered_write(iocb, from); } - btrfs_inode_unlock(inode, ilock_flags); - /* * We also have to set last_sub_trans to the current log transid, * otherwise subsequent syncs to a file that's been synced in this From patchwork Mon Sep 21 14:43:49 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790223 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AD21C59D for ; Mon, 21 Sep 2020 14:44:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 959932388B for ; Mon, 21 Sep 2020 14:44:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727860AbgIUOor (ORCPT ); Mon, 21 Sep 2020 10:44:47 -0400 Received: from mx2.suse.de ([195.135.220.15]:56232 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728314AbgIUOoi (ORCPT ); Mon, 21 Sep 2020 10:44:38 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id CAA5AACA1; Mon, 21 Sep 2020 14:45:12 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 11/15] btrfs: Use inode_lock_shared() for direct writes within EOF Date: Mon, 21 Sep 2020 09:43:49 -0500 Message-Id: <20200921144353.31319-12-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues Direct writes within EOF are safe to be performed with inode shared lock to improve parallelization with other direct writes or reads because EOF is not changed and there is no race with truncate(). Direct reads are already performed under shared inode lock. This patch is precursor to removing btrfs_inode->dio_sem. Signed-off-by: Goldwyn Rodrigues --- fs/btrfs/file.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index d9c3be19d7b3..50092d24eee2 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1977,7 +1977,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); loff_t pos; ssize_t written = 0; - bool relock = false; ssize_t written_buffered; loff_t endbyte; int err; @@ -1986,6 +1985,15 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; + /* + * If the write DIO within EOF, use a shared lock + */ + if (iocb->ki_pos + iov_iter_count(from) <= i_size_read(inode)) + ilock_flags |= BTRFS_ILOCK_SHARED; + else if (iocb->ki_flags & IOCB_NOWAIT) + return -EAGAIN; + +relock: err = btrfs_inode_lock(inode, ilock_flags); if (err < 0) return err; @@ -1997,21 +2005,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) } pos = iocb->ki_pos; + /* + * Re-check since file size may have changed + * just before taking the lock or pos may have changed + * because of O_APPEND in generic_write_check() + */ + if ((ilock_flags & BTRFS_ILOCK_SHARED) && + pos + iov_iter_count(from) > i_size_read(inode)) { + btrfs_inode_unlock(inode, ilock_flags); + ilock_flags &= ~BTRFS_ILOCK_SHARED; + goto relock; + } if (check_direct_IO(fs_info, from, pos)) { btrfs_inode_unlock(inode, ilock_flags); goto buffered; } - /* - * If the write DIO is beyond the EOF, we need update - * the isize, but it is protected by i_mutex. So we can - * not unlock the i_mutex at this case. - */ - if (pos + iov_iter_count(from) <= inode->i_size) { - btrfs_inode_unlock(inode, 0); - relock = true; - } down_read(&BTRFS_I(inode)->dio_sem); /* @@ -2029,8 +2039,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) written = 0; up_read(&BTRFS_I(inode)->dio_sem); - if (relock) - btrfs_inode_lock(inode, 0); + btrfs_inode_unlock(inode, ilock_flags); if (written < 0 || !iov_iter_count(from)) { err = written; From patchwork Mon Sep 21 14:43:50 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790227 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CAEF4112E for ; Mon, 21 Sep 2020 14:44:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BC5A5235FC for ; Mon, 21 Sep 2020 14:44:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728327AbgIUOoq (ORCPT ); Mon, 21 Sep 2020 10:44:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:56298 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727131AbgIUOol (ORCPT ); Mon, 21 Sep 2020 10:44:41 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 4A3B3B040; Mon, 21 Sep 2020 14:45:15 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 12/15] btrfs: Remove dio_sem Date: Mon, 21 Sep 2020 09:43:50 -0500 Message-Id: <20200921144353.31319-13-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues dio_sem can be eliminated because all DIO synchronization is performed through inode->i_rwsem now. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/btrfs_inode.h | 10 ---------- fs/btrfs/file.c | 13 ------------- fs/btrfs/inode.c | 1 - 3 files changed, 24 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 738009a22320..176abce59467 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -203,16 +203,6 @@ struct btrfs_inode { /* Hook into fs_info->delayed_iputs */ struct list_head delayed_iput; - /* - * To avoid races between lockless (i_mutex not held) direct IO writes - * and concurrent fsync requests. Direct IO writes must acquire read - * access on this semaphore for creating an extent map and its - * corresponding ordered extent. The fast fsync path must acquire write - * access on this semaphore before it collects ordered extents and - * extent maps. - */ - struct rw_semaphore dio_sem; - struct inode vfs_inode; }; diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 50092d24eee2..193af84f5405 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2022,8 +2022,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) goto buffered; } - down_read(&BTRFS_I(inode)->dio_sem); - /* * We have are actually a sync iocb, so we need our fancy endio to know * if we need to sync. @@ -2038,7 +2036,6 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) if (written == -ENOTBLK) written = 0; - up_read(&BTRFS_I(inode)->dio_sem); btrfs_inode_unlock(inode, ilock_flags); if (written < 0 || !iov_iter_count(from)) { @@ -2248,13 +2245,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) inode_lock(inode); - /* - * We take the dio_sem here because the tree log stuff can race with - * lockless dio writes and get an extent map logged for an extent we - * never waited on. We need it this high up for lockdep reasons. - */ - down_write(&BTRFS_I(inode)->dio_sem); - atomic_inc(&root->log_batch); /* @@ -2285,7 +2275,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) */ ret = start_ordered_ops(inode, start, end); if (ret) { - up_write(&BTRFS_I(inode)->dio_sem); inode_unlock(inode); goto out; } @@ -2382,7 +2371,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) * file again, but that will end up using the synchronization * inside btrfs_sync_log to keep things safe. */ - up_write(&BTRFS_I(inode)->dio_sem); inode_unlock(inode); if (ret != BTRFS_NO_LOG_SYNC) { @@ -2413,7 +2401,6 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync) out_release_extents: btrfs_release_log_ctx_extents(&ctx); - up_write(&BTRFS_I(inode)->dio_sem); inode_unlock(inode); goto out; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index f305efac75ae..96ff8e4a3b45 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8580,7 +8580,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) INIT_LIST_HEAD(&ei->delalloc_inodes); INIT_LIST_HEAD(&ei->delayed_iput); RB_CLEAR_NODE(&ei->rb_node); - init_rwsem(&ei->dio_sem); return inode; } From patchwork Mon Sep 21 14:43:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790231 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 59DE059D for ; Mon, 21 Sep 2020 14:44:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 495FB221EC for ; Mon, 21 Sep 2020 14:44:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728310AbgIUOop (ORCPT ); Mon, 21 Sep 2020 10:44:45 -0400 Received: from mx2.suse.de ([195.135.220.15]:56360 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727954AbgIUOon (ORCPT ); Mon, 21 Sep 2020 10:44:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3C4DFAF90; Mon, 21 Sep 2020 14:45:18 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 13/15] btrfs: Call iomap_dio_complete() without inode_lock Date: Mon, 21 Sep 2020 09:43:51 -0500 Message-Id: <20200921144353.31319-14-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues If direct writes are called with O_DIRECT | O_DSYNC, it will result in a deadlock because iomap_dio_rw() is called under i_rwsem which calls iomap_dio_complete() generic_write_sync() btrfs_sync_file(). btrfs_sync_file() requires i_rwsem, so call __iomap_dio_rw() with the i_rwsem locked, and call iomap_dio_complete() after unlocking i_rwsem. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/file.c | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 193af84f5405..9c7a2d4b4148 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1981,6 +1981,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) loff_t endbyte; int err; int ilock_flags = 0; + struct iomap_dio *dio = NULL; if (iocb->ki_flags & IOCB_NOWAIT) ilock_flags |= BTRFS_ILOCK_TRY; @@ -2022,22 +2023,19 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) goto buffered; } - /* - * We have are actually a sync iocb, so we need our fancy endio to know - * if we need to sync. - */ - if (current->journal_info) - written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, - &btrfs_sync_dops, is_sync_kiocb(iocb)); - else - written = iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, - &btrfs_dio_ops, is_sync_kiocb(iocb)); - - if (written == -ENOTBLK) - written = 0; + dio = __iomap_dio_rw(iocb, from, &btrfs_dio_iomap_ops, + &btrfs_dio_ops, is_sync_kiocb(iocb)); btrfs_inode_unlock(inode, ilock_flags); + if (IS_ERR_OR_NULL(dio)) { + err = PTR_ERR_OR_ZERO(dio); + if (err < 0 && err != -ENOTBLK) + goto out; + } else { + written = iomap_dio_complete(dio); + } + if (written < 0 || !iov_iter_count(from)) { err = written; goto out; From patchwork Mon Sep 21 14:43:52 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790237 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3E21859D for ; Mon, 21 Sep 2020 14:44:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2D3BD221EC for ; Mon, 21 Sep 2020 14:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727945AbgIUOoy (ORCPT ); Mon, 21 Sep 2020 10:44:54 -0400 Received: from mx2.suse.de ([195.135.220.15]:56440 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728323AbgIUOor (ORCPT ); Mon, 21 Sep 2020 10:44:47 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 1B37FB0D7; Mon, 21 Sep 2020 14:45:21 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 14/15] btrfs: Revert 09745ff88d93 ("btrfs: dio iomap DSYNC workaround") Date: Mon, 21 Sep 2020 09:43:52 -0500 Message-Id: <20200921144353.31319-15-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues iomap_dio_complete()->generic_write_sync() is not called under i_rwsem anymore, revert the workaround. Signed-off-by: Goldwyn Rodrigues Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 1 - fs/btrfs/file.c | 38 ++------------------------------ fs/btrfs/inode.c | 50 ------------------------------------------ fs/btrfs/transaction.h | 1 - 4 files changed, 2 insertions(+), 88 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index ea15771bf3da..bc96c52021b2 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3051,7 +3051,6 @@ void btrfs_writepage_endio_finish_ordered(struct page *page, u64 start, extern const struct dentry_operations btrfs_dentry_operations; extern const struct iomap_ops btrfs_dio_iomap_ops; extern const struct iomap_dio_ops btrfs_dio_ops; -extern const struct iomap_dio_ops btrfs_sync_dops; /* ilock flags definition */ #define BTRFS_ILOCK_SHARED (1 << 0) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 9c7a2d4b4148..1d3363e7b09d 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2094,44 +2094,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (sync) atomic_inc(&BTRFS_I(inode)->sync_writers); - if (iocb->ki_flags & IOCB_DIRECT) { - /* - * 1. We must always clear IOCB_DSYNC in order to not deadlock - * in iomap, as it calls generic_write_sync() in this case. - * 2. If we are async, we can call iomap_dio_complete() either - * in - * - * 2.1. A worker thread from the last bio completed. In this - * case we need to mark the btrfs_dio_data that it is - * async in order to call generic_write_sync() properly. - * This is handled by setting BTRFS_DIO_SYNC_STUB in the - * current->journal_info. - * 2.2 The submitter context, because all IO completed - * before we exited iomap_dio_rw(). In this case we can - * just re-set the IOCB_DSYNC on the iocb and we'll do - * the sync below. If our ->end_io() gets called and - * current->journal_info is set, then we know we're in - * our current context and we will clear - * current->journal_info to indicate that we need to - * sync below. - */ - if (sync) { - ASSERT(current->journal_info == NULL); - iocb->ki_flags &= ~IOCB_DSYNC; - current->journal_info = BTRFS_DIO_SYNC_STUB; - } + if (iocb->ki_flags & IOCB_DIRECT) num_written = btrfs_direct_write(iocb, from); - - /* - * As stated above, we cleared journal_info, so we need to do - * the sync ourselves. - */ - if (sync && current->journal_info == NULL) - iocb->ki_flags |= IOCB_DSYNC; - current->journal_info = NULL; - } else { + else num_written = btrfs_buffered_write(iocb, from); - } /* * We also have to set last_sub_trans to the current log transid, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 96ff8e4a3b45..41e32ef6b6ef 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -62,7 +62,6 @@ struct btrfs_dio_data { loff_t length; ssize_t submitted; struct extent_changeset *data_reserved; - bool sync; }; static const struct inode_operations btrfs_dir_inode_operations; @@ -7362,17 +7361,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, int ret = 0; u64 len = length; bool unlock_extents = false; - bool sync = (current->journal_info == BTRFS_DIO_SYNC_STUB); - - /* - * We used current->journal_info here to see if we were sync, but - * there's a lot of tests in the enospc machinery to not do flushing if - * we have a journal_info set, so we need to clear this out and re-set - * it in iomap_end. - */ - ASSERT(current->journal_info == NULL || - current->journal_info == BTRFS_DIO_SYNC_STUB); - current->journal_info = NULL; if (!write) len = min_t(u64, len, fs_info->sectorsize); @@ -7398,7 +7386,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, if (!dio_data) return -ENOMEM; - dio_data->sync = sync; dio_data->length = length; if (write) { dio_data->reserve = round_up(length, fs_info->sectorsize); @@ -7546,14 +7533,6 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length, extent_changeset_free(dio_data->data_reserved); } out: - /* - * We're all done, we can re-set the current->journal_info now safely - * for our endio. - */ - if (dio_data->sync) { - ASSERT(current->journal_info == NULL); - current->journal_info = BTRFS_DIO_SYNC_STUB; - } kfree(dio_data); iomap->private = NULL; @@ -7929,30 +7908,6 @@ static blk_qc_t btrfs_submit_direct(struct inode *inode, struct iomap *iomap, return BLK_QC_T_NONE; } -static inline int btrfs_maybe_fsync_end_io(struct kiocb *iocb, ssize_t size, - int error, unsigned flags) -{ - /* - * Now if we're still in the context of our submitter we know we can't - * safely run generic_write_sync(), so clear our flag here so that the - * caller knows to follow up with a sync. - */ - if (current->journal_info == BTRFS_DIO_SYNC_STUB) { - current->journal_info = NULL; - return error; - } - - if (error) - return error; - - if (size) { - iocb->ki_flags |= IOCB_DSYNC; - return generic_write_sync(iocb, size); - } - - return 0; -} - const struct iomap_ops btrfs_dio_iomap_ops = { .iomap_begin = btrfs_dio_iomap_begin, .iomap_end = btrfs_dio_iomap_end, @@ -7962,11 +7917,6 @@ const struct iomap_dio_ops btrfs_dio_ops = { .submit_io = btrfs_submit_direct, }; -const struct iomap_dio_ops btrfs_sync_dops = { - .submit_io = btrfs_submit_direct, - .end_io = btrfs_maybe_fsync_end_io, -}; - static int btrfs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len) { diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h index 858d9153a1cd..8241c050ba71 100644 --- a/fs/btrfs/transaction.h +++ b/fs/btrfs/transaction.h @@ -112,7 +112,6 @@ struct btrfs_transaction { #define TRANS_EXTWRITERS (__TRANS_START | __TRANS_ATTACH) #define BTRFS_SEND_TRANS_STUB ((void *)1) -#define BTRFS_DIO_SYNC_STUB ((void *)2) struct btrfs_trans_handle { u64 transid; From patchwork Mon Sep 21 14:43:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Goldwyn Rodrigues X-Patchwork-Id: 11790239 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA2A9112E for ; Mon, 21 Sep 2020 14:44:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B9B0B221EC for ; Mon, 21 Sep 2020 14:44:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728110AbgIUOow (ORCPT ); Mon, 21 Sep 2020 10:44:52 -0400 Received: from mx2.suse.de ([195.135.220.15]:56498 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727477AbgIUOot (ORCPT ); Mon, 21 Sep 2020 10:44:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 309A9B218; Mon, 21 Sep 2020 14:45:24 +0000 (UTC) From: Goldwyn Rodrigues To: linux-fsdevel@vger.kernel.org Cc: linux-btrfs@vger.kernel.org, david@fromorbit.com, hch@lst.de, johannes.thumshirn@wdc.com, dsterba@suse.com, darrick.wong@oracle.com, josef@toxicpanda.com, Goldwyn Rodrigues Subject: [PATCH 15/15] iomap: Reinstate lockdep_assert_held in iomap_dio_rw() Date: Mon, 21 Sep 2020 09:43:53 -0500 Message-Id: <20200921144353.31319-16-rgoldwyn@suse.de> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20200921144353.31319-1-rgoldwyn@suse.de> References: <20200921144353.31319-1-rgoldwyn@suse.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Goldwyn Rodrigues btrfs holds inode_lock_shared() while performing DIO within EOF, so lockdep_assert_held() check can be re-instated. Revert 3ad99bec6e82 ("iomap: remove lockdep_assert_held()") Signed-off-by: Goldwyn Rodrigues --- fs/iomap/direct-io.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index e01f81e7b76f..b5e030971001 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -421,6 +421,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct blk_plug plug; struct iomap_dio *dio; + lockdep_assert_held(&inode->i_rwsem); + if (!count) return NULL;