From patchwork Tue Feb 6 20:40:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Filipe Manana X-Patchwork-Id: 10205013 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 41A0860247 for ; Wed, 7 Feb 2018 11:33:11 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C11C28DE7 for ; Wed, 7 Feb 2018 11:33:11 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2118C28E02; Wed, 7 Feb 2018 11:33:11 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.9 required=2.0 tests=BAYES_00, DATE_IN_PAST_12_24, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 605B028DE7 for ; Wed, 7 Feb 2018 11:33:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753815AbeBGLdI (ORCPT ); Wed, 7 Feb 2018 06:33:08 -0500 Received: from mail.kernel.org ([198.145.29.99]:32968 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753794AbeBGLdH (ORCPT ); Wed, 7 Feb 2018 06:33:07 -0500 Received: from debian3.lan (bl12-226-64.dsl.telepac.pt [85.245.226.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A3FA621738 for ; Wed, 7 Feb 2018 11:33:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3FA621738 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=fdmanana@kernel.org From: fdmanana@kernel.org To: linux-btrfs@vger.kernel.org Subject: [PATCH] Btrfs: skip writeback of last page when truncating file to same size Date: Tue, 6 Feb 2018 20:40:31 +0000 Message-Id: <20180206204031.9759-1-fdmanana@kernel.org> X-Mailer: git-send-email 2.11.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Filipe Manana When we truncate a file to the same size and that size is not aligned with the sector size, we end up triggering writeback (and wait for it to complete) of the last page. This is unncessary as we can not have delayed allocation beyond the inode's i_size and the goal of truncating a file to its own size is to discard prealloc extents (allocated via the fallocate(2) system call). Besides the unnecessary IO start and wait, it also breaks the oppurtunity for larger contiguous extents on disk, as before the last dirty page there might be other dirty pages. This scenario is probably not very common in general, however it is common for btrfs receive implementations because currently the send stream always issues a truncate operation for each processed inode as the last operation for that inode (this truncate operation is not always needed and the send implementation will be addressed to avoid them). So improve this by not starting and waiting for writeback of the inode's last page when we are truncating to exactly the same size. The following script was used to quickly measure the time a receive operation takes: $ cat test_send.sh #!/bin/bash SRC_DEV=/dev/sdc DST_DEV=/dev/sdd SRC_MNT=/mnt/sdc DST_MNT=/mnt/sdd mkfs.btrfs -f $SRC_DEV >/dev/null mkfs.btrfs -f $DST_DEV >/dev/null mount $SRC_DEV $SRC_MNT mount $DST_DEV $DST_MNT echo "Creating source filesystem" for ((t = 0; t < 10; t++)); do ( for ((i = 1; i <= 20000; i++)); do xfs_io -f -c "pwrite -S 0xab 0 5000" \ $SRC_MNT/file_$i > /dev/null done ) & worker_pids[$t]=$! done wait ${worker_pids[@]} echo "Creating and sending snapshot" btrfs subvolume snapshot -r $SRC_MNT $SRC_MNT/snap1 >/dev/null /usr/bin/time -f "send took %e seconds" \ btrfs send -f $SRC_MNT/send_file $SRC_MNT/snap1 /usr/bin/time -f "receive took %e seconds" \ btrfs receive -f $SRC_MNT/send_file $DST_MNT umount $SRC_MNT umount $DST_MNT The results for 5 runs were the following: * Without this change average receive time was 26.49 seconds standard deviation of 2.53 seconds * With this change average receive time was 12.51 seconds standard deviation of 0.32 seconds Reported-by: Robbie Ko Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 2a19413a7868..dae631ab5cb2 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -101,7 +101,7 @@ static const unsigned char btrfs_type_by_mode[S_IFMT >> S_SHIFT] = { }; static int btrfs_setsize(struct inode *inode, struct iattr *attr); -static int btrfs_truncate(struct inode *inode); +static int btrfs_truncate(struct inode *inode, bool skip_writeback); static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent); static noinline int cow_file_range(struct inode *inode, struct page *locked_page, @@ -3625,7 +3625,7 @@ int btrfs_orphan_cleanup(struct btrfs_root *root) goto out; } - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, false); if (ret) btrfs_orphan_del(NULL, BTRFS_I(inode)); } else { @@ -5109,7 +5109,7 @@ static int btrfs_setsize(struct inode *inode, struct iattr *attr) inode_dio_wait(inode); btrfs_inode_resume_unlocked_dio(BTRFS_I(inode)); - ret = btrfs_truncate(inode); + ret = btrfs_truncate(inode, newsize == oldsize); if (ret && inode->i_nlink) { int err; @@ -9087,7 +9087,7 @@ int btrfs_page_mkwrite(struct vm_fault *vmf) return ret; } -static int btrfs_truncate(struct inode *inode) +static int btrfs_truncate(struct inode *inode, bool skip_writeback) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; @@ -9098,10 +9098,12 @@ static int btrfs_truncate(struct inode *inode) u64 mask = fs_info->sectorsize - 1; u64 min_size = btrfs_calc_trunc_metadata_size(fs_info, 1); - ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask), - (u64)-1); - if (ret) - return ret; + if (!skip_writeback) { + ret = btrfs_wait_ordered_range(inode, inode->i_size & (~mask), + (u64)-1); + if (ret) + return ret; + } /* * Yes ladies and gentlemen, this is indeed ugly. The fact is we have