From patchwork Wed Oct 21 00:59:31 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Timofey Titovets X-Patchwork-Id: 7453961 Return-Path: X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id D1D379F36A for ; Wed, 21 Oct 2015 00:59:47 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 80D51208DA for ; Wed, 21 Oct 2015 00:59:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4305E208D5 for ; Wed, 21 Oct 2015 00:59:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751855AbbJUA7g (ORCPT ); Tue, 20 Oct 2015 20:59:36 -0400 Received: from mail-wi0-f178.google.com ([209.85.212.178]:36928 "EHLO mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751042AbbJUA7f (ORCPT ); Tue, 20 Oct 2015 20:59:35 -0400 Received: by wicfv8 with SMTP id fv8so52715369wic.0 for ; Tue, 20 Oct 2015 17:59:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=to:from:subject:message-id:date:user-agent:mime-version :content-type; bh=DLdFEcHQ7nQGdUZB9p2EeYwHeudojQhAZ8106cLWAzM=; b=XphNJs+WFG6gLelW20BH23aBujj3lQziv6ELoyiag+BsXfRbT5G9SYRp9BHnTnawW5 l6ATmF48CoGAGF2A4QKXg27iFr/EKw/9+Y8hvB7ATWlsQrcAwLtib+9nEHA/BchmWALi 9Mgnnf+PsUjHPA2jxaj1Wg85qt9yeP/U07DDea88psdLK9EegIXHi0RJMRcsih3nkFCs QBBA3tfikkqeGh/y/AIqP93cnsB0irI4qUmSUhtytSNoIlEpgYiq99J7YQVjVzANGNUD eULqWDjTkw5ZWuWvyQp/9aD6CtPssLcLMgb10ytVbNORRLkPoEtjm9Bx6aZRjzXT4b+A MjFQ== X-Received: by 10.180.75.36 with SMTP id z4mr8117537wiv.68.1445389173654; Tue, 20 Oct 2015 17:59:33 -0700 (PDT) Received: from [192.168.1.30] (nat-minsk-pool-46-53-200-210.telecom.by. [46.53.200.210]) by smtp.gmail.com with ESMTPSA id q1sm6885522wje.39.2015.10.20.17.59.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 20 Oct 2015 17:59:32 -0700 (PDT) To: linux-btrfs@vger.kernel.org From: Timofey Titovets Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which, close to disk beginning Message-ID: <5626E373.8050906@gmail.com> Date: Wed, 21 Oct 2015 03:59:31 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_DKIM_INVALID, T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP It's just a proof of concept, and i hope to see feedback, ideas about it. --- While deduplication, Btrfs produce extent and file fragmentation But it's can be optimized by compute - which inode data placed a closest to beginning of hdd It's allow to reach: 1. Permonace boost on hdd (beginning of disk faster then end) 2. it's allow to make sparse only tail of fs, what can give boost later in balancing and resizing of fs New function: static u64 btrfs_avg_disko(struct inode *inode, const u64 off, const u64 olen_aligned); It normalize offsets with data lenghts, by represent it like offsets of blocks It return average data offset of all "pagesized" blocks in given range for inode Function cloned from btrfs_clone() Changes from V1: Added new function which compute "normal" offset --- fs/btrfs/ioctl.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 145 insertions(+), 2 deletions(-) unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start, @@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode *inode, } /** + * btrfs_avg_disko() - return avg data offset weight for inode + * + * @inode: Inode + * @off: Offset for computing + * @olen_aligned: Block-aligned len of data + * + * Computing avg address place of data, allow to heuristically + * determine where on the disk placed most fragment of data + */ +static u64 btrfs_avg_disko(struct inode *inode, + const u64 off, const u64 olen_aligned) +{ + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_path *path = NULL; + struct extent_buffer *leaf; + char *buf = NULL; + struct btrfs_key key; + u32 nritems; + int slot; + int no_quota; + double sum = 0; + u64 ret = 0; + u64 counter = 0; + + buf = vmalloc(root->nodesize); + if (!buf) + return ret; + + path = btrfs_alloc_path(); + if (!path) { + vfree(buf); + return ret; + } + + path->reada = 2; + /* clone data */ + key.objectid = btrfs_ino(inode); + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = off; + + while (1) { + u64 next_key_min_offset = key.offset + 1; + + /* + * note the key will change type as we walk through the + * tree. + */ + path->leave_spinning = 1; + ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path, 0, 0); + if (ret < 0) + goto out; + /* + * First search, if no extent item that starts at offset off was + * found but the previous item is an extent item, it's possible + * it might overlap our target range, therefore process it. + */ + if (key.offset == off && ret > 0 && path->slots[0] > 0) { + btrfs_item_key_to_cpu(path->nodes[0], &key, + path->slots[0] - 1); + if (key.type == BTRFS_EXTENT_DATA_KEY) + path->slots[0]--; + } + + nritems = btrfs_header_nritems(path->nodes[0]); +process_slot: + no_quota = 1; + if (path->slots[0] >= nritems) { + ret = btrfs_next_leaf(BTRFS_I(inode)->root, path); + if (ret < 0) + goto out; + if (ret > 0) + break; + nritems = btrfs_header_nritems(path->nodes[0]); + } + leaf = path->nodes[0]; + slot = path->slots[0]; + + btrfs_item_key_to_cpu(leaf, &key, slot); + if (key.type > BTRFS_EXTENT_DATA_KEY || + key.objectid != btrfs_ino(inode)) + break; + + if (key.type == BTRFS_EXTENT_DATA_KEY) { + struct btrfs_file_extent_item *extent; + int type; + u64 disko = 0; + u64 diskl = 0; + u64 datal = 0; + + extent = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + type = btrfs_file_extent_type(leaf, extent); + if (type == BTRFS_FILE_EXTENT_REG || + type == BTRFS_FILE_EXTENT_PREALLOC) { + disko = btrfs_file_extent_disk_bytenr(leaf, extent); + diskl = btrfs_file_extent_disk_num_bytes(leaf, extent); + datal = btrfs_file_extent_num_bytes(leaf, extent); + } + + /* + * The first search might have left us at an extent + * item that ends before our target range's start, can + * happen if we have holes and NO_HOLES feature enabled. + */ + if (key.offset + datal <= off) { + path->slots[0]++; + goto process_slot; + } else if (key.offset >= off + olen_aligned) { + break; + } + next_key_min_offset = key.offset + datal; + + for (;diskl >= 0; diskl -= pagesize) { + sum += disko+diskl; + counter++; + } + + btrfs_release_path(path); + path->leave_spinning = 0; + } + +out: + ret = sum/counter; + btrfs_free_path(path); + vfree(buf); + return ret; +} + +/** * btrfs_clone() - clone a range from inode file to another * * @src: Inode to clone from From 8719129c0fb4d5325f9f87ee34f591d301933f67 Mon Sep 17 00:00:00 2001 From: Timofey Titovets Date: Wed, 21 Oct 2015 03:31:57 +0300 Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which close to disk beginning While deduplication, Btrfs produce extent and file fragmentation But it's can be optimized by compute - which inode data placed a closest to beginning of hdd It's allow to reach: 1. Permonace boost on hdd (beginning of disk faster then end) 2. it's allow to make sparse only tail of fs, what can give boost later in balancing and resizing of fs New function: static u64 btrfs_avg_disko(struct inode *inode, const u64 off, const u64 olen_aligned); It normalize offsets with data lenghts, by represent it like offsets of blocks It return average data offset of all "pagesized" blocks in given range for inode --- fs/btrfs/ioctl.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 145 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 3e3e613..a14b4e7 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -86,6 +86,9 @@ struct btrfs_ioctl_received_subvol_args_32 { #endif +static u64 btrfs_avg_disko(struct inode *inode, + const u64 off, const u64 olen_aligned); + static int btrfs_clone(struct inode *src, struct inode *inode, u64 off, u64 olen, u64 olen_aligned, u64 destoff, int no_time_update); @@ -3074,8 +3077,20 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen, /* pass original length for comparison so we stay within i_size */ ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp); - if (ret == 0) - ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); + if (ret == 0) { + /* prefer inode with lowest offset as source for clone*/ + u64 src_weight; + u64 dst_weight; + src_weight = btrfs_avg_disko(src, off, olen); + dst_weight = btrfs_avg_disko(dest, dst_loff, olen); + /* if one of weight == 0 -> fallback */ + if (dest_weight == 0) + src_weight = 0; + if (src_weight > dest_weight) + ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1); + else + ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1); + } if (same_inode) unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start, @@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode *inode, } /** + * btrfs_avg_disko() - return avg data offset weight for inode + * + * @inode: Inode + * @off: Offset for computing + * @olen_aligned: Block-aligned len of data + * + * Computing avg address place of data, allow to heuristically + * determine where on the disk placed most fragment of data + */ +static u64 btrfs_avg_disko(struct inode *inode, + const u64 off, const u64 olen_aligned) +{ + struct btrfs_root *root = BTRFS_I(inode)->root; + struct btrfs_path *path = NULL; + struct extent_buffer *leaf; + char *buf = NULL; + struct btrfs_key key; + u32 nritems; + int slot; + int no_quota; + double sum = 0; + u64 ret = 0; + u64 counter = 0; + + buf = vmalloc(root->nodesize); + if (!buf) + return ret; + + path = btrfs_alloc_path(); + if (!path) { + vfree(buf); + return ret; + } + + path->reada = 2; + /* clone data */ + key.objectid = btrfs_ino(inode); + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = off; + + while (1) { + u64 next_key_min_offset = key.offset + 1; + + /* + * note the key will change type as we walk through the + * tree. + */ + path->leave_spinning = 1; + ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path, 0, 0); + if (ret < 0) + goto out; + /* + * First search, if no extent item that starts at offset off was + * found but the previous item is an extent item, it's possible + * it might overlap our target range, therefore process it. + */ + if (key.offset == off && ret > 0 && path->slots[0] > 0) { + btrfs_item_key_to_cpu(path->nodes[0], &key, + path->slots[0] - 1); + if (key.type == BTRFS_EXTENT_DATA_KEY) + path->slots[0]--; + } + + nritems = btrfs_header_nritems(path->nodes[0]); +process_slot: + no_quota = 1; + if (path->slots[0] >= nritems) { + ret = btrfs_next_leaf(BTRFS_I(inode)->root, path); + if (ret < 0) + goto out; + if (ret > 0) + break; + nritems = btrfs_header_nritems(path->nodes[0]); + } + leaf = path->nodes[0]; + slot = path->slots[0]; + + btrfs_item_key_to_cpu(leaf, &key, slot); + if (key.type > BTRFS_EXTENT_DATA_KEY || + key.objectid != btrfs_ino(inode)) + break; + + if (key.type == BTRFS_EXTENT_DATA_KEY) { + struct btrfs_file_extent_item *extent; + int type; + u64 disko = 0; + u64 diskl = 0; + u64 datal = 0; + + extent = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + type = btrfs_file_extent_type(leaf, extent); + if (type == BTRFS_FILE_EXTENT_REG || + type == BTRFS_FILE_EXTENT_PREALLOC) { + disko = btrfs_file_extent_disk_bytenr(leaf, extent); + diskl = btrfs_file_extent_disk_num_bytes(leaf, extent); + datal = btrfs_file_extent_num_bytes(leaf, extent); + } + + /* + * The first search might have left us at an extent + * item that ends before our target range's start, can + * happen if we have holes and NO_HOLES feature enabled. + */ + if (key.offset + datal <= off) { + path->slots[0]++; + goto process_slot; + } else if (key.offset >= off + olen_aligned) { + break; + } + next_key_min_offset = key.offset + datal; + + for (;diskl >= 0; diskl -= pagesize) { + sum += disko+diskl; + counter++; + } + + btrfs_release_path(path); + path->leave_spinning = 0; + } + +out: + ret = sum/counter; + btrfs_free_path(path); + vfree(buf); + return ret; +} + +/** * btrfs_clone() - clone a range from inode file to another * * @src: Inode to clone from -- 2.6.1