From patchwork Wed Oct 21 00:59:31 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Timofey Titovets <nefelim4ag@gmail.com>
X-Patchwork-Id: 7453961
Return-Path: <linux-btrfs-owner@kernel.org>
X-Original-To: patchwork-linux-btrfs@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id D1D379F36A
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 00:59:47 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 80D51208DA
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 00:59:46 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 4305E208D5
	for <patchwork-linux-btrfs@patchwork.kernel.org>;
	Wed, 21 Oct 2015 00:59:45 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751855AbbJUA7g (ORCPT
	<rfc822;patchwork-linux-btrfs@patchwork.kernel.org>);
	Tue, 20 Oct 2015 20:59:36 -0400
Received: from mail-wi0-f178.google.com ([209.85.212.178]:36928 "EHLO
	mail-wi0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751042AbbJUA7f (ORCPT
	<rfc822;linux-btrfs@vger.kernel.org>);
	Tue, 20 Oct 2015 20:59:35 -0400
Received: by wicfv8 with SMTP id fv8so52715369wic.0
	for <linux-btrfs@vger.kernel.org>;
	Tue, 20 Oct 2015 17:59:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20120113;
	h=to:from:subject:message-id:date:user-agent:mime-version
	:content-type;
	bh=DLdFEcHQ7nQGdUZB9p2EeYwHeudojQhAZ8106cLWAzM=;
	b=XphNJs+WFG6gLelW20BH23aBujj3lQziv6ELoyiag+BsXfRbT5G9SYRp9BHnTnawW5
	l6ATmF48CoGAGF2A4QKXg27iFr/EKw/9+Y8hvB7ATWlsQrcAwLtib+9nEHA/BchmWALi
	9Mgnnf+PsUjHPA2jxaj1Wg85qt9yeP/U07DDea88psdLK9EegIXHi0RJMRcsih3nkFCs
	QBBA3tfikkqeGh/y/AIqP93cnsB0irI4qUmSUhtytSNoIlEpgYiq99J7YQVjVzANGNUD
	eULqWDjTkw5ZWuWvyQp/9aD6CtPssLcLMgb10ytVbNORRLkPoEtjm9Bx6aZRjzXT4b+A
	MjFQ==
X-Received: by 10.180.75.36 with SMTP id z4mr8117537wiv.68.1445389173654;
	Tue, 20 Oct 2015 17:59:33 -0700 (PDT)
Received: from [192.168.1.30] (nat-minsk-pool-46-53-200-210.telecom.by.
	[46.53.200.210]) by smtp.gmail.com with ESMTPSA id
	q1sm6885522wje.39.2015.10.20.17.59.32
	for <linux-btrfs@vger.kernel.org>
	(version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Tue, 20 Oct 2015 17:59:32 -0700 (PDT)
To: linux-btrfs@vger.kernel.org
From: Timofey Titovets <nefelim4ag@gmail.com>
Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which,
	close to disk beginning
Message-ID: <5626E373.8050906@gmail.com>
Date: Wed, 21 Oct 2015 03:59:31 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
	Thunderbird/38.3.0
MIME-Version: 1.0
Sender: linux-btrfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-btrfs.vger.kernel.org>
X-Mailing-List: linux-btrfs@vger.kernel.org
X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,
	DKIM_ADSP_CUSTOM_MED,
	DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD,
	T_DKIM_INVALID,
	T_TVD_MIME_EPI,UNPARSEABLE_RELAY autolearn=ham version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

It's just a proof of concept, and i hope to see feedback, ideas about it.
---
While deduplication,
Btrfs produce extent and file fragmentation
But it's can be optimized by compute - which inode data placed a closest 
to beginning of hdd
It's allow to reach:
1. Permonace boost on hdd (beginning of disk faster then end)
2. it's allow to make sparse only tail of fs, what can give boost later in
balancing and resizing of fs

New function:
static u64 btrfs_avg_disko(struct inode *inode,
                         const u64 off, const u64 olen_aligned);

It normalize offsets with data lenghts, by represent it like offsets of 
blocks
It return average data offset of all "pagesized" blocks in given range 
for inode
Function cloned from btrfs_clone()

Changes from V1:
         Added new function which compute "normal" offset

---
  fs/btrfs/ioctl.c | 147 
++++++++++++++++++++++++++++++++++++++++++++++++++++++-
  1 file changed, 145 insertions(+), 2 deletions(-)

          unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode 
*inode,
  }

  /**
+ * btrfs_avg_disko() - return avg data offset weight for inode
+ *
+ * @inode: Inode
+ * @off: Offset for computing
+ * @olen_aligned: Block-aligned len of data
+ *
+ * Computing avg address place of data, allow to heuristically
+ * determine where on the disk placed most fragment of data
+ */
+static u64 btrfs_avg_disko(struct inode *inode,
+                    const u64 off, const u64 olen_aligned)
+{
+    struct btrfs_root *root = BTRFS_I(inode)->root;
+    struct btrfs_path *path = NULL;
+    struct extent_buffer *leaf;
+    char *buf = NULL;
+    struct btrfs_key key;
+    u32 nritems;
+    int slot;
+    int no_quota;
+    double sum = 0;
+    u64 ret = 0;
+    u64 counter = 0;
+
+    buf = vmalloc(root->nodesize);
+    if (!buf)
+        return ret;
+
+    path = btrfs_alloc_path();
+    if (!path) {
+        vfree(buf);
+        return ret;
+    }
+
+    path->reada = 2;
+    /* clone data */
+    key.objectid = btrfs_ino(inode);
+    key.type = BTRFS_EXTENT_DATA_KEY;
+    key.offset = off;
+
+    while (1) {
+        u64 next_key_min_offset = key.offset + 1;
+
+        /*
+         * note the key will change type as we walk through the
+         * tree.
+         */
+        path->leave_spinning = 1;
+        ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, 
path,    0, 0);
+        if (ret < 0)
+            goto out;
+        /*
+         * First search, if no extent item that starts at offset off was
+         * found but the previous item is an extent item, it's possible
+         * it might overlap our target range, therefore process it.
+         */
+        if (key.offset == off && ret > 0 && path->slots[0] > 0) {
+            btrfs_item_key_to_cpu(path->nodes[0], &key,
+                          path->slots[0] - 1);
+            if (key.type == BTRFS_EXTENT_DATA_KEY)
+                path->slots[0]--;
+        }
+
+        nritems = btrfs_header_nritems(path->nodes[0]);
+process_slot:
+        no_quota = 1;
+        if (path->slots[0] >= nritems) {
+            ret = btrfs_next_leaf(BTRFS_I(inode)->root, path);
+            if (ret < 0)
+                goto out;
+            if (ret > 0)
+                break;
+            nritems = btrfs_header_nritems(path->nodes[0]);
+        }
+        leaf = path->nodes[0];
+        slot = path->slots[0];
+
+        btrfs_item_key_to_cpu(leaf, &key, slot);
+        if (key.type > BTRFS_EXTENT_DATA_KEY ||
+            key.objectid != btrfs_ino(inode))
+            break;
+
+        if (key.type == BTRFS_EXTENT_DATA_KEY) {
+            struct btrfs_file_extent_item *extent;
+            int type;
+            u64 disko = 0;
+            u64 diskl = 0;
+            u64 datal = 0;
+
+            extent = btrfs_item_ptr(leaf, slot,    struct 
btrfs_file_extent_item);
+            type = btrfs_file_extent_type(leaf, extent);
+            if (type == BTRFS_FILE_EXTENT_REG ||
+                type == BTRFS_FILE_EXTENT_PREALLOC) {
+                disko = btrfs_file_extent_disk_bytenr(leaf, extent);
+                diskl = btrfs_file_extent_disk_num_bytes(leaf, extent);
+                datal = btrfs_file_extent_num_bytes(leaf, extent);
+            }
+
+            /*
+             * The first search might have left us at an extent
+             * item that ends before our target range's start, can
+             * happen if we have holes and NO_HOLES feature enabled.
+             */
+            if (key.offset + datal <= off) {
+                path->slots[0]++;
+                goto process_slot;
+            } else if (key.offset >= off + olen_aligned) {
+                break;
+            }
+            next_key_min_offset = key.offset + datal;
+
+            for (;diskl >= 0; diskl -= pagesize) {
+                sum += disko+diskl;
+                counter++;
+            }
+
+            btrfs_release_path(path);
+            path->leave_spinning = 0;
+    }
+
+out:
+    ret = sum/counter;
+    btrfs_free_path(path);
+    vfree(buf);
+    return ret;
+}
+
+/**
   * btrfs_clone() - clone a range from inode file to another
   *
   * @src: Inode to clone from

From 8719129c0fb4d5325f9f87ee34f591d301933f67 Mon Sep 17 00:00:00 2001
From: Timofey Titovets <nefelim4ag@gmail.com>
Date: Wed, 21 Oct 2015 03:31:57 +0300
Subject: [RFC PATCH V2] btrfs/ioctl.c: extent_same - Use inode as src which
 close to disk beginning

While deduplication,
Btrfs produce extent and file fragmentation
But it's can be optimized by compute - which inode data placed a closest to beginning of hdd
It's allow to reach:
1. Permonace boost on hdd (beginning of disk faster then end)
2. it's allow to make sparse only tail of fs, what can give boost later in
balancing and resizing of fs

New function:
static u64 btrfs_avg_disko(struct inode *inode,
                        const u64 off, const u64 olen_aligned);

It normalize offsets with data lenghts, by represent it like offsets of blocks
It return average data offset of all "pagesized" blocks in given range for inode
---
 fs/btrfs/ioctl.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 145 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 3e3e613..a14b4e7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -86,6 +86,9 @@ struct btrfs_ioctl_received_subvol_args_32 {
 #endif
 
 
+static u64 btrfs_avg_disko(struct inode *inode,
+					const u64 off, const u64 olen_aligned);
+
 static int btrfs_clone(struct inode *src, struct inode *inode,
 		       u64 off, u64 olen, u64 olen_aligned, u64 destoff,
 		       int no_time_update);
@@ -3074,8 +3077,20 @@ static int btrfs_extent_same(struct inode *src, u64 loff, u64 olen,
 
 	/* pass original length for comparison so we stay within i_size */
 	ret = btrfs_cmp_data(src, loff, dst, dst_loff, olen, &cmp);
-	if (ret == 0)
-		ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	if (ret == 0) {
+		/* prefer inode with lowest offset as source for clone*/
+		u64 src_weight;
+		u64 dst_weight;
+		src_weight = btrfs_avg_disko(src, off, olen);
+		dst_weight = btrfs_avg_disko(dest, dst_loff, olen);
+		/* if one of weight == 0 -> fallback */
+		if (dest_weight == 0)
+			src_weight = 0;
+		if (src_weight > dest_weight)
+			ret = btrfs_clone(dst, src, dst_loff, olen, len, loff, 1);
+		else
+			ret = btrfs_clone(src, dst, loff, olen, len, dst_loff, 1);
+	}
 
 	if (same_inode)
 		unlock_extent(&BTRFS_I(src)->io_tree, same_lock_start,
@@ -3329,6 +3344,134 @@ static void clone_update_extent_map(struct inode *inode,
 }
 
 /**
+ * btrfs_avg_disko() - return avg data offset weight for inode
+ *
+ * @inode: Inode
+ * @off: Offset for computing
+ * @olen_aligned: Block-aligned len of data
+ *
+ * Computing avg address place of data, allow to heuristically
+ * determine where on the disk placed most fragment of data
+ */
+static u64 btrfs_avg_disko(struct inode *inode,
+					const u64 off, const u64 olen_aligned)
+{
+	struct btrfs_root *root = BTRFS_I(inode)->root;
+	struct btrfs_path *path = NULL;
+	struct extent_buffer *leaf;
+	char *buf = NULL;
+	struct btrfs_key key;
+	u32 nritems;
+	int slot;
+	int no_quota;
+	double sum = 0;
+	u64 ret = 0;
+	u64 counter = 0;
+
+	buf = vmalloc(root->nodesize);
+	if (!buf)
+		return ret;
+
+	path = btrfs_alloc_path();
+	if (!path) {
+		vfree(buf);
+		return ret;
+	}
+
+	path->reada = 2;
+	/* clone data */
+	key.objectid = btrfs_ino(inode);
+	key.type = BTRFS_EXTENT_DATA_KEY;
+	key.offset = off;
+
+	while (1) {
+		u64 next_key_min_offset = key.offset + 1;
+
+		/*
+		 * note the key will change type as we walk through the
+		 * tree.
+		 */
+		path->leave_spinning = 1;
+		ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path,	0, 0);
+		if (ret < 0)
+			goto out;
+		/*
+		 * First search, if no extent item that starts at offset off was
+		 * found but the previous item is an extent item, it's possible
+		 * it might overlap our target range, therefore process it.
+		 */
+		if (key.offset == off && ret > 0 && path->slots[0] > 0) {
+			btrfs_item_key_to_cpu(path->nodes[0], &key,
+					      path->slots[0] - 1);
+			if (key.type == BTRFS_EXTENT_DATA_KEY)
+				path->slots[0]--;
+		}
+
+		nritems = btrfs_header_nritems(path->nodes[0]);
+process_slot:
+		no_quota = 1;
+		if (path->slots[0] >= nritems) {
+			ret = btrfs_next_leaf(BTRFS_I(inode)->root, path);
+			if (ret < 0)
+				goto out;
+			if (ret > 0)
+				break;
+			nritems = btrfs_header_nritems(path->nodes[0]);
+		}
+		leaf = path->nodes[0];
+		slot = path->slots[0];
+
+		btrfs_item_key_to_cpu(leaf, &key, slot);
+		if (key.type > BTRFS_EXTENT_DATA_KEY ||
+		    key.objectid != btrfs_ino(inode))
+			break;
+
+		if (key.type == BTRFS_EXTENT_DATA_KEY) {
+			struct btrfs_file_extent_item *extent;
+			int type;
+			u64 disko = 0;
+			u64 diskl = 0;
+			u64 datal = 0;
+
+			extent = btrfs_item_ptr(leaf, slot,	struct btrfs_file_extent_item);
+			type = btrfs_file_extent_type(leaf, extent);
+			if (type == BTRFS_FILE_EXTENT_REG ||
+			    type == BTRFS_FILE_EXTENT_PREALLOC) {
+				disko = btrfs_file_extent_disk_bytenr(leaf, extent);
+				diskl = btrfs_file_extent_disk_num_bytes(leaf, extent);
+				datal = btrfs_file_extent_num_bytes(leaf, extent);
+			}
+
+			/*
+			 * The first search might have left us at an extent
+			 * item that ends before our target range's start, can
+			 * happen if we have holes and NO_HOLES feature enabled.
+			 */
+			if (key.offset + datal <= off) {
+				path->slots[0]++;
+				goto process_slot;
+			} else if (key.offset >= off + olen_aligned) {
+				break;
+			}
+			next_key_min_offset = key.offset + datal;
+
+			for (;diskl >= 0; diskl -= pagesize) {
+				sum += disko+diskl;
+				counter++;
+			}
+
+			btrfs_release_path(path);
+			path->leave_spinning = 0;
+	}
+
+out:
+	ret = sum/counter;
+	btrfs_free_path(path);
+	vfree(buf);
+	return ret;
+}
+
+/**
  * btrfs_clone() - clone a range from inode file to another
  *
  * @src: Inode to clone from
-- 
2.6.1