diff mbox

[v3] Btrfs: add skeleton code for compression heuristic

Message ID 20170717135258.15865-1-nefelim4ag@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets July 17, 2017, 1:52 p.m. UTC
For now that code just return true
Later more complex heuristic code will be added

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/compression.c | 30 ++++++++++++++++++++++++++++++
 fs/btrfs/compression.h |  2 ++
 fs/btrfs/inode.c       | 10 +++++-----
 3 files changed, 37 insertions(+), 5 deletions(-)

--
2.13.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Sterba July 17, 2017, 6:30 p.m. UTC | #1
So it basically looks good, I could not resist and rewrote the changelog
and comments. There's one code fix:

On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote:
> -static inline int inode_need_compress(struct inode *inode)
> +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
>  {
>  	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> 
>  	/* force compress */
>  	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
> -		return 1;
> +		return btrfs_compress_heuristic(inode, start, end);

This must stay 'return 1', if force-compress is on, so the change is
reverted.

I'm adding the patch to for-next.

>  	/* bad compression ratios */
>  	if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS)
>  		return 0;
>  	if (btrfs_test_opt(fs_info, COMPRESS) ||
>  	    BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS ||
>  	    BTRFS_I(inode)->force_compress)
> -		return 1;
> +		return btrfs_compress_heuristic(inode, start, end);
>  	return 0;
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain July 21, 2017, 5 a.m. UTC | #2
On 07/18/2017 02:30 AM, David Sterba wrote:
> So it basically looks good, I could not resist and rewrote the changelog
> and comments. There's one code fix:
> 
> On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote:
>> -static inline int inode_need_compress(struct inode *inode)
>> +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
>>   {
>>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>>
>>   	/* force compress */
>>   	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
>> -		return 1;
>> +		return btrfs_compress_heuristic(inode, start, end);



> This must stay 'return 1', if force-compress is on, so the change is
> reverted.

  Initially I thought 'return 1' is correct, but looking in depth,
  it is not correct as below..

  The biggest beneficiary of the estimating the compression ratio
  in advance (heuristic) is when customers are using the
  -o compress-force. But 'return 1' here is making them not to
  use heuristic. So definitely something is wrong.

  -o compress is about the whether each of the compression-granular bytes
  (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR
  just give up for the whole inode by looking at the compression ratio
  of the current compression-granular.
  This approach can be overridden by -o compress-force. So in
  -o compress-force there will be a lot more efforts in _trying_
  to compression than in -o compress. We must use heuristic for
  -o compress-force.

  btrfs_compress_heuristic is about the way of figuring out
  whether the current compression-granular is compression capable.
  Currently we are doing this part by trial-method (which is more
  accurate than the heuristic approach), in the function
  btrfs_compress_pages() and then we giving up in the function
  compress_file_range(). So IMO. btrfs_compress_heuristic() should
  be called at these functions.

  Further, heuristic is just estimating which may go wrong based
  on future compression algorithm ? (I am not sure).

  IMO. its a good idea to either add an option to enable/disable
  the heuristic.

Thanks, Anand



> I'm adding the patch to for-next.
> 
>>   	/* bad compression ratios */
>>   	if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS)
>>   		return 0;
>>   	if (btrfs_test_opt(fs_info, COMPRESS) ||
>>   	    BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS ||
>>   	    BTRFS_I(inode)->force_compress)
>> -		return 1;
>> +		return btrfs_compress_heuristic(inode, start, end);
>>   	return 0;
>>   }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Roman Mamedov July 21, 2017, 6:37 p.m. UTC | #3
On Fri, 21 Jul 2017 13:00:56 +0800
Anand Jain <anand.jain@oracle.com> wrote:

> 
> 
> On 07/18/2017 02:30 AM, David Sterba wrote:
> > So it basically looks good, I could not resist and rewrote the changelog
> > and comments. There's one code fix:
> > 
> > On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote:
> >> -static inline int inode_need_compress(struct inode *inode)
> >> +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
> >>   {
> >>   	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
> >>
> >>   	/* force compress */
> >>   	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
> >> -		return 1;
> >> +		return btrfs_compress_heuristic(inode, start, end);
> 
> 
> 
> > This must stay 'return 1', if force-compress is on, so the change is
> > reverted.
> 
>   Initially I thought 'return 1' is correct, but looking in depth,
>   it is not correct as below..
> 
>   The biggest beneficiary of the estimating the compression ratio
>   in advance (heuristic) is when customers are using the
>   -o compress-force. But 'return 1' here is making them not to
>   use heuristic. So definitely something is wrong.

man mount says for btrfs:

    If compress-force is specified, all files will be compressed,  whether
    or  not they compress well.

So compress-force by definition should always compress all files no matter
what, and not use any heuristic. In fact it has no right to, as user forced
compression to always on. Returning 1 up there does seem right to me.

>   -o compress is about the whether each of the compression-granular bytes
>   (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR
>   just give up for the whole inode by looking at the compression ratio
>   of the current compression-granular.
>   This approach can be overridden by -o compress-force. So in
>   -o compress-force there will be a lot more efforts in _trying_
>   to compression than in -o compress. We must use heuristic for
>   -o compress-force.

Semantic and the user expectation of compress-force dictates to always
compress without giving up, even if it turns out to be slower and not providing
much benefit.
Adam Borowski July 21, 2017, 9 p.m. UTC | #4
On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote:
> On Fri, 21 Jul 2017 13:00:56 +0800
> Anand Jain <anand.jain@oracle.com> wrote:
> > On 07/18/2017 02:30 AM, David Sterba wrote:
> > > This must stay 'return 1', if force-compress is on, so the change is
> > > reverted.
> > 
> >   Initially I thought 'return 1' is correct, but looking in depth,
> >   it is not correct as below..
> > 
> >   The biggest beneficiary of the estimating the compression ratio
> >   in advance (heuristic) is when customers are using the
> >   -o compress-force. But 'return 1' here is making them not to
> >   use heuristic. So definitely something is wrong.
> 
> man mount says for btrfs:
> 
>     If compress-force is specified, all files will be compressed,  whether
>     or  not they compress well.
> 
> So compress-force by definition should always compress all files no matter
> what, and not use any heuristic. In fact it has no right to, as user forced
> compression to always on. Returning 1 up there does seem right to me.

Technically, for every compression algorithm other than identity (and its
bijections), some data will expand by at least one bit (thus byte, thus
page), therefore we need to be able to store with no compression even when
forced.  On the other hand, it sounds reasonable to take force to mean
"compression will always be attempted" -- ie, we forbid early return when
a small sample seems uncompressible.

> >   -o compress is about the whether each of the compression-granular bytes
> >   (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR
> >   just give up for the whole inode by looking at the compression ratio
> >   of the current compression-granular.
> >   This approach can be overridden by -o compress-force. So in
> >   -o compress-force there will be a lot more efforts in _trying_
> >   to compression than in -o compress. We must use heuristic for
> >   -o compress-force.
> 
> Semantic and the user expectation of compress-force dictates to always
> compress without giving up, even if it turns out to be slower and not providing
> much benefit.

Another question is, how would "compress-force" differ from "compress"
otherwise?  Always attempting the compression is its whole purpose!


Meow.
Anand Jain July 22, 2017, 12:52 a.m. UTC | #5
On 07/22/2017 05:00 AM, Adam Borowski wrote:
> On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote:
>> On Fri, 21 Jul 2017 13:00:56 +0800
>> Anand Jain <anand.jain@oracle.com> wrote:
>>> On 07/18/2017 02:30 AM, David Sterba wrote:
>>>> This must stay 'return 1', if force-compress is on, so the change is
>>>> reverted.
>>>
>>>    Initially I thought 'return 1' is correct, but looking in depth,
>>>    it is not correct as below..
>>>
>>>    The biggest beneficiary of the estimating the compression ratio
>>>    in advance (heuristic) is when customers are using the
>>>    -o compress-force. But 'return 1' here is making them not to
>>>    use heuristic. So definitely something is wrong.
>>
>> man mount says for btrfs:
>>
>>      If compress-force is specified, all files will be compressed,  whether
>>      or  not they compress well.
>>
>> So compress-force by definition should always compress all files no matter
>> what, and not use any heuristic. In fact it has no right to, as user forced
>> compression to always on. Returning 1 up there does seem right to me.

  Oh yes. I mean to say as the try and compress happens at
  btrfs_compress_pages() and compress_file_range(), so in fact,
  heuristic should not come in inode_need_compress() at all.

Thanks, Anand


> Technically, for every compression algorithm other than identity (and its
> bijections), some data will expand by at least one bit (thus byte, thus
> page), therefore we need to be able to store with no compression even when
> forced.  On the other hand, it sounds reasonable to take force to mean
> "compression will always be attempted" -- ie, we forbid early return when
> a small sample seems uncompressible.
> 
>>>    -o compress is about the whether each of the compression-granular bytes
>>>    (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR
>>>    just give up for the whole inode by looking at the compression ratio
>>>    of the current compression-granular.
>>>    This approach can be overridden by -o compress-force. So in
>>>    -o compress-force there will be a lot more efforts in _trying_
>>>    to compression than in -o compress. We must use heuristic for
>>>    -o compress-force.
>>
>> Semantic and the user expectation of compress-force dictates to always
>> compress without giving up, even if it turns out to be slower and not providing
>> much benefit.
> 
> Another question is, how would "compress-force" differ from "compress"
> otherwise?  Always attempting the compression is its whole purpose!




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 24, 2017, 2:53 p.m. UTC | #6
On Fri, Jul 21, 2017 at 11:00:27PM +0200, Adam Borowski wrote:
> On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote:
> > On Fri, 21 Jul 2017 13:00:56 +0800
> > Anand Jain <anand.jain@oracle.com> wrote:
> > > On 07/18/2017 02:30 AM, David Sterba wrote:
> > > > This must stay 'return 1', if force-compress is on, so the change is
> > > > reverted.
> > > 
> > >   Initially I thought 'return 1' is correct, but looking in depth,
> > >   it is not correct as below..
> > > 
> > >   The biggest beneficiary of the estimating the compression ratio
> > >   in advance (heuristic) is when customers are using the
> > >   -o compress-force. But 'return 1' here is making them not to
> > >   use heuristic. So definitely something is wrong.
> > 
> > man mount says for btrfs:
> > 
> >     If compress-force is specified, all files will be compressed,  whether
> >     or  not they compress well.
> > 
> > So compress-force by definition should always compress all files no matter
> > what, and not use any heuristic. In fact it has no right to, as user forced
> > compression to always on. Returning 1 up there does seem right to me.
> 
> Technically, for every compression algorithm other than identity (and its
> bijections), some data will expand by at least one bit (thus byte, thus
> page), therefore we need to be able to store with no compression even when
> forced.  On the other hand, it sounds reasonable to take force to mean
> "compression will always be attempted" -- ie, we forbid early return when
> a small sample seems uncompressible.

The current wording is a bit confusing, it works as you describe. I'll
update the manual page.

> > >   -o compress is about the whether each of the compression-granular bytes
> > >   (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR
> > >   just give up for the whole inode by looking at the compression ratio
> > >   of the current compression-granular.
> > >   This approach can be overridden by -o compress-force. So in
> > >   -o compress-force there will be a lot more efforts in _trying_
> > >   to compression than in -o compress. We must use heuristic for
> > >   -o compress-force.
> > 
> > Semantic and the user expectation of compress-force dictates to always
> > compress without giving up, even if it turns out to be slower and not providing
> > much benefit.
> 
> Another question is, how would "compress-force" differ from "compress"
> otherwise?  Always attempting the compression is its whole purpose!

Eg. files that are already compressed would increase the cpu consumption
with compress-force, while they'd be hopefully detected as
incompressible with 'compress' and clever heuristics. So the NOCOMPRESS
bit would better reflect the status of the file.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain July 24, 2017, 3:40 p.m. UTC | #7
> Eg. files that are already compressed would increase the cpu consumption
> with compress-force, while they'd be hopefully detected as
> incompressible with 'compress' and clever heuristics. So the NOCOMPRESS
> bit would better reflect the status of the file.

  current NOCOMPRESS is based on trial and error method and is more
  accurate than heuristic also loss of cpu power is only one time ?

  May be the only opportunity that heuristic can facilitate is at the
  logic to monitor and reset the NOCOMPRESS, as of now there is no
  such a logic.

Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba July 27, 2017, 3:36 p.m. UTC | #8
On Mon, Jul 24, 2017 at 11:40:17PM +0800, Anand Jain wrote:
> 
> > Eg. files that are already compressed would increase the cpu consumption
> > with compress-force, while they'd be hopefully detected as
> > incompressible with 'compress' and clever heuristics. So the NOCOMPRESS
> > bit would better reflect the status of the file.
> 
>   current NOCOMPRESS is based on trial and error method and is more
>   accurate than heuristic also loss of cpu power is only one time ?

Curreently, force-compress beats everything, so even a file with
NOCOMPRESS will be compressed, all new writes will be passed to the
compression and stored uncompressed eventually. Each time the
compression code will run and fail, so it's not one time.

Although you can say it's more 'accurate', it's also more expensive.

>   May be the only opportunity that heuristic can facilitate is at the
>   logic to monitor and reset the NOCOMPRESS, as of now there is no
>   such a logic.

The heurictic can be made adaptive, and examine data even for NOCOMPRESS
files, but that's a few steps ahead of where we are now.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anand Jain July 28, 2017, 2:04 p.m. UTC | #9
On 28/07/2017 00:36, David Sterba wrote:
> On Mon, Jul 24, 2017 at 11:40:17PM +0800, Anand Jain wrote:
>>
>>> Eg. files that are already compressed would increase the cpu consumption
>>> with compress-force, while they'd be hopefully detected as
>>> incompressible with 'compress' and clever heuristics. So the NOCOMPRESS
>>> bit would better reflect the status of the file.

  I thought 'compress' in above, is the compress option. Ah you mean
  to say compression algo .. got it. Right compress-force for
  incompressible-data is very expensive.

  And its also true that compress option for incompressible data is
  not at all expensive and its only one time.

>>    current NOCOMPRESS is based on trial and error method and is more
>>    accurate than heuristic also loss of cpu power is only one time ?


> Curreently, force-compress beats everything, so even a file with
> NOCOMPRESS will be compressed, all new writes will be passed to the
> compression and stored uncompressed eventually.

  It makes sense to me when you replace NOCOMPRESS with
  incompressible-data in the above statement. As in my understanding..

  You will never have a file with NOCOMPRESS flag if compress-force
  option is used.


> Each time they
> compression code will run and fail, so it's not one time.
>
> Although you can say it's more 'accurate', it's also more expensive.

   yes. Expensive only in compress-force.

>>    May be the only opportunity that heuristic can facilitate is at the
>>    logic to monitor and reset the NOCOMPRESS, as of now there is no
>>    such a logic.
>
> The heurictic can be made adaptive, and examine data even for NOCOMPRESS
> files, but that's a few steps ahead of where we are now.

   Nice.

Thanks, Anand

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index d2ef9ac2a630..27ba11a74eb2 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -1047,3 +1047,33 @@  int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start,

 	return 1;
 }
+
+/*
+ * Heuristic skeleton
+ * For now just would be a naive and very optimistic 'return true'.
+ * Heuristic proporsed to fast (in compare to direct compression) detect
+ * data type (compressible/uncompressible) for avoid vaste of cpu time
+ * on compression uncompressible data.
+ * In near time that logic will be added:
+ * 0. Get sample of input data
+ * 1. Detect Mostly Zeroed data
+ * 2. Detect Data with low "byte set" size (Text & etc)
+ * 3. Detect Data with low/high core "byte set"
+ */
+int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end)
+{
+	u64 index = start >> PAGE_SHIFT;
+	u64 end_index = end >> PAGE_SHIFT;
+	struct page *page;
+	int ret = 1;
+
+	while (index <= end_index) {
+		page = find_get_page(inode->i_mapping, index);
+		kmap(page);
+		kunmap(page);
+		put_page(page);
+		index++;
+	}
+
+	return ret;
+}
diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h
index 87f6d3332163..8508ba6b9aef 100644
--- a/fs/btrfs/compression.h
+++ b/fs/btrfs/compression.h
@@ -129,4 +129,6 @@  struct btrfs_compress_op {
 extern const struct btrfs_compress_op btrfs_zlib_compress;
 extern const struct btrfs_compress_op btrfs_lzo_compress;

+int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end);
+
 #endif
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95c212037095..c23b7047fc39 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -392,20 +392,20 @@  static noinline int add_async_extent(struct async_cow *cow,
 	return 0;
 }

-static inline int inode_need_compress(struct inode *inode)
+static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
 {
 	struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);

 	/* force compress */
 	if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
-		return 1;
+		return btrfs_compress_heuristic(inode, start, end);
 	/* bad compression ratios */
 	if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS)
 		return 0;
 	if (btrfs_test_opt(fs_info, COMPRESS) ||
 	    BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS ||
 	    BTRFS_I(inode)->force_compress)
-		return 1;
+		return btrfs_compress_heuristic(inode, start, end);
 	return 0;
 }

@@ -503,7 +503,7 @@  static noinline void compress_file_range(struct inode *inode,
 	 * inode has not been flagged as nocompress.  This flag can
 	 * change at any time if we discover bad compression ratios.
 	 */
-	if (inode_need_compress(inode)) {
+	if (inode_need_compress(inode, start, end)) {
 		WARN_ON(pages);
 		pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
 		if (!pages) {
@@ -1576,7 +1576,7 @@  static int run_delalloc_range(void *private_data, struct page *locked_page,
 	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 0, nr_written);
-	} else if (!inode_need_compress(inode)) {
+	} else if (!inode_need_compress(inode, start, end)) {
 		ret = cow_file_range(inode, locked_page, start, end, end,
 				      page_started, nr_written, 1, NULL);
 	} else {