Message ID | 20170717135258.15865-1-nefelim4ag@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
So it basically looks good, I could not resist and rewrote the changelog and comments. There's one code fix: On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote: > -static inline int inode_need_compress(struct inode *inode) > +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) > { > struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > > /* force compress */ > if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) > - return 1; > + return btrfs_compress_heuristic(inode, start, end); This must stay 'return 1', if force-compress is on, so the change is reverted. I'm adding the patch to for-next. > /* bad compression ratios */ > if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) > return 0; > if (btrfs_test_opt(fs_info, COMPRESS) || > BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS || > BTRFS_I(inode)->force_compress) > - return 1; > + return btrfs_compress_heuristic(inode, start, end); > return 0; > } -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/18/2017 02:30 AM, David Sterba wrote: > So it basically looks good, I could not resist and rewrote the changelog > and comments. There's one code fix: > > On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote: >> -static inline int inode_need_compress(struct inode *inode) >> +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) >> { >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); >> >> /* force compress */ >> if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) >> - return 1; >> + return btrfs_compress_heuristic(inode, start, end); > This must stay 'return 1', if force-compress is on, so the change is > reverted. Initially I thought 'return 1' is correct, but looking in depth, it is not correct as below.. The biggest beneficiary of the estimating the compression ratio in advance (heuristic) is when customers are using the -o compress-force. But 'return 1' here is making them not to use heuristic. So definitely something is wrong. -o compress is about the whether each of the compression-granular bytes (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR just give up for the whole inode by looking at the compression ratio of the current compression-granular. This approach can be overridden by -o compress-force. So in -o compress-force there will be a lot more efforts in _trying_ to compression than in -o compress. We must use heuristic for -o compress-force. btrfs_compress_heuristic is about the way of figuring out whether the current compression-granular is compression capable. Currently we are doing this part by trial-method (which is more accurate than the heuristic approach), in the function btrfs_compress_pages() and then we giving up in the function compress_file_range(). So IMO. btrfs_compress_heuristic() should be called at these functions. Further, heuristic is just estimating which may go wrong based on future compression algorithm ? (I am not sure). IMO. its a good idea to either add an option to enable/disable the heuristic. Thanks, Anand > I'm adding the patch to for-next. > >> /* bad compression ratios */ >> if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) >> return 0; >> if (btrfs_test_opt(fs_info, COMPRESS) || >> BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS || >> BTRFS_I(inode)->force_compress) >> - return 1; >> + return btrfs_compress_heuristic(inode, start, end); >> return 0; >> } > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 21 Jul 2017 13:00:56 +0800 Anand Jain <anand.jain@oracle.com> wrote: > > > On 07/18/2017 02:30 AM, David Sterba wrote: > > So it basically looks good, I could not resist and rewrote the changelog > > and comments. There's one code fix: > > > > On Mon, Jul 17, 2017 at 04:52:58PM +0300, Timofey Titovets wrote: > >> -static inline int inode_need_compress(struct inode *inode) > >> +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) > >> { > >> struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); > >> > >> /* force compress */ > >> if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) > >> - return 1; > >> + return btrfs_compress_heuristic(inode, start, end); > > > > > This must stay 'return 1', if force-compress is on, so the change is > > reverted. > > Initially I thought 'return 1' is correct, but looking in depth, > it is not correct as below.. > > The biggest beneficiary of the estimating the compression ratio > in advance (heuristic) is when customers are using the > -o compress-force. But 'return 1' here is making them not to > use heuristic. So definitely something is wrong. man mount says for btrfs: If compress-force is specified, all files will be compressed, whether or not they compress well. So compress-force by definition should always compress all files no matter what, and not use any heuristic. In fact it has no right to, as user forced compression to always on. Returning 1 up there does seem right to me. > -o compress is about the whether each of the compression-granular bytes > (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR > just give up for the whole inode by looking at the compression ratio > of the current compression-granular. > This approach can be overridden by -o compress-force. So in > -o compress-force there will be a lot more efforts in _trying_ > to compression than in -o compress. We must use heuristic for > -o compress-force. Semantic and the user expectation of compress-force dictates to always compress without giving up, even if it turns out to be slower and not providing much benefit.
On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote: > On Fri, 21 Jul 2017 13:00:56 +0800 > Anand Jain <anand.jain@oracle.com> wrote: > > On 07/18/2017 02:30 AM, David Sterba wrote: > > > This must stay 'return 1', if force-compress is on, so the change is > > > reverted. > > > > Initially I thought 'return 1' is correct, but looking in depth, > > it is not correct as below.. > > > > The biggest beneficiary of the estimating the compression ratio > > in advance (heuristic) is when customers are using the > > -o compress-force. But 'return 1' here is making them not to > > use heuristic. So definitely something is wrong. > > man mount says for btrfs: > > If compress-force is specified, all files will be compressed, whether > or not they compress well. > > So compress-force by definition should always compress all files no matter > what, and not use any heuristic. In fact it has no right to, as user forced > compression to always on. Returning 1 up there does seem right to me. Technically, for every compression algorithm other than identity (and its bijections), some data will expand by at least one bit (thus byte, thus page), therefore we need to be able to store with no compression even when forced. On the other hand, it sounds reasonable to take force to mean "compression will always be attempted" -- ie, we forbid early return when a small sample seems uncompressible. > > -o compress is about the whether each of the compression-granular bytes > > (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR > > just give up for the whole inode by looking at the compression ratio > > of the current compression-granular. > > This approach can be overridden by -o compress-force. So in > > -o compress-force there will be a lot more efforts in _trying_ > > to compression than in -o compress. We must use heuristic for > > -o compress-force. > > Semantic and the user expectation of compress-force dictates to always > compress without giving up, even if it turns out to be slower and not providing > much benefit. Another question is, how would "compress-force" differ from "compress" otherwise? Always attempting the compression is its whole purpose! Meow.
On 07/22/2017 05:00 AM, Adam Borowski wrote: > On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote: >> On Fri, 21 Jul 2017 13:00:56 +0800 >> Anand Jain <anand.jain@oracle.com> wrote: >>> On 07/18/2017 02:30 AM, David Sterba wrote: >>>> This must stay 'return 1', if force-compress is on, so the change is >>>> reverted. >>> >>> Initially I thought 'return 1' is correct, but looking in depth, >>> it is not correct as below.. >>> >>> The biggest beneficiary of the estimating the compression ratio >>> in advance (heuristic) is when customers are using the >>> -o compress-force. But 'return 1' here is making them not to >>> use heuristic. So definitely something is wrong. >> >> man mount says for btrfs: >> >> If compress-force is specified, all files will be compressed, whether >> or not they compress well. >> >> So compress-force by definition should always compress all files no matter >> what, and not use any heuristic. In fact it has no right to, as user forced >> compression to always on. Returning 1 up there does seem right to me. Oh yes. I mean to say as the try and compress happens at btrfs_compress_pages() and compress_file_range(), so in fact, heuristic should not come in inode_need_compress() at all. Thanks, Anand > Technically, for every compression algorithm other than identity (and its > bijections), some data will expand by at least one bit (thus byte, thus > page), therefore we need to be able to store with no compression even when > forced. On the other hand, it sounds reasonable to take force to mean > "compression will always be attempted" -- ie, we forbid early return when > a small sample seems uncompressible. > >>> -o compress is about the whether each of the compression-granular bytes >>> (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR >>> just give up for the whole inode by looking at the compression ratio >>> of the current compression-granular. >>> This approach can be overridden by -o compress-force. So in >>> -o compress-force there will be a lot more efforts in _trying_ >>> to compression than in -o compress. We must use heuristic for >>> -o compress-force. >> >> Semantic and the user expectation of compress-force dictates to always >> compress without giving up, even if it turns out to be slower and not providing >> much benefit. > > Another question is, how would "compress-force" differ from "compress" > otherwise? Always attempting the compression is its whole purpose! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 21, 2017 at 11:00:27PM +0200, Adam Borowski wrote: > On Fri, Jul 21, 2017 at 11:37:49PM +0500, Roman Mamedov wrote: > > On Fri, 21 Jul 2017 13:00:56 +0800 > > Anand Jain <anand.jain@oracle.com> wrote: > > > On 07/18/2017 02:30 AM, David Sterba wrote: > > > > This must stay 'return 1', if force-compress is on, so the change is > > > > reverted. > > > > > > Initially I thought 'return 1' is correct, but looking in depth, > > > it is not correct as below.. > > > > > > The biggest beneficiary of the estimating the compression ratio > > > in advance (heuristic) is when customers are using the > > > -o compress-force. But 'return 1' here is making them not to > > > use heuristic. So definitely something is wrong. > > > > man mount says for btrfs: > > > > If compress-force is specified, all files will be compressed, whether > > or not they compress well. > > > > So compress-force by definition should always compress all files no matter > > what, and not use any heuristic. In fact it has no right to, as user forced > > compression to always on. Returning 1 up there does seem right to me. > > Technically, for every compression algorithm other than identity (and its > bijections), some data will expand by at least one bit (thus byte, thus > page), therefore we need to be able to store with no compression even when > forced. On the other hand, it sounds reasonable to take force to mean > "compression will always be attempted" -- ie, we forbid early return when > a small sample seems uncompressible. The current wording is a bit confusing, it works as you describe. I'll update the manual page. > > > -o compress is about the whether each of the compression-granular bytes > > > (BTRFS_MAX_UNCOMPRESSED) of the inode should be tried to compress OR > > > just give up for the whole inode by looking at the compression ratio > > > of the current compression-granular. > > > This approach can be overridden by -o compress-force. So in > > > -o compress-force there will be a lot more efforts in _trying_ > > > to compression than in -o compress. We must use heuristic for > > > -o compress-force. > > > > Semantic and the user expectation of compress-force dictates to always > > compress without giving up, even if it turns out to be slower and not providing > > much benefit. > > Another question is, how would "compress-force" differ from "compress" > otherwise? Always attempting the compression is its whole purpose! Eg. files that are already compressed would increase the cpu consumption with compress-force, while they'd be hopefully detected as incompressible with 'compress' and clever heuristics. So the NOCOMPRESS bit would better reflect the status of the file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Eg. files that are already compressed would increase the cpu consumption > with compress-force, while they'd be hopefully detected as > incompressible with 'compress' and clever heuristics. So the NOCOMPRESS > bit would better reflect the status of the file. current NOCOMPRESS is based on trial and error method and is more accurate than heuristic also loss of cpu power is only one time ? May be the only opportunity that heuristic can facilitate is at the logic to monitor and reset the NOCOMPRESS, as of now there is no such a logic. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jul 24, 2017 at 11:40:17PM +0800, Anand Jain wrote: > > > Eg. files that are already compressed would increase the cpu consumption > > with compress-force, while they'd be hopefully detected as > > incompressible with 'compress' and clever heuristics. So the NOCOMPRESS > > bit would better reflect the status of the file. > > current NOCOMPRESS is based on trial and error method and is more > accurate than heuristic also loss of cpu power is only one time ? Curreently, force-compress beats everything, so even a file with NOCOMPRESS will be compressed, all new writes will be passed to the compression and stored uncompressed eventually. Each time the compression code will run and fail, so it's not one time. Although you can say it's more 'accurate', it's also more expensive. > May be the only opportunity that heuristic can facilitate is at the > logic to monitor and reset the NOCOMPRESS, as of now there is no > such a logic. The heurictic can be made adaptive, and examine data even for NOCOMPRESS files, but that's a few steps ahead of where we are now. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 28/07/2017 00:36, David Sterba wrote: > On Mon, Jul 24, 2017 at 11:40:17PM +0800, Anand Jain wrote: >> >>> Eg. files that are already compressed would increase the cpu consumption >>> with compress-force, while they'd be hopefully detected as >>> incompressible with 'compress' and clever heuristics. So the NOCOMPRESS >>> bit would better reflect the status of the file. I thought 'compress' in above, is the compress option. Ah you mean to say compression algo .. got it. Right compress-force for incompressible-data is very expensive. And its also true that compress option for incompressible data is not at all expensive and its only one time. >> current NOCOMPRESS is based on trial and error method and is more >> accurate than heuristic also loss of cpu power is only one time ? > Curreently, force-compress beats everything, so even a file with > NOCOMPRESS will be compressed, all new writes will be passed to the > compression and stored uncompressed eventually. It makes sense to me when you replace NOCOMPRESS with incompressible-data in the above statement. As in my understanding.. You will never have a file with NOCOMPRESS flag if compress-force option is used. > Each time they > compression code will run and fail, so it's not one time. > > Although you can say it's more 'accurate', it's also more expensive. yes. Expensive only in compress-force. >> May be the only opportunity that heuristic can facilitate is at the >> logic to monitor and reset the NOCOMPRESS, as of now there is no >> such a logic. > > The heurictic can be made adaptive, and examine data even for NOCOMPRESS > files, but that's a few steps ahead of where we are now. Nice. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index d2ef9ac2a630..27ba11a74eb2 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1047,3 +1047,33 @@ int btrfs_decompress_buf2page(const char *buf, unsigned long buf_start, return 1; } + +/* + * Heuristic skeleton + * For now just would be a naive and very optimistic 'return true'. + * Heuristic proporsed to fast (in compare to direct compression) detect + * data type (compressible/uncompressible) for avoid vaste of cpu time + * on compression uncompressible data. + * In near time that logic will be added: + * 0. Get sample of input data + * 1. Detect Mostly Zeroed data + * 2. Detect Data with low "byte set" size (Text & etc) + * 3. Detect Data with low/high core "byte set" + */ +int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end) +{ + u64 index = start >> PAGE_SHIFT; + u64 end_index = end >> PAGE_SHIFT; + struct page *page; + int ret = 1; + + while (index <= end_index) { + page = find_get_page(inode->i_mapping, index); + kmap(page); + kunmap(page); + put_page(page); + index++; + } + + return ret; +} diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 87f6d3332163..8508ba6b9aef 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -129,4 +129,6 @@ struct btrfs_compress_op { extern const struct btrfs_compress_op btrfs_zlib_compress; extern const struct btrfs_compress_op btrfs_lzo_compress; +int btrfs_compress_heuristic(struct inode *inode, u64 start, u64 end); + #endif diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 95c212037095..c23b7047fc39 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -392,20 +392,20 @@ static noinline int add_async_extent(struct async_cow *cow, return 0; } -static inline int inode_need_compress(struct inode *inode) +static inline int inode_need_compress(struct inode *inode, u64 start, u64 end) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); /* force compress */ if (btrfs_test_opt(fs_info, FORCE_COMPRESS)) - return 1; + return btrfs_compress_heuristic(inode, start, end); /* bad compression ratios */ if (BTRFS_I(inode)->flags & BTRFS_INODE_NOCOMPRESS) return 0; if (btrfs_test_opt(fs_info, COMPRESS) || BTRFS_I(inode)->flags & BTRFS_INODE_COMPRESS || BTRFS_I(inode)->force_compress) - return 1; + return btrfs_compress_heuristic(inode, start, end); return 0; } @@ -503,7 +503,7 @@ static noinline void compress_file_range(struct inode *inode, * inode has not been flagged as nocompress. This flag can * change at any time if we discover bad compression ratios. */ - if (inode_need_compress(inode)) { + if (inode_need_compress(inode, start, end)) { WARN_ON(pages); pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); if (!pages) { @@ -1576,7 +1576,7 @@ static int run_delalloc_range(void *private_data, struct page *locked_page, } else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) { ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_need_compress(inode)) { + } else if (!inode_need_compress(inode, start, end)) { ret = cow_file_range(inode, locked_page, start, end, end, page_started, nr_written, 1, NULL); } else {
For now that code just return true Later more complex heuristic code will be added Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> --- fs/btrfs/compression.c | 30 ++++++++++++++++++++++++++++++ fs/btrfs/compression.h | 2 ++ fs/btrfs/inode.c | 10 +++++----- 3 files changed, 37 insertions(+), 5 deletions(-) -- 2.13.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html