Message ID | 20170519133835.27843-3-nefelim4ag@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, Le 19/05/2017 à 15:38, Timofey Titovets a écrit : > If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent > > Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> > --- > fs/btrfs/lzo.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c > index bd0b0938..637ef1b0 100644 > --- a/fs/btrfs/lzo.c > +++ b/fs/btrfs/lzo.c > @@ -207,7 +207,7 @@ static int lzo_compress_pages(struct list_head *ws, > } > > /* we're making it bigger, give up */ > - if (tot_in > 8192 && tot_in < tot_out) { > + if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) { > ret = -E2BIG; > goto out; > } I'm not familiar with this code but I was surprised by the test : you would expect compression having a benefit when you are freeing an actual page not reducing data by a page size. So unless I don't understand the context shouldn't it be something like : if (tot_in > 8192 && ((tot_in % PAGE_SIZE) <= (tot_out % PAGE_SIZE)) but looking at the code I see that this is in a while loop and there's another test just after the loop in the existing code : if (tot_out > tot_in) goto out; There's a couple of things I don't understand but isn't this designed to stream data in small chunks through compression before writing it in the end ? So isn't this later test the proper location to detect if compression was beneficial ? You might not save a page early on in the while loop working on a subset of the data to compress but after enough data being processed you could save a page. It seems odd that your modification could abort compression early on although the same condition would become true after enough loops. Isn't what you want something like : if (tot_out % PAGE_SIZE >= tot_in % PAGE_SIZE) goto out; after the loop ? The >= instead of > would avoid decompression in the case where the compressed data is smaller but uses the same space on disk. Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 19/05/2017 à 16:17, Lionel Bouton a écrit : > Hi, > > Le 19/05/2017 à 15:38, Timofey Titovets a écrit : >> If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent >> >> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> >> --- >> fs/btrfs/lzo.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c >> index bd0b0938..637ef1b0 100644 >> --- a/fs/btrfs/lzo.c >> +++ b/fs/btrfs/lzo.c >> @@ -207,7 +207,7 @@ static int lzo_compress_pages(struct list_head *ws, >> } >> >> /* we're making it bigger, give up */ >> - if (tot_in > 8192 && tot_in < tot_out) { >> + if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) { >> ret = -E2BIG; >> goto out; >> } > I'm not familiar with this code but I was surprised by the test : you > would expect compression having a benefit when you are freeing an actual > page not reducing data by a page size. So unless I don't understand the > context shouldn't it be something like : > > if (tot_in > 8192 && ((tot_in % PAGE_SIZE) <= (tot_out % PAGE_SIZE)) > > but looking at the code I see that this is in a while loop and there's > another test just after the loop in the existing code : > > if (tot_out > tot_in) > goto out; > > There's a couple of things I don't understand but isn't this designed to > stream data in small chunks through compression before writing it in the > end ? So isn't this later test the proper location to detect if > compression was beneficial ? > > You might not save a page early on in the while loop working on a subset > of the data to compress but after enough data being processed you could > save a page. It seems odd that your modification could abort compression > early on although the same condition would become true after enough loops. > > Isn't what you want something like : > > if (tot_out % PAGE_SIZE >= tot_in % PAGE_SIZE) > goto out; > > after the loop ? > The >= instead of > would avoid decompression in the case where the > compressed data is smaller but uses the same space on disk. I was too focused on other problems and having a fresh look at what I wrote I'm embarrassed by what I read. Used pages for a given amount of data should be (amount / PAGE_SIZE) + ((amount % PAGE_SIZE) == 0 ? 0 : 1) this seems enough of a common thing to compute that the kernel might have a macro defined for this. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017-05-19 23:19 GMT+03:00 Lionel Bouton <lionel-subscription@bouton.name>: > I was too focused on other problems and having a fresh look at what I > wrote I'm embarrassed by what I read. > Used pages for a given amount of data should be (amount / PAGE_SIZE) + > ((amount % PAGE_SIZE) == 0 ? 0 : 1) this seems enough of a common thing > to compute that the kernel might have a macro defined for this. If i understand the code correctly, the logic of comparing the size of input/output by bytes is enough (IMHO) for skipping the compression of non-compressible data, and as btrfs uses PAGE_SIZE as a data cluster size (and if i understand correctly it's minimum IO size), the logic of that can be improved in case when compressed data use 126978 < compressed_size < 131072. The easiest way to improve that case, i think, is making the compressed size bigger by PAGE_SIZE. JFYI: Once I've read on the list that btrfs don't compress data, if data are less then PAGE_SIZE because it's useless (i didn't find that in the code, so i think that i don't fully understand code of compress routine) After some time i got a idea that if btrfs determines store data compressed or not by comparing input/output size of data (i.e. if compressed size is bigger in compare to input data, don't store compressed version), this logic can be improved by also checking if compression profit is more then PAGE_SIZE, because if it's not, compressed data will pass check and comsume the same amount of space and as a result will not show any gain.
Le 19/05/2017 à 23:15, Timofey Titovets a écrit : > 2017-05-19 23:19 GMT+03:00 Lionel Bouton > <lionel-subscription@bouton.name>: >> I was too focused on other problems and having a fresh look at what I >> wrote I'm embarrassed by what I read. Used pages for a given amount >> of data should be (amount / PAGE_SIZE) + ((amount % PAGE_SIZE) == 0 ? >> 0 : 1) this seems enough of a common thing to compute that the kernel >> might have a macro defined for this. > If i understand the code correctly, the logic of comparing the size of > input/output by bytes is enough (IMHO) As I suspected I missed context : the name of the function makes it clear it is supposed to work on whole pages so you are right about the comparison. What I'm still unsure is if the test is at the right spot. The inner loop seems to work on at most in_len = min(len, PAGE_SIZE) chunks of data, for example on anything with len >= 4xPAGE_SIZE and PAGE_SIZE=4096 it seems to me there's a problem. if·(tot_in·>·8192·&&·tot_in·<·tot_out·+·PAGE_SIZE) tot_in > 8192 is true starting at the 3rd page being processedin my example If the 3 first pages don't manage to free one full page (ie the function only reaches at best a 2/3 compression ratio) the modified second condition is true and the compression is aborted. This even if continuing the compression would end up in freeing one page (tot_out is expected to rise slower than tot_in on compressible data so the difference could rise and reach a full PAGE_SIZE). Am I still confused by something ? Best regards, Lionel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2017-05-20 18:47 GMT+03:00 Lionel Bouton <lionel-subscription@bouton.name>: > Le 19/05/2017 à 23:15, Timofey Titovets a écrit : >> 2017-05-19 23:19 GMT+03:00 Lionel Bouton >> <lionel-subscription@bouton.name>: >>> I was too focused on other problems and having a fresh look at what I >>> wrote I'm embarrassed by what I read. Used pages for a given amount >>> of data should be (amount / PAGE_SIZE) + ((amount % PAGE_SIZE) == 0 ? >>> 0 : 1) this seems enough of a common thing to compute that the kernel >>> might have a macro defined for this. >> If i understand the code correctly, the logic of comparing the size of >> input/output by bytes is enough (IMHO) > > As I suspected I missed context : the name of the function makes it > clear it is supposed to work on whole pages so you are right about the > comparison. > > What I'm still unsure is if the test is at the right spot. The inner > loop seems to work on at most > in_len = min(len, PAGE_SIZE) > chunks of data, > for example on anything with len >= 4xPAGE_SIZE and PAGE_SIZE=4096 it > seems to me there's a problem. > > if·(tot_in·>·8192·&&·tot_in·<·tot_out·+·PAGE_SIZE) > > tot_in > 8192 is true starting at the 3rd page being processedin my example > > If the 3 first pages don't manage to free one full page (ie the function > only reaches at best a 2/3 compression ratio) the modified second > condition is true and the compression is aborted. This even if > continuing the compression would end up in freeing one page (tot_out is > expected to rise slower than tot_in on compressible data so the > difference could rise and reach a full PAGE_SIZE). > > Am I still confused by something ? > > Best regards, > > Lionel You right, size must be checked after all data are already comressed, so i will fix that and update patch set, thanks
diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index bd0b0938..637ef1b0 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -207,7 +207,7 @@ static int lzo_compress_pages(struct list_head *ws, } /* we're making it bigger, give up */ - if (tot_in > 8192 && tot_in < tot_out) { + if (tot_in > 8192 && tot_in < tot_out + PAGE_SIZE) { ret = -E2BIG; goto out; }
If data compression didn't free at least one PAGE_SIZE, it useless to store that compressed extent Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com> --- fs/btrfs/lzo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)