[0/3] Allow ZRAM to use any zpool-compatible backend

Message ID	20191010230414.647c29f34665ca26103879c4@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=M5w7=YD=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C314D21835 Date: Thu, 10 Oct 2019 23:04:14 +0300 From: Vitaly Wool <vitalywool@gmail.com> To: Linux-MM <linux-mm@kvack.org>, Andrew Morton <akpm@linux-foundation.org>, Dan Streetman <ddstreet@ieee.org>, Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, LKML <linux-kernel@vger.kernel.org>, Vlastimil Babka <vbabka@suse.cz>, Shakeel Butt <shakeelb@google.com>, Henry Burns <henrywolfeburns@gmail.com>, Theodore Ts'o <tytso@thunk.org> Subject: [PATCH 0/3] Allow ZRAM to use any zpool-compatible backend Message-Id: <20191010230414.647c29f34665ca26103879c4@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Allow ZRAM to use any zpool-compatible backend \| expand [0/3] Allow ZRAM to use any zpool-compatible backend [1/3] zpool: extend API to match zsmalloc [2/3] zsmalloc: add compaction and huge class callbacks [3/3] zram: use common zpool interface

Vitaly Wool Oct. 10, 2019, 8:04 p.m. UTC

The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.

The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.

The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.

The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.

[1] https://lkml.org/lkml/2015/9/14/356
[2] https://linuxplumbersconf.org/event/4/contributions/551/

Sergey Senozhatsky Oct. 14, 2019, 10:33 a.m. UTC | #1

Hi,

On (10/10/19 23:04), Vitaly Wool wrote:
[..]
> The coming patchset is a new take on the old issue: ZRAM can
> currently be used only with zsmalloc even though this may not
> be the optimal combination for some configurations. The previous
> (unsuccessful) attempt dates back to 2015 [1] and is notable for
> the heated discussions it has caused.

Oh, right, I do recall it.

> The patchset in [1] had basically the only goal of enabling
> ZRAM/zbud combo which had a very narrow use case. Things have
> changed substantially since then, and now, with z3fold used
> widely as a zswap backend, I, as the z3fold maintainer, am
> getting requests to re-interate on making it possible to use
> ZRAM with any zpool-compatible backend, first of all z3fold.

A quick question, what are the technical reasons to prefer
allocator X over zsmalloc? Some data would help, I guess.

> The preliminary results for this work have been delivered at
> Linux Plumbers this year [2]. The talk at LPC, though having
> attracted limited interest, ended in a consensus to continue
> the work and pursue the goal of decoupling ZRAM from zsmalloc.

[..]

> [1] https://lkml.org/lkml/2015/9/14/356

I need to re-read it, thanks for the link. IIRC, but maybe
I'm wrong, one of the things Minchan was not happy with was
increased maintenance cost. So, perhaps, this also should
be discuss/addressed (and maybe even in the first place).

	-ss

Vitaly Wool Oct. 14, 2019, 11:49 a.m. UTC | #2

Hi Sergey,

On Mon, Oct 14, 2019 at 12:35 PM Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
>
> Hi,
>
> On (10/10/19 23:04), Vitaly Wool wrote:
> [..]
> > The coming patchset is a new take on the old issue: ZRAM can
> > currently be used only with zsmalloc even though this may not
> > be the optimal combination for some configurations. The previous
> > (unsuccessful) attempt dates back to 2015 [1] and is notable for
> > the heated discussions it has caused.
>
> Oh, right, I do recall it.
>
> > The patchset in [1] had basically the only goal of enabling
> > ZRAM/zbud combo which had a very narrow use case. Things have
> > changed substantially since then, and now, with z3fold used
> > widely as a zswap backend, I, as the z3fold maintainer, am
> > getting requests to re-interate on making it possible to use
> > ZRAM with any zpool-compatible backend, first of all z3fold.
>
> A quick question, what are the technical reasons to prefer
> allocator X over zsmalloc? Some data would help, I guess.

For z3fold, the data can be found here:
https://elinux.org/images/d/d3/Z3fold.pdf.

For zbud (which is also of interest), imagine a low-end platform with
a simplistic HW compressor that doesn't give really high ratio. We
still want to be able to use ZRAM (not necessarily as a swap
partition, but rather for /home and /var) but we absolutely don't need
zsmalloc's complexity. zbud is a perfect match here (provided that it
can cope with PAGE_SIZE pages, yes, but it's a small patch to make
that work) since it's unlikely that we squeeze more than 2 compressed
pages per page with that HW compressor anyway.

> > The preliminary results for this work have been delivered at
> > Linux Plumbers this year [2]. The talk at LPC, though having
> > attracted limited interest, ended in a consensus to continue
> > the work and pursue the goal of decoupling ZRAM from zsmalloc.
>
> [..]
>
> > [1] https://lkml.org/lkml/2015/9/14/356
>
> I need to re-read it, thanks for the link. IIRC, but maybe
> I'm wrong, one of the things Minchan was not happy with was
> increased maintenance cost. So, perhaps, this also should
> be discuss/addressed (and maybe even in the first place).

I have hard time seeing how maintenance cost is increased here :)

~Vitaly

Minchan Kim Oct. 14, 2019, 4:41 p.m. UTC | #3

On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.
> 
> The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.
> 
> The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.
> 
> The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.
> 
> [1] https://lkml.org/lkml/2015/9/14/356
> [2] https://linuxplumbersconf.org/event/4/contributions/551/

Please describe what's the usecase in real world, what's the benefit zsmalloc
cannot fulfill by desgin and how it's significant.
I really don't want to make fragmentaion of allocator so we should really see
how zsmalloc cannot achieve things if you are claiming.
Please tell us how to test it so that we could investigate what's the root
cause.

Vitaly Wool Oct. 14, 2019, 4:50 p.m. UTC | #4

Den mån 14 okt. 2019 18:41Minchan Kim <minchan@kernel.org> skrev:

> On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> > The coming patchset is a new take on the old issue: ZRAM can currently
> be used only with zsmalloc even though this may not be the optimal
> combination for some configurations. The previous (unsuccessful) attempt
> dates back to 2015 [1] and is notable for the heated discussions it has
> caused.
> >
> > The patchset in [1] had basically the only goal of enabling ZRAM/zbud
> combo which had a very narrow use case. Things have changed substantially
> since then, and now, with z3fold used widely as a zswap backend, I, as the
> z3fold maintainer, am getting requests to re-interate on making it possible
> to use ZRAM with any zpool-compatible backend, first of all z3fold.
> >
> > The preliminary results for this work have been delivered at Linux
> Plumbers this year [2]. The talk at LPC, though having attracted limited
> interest, ended in a consensus to continue the work and pursue the goal of
> decoupling ZRAM from zsmalloc.
> >
> > The current patchset has been stress tested on arm64 and x86_64 devices,
> including the Dell laptop I'm writing this message on now, not to mention
> several QEmu confugirations.
> >
> > [1] https://lkml.org/lkml/2015/9/14/356
> > [2] https://linuxplumbersconf.org/event/4/contributions/551/
>
> Please describe what's the usecase in real world, what's the benefit
> zsmalloc
> cannot fulfill by desgin and how it's significant.
> I really don't want to make fragmentaion of allocator so we should really
> see
> how zsmalloc cannot achieve things if you are claiming.
> Please tell us how to test it so that we could investigate what's the root
> cause.
>

Fragmentation? What do you mean by that?

~Vitaly

>

Vitaly Wool Oct. 15, 2019, 7:39 a.m. UTC | #5

Hi Minchan,

On Mon, Oct 14, 2019 at 6:41 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> > The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.
> >
> > The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.
> >
> > The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.
> >
> > The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.
> >
> > [1] https://lkml.org/lkml/2015/9/14/356
> > [2] https://linuxplumbersconf.org/event/4/contributions/551/
>
> Please describe what's the usecase in real world, what's the benefit zsmalloc
> cannot fulfill by desgin and how it's significant.

I'm not entirely sure how to interpret the phrase "the benefit
zsmalloc cannot fulfill by design" but let me explain.
First, there are multi multi core systems where z3fold can provide
better throughput.
Then, there are low end systems with hardware
compression/decompression support which don't need zsmalloc
sophistication and would rather use zbud with ZRAM because the
compression ratio is relatively low.
Finally, there are MMU-less systems targeting IOT and still running
Linux and having a compressed RAM disk is something that would help
these systems operate in a better way (for the benefit of the overall
Linux ecosystem, if you care about that, of course; well, some people
do).

> I really don't want to make fragmentaion of allocator so we should really see
> how zsmalloc cannot achieve things if you are claiming.

I have to say that this point is completely bogus. We do not create
fragmentation by using a better defined and standardized API. In fact,
we aim to increase the number of use cases and test coverage for ZRAM.
With that said, I have hard time seeing how zsmalloc can operate on a
MMU-less system.

> Please tell us how to test it so that we could investigate what's the root
> cause.

I gather you haven't read neither the LPC documents nor my
conversation with Sergey re: these changes, because if you did you
wouldn't have had the type of questions you're asking. Please also see
above.

I feel a bit awkward explaining basic things to you but there may not
be other "root cause" than applicability issue. zsmalloc is a great
allocator but it's not universal and has its limitations. The
(potential) scope for ZRAM is wider than zsmalloc can provide. We are
*helping* _you_ to extend this scope "in real world" (c) and you come
up with bogus objections. Why?

Best regards,
   Vitaly

Minchan Kim Oct. 15, 2019, 8 p.m. UTC | #6

On Tue, Oct 15, 2019 at 09:39:35AM +0200, Vitaly Wool wrote:
> Hi Minchan,
> 
> On Mon, Oct 14, 2019 at 6:41 PM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> > > The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.
> > >
> > > The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.
> > >
> > > The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.
> > >
> > > The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.
> > >
> > > [1] https://lkml.org/lkml/2015/9/14/356
> > > [2] https://linuxplumbersconf.org/event/4/contributions/551/
> >
> > Please describe what's the usecase in real world, what's the benefit zsmalloc
> > cannot fulfill by desgin and how it's significant.
> 
> I'm not entirely sure how to interpret the phrase "the benefit
> zsmalloc cannot fulfill by design" but let me explain.
> First, there are multi multi core systems where z3fold can provide
> better throughput.

Please include number in the description with workload.

> Then, there are low end systems with hardware
> compression/decompression support which don't need zsmalloc
> sophistication and would rather use zbud with ZRAM because the
> compression ratio is relatively low.

I couldn't imagine how it's bad with zsmalloc. Could you be more
specific?

> Finally, there are MMU-less systems targeting IOT and still running
> Linux and having a compressed RAM disk is something that would help
> these systems operate in a better way (for the benefit of the overall
> Linux ecosystem, if you care about that, of course; well, some people
> do).

Could you write down what's the problem to use zsmalloc for MMU-less
system? Maybe, it would be important point rather other performance
argument since other functions's overheads in the callpath are already
rather big.

> 
> > I really don't want to make fragmentaion of allocator so we should really see
> > how zsmalloc cannot achieve things if you are claiming.
> 
> I have to say that this point is completely bogus. We do not create
> fragmentation by using a better defined and standardized API. In fact,
> we aim to increase the number of use cases and test coverage for ZRAM.
> With that said, I have hard time seeing how zsmalloc can operate on a
> MMU-less system.
> 
> > Please tell us how to test it so that we could investigate what's the root
> > cause.
> 
> I gather you haven't read neither the LPC documents nor my
> conversation with Sergey re: these changes, because if you did you
> wouldn't have had the type of questions you're asking. Please also see
> above.

Please include your claims in the description rather than attaching
file. That's the usualy way how we work because it could make easier to
discuss by inline.

> 
> I feel a bit awkward explaining basic things to you but there may not
> be other "root cause" than applicability issue. zsmalloc is a great
> allocator but it's not universal and has its limitations. The
> (potential) scope for ZRAM is wider than zsmalloc can provide. We are
> *helping* _you_ to extend this scope "in real world" (c) and you come
> up with bogus objections. Why?

Please add more detail to convince so we need to think over why zsmalloc
cannot be improved for the usecase.

Vitaly Wool Oct. 21, 2019, 2:21 p.m. UTC | #7

On Tue, Oct 15, 2019 at 10:00 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Tue, Oct 15, 2019 at 09:39:35AM +0200, Vitaly Wool wrote:
> > Hi Minchan,
> >
> > On Mon, Oct 14, 2019 at 6:41 PM Minchan Kim <minchan@kernel.org> wrote:
> > >
> > > On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> > > > The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.
> > > >
> > > > The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.
> > > >
> > > > The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.
> > > >
> > > > The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.
> > > >
> > > > [1] https://lkml.org/lkml/2015/9/14/356
> > > > [2] https://linuxplumbersconf.org/event/4/contributions/551/
> > >
> > > Please describe what's the usecase in real world, what's the benefit zsmalloc
> > > cannot fulfill by desgin and how it's significant.
> >
> > I'm not entirely sure how to interpret the phrase "the benefit
> > zsmalloc cannot fulfill by design" but let me explain.
> > First, there are multi multi core systems where z3fold can provide
> > better throughput.
>
> Please include number in the description with workload.

Sure. So on an HMP 8-core ARM64 system with ZRAM, we run the following command:
fio --bs=4k --randrepeat=1 --randseed=100 --refill_buffers \
    --buffer_compress_percentage=50 --scramble_buffers=1 \
    --direct=1 --loops=15 --numjobs=4 --filename=/dev/block/zram0 \
     --name=seq-write --rw=write --stonewall --name=seq-read \
     --rw=read --stonewall --name=seq-readwrite --rw=rw --stonewall \
     --name=rand-readwrite --rw=randrw --stonewall

The results are the following:

zsmalloc:
Run status group 0 (all jobs):
  WRITE: io=61440MB, aggrb=1680.4MB/s, minb=430167KB/s,
maxb=440590KB/s, mint=35699msec, maxt=36564msec

Run status group 1 (all jobs):
   READ: io=61440MB, aggrb=1620.4MB/s, minb=414817KB/s,
maxb=414850KB/s, mint=37914msec, maxt=37917msec

Run status group 2 (all jobs):
  READ: io=30615MB, aggrb=897979KB/s, minb=224494KB/s,
maxb=228161KB/s, mint=34351msec, maxt=34912msec
  WRITE: io=30825MB, aggrb=904110KB/s, minb=226027KB/s,
maxb=229718KB/s, mint=34351msec, maxt=34912msec

Run status group 3 (all jobs):
   READ: io=30615MB, aggrb=772002KB/s, minb=193000KB/s,
maxb=193010KB/s, mint=40607msec, maxt=40609msec
  WRITE: io=30825MB, aggrb=777273KB/s, minb=194318KB/s,
maxb=194327KB/s, mint=40607msec, maxt=40609msec

z3fold:
Run status group 0 (all jobs):
  WRITE: io=61440MB, aggrb=1224.8MB/s, minb=313525KB/s,
maxb=329941KB/s, mint=47671msec, maxt=50167msec

Run status group 1 (all jobs):
   READ: io=61440MB, aggrb=3119.3MB/s, minb=798529KB/s,
maxb=862883KB/s, mint=18228msec, maxt=19697msec

Run status group 2 (all jobs):
   READ: io=30615MB, aggrb=937283KB/s, minb=234320KB/s,
maxb=234334KB/s, mint=33446msec, maxt=33448msec
  WRITE: io=30825MB, aggrb=943682KB/s, minb=235920KB/s,
maxb=235934KB/s, mint=33446msec, maxt=33448msec

Run status group 3 (all jobs):
   READ: io=30615MB, aggrb=829591KB/s, minb=207397KB/s,
maxb=210285KB/s, mint=37271msec, maxt=37790msec
  WRITE: io=30825MB, aggrb=835255KB/s, minb=208813KB/s,
maxb=211721KB/s, mint=37271msec, maxt=37790msec

So, z3fold is faster everywhere (including being *two* times faster on
read) except for sequential write which is the least important use
case in real world.

> > Then, there are low end systems with hardware
> > compression/decompression support which don't need zsmalloc
> > sophistication and would rather use zbud with ZRAM because the
> > compression ratio is relatively low.
>
> I couldn't imagine how it's bad with zsmalloc. Could you be more
> specific?


> > Finally, there are MMU-less systems targeting IOT and still running
> > Linux and having a compressed RAM disk is something that would help
> > these systems operate in a better way (for the benefit of the overall
> > Linux ecosystem, if you care about that, of course; well, some people
> > do).
>
> Could you write down what's the problem to use zsmalloc for MMU-less
> system? Maybe, it would be important point rather other performance
> argument since other functions's overheads in the callpath are already
> rather big.

Well, I assume you had the reasons to make zsmalloc depend on MMU in Kconfig:
...
config ZSMALLOC
    tristate "Memory allocator for compressed pages"
    depends on MMU
    help
...

But even disregarding that, let's compare ZRAM/zbud and ZRAM/zsmalloc
performance and memory these two consume on a relatively low end
2-core ARM.
Command:
fio --bs=4k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 \
        --direct=1 --loops=15 --numjobs=2 --filename=/dev/block/zram0 \
        --name=seq-write --rw=write --stonewall --name=seq-read --rw=read \
        --stonewall --name=seq-readwrite --rw=rw --stonewall
--name=rand-readwrite \
        --rw=randrw --stonewall

zsmalloc:
Run status group 0 (all jobs):
  WRITE: io=30720MB, aggrb=374763KB/s, minb=187381KB/s,
maxb=188389KB/s, mint=83490msec, maxt=83939msec

Run status group 1 (all jobs):
   READ: io=30720MB, aggrb=964000KB/s, minb=482000KB/s,
maxb=482015KB/s, mint=32631msec, maxt=32632msec

Run status group 2 (all jobs):
   READ: io=15308MB, aggrb=431263KB/s, minb=215631KB/s,
maxb=215898KB/s, mint=36302msec, maxt=36347msec
  WRITE: io=15412MB, aggrb=434207KB/s, minb=217103KB/s,
maxb=217373KB/s, mint=36302msec, maxt=36347msec

Run status group 3 (all jobs):
   READ: io=15308MB, aggrb=327328KB/s, minb=163664KB/s,
maxb=163667KB/s, mint=47887msec, maxt=47888msec
  WRITE: io=15412MB, aggrb=329563KB/s, minb=164781KB/s,
maxb=164785KB/s, mint=47887msec, maxt=47888msec

zbud:
Run status group 0 (all jobs):
  WRITE: io=30720MB, aggrb=735980KB/s, minb=367990KB/s,
maxb=373079KB/s, mint=42159msec, maxt=42742msec

Run status group 1 (all jobs):
   READ: io=30720MB, aggrb=927915KB/s, minb=463957KB/s,
maxb=463999KB/s, mint=33898msec, maxt=33901msec

Run status group 2 (all jobs):
   READ: io=15308MB, aggrb=403467KB/s, minb=201733KB/s,
maxb=202051KB/s, mint=38790msec, maxt=38851msec
  WRITE: io=15412MB, aggrb=406222KB/s, minb=203111KB/s,
maxb=203430KB/s, mint=38790msec, maxt=38851msec

Run status group 3 (all jobs):
   READ: io=15308MB, aggrb=334967KB/s, minb=167483KB/s,
maxb=167487KB/s, mint=46795msec, maxt=46796msec
  WRITE: io=15412MB, aggrb=337254KB/s, minb=168627KB/s,
maxb=168630KB/s, mint=46795msec, maxt=46796msec

Pretty equal except for sequential write which is twice as good with zbud.

Now to the fun part.
zsmalloc:
  0 .text         00002908  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
zbud:
  0 .text         0000072c  0000000000000000  0000000000000000  00000040  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

And this does not cover dynamic memory allocation overhead which is
higher for zsmalloc. So once again, given that the compression ratio
is low (e. g. a simple HW accelerator is used), what would most
unbiased people prefer to use in this case?

> > > I really don't want to make fragmentaion of allocator so we should really see
> > > how zsmalloc cannot achieve things if you are claiming.
> >
> > I have to say that this point is completely bogus. We do not create
> > fragmentation by using a better defined and standardized API. In fact,
> > we aim to increase the number of use cases and test coverage for ZRAM.
> > With that said, I have hard time seeing how zsmalloc can operate on a
> > MMU-less system.
> >
> > > Please tell us how to test it so that we could investigate what's the root
> > > cause.
> >
> > I gather you haven't read neither the LPC documents nor my
> > conversation with Sergey re: these changes, because if you did you
> > wouldn't have had the type of questions you're asking. Please also see
> > above.
>
> Please include your claims in the description rather than attaching
> file. That's the usualy way how we work because it could make easier to
> discuss by inline.

Did I attach something? I don't quite recall that. I posted links to
previous discussions and conference materials, each for a reason.

> >
> > I feel a bit awkward explaining basic things to you but there may not
> > be other "root cause" than applicability issue. zsmalloc is a great
> > allocator but it's not universal and has its limitations. The
> > (potential) scope for ZRAM is wider than zsmalloc can provide. We are
> > *helping* _you_ to extend this scope "in real world" (c) and you come
> > up with bogus objections. Why?
>
> Please add more detail to convince so we need to think over why zsmalloc
> cannot be improved for the usecase.

This approach is wrong. zsmalloc is good enough and covers a lot of
use cases but there are still some where it doesn't work that well by
design. E. g. on an XIP system we do care about the code size since
it's stored uncompressed but still want to use ZRAM. Why would we want
to waste almost 10K just on zsmalloc code if the counterpart (zbud in
that case) works better?

Best regards,
   Vitaly

Minchan Kim Oct. 30, 2019, 12:10 a.m. UTC | #8

On Mon, Oct 21, 2019 at 04:21:21PM +0200, Vitaly Wool wrote:
> On Tue, Oct 15, 2019 at 10:00 PM Minchan Kim <minchan@kernel.org> wrote:
> >
> > On Tue, Oct 15, 2019 at 09:39:35AM +0200, Vitaly Wool wrote:
> > > Hi Minchan,
> > >
> > > On Mon, Oct 14, 2019 at 6:41 PM Minchan Kim <minchan@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 10, 2019 at 11:04:14PM +0300, Vitaly Wool wrote:
> > > > > The coming patchset is a new take on the old issue: ZRAM can currently be used only with zsmalloc even though this may not be the optimal combination for some configurations. The previous (unsuccessful) attempt dates back to 2015 [1] and is notable for the heated discussions it has caused.
> > > > >
> > > > > The patchset in [1] had basically the only goal of enabling ZRAM/zbud combo which had a very narrow use case. Things have changed substantially since then, and now, with z3fold used widely as a zswap backend, I, as the z3fold maintainer, am getting requests to re-interate on making it possible to use ZRAM with any zpool-compatible backend, first of all z3fold.
> > > > >
> > > > > The preliminary results for this work have been delivered at Linux Plumbers this year [2]. The talk at LPC, though having attracted limited interest, ended in a consensus to continue the work and pursue the goal of decoupling ZRAM from zsmalloc.
> > > > >
> > > > > The current patchset has been stress tested on arm64 and x86_64 devices, including the Dell laptop I'm writing this message on now, not to mention several QEmu confugirations.
> > > > >
> > > > > [1] https://lkml.org/lkml/2015/9/14/356
> > > > > [2] https://linuxplumbersconf.org/event/4/contributions/551/
> > > >
> > > > Please describe what's the usecase in real world, what's the benefit zsmalloc
> > > > cannot fulfill by desgin and how it's significant.
> > >
> > > I'm not entirely sure how to interpret the phrase "the benefit
> > > zsmalloc cannot fulfill by design" but let me explain.
> > > First, there are multi multi core systems where z3fold can provide
> > > better throughput.
> >
> > Please include number in the description with workload.
> 
> Sure. So on an HMP 8-core ARM64 system with ZRAM, we run the following command:
> fio --bs=4k --randrepeat=1 --randseed=100 --refill_buffers \
>     --buffer_compress_percentage=50 --scramble_buffers=1 \
>     --direct=1 --loops=15 --numjobs=4 --filename=/dev/block/zram0 \
>      --name=seq-write --rw=write --stonewall --name=seq-read \
>      --rw=read --stonewall --name=seq-readwrite --rw=rw --stonewall \
>      --name=rand-readwrite --rw=randrw --stonewall
> 
> The results are the following:
> 
> zsmalloc:
> Run status group 0 (all jobs):
>   WRITE: io=61440MB, aggrb=1680.4MB/s, minb=430167KB/s,
> maxb=440590KB/s, mint=35699msec, maxt=36564msec
> 
> Run status group 1 (all jobs):
>    READ: io=61440MB, aggrb=1620.4MB/s, minb=414817KB/s,
> maxb=414850KB/s, mint=37914msec, maxt=37917msec
> 
> Run status group 2 (all jobs):
>   READ: io=30615MB, aggrb=897979KB/s, minb=224494KB/s,
> maxb=228161KB/s, mint=34351msec, maxt=34912msec
>   WRITE: io=30825MB, aggrb=904110KB/s, minb=226027KB/s,
> maxb=229718KB/s, mint=34351msec, maxt=34912msec
> 
> Run status group 3 (all jobs):
>    READ: io=30615MB, aggrb=772002KB/s, minb=193000KB/s,
> maxb=193010KB/s, mint=40607msec, maxt=40609msec
>   WRITE: io=30825MB, aggrb=777273KB/s, minb=194318KB/s,
> maxb=194327KB/s, mint=40607msec, maxt=40609msec
> 
> z3fold:
> Run status group 0 (all jobs):
>   WRITE: io=61440MB, aggrb=1224.8MB/s, minb=313525KB/s,
> maxb=329941KB/s, mint=47671msec, maxt=50167msec
> 
> Run status group 1 (all jobs):
>    READ: io=61440MB, aggrb=3119.3MB/s, minb=798529KB/s,
> maxb=862883KB/s, mint=18228msec, maxt=19697msec
> 
> Run status group 2 (all jobs):
>    READ: io=30615MB, aggrb=937283KB/s, minb=234320KB/s,
> maxb=234334KB/s, mint=33446msec, maxt=33448msec
>   WRITE: io=30825MB, aggrb=943682KB/s, minb=235920KB/s,
> maxb=235934KB/s, mint=33446msec, maxt=33448msec
> 
> Run status group 3 (all jobs):
>    READ: io=30615MB, aggrb=829591KB/s, minb=207397KB/s,
> maxb=210285KB/s, mint=37271msec, maxt=37790msec
>   WRITE: io=30825MB, aggrb=835255KB/s, minb=208813KB/s,
> maxb=211721KB/s, mint=37271msec, maxt=37790msec
> 
> So, z3fold is faster everywhere (including being *two* times faster on
> read) except for sequential write which is the least important use
> case in real world.

No. write is also important because it affects reclaim speed.

I ran fio on x86 with various compression sizes.

left is zsmalloc. right is z3fold

The operation order is
	seq-write
	rand-write
	seq-read
	rand-read
	mixed-seq
	mixed-rand
	trim
	mem_used - byte unit

Last column mem_used is to indicate how many allocator used the memory
to store compressed page

1) compression ratio 75

     WRITE     2535          WRITE     1928
     WRITE     2425          WRITE     1886
      READ     6211           READ     5731
      READ     6339           READ     6182
      READ     1791           READ     1592
     WRITE     1790          WRITE     1591
      READ     1704           READ     1493
     WRITE     1699          WRITE     1489
     WRITE      984          WRITE      974
      TRIM      984           TRIM      974
  mem_used 29986816       mem_used 61239296

For every operation, zsmalloc is faster than z3fold.
Even, it used the 1/2 memory compared to z3fold.

2) compression ratio 66

     WRITE     2125          WRITE     1258
     WRITE     2107          WRITE     1233
      READ     5714           READ     5793
      READ     5948           READ     6065
      READ     1667           READ     1248
     WRITE     1666          WRITE     1247
      READ     1521           READ     1218
     WRITE     1517          WRITE     1215
     WRITE      943          WRITE      870
      TRIM      943           TRIM      870
  mem_used 38158336       mem_used 76779520

For only read operation, z3fold is a bit faster than zsmalloc about 2%.
However, look at other operations which zsmalloc is much faster.
Even, look at used memory.

3) compression ratio 50

     WRITE     2051          WRITE     1109
     WRITE     2029          WRITE     1087
      READ     5366           READ     6364
      READ     5575           READ     5785
      READ     1497           READ     1121
     WRITE     1496          WRITE     1121
      READ     1432           READ     1065
     WRITE     1428          WRITE     1062
     WRITE      930          WRITE      838
      TRIM      930           TRIM      838
  mem_used 59932672       mem_used 104873984

sequential read on z3fold is faster about 15%. However, look at other
operations and used memory. zsmalloc is better.

Why zsmalloc is slow for 50% compression ratio is it needs page copy
for every read operation since compressed objects cross over page boundary.
However, I don't think it's real workload because compressed ratio will
spread out into various sizes. Having said that, I could enhance zsmalloc
to avoid the copy operation. I will work on it.

4) compression ratio 33

     WRITE     1945          WRITE     1239
     WRITE     1869          WRITE     1222
      READ     5319           READ     6206
      READ     5416           READ     6645
      READ     1480           READ     1188
     WRITE     1479          WRITE     1188
      READ     1403           READ     1114
     WRITE     1399          WRITE     1110
     WRITE      930          WRITE      793
      TRIM      930           TRIM      793
  mem_used 78667776       mem_used 104873984

5) compression ratio 25

     WRITE     1862          WRITE     1080
     WRITE     1840          WRITE     1052
      READ     5260           READ     6240
      READ     5540           READ     6359
      READ     1445           READ     1040
     WRITE     1444          WRITE     1039
      READ     1354           READ     1006
     WRITE     1350          WRITE     1003
     WRITE      909          WRITE      775
      TRIM      909           TRIM      775
  mem_used 83902464       mem_used 104873984

If compress ratio is bad, zram read operation with zsmalloc could
be slower about 15% than z3fold because it needs additional memory
copy as I mentioned. However, it's still faster if compression ratio
ig greater than 50%, which is usual case(I believe that's why you
makes z3fold).

> 
> > > Then, there are low end systems with hardware
> > > compression/decompression support which don't need zsmalloc
> > > sophistication and would rather use zbud with ZRAM because the
> > > compression ratio is relatively low.
> >
> > I couldn't imagine how it's bad with zsmalloc. Could you be more
> > specific?
> 
> 
> > > Finally, there are MMU-less systems targeting IOT and still running
> > > Linux and having a compressed RAM disk is something that would help
> > > these systems operate in a better way (for the benefit of the overall
> > > Linux ecosystem, if you care about that, of course; well, some people
> > > do).
> >
> > Could you write down what's the problem to use zsmalloc for MMU-less
> > system? Maybe, it would be important point rather other performance
> > argument since other functions's overheads in the callpath are already
> > rather big.
> 
> Well, I assume you had the reasons to make zsmalloc depend on MMU in Kconfig:
> ...
> config ZSMALLOC
>     tristate "Memory allocator for compressed pages"
>     depends on MMU
>     help
> ...

It's old piece left since zsmalloc used mapping API so I think we could
remove the dependency now. However, I want to know it's the only problem
to use zram in MMU-less system. IOW, if we could remove the zsmalloc MMU
dependency, it's ready to use zram on MMU-less system now?

> 
> But even disregarding that, let's compare ZRAM/zbud and ZRAM/zsmalloc
> performance and memory these two consume on a relatively low end
> 2-core ARM.
> Command:
> fio --bs=4k --randrepeat=1 --randseed=100 --refill_buffers
> --scramble_buffers=1 \
>         --direct=1 --loops=15 --numjobs=2 --filename=/dev/block/zram0 \
>         --name=seq-write --rw=write --stonewall --name=seq-read --rw=read \
>         --stonewall --name=seq-readwrite --rw=rw --stonewall
> --name=rand-readwrite \
>         --rw=randrw --stonewall
> 
> zsmalloc:
> Run status group 0 (all jobs):
>   WRITE: io=30720MB, aggrb=374763KB/s, minb=187381KB/s,
> maxb=188389KB/s, mint=83490msec, maxt=83939msec
> 
> Run status group 1 (all jobs):
>    READ: io=30720MB, aggrb=964000KB/s, minb=482000KB/s,
> maxb=482015KB/s, mint=32631msec, maxt=32632msec
> 
> Run status group 2 (all jobs):
>    READ: io=15308MB, aggrb=431263KB/s, minb=215631KB/s,
> maxb=215898KB/s, mint=36302msec, maxt=36347msec
>   WRITE: io=15412MB, aggrb=434207KB/s, minb=217103KB/s,
> maxb=217373KB/s, mint=36302msec, maxt=36347msec
> 
> Run status group 3 (all jobs):
>    READ: io=15308MB, aggrb=327328KB/s, minb=163664KB/s,
> maxb=163667KB/s, mint=47887msec, maxt=47888msec
>   WRITE: io=15412MB, aggrb=329563KB/s, minb=164781KB/s,
> maxb=164785KB/s, mint=47887msec, maxt=47888msec
> 
> zbud:
> Run status group 0 (all jobs):
>   WRITE: io=30720MB, aggrb=735980KB/s, minb=367990KB/s,
> maxb=373079KB/s, mint=42159msec, maxt=42742msec
> 
> Run status group 1 (all jobs):
>    READ: io=30720MB, aggrb=927915KB/s, minb=463957KB/s,
> maxb=463999KB/s, mint=33898msec, maxt=33901msec
> 
> Run status group 2 (all jobs):
>    READ: io=15308MB, aggrb=403467KB/s, minb=201733KB/s,
> maxb=202051KB/s, mint=38790msec, maxt=38851msec
>   WRITE: io=15412MB, aggrb=406222KB/s, minb=203111KB/s,
> maxb=203430KB/s, mint=38790msec, maxt=38851msec
> 
> Run status group 3 (all jobs):
>    READ: io=15308MB, aggrb=334967KB/s, minb=167483KB/s,
> maxb=167487KB/s, mint=46795msec, maxt=46796msec
>   WRITE: io=15412MB, aggrb=337254KB/s, minb=168627KB/s,
> maxb=168630KB/s, mint=46795msec, maxt=46796msec
> 
> Pretty equal except for sequential write which is twice as good with zbud.

Thanks for the testing. I also tried to test zbud with zram but failed because fio
submit incompressible pages to zram even though it specifiy compress ratio 100%
However, zbud doesn't support 4K page allocation so zram couldn't work on it
at this moment. I tried various fio versions as well as old but everything failed.

How did you test it successfully? Let me know your fio version.
I want to investigate what's the performance bottleneck beside page copy
so that I will optimize it.

> 
> Now to the fun part.
> zsmalloc:
>   0 .text         00002908  0000000000000000  0000000000000000  00000040  2**2
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> zbud:
>   0 .text         0000072c  0000000000000000  0000000000000000  00000040  2**2
>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> 
> And this does not cover dynamic memory allocation overhead which is
> higher for zsmalloc. So once again, given that the compression ratio
> is low (e. g. a simple HW accelerator is used), what would most
> unbiased people prefer to use in this case?

Zsmalloc has more features than zbud. That's why you see the code size
difference. It was intentional because at that time most of users were
mobile phones, TV and other smart devices. They needed those features.

We could make those feature turned off at build time, which will improve
performance and reduce code size a lot. It would be no problem if the
user wanted to use zbud which is alredy lacking of those features.

> 
> > > > I really don't want to make fragmentaion of allocator so we should really see
> > > > how zsmalloc cannot achieve things if you are claiming.
> > >
> > > I have to say that this point is completely bogus. We do not create
> > > fragmentation by using a better defined and standardized API. In fact,
> > > we aim to increase the number of use cases and test coverage for ZRAM.
> > > With that said, I have hard time seeing how zsmalloc can operate on a
> > > MMU-less system.
> > >
> > > > Please tell us how to test it so that we could investigate what's the root
> > > > cause.
> > >
> > > I gather you haven't read neither the LPC documents nor my
> > > conversation with Sergey re: these changes, because if you did you
> > > wouldn't have had the type of questions you're asking. Please also see
> > > above.
> >
> > Please include your claims in the description rather than attaching
> > file. That's the usualy way how we work because it could make easier to
> > discuss by inline.
> 
> Did I attach something? I don't quite recall that. I posted links to
> previous discussions and conference materials, each for a reason.
> 
> > >
> > > I feel a bit awkward explaining basic things to you but there may not
> > > be other "root cause" than applicability issue. zsmalloc is a great
> > > allocator but it's not universal and has its limitations. The
> > > (potential) scope for ZRAM is wider than zsmalloc can provide. We are
> > > *helping* _you_ to extend this scope "in real world" (c) and you come
> > > up with bogus objections. Why?
> >
> > Please add more detail to convince so we need to think over why zsmalloc
> > cannot be improved for the usecase.
> 
> This approach is wrong. zsmalloc is good enough and covers a lot of
> use cases but there are still some where it doesn't work that well by
> design. E. g. on an XIP system we do care about the code size since
> it's stored uncompressed but still want to use ZRAM. Why would we want
> to waste almost 10K just on zsmalloc code if the counterpart (zbud in
> that case) works better?

As I mentiond, we could improve zsmalloc to reduce code size as well as
performance. I will work on it.

Vitaly Wool Nov. 13, 2019, 3:54 p.m. UTC | #9

Hi Minchan,

On Wed, Oct 30, 2019 at 1:10 AM Minchan Kim <minchan@kernel.org> wrote:

<snip>
>
>
> I ran fio on x86 with various compression sizes.
>
> left is zsmalloc. right is z3fold
>
> The operation order is
>         seq-write
>         rand-write
>         seq-read
>         rand-read
>         mixed-seq
>         mixed-rand
>         trim
>         mem_used - byte unit
>
> Last column mem_used is to indicate how many allocator used the memory
> to store compressed page
>
> 1) compression ratio 75
>
>      WRITE     2535          WRITE     1928
>      WRITE     2425          WRITE     1886
>       READ     6211           READ     5731
>       READ     6339           READ     6182
>       READ     1791           READ     1592
>      WRITE     1790          WRITE     1591
>       READ     1704           READ     1493
>      WRITE     1699          WRITE     1489
>      WRITE      984          WRITE      974
>       TRIM      984           TRIM      974
>   mem_used 29986816       mem_used 61239296
>
> For every operation, zsmalloc is faster than z3fold.
> Even, it used the 1/2 memory compared to z3fold.
>
> 2) compression ratio 66
>
>      WRITE     2125          WRITE     1258
>      WRITE     2107          WRITE     1233
>       READ     5714           READ     5793
>       READ     5948           READ     6065
>       READ     1667           READ     1248
>      WRITE     1666          WRITE     1247
>       READ     1521           READ     1218
>      WRITE     1517          WRITE     1215
>      WRITE      943          WRITE      870
>       TRIM      943           TRIM      870
>   mem_used 38158336       mem_used 76779520
>
> For only read operation, z3fold is a bit faster than zsmalloc about 2%.
> However, look at other operations which zsmalloc is much faster.
> Even, look at used memory.
>
> 3) compression ratio 50
>
>      WRITE     2051          WRITE     1109
>      WRITE     2029          WRITE     1087
>       READ     5366           READ     6364
>       READ     5575           READ     5785
>       READ     1497           READ     1121
>      WRITE     1496          WRITE     1121
>       READ     1432           READ     1065
>      WRITE     1428          WRITE     1062
>      WRITE      930          WRITE      838
>       TRIM      930           TRIM      838
>   mem_used 59932672       mem_used 104873984
>
> sequential read on z3fold is faster about 15%. However, look at other
> operations and used memory. zsmalloc is better.

There are two things to this: the measurements you've taken as such
and how they are relevant to this discussion.
I'd be happy to discuss these measurements in a separate thread if you
specified more precisely what kind of x86 the measurements were taken
on.

However, my point was that there are rather common cases when people
want to use z3fold as a zRAM memory allocation backend. The fact that
there are other cases when people wouldn't want that is pretty natural
and doesn't need a proof.
That's why I propose to use ZRAM over zpool API for the sake of
flexibility. That would benefit various users of ZRAM and, at the end
of the day, the Linux kernel ecosystem.

<snip>
> Thanks for the testing. I also tried to test zbud with zram but failed because fio
> submit incompressible pages to zram even though it specifiy compress ratio 100%
> However, zbud doesn't support 4K page allocation so zram couldn't work on it
> at this moment. I tried various fio versions as well as old but everything failed.
>
> How did you test it successfully? Let me know your fio version.
> I want to investigate what's the performance bottleneck beside page copy
> so that I will optimize it.

You're very welcome. :) The patch to make zbud accept PAGE_SIZE pages
has been posted a while ago [1] and it was a part of our previous
(pre-z3fold) discussion on the same subject but you probably haven't
read it then.

> >
> > Now to the fun part.
> > zsmalloc:
> >   0 .text         00002908  0000000000000000  0000000000000000  00000040  2**2
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> > zbud:
> >   0 .text         0000072c  0000000000000000  0000000000000000  00000040  2**2
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
> >
> > And this does not cover dynamic memory allocation overhead which is
> > higher for zsmalloc. So once again, given that the compression ratio
> > is low (e. g. a simple HW accelerator is used), what would most
> > unbiased people prefer to use in this case?
>
> Zsmalloc has more features than zbud. That's why you see the code size
> difference. It was intentional because at that time most of users were
> mobile phones, TV and other smart devices. They needed those features.
>
> We could make those feature turned off at build time, which will improve
> performance and reduce code size a lot. It would be no problem if the
> user wanted to use zbud which is alredy lacking of those features.

I do support this idea and would like to help as much as I can, but
why should the people who want to use ZRAM/zbud combo be left stranded
while we're working on reducing the zsmalloc code size by 4x?

With that said, let me also re-iterate that there may be more
allocators coming, and in some cases zsmalloc won't be a good
fit/alternative while there will be still a need for a compressed RAM
device. I hope you understand.

Best regards,
   Vitaly

[1] https://lore.kernel.org/patchwork/patch/598210/

[0/3] Allow ZRAM to use any zpool-compatible backend

Message

Comments