mbox series

[RFC,0/2] zram: objects charge to mem_cgroup

Message ID 20230707044613.1169103-1-hezhongkun.hzk@bytedance.com (mailing list archive)
Headers show
Series zram: objects charge to mem_cgroup | expand

Message

Zhongkun He July 7, 2023, 4:46 a.m. UTC
This is a new solution to charge ZRAM objects,more simple than
previous one[1],The compressed RAM is currently charged to
kernel,not to any memory cgroup.

As we know, zram can be used in two ways, direct and
indirect, this patchset can charge memory in both cases.
Direct zram usage by process within a cgroup will fail
to charge if there is no memory. Indirect zram usage by
process within a cgroup via swap in PF_MEMALLOC context,
will charge successfully.

[1]
https://lore.kernel.org/all/20230615034830.1361853-1-hezhongkun.hzk@bytedance.com/

Zhongkun He (2):
  memcg: Add support for zram object charge
  zram: charge the compressed RAM to the page's memcgroup

 drivers/block/zram/zram_drv.c | 43 +++++++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h |  1 +
 include/linux/memcontrol.h    | 10 ++++++++
 mm/memcontrol.c               | 23 +++++++++++++++++++
 4 files changed, 77 insertions(+)

Comments

Michal Hocko July 7, 2023, 7:57 a.m. UTC | #1
On Fri 07-07-23 12:46:13, Zhongkun He wrote:
> This is a new solution to charge ZRAM objects,more simple than
> previous one[1],The compressed RAM is currently charged to
> kernel,not to any memory cgroup.
>
> As we know, zram can be used in two ways, direct and
> indirect, this patchset can charge memory in both cases.
> Direct zram usage by process within a cgroup will fail
> to charge if there is no memory. Indirect zram usage by
> process within a cgroup via swap in PF_MEMALLOC context,
> will charge successfully.

Please state the objective you are trying to achieve by this patchset.
It is always good to summarize the previous discussion and mention what
is done differently or how previous review feedback has been addressed
but the overall idea/purpose should be always explicit. 

Please elaborate more about both.
 
> [1]
> https://lore.kernel.org/all/20230615034830.1361853-1-hezhongkun.hzk@bytedance.com/
> 
> Zhongkun He (2):
>   memcg: Add support for zram object charge
>   zram: charge the compressed RAM to the page's memcgroup
> 
>  drivers/block/zram/zram_drv.c | 43 +++++++++++++++++++++++++++++++++++
>  drivers/block/zram/zram_drv.h |  1 +
>  include/linux/memcontrol.h    | 10 ++++++++
>  mm/memcontrol.c               | 23 +++++++++++++++++++
>  4 files changed, 77 insertions(+)
> 
> -- 
> 2.25.1
Zhongkun He July 7, 2023, 2:25 p.m. UTC | #2
> Please state the objective you are trying to achieve by this patchset.
> It is always good to summarize the previous discussion and mention what
> is done differently or how previous review feedback has been addressed
> but the overall idea/purpose should be always explicit.
>
> Please elaborate more about both.
>

Thanks for your reply.
objective:
the compressed memory of zram charge to the cgroup of the user.

summarize the previous discussion:
[1] As I can see, Michal's concern is that the charges are going to fail
and swapout would fail.

The indirect use of zram is in the context of PF_MEMALLOC, so
the charge must be successful.

[2] David's concern is that if there is a page in the BIO that is not charged,
we can not charge the compressed page for the fs->zram and whether
the recompress case is charged.

In the new solution, the recompress case can be charged. But if there is
a BIO not charged, I'm not so sure, to be honest.it would be great if
someone with more FS->BIO experience could comment.
I have to review the relevant code to check it.

[3] Yosry's concern is that the previous patchsets are very complicated,
and do not have wirteback support in zram.

For the new patchset, charging becomes very simple, and zram supports
writing back to disk.At the same time, I am hoping the use of zswap
without a backing swapfile can be enabled.

To summarize the new patchset:
We charge compressed memory directly in the zram module instead of in
the zsmalloc module because zsmallc may be used by zswap, zswap objects
has been charged once in the zswap module, so zsmallc will double charge.
Michal Hocko July 7, 2023, 2:44 p.m. UTC | #3
On Fri 07-07-23 22:25:48, 贺中坤 wrote:
> > Please state the objective you are trying to achieve by this patchset.
> > It is always good to summarize the previous discussion and mention what
> > is done differently or how previous review feedback has been addressed
> > but the overall idea/purpose should be always explicit.
> >
> > Please elaborate more about both.
> >
> 
> Thanks for your reply.
> objective:
> the compressed memory of zram charge to the cgroup of the user.

Why do we want/need that?

> summarize the previous discussion:
> [1] As I can see, Michal's concern is that the charges are going to fail
> and swapout would fail.
> 
> The indirect use of zram is in the context of PF_MEMALLOC, so
> the charge must be successful.

No, this was not my concern. Please read through that more carefully. My
concern was that the hard limit reclaim would fail. PF_MEMALLOC will not
help in that case as this is not a global reclaim path.

Also let's assume you allow swapout charges to succeed similar to
PF_MEMALLOC. That would mean breaching the limit in an unbounded way,
no?
Zhongkun He July 10, 2023, 9:35 a.m. UTC | #4
On Fri, Jul 7, 2023 at 10:44 PM Michal Hocko <mhocko@suse.com> wrote:
>
> Why do we want/need that?

Applications can currently escape their cgroup memory containment when
zram is enabled regardless of indirect(swapfile) or direct usage(disk) .
This patch adds memcg accounting to fix it.

Zram and zswap have the same problem,please refer to the patch
corresponding to zswap[1].

[1] https://lore.kernel.org/all/20220510152847.230957-7-hannes@cmpxchg.org/

>
> > summarize the previous discussion:
> > [1] As I can see, Michal's concern is that the charges are going to fail
> > and swapout would fail.
> >
> > The indirect use of zram is in the context of PF_MEMALLOC, so
> > the charge must be successful.
>
> No, this was not my concern. Please read through that more carefully. My
> concern was that the hard limit reclaim would fail. PF_MEMALLOC will not
> help in that case as this is not a global reclaim path.
>

Sorry for my expression. I mean the hard limit reclaim would fail.
As i can see, the PF_MEMALLOC is not only used in global reclaim path
but the mem_cgroup reclaim.

try_charge_memcg
  try_to_free_mem_cgroup_pages
     noreclaim_flag = memalloc_noreclaim_save();
     nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
     memalloc_noreclaim_restore(noreclaim_flag);

> Also let's assume you allow swapout charges to succeed similar to
> PF_MEMALLOC. That would mean breaching the limit in an unbounded way,
> no?
> --

Chage compressed page once, mean a page will be freed. the size of compressed
page is less than or equal to the page to be freed. So not an unbounded way.

> Michal Hocko
> SUSE Labs
Michal Hocko July 10, 2023, 10:41 a.m. UTC | #5
On Mon 10-07-23 17:35:07, 贺中坤 wrote:
> On Fri, Jul 7, 2023 at 10:44 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > Why do we want/need that?
> 
> Applications can currently escape their cgroup memory containment when
> zram is enabled regardless of indirect(swapfile) or direct usage(disk) .
> This patch adds memcg accounting to fix it.
> 
> Zram and zswap have the same problem,please refer to the patch
> corresponding to zswap[1].
> 
> [1] https://lore.kernel.org/all/20220510152847.230957-7-hannes@cmpxchg.org/
> 
> >
> > > summarize the previous discussion:
> > > [1] As I can see, Michal's concern is that the charges are going to fail
> > > and swapout would fail.
> > >
> > > The indirect use of zram is in the context of PF_MEMALLOC, so
> > > the charge must be successful.
> >
> > No, this was not my concern. Please read through that more carefully. My
> > concern was that the hard limit reclaim would fail. PF_MEMALLOC will not
> > help in that case as this is not a global reclaim path.
> >
> 
> Sorry for my expression. I mean the hard limit reclaim would fail.
> As i can see, the PF_MEMALLOC is not only used in global reclaim path
> but the mem_cgroup reclaim.
> 
> try_charge_memcg
>   try_to_free_mem_cgroup_pages
>      noreclaim_flag = memalloc_noreclaim_save();
>      nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
>      memalloc_noreclaim_restore(noreclaim_flag);

My bad, I have overlooked this. I forgot about 89a2848381b5f. 

> > Also let's assume you allow swapout charges to succeed similar to
> > PF_MEMALLOC. That would mean breaching the limit in an unbounded way,
> > no?
> > --
> 
> Chage compressed page once, mean a page will be freed. the size of compressed
> page is less than or equal to the page to be freed. So not an unbounded way.

OK, this is an important detail to mention. Also have tried to get some
numbers of how much excess is happening for a mixed bag of compressible
memory?
Zhongkun He July 10, 2023, 1:16 p.m. UTC | #6
>
> OK, this is an important detail to mention. Also have tried to get some
> numbers of how much excess is happening for a mixed bag of compressible
> memory?

Yes, I've done the test many times.  The numbers of excess depend on the
compression ratio only. The maximum amount will not exceed 400KB,and will be
smaller than the hard limit finally.

> --
> Michal Hocko
> SUSE Labs
Michal Hocko July 10, 2023, 1:34 p.m. UTC | #7
On Mon 10-07-23 21:16:42, 贺中坤 wrote:
> >
> > OK, this is an important detail to mention. Also have tried to get some
> > numbers of how much excess is happening for a mixed bag of compressible
> > memory?
> 
> Yes, I've done the test many times.  The numbers of excess depend on the
> compression ratio only. The maximum amount will not exceed 400KB,and will be
> smaller than the hard limit finally.

Again, important data point for your changelog.
Zhongkun He July 10, 2023, 3:02 p.m. UTC | #8
On Mon, Jul 10, 2023 at 9:35 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 10-07-23 21:16:42, 贺中坤 wrote:
> > >
> > > OK, this is an important detail to mention. Also have tried to get some
> > > numbers of how much excess is happening for a mixed bag of compressible
> > > memory?
> >
> > Yes, I've done the test many times.  The numbers of excess depend on the
> > compression ratio only. The maximum amount will not exceed 400KB,and will be
> > smaller than the hard limit finally.
>
> Again, important data point for your changelog.

Thanks,  more data will be added in the next version.

>
> --
> Michal Hocko
> SUSE Labs