mbox series

[v5,00/21] mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool

Message ID 20240806022143.3924396-1-alexs@kernel.org (mailing list archive)
Headers show
Series mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool | expand

Message

alexs@kernel.org Aug. 6, 2024, 2:21 a.m. UTC
From: Alex Shi <alexs@kernel.org>

According to Metthew's plan, the page descriptor will be replace by a 8
bytes mem_desc on destination purpose.
https://lore.kernel.org/lkml/YvV1KTyzZ+Jrtj9x@casper.infradead.org/

Here is a implement on zsmalloc to replace page descriptor by 'zpdesc',
which is still overlay on struct page now. but it's a step move forward
above destination.

To name the struct zpdesc instead of zsdesc, since there are still 3
zpools under zswap: zbud, z3fold, zsmalloc for now(z3fold maybe removed
soon), and we could easyly extend it to other zswap.zpool in needs.

For all zswap.zpools, they are all using single page since often used
under memory pressure. So the conversion via folio series helper is
better than page's for compound_head check saving.

For now, all zpools are using some page struct members, like page.flags
for PG_private/PG_locked. and list_head lru, page.mapping for page migration.

This patachset does not increase the descriptor size nor introduce any
functional changes, and could save about 122Kbytes zsmalloc.o size.

Thanks a lot for comments and suggestion from Yosry, Yoo, Sergey, Willy
and Vishal!

Thanks
Alex

---
v4->v5:
- rebase on akpm/mm-unstable on Aug 4.
- add a helper and update code comments accroding to Sergey's comments
- fold patch 20/21, remove 3 helpers functions according to Vishal's
  comments 

v3->v4:
- rebase on akpm/mm-unstable Jul 21
- fixed a build warning reported by LKP
- Add a comment update for struct page to zpdesc change

v2->v3:
- Fix LKP reported build issue
- Update the Usage of struct zpdesc fields.
- Rebase onto latest mm-unstable commit 2073cda629a4

v1->v2: 
- Take Yosry and Yoo's suggestion to add more members in zpdesc,
- Rebase on latest mm-unstable commit 31334cf98dbd
---

Alex Shi (10):
  mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool
  mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()
  mm/zsmalloc: convert create_page_chain() and its users to use zpdesc
  mm/zsmalloc: rename reset_page to reset_zpdesc and use zpdesc in it
  mm/zsmalloc: convert SetZsPageMovable and remove unused funcs
  mm/zsmalloc: convert get/set_first_obj_offset() to take zpdesc
  mm/zsmalloc: introduce __zpdesc_clear_movable
  mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc()
  mm/zsmalloc: introduce zpdesc_clear_first() helper
  mm/zsmalloc: update comments for page->zpdesc changes

Hyeonggon Yoo (11):
  mm/zsmalloc: convert __zs_map_object/__zs_unmap_object to use zpdesc
  mm/zsmalloc: add and use pfn/zpdesc seeking funcs
  mm/zsmalloc: convert obj_malloc() to use zpdesc
  mm/zsmalloc: convert obj_allocated() and related helpers to use zpdesc
  mm/zsmalloc: convert init_zspage() to use zpdesc
  mm/zsmalloc: convert obj_to_page() and zs_free() to use zpdesc
  mm/zsmalloc: add zpdesc_is_isolated()/zpdesc_zone() helper for
    zs_page_migrate()
  mm/zsmalloc: convert __free_zspage() to use zdsesc
  mm/zsmalloc: convert location_to_obj() to take zpdesc
  mm/zsmalloc: convert migrate_zspage() to use zpdesc
  mm/zsmalloc: convert get_zspage() to take zpdesc

 mm/zpdesc.h   | 146 +++++++++++++++
 mm/zsmalloc.c | 482 +++++++++++++++++++++++++++-----------------------
 2 files changed, 403 insertions(+), 225 deletions(-)
 create mode 100644 mm/zpdesc.h

Comments

alexs@kernel.org Aug. 6, 2024, 2:22 a.m. UTC | #1
From: Alex Shi <alexs@kernel.org>

According to Metthew's plan, the page descriptor will be replace by a 8
bytes mem_desc on destination purpose.
https://lore.kernel.org/lkml/YvV1KTyzZ+Jrtj9x@casper.infradead.org/

Here is a implement on zsmalloc to replace page descriptor by 'zpdesc',
which is still overlay on struct page now. but it's a step move forward
above destination.

To name the struct zpdesc instead of zsdesc, since there are still 3
zpools under zswap: zbud, z3fold, zsmalloc for now(z3fold maybe removed
soon), and we could easyly extend it to other zswap.zpool in needs.

For all zswap.zpools, they are all using single page since often used
under memory pressure. So the conversion via folio series helper is
better than page's for compound_head check saving.

For now, all zpools are using some page struct members, like page.flags
for PG_private/PG_locked. and list_head lru, page.mapping for page migration.

This patachset does not increase the descriptor size nor introduce any
functional changes, and could save about 122Kbytes zsmalloc.o size.

Thanks a lot for comments and suggestion from Yosry, Yoo, Sergey, Willy
and Vishal!

Thanks
Alex

---
v4->v5:
- rebase on akpm/mm-unstable on Aug 4.
- add a helper and update code comments accroding to Sergey's comments
- fold patch 20/21, remove 3 helpers functions according to Vishal's
  comments 

v3->v4:
- rebase on akpm/mm-unstable Jul 21
- fixed a build warning reported by LKP
- Add a comment update for struct page to zpdesc change

v2->v3:
- Fix LKP reported build issue
- Update the Usage of struct zpdesc fields.
- Rebase onto latest mm-unstable commit 2073cda629a4

v1->v2: 
- Take Yosry and Yoo's suggestion to add more members in zpdesc,
- Rebase on latest mm-unstable commit 31334cf98dbd
---

Alex Shi (10):
  mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool
  mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()
  mm/zsmalloc: convert create_page_chain() and its users to use zpdesc
  mm/zsmalloc: rename reset_page to reset_zpdesc and use zpdesc in it
  mm/zsmalloc: convert SetZsPageMovable and remove unused funcs
  mm/zsmalloc: convert get/set_first_obj_offset() to take zpdesc
  mm/zsmalloc: introduce __zpdesc_clear_movable
  mm/zsmalloc: introduce __zpdesc_clear/set_zsmalloc()
  mm/zsmalloc: introduce zpdesc_clear_first() helper
  mm/zsmalloc: update comments for page->zpdesc changes

Hyeonggon Yoo (11):
  mm/zsmalloc: convert __zs_map_object/__zs_unmap_object to use zpdesc
  mm/zsmalloc: add and use pfn/zpdesc seeking funcs
  mm/zsmalloc: convert obj_malloc() to use zpdesc
  mm/zsmalloc: convert obj_allocated() and related helpers to use zpdesc
  mm/zsmalloc: convert init_zspage() to use zpdesc
  mm/zsmalloc: convert obj_to_page() and zs_free() to use zpdesc
  mm/zsmalloc: add zpdesc_is_isolated()/zpdesc_zone() helper for
    zs_page_migrate()
  mm/zsmalloc: convert __free_zspage() to use zdsesc
  mm/zsmalloc: convert location_to_obj() to take zpdesc
  mm/zsmalloc: convert migrate_zspage() to use zpdesc
  mm/zsmalloc: convert get_zspage() to take zpdesc

 mm/zpdesc.h   | 146 +++++++++++++++
 mm/zsmalloc.c | 482 +++++++++++++++++++++++++++-----------------------
 2 files changed, 403 insertions(+), 225 deletions(-)
 create mode 100644 mm/zpdesc.h
Alex Shi Aug. 9, 2024, 2:32 a.m. UTC | #2
On 8/8/24 11:37 AM, Matthew Wilcox wrote:
> I've written about it here:
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> https://kernelnewbies.org/MatthewWilcox/FolioAlloc
> https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

Thanks for sharing!

> 
>> So I guess if we have something
>>
>> struct zspage {
>> 	..
>> 	struct zpdesc *first_desc;
>> 	..
>> }
>>
>> and we "chain" zpdesc-s to form a zspage, and make each of them point to
>> a corresponding struct page (memdesc -> *page), then it'll resemble current
>> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
>> will need to maintain a dedicated kmem_cache?
> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> sp we'd be doing something like allocating 32 bytes for each page.
> Is there really 32 bytes of information that we want to store for
> each page?  Or could we store all of the information in (a somewhat
> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.

Thanks for the suggestions! Yes, it's a good direction we could try after this
patchset.

Thanks for you all!
Sergey Senozhatsky Aug. 14, 2024, 6:03 a.m. UTC | #3
On (24/08/08 04:37), Matthew Wilcox wrote:
[..]
>
> I don't know if it's _your_ problem.  It's _our_ problem.  The arguments
> for (at least attempting) to shrink struct page seem quite compelling.
> We have a plan for most of the users of struct page, in greater or
> lesser detail.  I don't think we have a plan for zsmalloc.  Or at least
> if there is a plan, I don't know what it is.

Got you, thanks.  And sorry for a very delayed reply.

> > > Do you allocate a per-page struct zpdesc, and have each one pointing
> > > to a zspage?
> > 
> > I'm not very knowledgeable when it comes to memdesc, excuse my
> > ignorance, and please feel free to educate me.
> 
> I've written about it here:
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> https://kernelnewbies.org/MatthewWilcox/FolioAlloc
> https://kernelnewbies.org/MatthewWilcox/Memdescs/Path

Thanks a lot!

> > So I guess if we have something
> > 
> > struct zspage {
> > 	..
> > 	struct zpdesc *first_desc;
> > 	..
> > }
> > 
> > and we "chain" zpdesc-s to form a zspage, and make each of them point to
> > a corresponding struct page (memdesc -> *page), then it'll resemble current
> > zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
> > will need to maintain a dedicated kmem_cache?
> 
> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> sp we'd be doing something like allocating 32 bytes for each page.
> Is there really 32 bytes of information that we want to store for
> each page?  Or could we store all of the information in (a somewhat
> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.

I certainly like (and appreciate) the approach that saves us
some bytes here and there.  zsmalloc page can consist of 1 to
up to CONFIG_ZSMALLOC_CHAIN_SIZE (max 16) physical pages.  I'm
trying to understand (in pseudo-C code) what does a "somewhat larger
zspage" mean.  A fixed size array (given that we know the max number
of physical pages) per-zspage?
Sergey Senozhatsky Aug. 15, 2024, 3:13 a.m. UTC | #4
On (24/08/09 10:32), Alex Shi wrote:
[..]
> >> and we "chain" zpdesc-s to form a zspage, and make each of them point to
> >> a corresponding struct page (memdesc -> *page), then it'll resemble current
> >> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
> >> will need to maintain a dedicated kmem_cache?
> > Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> > sp we'd be doing something like allocating 32 bytes for each page.
> > Is there really 32 bytes of information that we want to store for
> > each page?  Or could we store all of the information in (a somewhat
> > larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> > an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
> 
> Thanks for the suggestions! Yes, it's a good direction we could try after this
> patchset.

Alex, may I ask what exactly you will "try"?
Alex Shi Aug. 15, 2024, 3:50 a.m. UTC | #5
On 8/15/24 11:13 AM, Sergey Senozhatsky wrote:
> On (24/08/09 10:32), Alex Shi wrote:
> [..]
>>>> and we "chain" zpdesc-s to form a zspage, and make each of them point to
>>>> a corresponding struct page (memdesc -> *page), then it'll resemble current
>>>> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
>>>> will need to maintain a dedicated kmem_cache?
>>> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
>>> sp we'd be doing something like allocating 32 bytes for each page.
>>> Is there really 32 bytes of information that we want to store for
>>> each page?  Or could we store all of the information in (a somewhat
>>> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
>>> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
>>
>> Thanks for the suggestions! Yes, it's a good direction we could try after this
>> patchset.
> 
> Alex, may I ask what exactly you will "try"?

Hi Sergey,

Thanks for question. As a quick amateur thought, the final result may like following,
please correct me if I am wrong.

1, there is a memdesc for each of memory page.

2, we kmem_alloc some zpdesc struct for specifically our needs, like zpdesc.next
   zpdesc.zspage/first_obj_offset, these current we used in zsmalloc.

3, there is a gap between memdesc and zpdesc, like .flags, _refcount, .mops etc.
   this part is still unclear that how to handle them well.

During the 2nd, 3rd steps, we may have chance to move some members from zpdesc, to
zspage? but it's also unclear. 

Thanks
Alex
Vishal Moola Aug. 27, 2024, 11:19 p.m. UTC | #6
On Wed, Aug 14, 2024 at 03:03:54PM +0900, Sergey Senozhatsky wrote:
> On (24/08/08 04:37), Matthew Wilcox wrote:
> [..]
> > > So I guess if we have something
> > > 
> > > struct zspage {
> > > 	..
> > > 	struct zpdesc *first_desc;
> > > 	..
> > > }
> > > 
> > > and we "chain" zpdesc-s to form a zspage, and make each of them point to
> > > a corresponding struct page (memdesc -> *page), then it'll resemble current
> > > zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
> > > will need to maintain a dedicated kmem_cache?
> > 
> > Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> > sp we'd be doing something like allocating 32 bytes for each page.
> > Is there really 32 bytes of information that we want to store for
> > each page?  Or could we store all of the information in (a somewhat
> > larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> > an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
> 
> I certainly like (and appreciate) the approach that saves us
> some bytes here and there.  zsmalloc page can consist of 1 to
> up to CONFIG_ZSMALLOC_CHAIN_SIZE (max 16) physical pages.  I'm
> trying to understand (in pseudo-C code) what does a "somewhat larger
> zspage" mean.  A fixed size array (given that we know the max number
> of physical pages) per-zspage?

I haven't had the opportunity to respond until now as I was on vacation.

With the current approach in a memdesc world, we would do the following:

1) kmem_cache_alloc() every single Zpdesc
2) Allocate a memdesc/page that points to its own Zpdesc
3) Access/Track Zpdescs directly
4) Use those Zpdescs to build a Zspage

An alternative approach would move more metadata storage from a Zpdesc
into a Zspage instead. That extreme would leave us with:

1) kmem_cache_alloc() once for a Zspage
2) Allocate a memdesc/page that points to the Zspage
3) Use the Zspage to access/track its own subpages (through some magic
we would have to figure out)
4) Zpdescs are just Zspages (since all the information would be in a Zspage)

IMO, we should introduce zpdescs first, then start to shift
metadata from "struct zpdesc" into "struct zspage" until we no longer
need "struct zpdesc". My big concern is whether or not this patchset works
towards those goals. Will it make consolidating the metadata easier? And are
these goals feasible (while maintaining the wins of zsmalloc)? Or should we
aim to leave zsmalloc as it is currently implemented?
Alex Shi Aug. 29, 2024, 9:42 a.m. UTC | #7
On 8/28/24 7:19 AM, Vishal Moola wrote:
> On Wed, Aug 14, 2024 at 03:03:54PM +0900, Sergey Senozhatsky wrote:
>> On (24/08/08 04:37), Matthew Wilcox wrote:
>> [..]
>>>> So I guess if we have something
>>>>
>>>> struct zspage {
>>>> 	..
>>>> 	struct zpdesc *first_desc;
>>>> 	..
>>>> }
>>>>
>>>> and we "chain" zpdesc-s to form a zspage, and make each of them point to
>>>> a corresponding struct page (memdesc -> *page), then it'll resemble current
>>>> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
>>>> will need to maintain a dedicated kmem_cache?
>>>
>>> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
>>> sp we'd be doing something like allocating 32 bytes for each page.
>>> Is there really 32 bytes of information that we want to store for
>>> each page?  Or could we store all of the information in (a somewhat
>>> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
>>> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
>>
>> I certainly like (and appreciate) the approach that saves us
>> some bytes here and there.  zsmalloc page can consist of 1 to
>> up to CONFIG_ZSMALLOC_CHAIN_SIZE (max 16) physical pages.  I'm
>> trying to understand (in pseudo-C code) what does a "somewhat larger
>> zspage" mean.  A fixed size array (given that we know the max number
>> of physical pages) per-zspage?
> 
> I haven't had the opportunity to respond until now as I was on vacation.
> 
> With the current approach in a memdesc world, we would do the following:
> 
> 1) kmem_cache_alloc() every single Zpdesc
> 2) Allocate a memdesc/page that points to its own Zpdesc
> 3) Access/Track Zpdescs directly
> 4) Use those Zpdescs to build a Zspage
> 
> An alternative approach would move more metadata storage from a Zpdesc
> into a Zspage instead. That extreme would leave us with:
> 
> 1) kmem_cache_alloc() once for a Zspage
> 2) Allocate a memdesc/page that points to the Zspage
> 3) Use the Zspage to access/track its own subpages (through some magic
> we would have to figure out)
> 4) Zpdescs are just Zspages (since all the information would be in a Zspage)
> 
> IMO, we should introduce zpdescs first, then start to shift
> metadata from "struct zpdesc" into "struct zspage" until we no longer
> need "struct zpdesc". My big concern is whether or not this patchset works
> towards those goals. Will it make consolidating the metadata easier? And are
> these goals feasible (while maintaining the wins of zsmalloc)? Or should we
> aim to leave zsmalloc as it is currently implemented?

Uh, correct me if I am wrong.

IMHO, regarding what this patchset does, it abstracts the memory descriptor usage
for zswap/zram. The descriptor still overlays the struct page; nothing has changed
in that regard. What this patchset accomplishes is the use of folios in the guts
to save some code size, and the introduction of a new concept, zpdesc. 
This patchset is just an initial step; it does not bias the potential changes to 
kmem_alloc or larger zspage modifications. In fact, both approaches require this
fundamental abstract concept: zpdesc.

So I believe this patchset is needed.

Thanks
Alex
Sergey Senozhatsky Sept. 3, 2024, 3:20 a.m. UTC | #8
On (24/08/27 16:19), Vishal Moola wrote:
>
> Or should we aim to leave zsmalloc as it is currently implemented?
>

Is this really an option?
Vishal Moola Sept. 3, 2024, 5:35 p.m. UTC | #9
On Mon, Sep 2, 2024 at 8:20 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
>
> On (24/08/27 16:19), Vishal Moola wrote:
> >
> > Or should we aim to leave zsmalloc as it is currently implemented?
> >
>
> Is this really an option?

Yup. That would be similar to your initial suggestion: zpdescs would be a
wrapper around struct page and we "chain zpdesc-s together to form a zspage".

Although, we may as well aim for an improved implementation of zsmalloc (if at
all possible) since implementing the zpdesc wrapper requires us to modify a
good chunk of code anyways. That way zsmalloc gets direct wins for its users
on top of shrinking struct page as Matthew described before.

I believe you would be a better judge of whether or not any improvements
are feasible (and worth the effort).
Vishal Moola Sept. 4, 2024, 7:51 p.m. UTC | #10
On Thu, Aug 29, 2024 at 05:42:06PM +0800, Alex Shi wrote:
> 
> 
> On 8/28/24 7:19 AM, Vishal Moola wrote:
> > On Wed, Aug 14, 2024 at 03:03:54PM +0900, Sergey Senozhatsky wrote:
> >> On (24/08/08 04:37), Matthew Wilcox wrote:
> >> [..]
> >>>> So I guess if we have something
> >>>>
> >>>> struct zspage {
> >>>> 	..
> >>>> 	struct zpdesc *first_desc;
> >>>> 	..
> >>>> }
> >>>>
> >>>> and we "chain" zpdesc-s to form a zspage, and make each of them point to
> >>>> a corresponding struct page (memdesc -> *page), then it'll resemble current
> >>>> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
> >>>> will need to maintain a dedicated kmem_cache?
> >>>
> >>> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> >>> sp we'd be doing something like allocating 32 bytes for each page.
> >>> Is there really 32 bytes of information that we want to store for
> >>> each page?  Or could we store all of the information in (a somewhat
> >>> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> >>> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
> >>
> >> I certainly like (and appreciate) the approach that saves us
> >> some bytes here and there.  zsmalloc page can consist of 1 to
> >> up to CONFIG_ZSMALLOC_CHAIN_SIZE (max 16) physical pages.  I'm
> >> trying to understand (in pseudo-C code) what does a "somewhat larger
> >> zspage" mean.  A fixed size array (given that we know the max number
> >> of physical pages) per-zspage?
> > 
> > I haven't had the opportunity to respond until now as I was on vacation.
> > 
> > With the current approach in a memdesc world, we would do the following:
> > 
> > 1) kmem_cache_alloc() every single Zpdesc
> > 2) Allocate a memdesc/page that points to its own Zpdesc
> > 3) Access/Track Zpdescs directly
> > 4) Use those Zpdescs to build a Zspage
> > 
> > An alternative approach would move more metadata storage from a Zpdesc
> > into a Zspage instead. That extreme would leave us with:
> > 
> > 1) kmem_cache_alloc() once for a Zspage
> > 2) Allocate a memdesc/page that points to the Zspage
> > 3) Use the Zspage to access/track its own subpages (through some magic
> > we would have to figure out)
> > 4) Zpdescs are just Zspages (since all the information would be in a Zspage)
> > 
> > IMO, we should introduce zpdescs first, then start to shift
> > metadata from "struct zpdesc" into "struct zspage" until we no longer
> > need "struct zpdesc". My big concern is whether or not this patchset works
> > towards those goals. Will it make consolidating the metadata easier? And are
> > these goals feasible (while maintaining the wins of zsmalloc)? Or should we
> > aim to leave zsmalloc as it is currently implemented?
> 
> Uh, correct me if I am wrong.
> 
> IMHO, regarding what this patchset does, it abstracts the memory descriptor usage
> for zswap/zram. 

Sorry, I misunderstood the patchset. I thought it was creating a
descriptor specifically for zsmalloc, when it seems like this is supposed to
be a generic descriptor for all zpool allocators. The code comments and commit
subjects are misleading and should be changed to reflect that.

I'm onboard for using zpdesc for zbud and z3fold as well (or we'd have to come
up with some other plan for them as well). Once we have a plan all the
maintainers agree on we can all be on our merry way :)

The questions for all the zpool allocator maintainers are:
1) Does your allocator need the space its using in struct page (aka
would it need a descriptor in a memdesc world)?

2) Is it feasible to store the information elsewhere (outside of struct
page)? And how much effort would that code conversion be?

Thoughts? Seth/Dan, Vitaly/Miahoe, and Sergey?

> The descriptor still overlays the struct page; nothing has changed
> in that regard. What this patchset accomplishes is the use of folios in the guts
> to save some code size, and the introduction of a new concept, zpdesc. 
> This patchset is just an initial step; it does not bias the potential changes to 
> kmem_alloc or larger zspage modifications. In fact, both approaches require this
> fundamental abstract concept: zpdesc.
Yosry Ahmed Sept. 4, 2024, 8:21 p.m. UTC | #11
On Wed, Sep 4, 2024 at 12:51 PM Vishal Moola <vishal.moola@gmail.com> wrote:
>
> On Thu, Aug 29, 2024 at 05:42:06PM +0800, Alex Shi wrote:
> >
> >
> > On 8/28/24 7:19 AM, Vishal Moola wrote:
> > > On Wed, Aug 14, 2024 at 03:03:54PM +0900, Sergey Senozhatsky wrote:
> > >> On (24/08/08 04:37), Matthew Wilcox wrote:
> > >> [..]
> > >>>> So I guess if we have something
> > >>>>
> > >>>> struct zspage {
> > >>>>  ..
> > >>>>  struct zpdesc *first_desc;
> > >>>>  ..
> > >>>> }
> > >>>>
> > >>>> and we "chain" zpdesc-s to form a zspage, and make each of them point to
> > >>>> a corresponding struct page (memdesc -> *page), then it'll resemble current
> > >>>> zsmalloc and should work for everyone? I also assume for zspdesc-s zsmalloc
> > >>>> will need to maintain a dedicated kmem_cache?
> > >>>
> > >>> Right, we could do that.  Each memdesc has to be a multiple of 16 bytes,
> > >>> sp we'd be doing something like allocating 32 bytes for each page.
> > >>> Is there really 32 bytes of information that we want to store for
> > >>> each page?  Or could we store all of the information in (a somewhat
> > >>> larger) zspage?  Assuming we allocate 3 pages per zspage, if we allocate
> > >>> an extra 64 bytes in the zspage, we've saved 32 bytes per zspage.
> > >>
> > >> I certainly like (and appreciate) the approach that saves us
> > >> some bytes here and there.  zsmalloc page can consist of 1 to
> > >> up to CONFIG_ZSMALLOC_CHAIN_SIZE (max 16) physical pages.  I'm
> > >> trying to understand (in pseudo-C code) what does a "somewhat larger
> > >> zspage" mean.  A fixed size array (given that we know the max number
> > >> of physical pages) per-zspage?
> > >
> > > I haven't had the opportunity to respond until now as I was on vacation.
> > >
> > > With the current approach in a memdesc world, we would do the following:
> > >
> > > 1) kmem_cache_alloc() every single Zpdesc
> > > 2) Allocate a memdesc/page that points to its own Zpdesc
> > > 3) Access/Track Zpdescs directly
> > > 4) Use those Zpdescs to build a Zspage
> > >
> > > An alternative approach would move more metadata storage from a Zpdesc
> > > into a Zspage instead. That extreme would leave us with:
> > >
> > > 1) kmem_cache_alloc() once for a Zspage
> > > 2) Allocate a memdesc/page that points to the Zspage
> > > 3) Use the Zspage to access/track its own subpages (through some magic
> > > we would have to figure out)
> > > 4) Zpdescs are just Zspages (since all the information would be in a Zspage)
> > >
> > > IMO, we should introduce zpdescs first, then start to shift
> > > metadata from "struct zpdesc" into "struct zspage" until we no longer
> > > need "struct zpdesc". My big concern is whether or not this patchset works
> > > towards those goals. Will it make consolidating the metadata easier? And are
> > > these goals feasible (while maintaining the wins of zsmalloc)? Or should we
> > > aim to leave zsmalloc as it is currently implemented?
> >
> > Uh, correct me if I am wrong.
> >
> > IMHO, regarding what this patchset does, it abstracts the memory descriptor usage
> > for zswap/zram.
>
> Sorry, I misunderstood the patchset. I thought it was creating a
> descriptor specifically for zsmalloc, when it seems like this is supposed to
> be a generic descriptor for all zpool allocators. The code comments and commit
> subjects are misleading and should be changed to reflect that.
>
> I'm onboard for using zpdesc for zbud and z3fold as well (or we'd have to come
> up with some other plan for them as well). Once we have a plan all the
> maintainers agree on we can all be on our merry way :)
>
> The questions for all the zpool allocator maintainers are:
> 1) Does your allocator need the space its using in struct page (aka
> would it need a descriptor in a memdesc world)?
>
> 2) Is it feasible to store the information elsewhere (outside of struct
> page)? And how much effort would that code conversion be?
>
> Thoughts? Seth/Dan, Vitaly/Miahoe, and Sergey?

I would advise against spending effort on z3fold and zbud tbh, we want
to deprecate them.