Message ID | 20240628031138.429622-2-alexs@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/zsmalloc: add zpdesc memory descriptor for zswap.zpool | expand |
On Fri, Jun 28, 2024 at 12:06 PM <alexs@kernel.org> wrote: > > From: Alex Shi <alexs@kernel.org> > > The 1st patch introduces new memory decriptor zpdesc and rename > zspage.first_page to zspage.first_zpdesc, no functional change. > > Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > Signed-off-by: Alex Shi <alexs@kernel.org> > --- > mm/zpdesc.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ > mm/zsmalloc.c | 19 ++++++++--------- > 2 files changed, 66 insertions(+), 9 deletions(-) > create mode 100644 mm/zpdesc.h Hi Alex, thanks for your effort in pushing this forward! > diff --git a/mm/zpdesc.h b/mm/zpdesc.h > new file mode 100644 > index 000000000000..a1ab5ebaa936 > --- /dev/null > +++ b/mm/zpdesc.h > @@ -0,0 +1,56 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* zpdesc.h: zswap.zpool memory descriptor > + * > + * Written by Alex Shi <alexs@kernel.org> > + * Hyeonggon Yoo <42.hyeyoo@gmail.com> > + */ > +#ifndef __MM_ZPDESC_H__ > +#define __MM_ZPDESC_H__ > + > +/* > + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc > + * @flags: Page flags, PG_private: identifies the first component page > + * @lru: Indirected used by page migration maybe Indirected -> Indirectly? > + * @next: Next zpdesc in a zspage in zsmalloc zpool > + * @handle: For huge zspage in zsmalloc zpool > + * @zspage: Pointer to zspage in zsmalloc > + * > + * This struct overlays struct page for now. Do not modify without a good > + * understanding of the issues. > + */ > +struct zpdesc { > + unsigned long flags; > + struct list_head lru; > + unsigned long _zp_pad_1; for understanding, I think it'd be better to replace _zp_pad_1 with movable ops, because mops reuses this 'mapping' field. > + union { > + /* Next zpdescs in a zspage in zsmalloc zpool */ > + struct zpdesc *next; > + /* For huge zspage in zsmalloc zpool */ > + unsigned long handle; > + }; > + struct zspage *zspage; There was a discussion with Yosry on including memcg_data on zpdesc even if it's not used at the moment. Maybe you can look at: https://lore.kernel.org/linux-mm/CAB=+i9Quz9iP2-Lq=oQfKVVnzPDtOaKMm=hUPbnRg5hRxH+qaA@mail.gmail.com/ > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index fec1a39e5bbe..67bb80b7413a 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -13,17 +13,17 @@ > > /* > * Following is how we use various fields and flags of underlying > - * struct page(s) to form a zspage. > + * struct zpdesc(page) to form a zspage. > * > - * Usage of struct page fields: > - * page->private: points to zspage > - * page->index: links together all component pages of a zspage > + * Usage of struct zpdesc fields: > + * zpdesc->zspage: points to zspage > + * zpdesc->next: links together all component pages of a zspage > * For the huge page, this is always 0, so we use this field > * to store handle. > * page->page_type: PG_zsmalloc, lower 16 bit locate the first object > * offset in a subpage of a zspage > * > - * Usage of struct page flags: > + * Usage of struct zpdesc(page) flags: > * PG_private: identifies the first component page > * PG_owner_priv_1: identifies the huge component page the comment for PG_owner_priv_1 can safely be removed as it's not used after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from page to zspage") > @@ -948,7 +949,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, > set_page_private(page, (unsigned long)zspage); > page->index = 0; > if (i == 0) { > - zspage->first_page = page; > + zspage->first_zpdesc = page_zpdesc(page); > SetPagePrivate(page); > if (unlikely(class->objs_per_zspage == 1 && > class->pages_per_zspage == 1)) > @@ -1325,7 +1326,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, > link->handle = handle; > else > /* record handle to page->index */ > - zspage->first_page->index = handle; > + zspage->first_zpdesc->handle = handle; FYI this line seems to conflict with bcc6116e39f512 ("mm/zsmalloc: move record_obj() into obj_malloc()") on mm-unstable. Best, Hyeonggon
On 6/30/24 9:45 PM, Hyeonggon Yoo wrote: > On Fri, Jun 28, 2024 at 12:06 PM <alexs@kernel.org> wrote: >> >> From: Alex Shi <alexs@kernel.org> >> >> The 1st patch introduces new memory decriptor zpdesc and rename >> zspage.first_page to zspage.first_zpdesc, no functional change. >> >> Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> >> Signed-off-by: Alex Shi <alexs@kernel.org> >> --- >> mm/zpdesc.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ >> mm/zsmalloc.c | 19 ++++++++--------- >> 2 files changed, 66 insertions(+), 9 deletions(-) >> create mode 100644 mm/zpdesc.h > > Hi Alex, thanks for your effort in pushing this forward! > >> diff --git a/mm/zpdesc.h b/mm/zpdesc.h >> new file mode 100644 >> index 000000000000..a1ab5ebaa936 >> --- /dev/null >> +++ b/mm/zpdesc.h >> @@ -0,0 +1,56 @@ >> +/* SPDX-License-Identifier: GPL-2.0 */ >> +/* zpdesc.h: zswap.zpool memory descriptor >> + * >> + * Written by Alex Shi <alexs@kernel.org> >> + * Hyeonggon Yoo <42.hyeyoo@gmail.com> >> + */ >> +#ifndef __MM_ZPDESC_H__ >> +#define __MM_ZPDESC_H__ >> + >> +/* >> + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc >> + * @flags: Page flags, PG_private: identifies the first component page >> + * @lru: Indirected used by page migration > > maybe Indirected -> Indirectly? Hi Yoo, Thanks for comments! Yes Indirectly is better. I will update it in next version. > >> + * @next: Next zpdesc in a zspage in zsmalloc zpool >> + * @handle: For huge zspage in zsmalloc zpool >> + * @zspage: Pointer to zspage in zsmalloc >> + * >> + * This struct overlays struct page for now. Do not modify without a good >> + * understanding of the issues. >> + */ >> +struct zpdesc { >> + unsigned long flags; >> + struct list_head lru; >> + unsigned long _zp_pad_1; > > for understanding, I think it'd be better to replace _zp_pad_1 with movable ops, > because mops reuses this 'mapping' field. Right, 'mops' looks a bit more clear. > >> + union { >> + /* Next zpdescs in a zspage in zsmalloc zpool */ >> + struct zpdesc *next; >> + /* For huge zspage in zsmalloc zpool */ >> + unsigned long handle; >> + }; >> + struct zspage *zspage; > > There was a discussion with Yosry on including memcg_data on zpdesc > even if it's not used at the moment. > > Maybe you can look at: > https://lore.kernel.org/linux-mm/CAB=+i9Quz9iP2-Lq=oQfKVVnzPDtOaKMm=hUPbnRg5hRxH+qaA@mail.gmail.com/ Thanks for notice. The memcg_data isn't used for zpdesc. And I have a bit confusion since Yosry said: "I think to drop memcg_data we need to enlighten the code that ...". So we actually don't need to have this unused member, is this right, Yosry? > >> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c >> index fec1a39e5bbe..67bb80b7413a 100644 >> --- a/mm/zsmalloc.c >> +++ b/mm/zsmalloc.c >> @@ -13,17 +13,17 @@ >> >> /* >> * Following is how we use various fields and flags of underlying >> - * struct page(s) to form a zspage. >> + * struct zpdesc(page) to form a zspage. >> * >> - * Usage of struct page fields: >> - * page->private: points to zspage >> - * page->index: links together all component pages of a zspage >> + * Usage of struct zpdesc fields: >> + * zpdesc->zspage: points to zspage >> + * zpdesc->next: links together all component pages of a zspage >> * For the huge page, this is always 0, so we use this field >> * to store handle. >> * page->page_type: PG_zsmalloc, lower 16 bit locate the first object >> * offset in a subpage of a zspage >> * >> - * Usage of struct page flags: >> + * Usage of struct zpdesc(page) flags: >> * PG_private: identifies the first component page >> * PG_owner_priv_1: identifies the huge component page > > the comment for PG_owner_priv_1 can safely be removed as it's not used > after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from > page to zspage") Right, thanks for info! > >> @@ -948,7 +949,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, >> set_page_private(page, (unsigned long)zspage); >> page->index = 0; >> if (i == 0) { >> - zspage->first_page = page; >> + zspage->first_zpdesc = page_zpdesc(page); >> SetPagePrivate(page); >> if (unlikely(class->objs_per_zspage == 1 && >> class->pages_per_zspage == 1)) >> @@ -1325,7 +1326,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, >> link->handle = handle; >> else >> /* record handle to page->index */ >> - zspage->first_page->index = handle; >> + zspage->first_zpdesc->handle = handle; > > FYI this line seems to conflict with > bcc6116e39f512 ("mm/zsmalloc: move record_obj() into obj_malloc()") > on mm-unstable. yes, a new commit made this conflict. will update this in next version. Thanks Alex > > Best, > Hyeonggon
Ops, Sorry for missing Yosry and Nhat. How stupid I am! Here is the first patch on lore: https://lore.kernel.org/all/20240628031138.429622-1-alexs@kernel.org/ Sorry for this! Alex On 7/1/24 11:03 AM, Alex Shi wrote: > > > On 6/30/24 9:45 PM, Hyeonggon Yoo wrote: >> On Fri, Jun 28, 2024 at 12:06 PM <alexs@kernel.org> wrote: >>> >>> From: Alex Shi <alexs@kernel.org> >>> >>> The 1st patch introduces new memory decriptor zpdesc and rename >>> zspage.first_page to zspage.first_zpdesc, no functional change. >>> >>> Originally-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> >>> Signed-off-by: Alex Shi <alexs@kernel.org> >>> --- >>> mm/zpdesc.h | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++ >>> mm/zsmalloc.c | 19 ++++++++--------- >>> 2 files changed, 66 insertions(+), 9 deletions(-) >>> create mode 100644 mm/zpdesc.h >> >> Hi Alex, thanks for your effort in pushing this forward! >> >>> diff --git a/mm/zpdesc.h b/mm/zpdesc.h >>> new file mode 100644 >>> index 000000000000..a1ab5ebaa936 >>> --- /dev/null >>> +++ b/mm/zpdesc.h >>> @@ -0,0 +1,56 @@ >>> +/* SPDX-License-Identifier: GPL-2.0 */ >>> +/* zpdesc.h: zswap.zpool memory descriptor >>> + * >>> + * Written by Alex Shi <alexs@kernel.org> >>> + * Hyeonggon Yoo <42.hyeyoo@gmail.com> >>> + */ >>> +#ifndef __MM_ZPDESC_H__ >>> +#define __MM_ZPDESC_H__ >>> + >>> +/* >>> + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc >>> + * @flags: Page flags, PG_private: identifies the first component page >>> + * @lru: Indirected used by page migration >> >> maybe Indirected -> Indirectly? > > Hi Yoo, > > Thanks for comments! Yes Indirectly is better. I will update it in next version. > >> >>> + * @next: Next zpdesc in a zspage in zsmalloc zpool >>> + * @handle: For huge zspage in zsmalloc zpool >>> + * @zspage: Pointer to zspage in zsmalloc >>> + * >>> + * This struct overlays struct page for now. Do not modify without a good >>> + * understanding of the issues. >>> + */ >>> +struct zpdesc { >>> + unsigned long flags; >>> + struct list_head lru; >>> + unsigned long _zp_pad_1; >> >> for understanding, I think it'd be better to replace _zp_pad_1 with movable ops, >> because mops reuses this 'mapping' field. > > Right, 'mops' looks a bit more clear. > >> >>> + union { >>> + /* Next zpdescs in a zspage in zsmalloc zpool */ >>> + struct zpdesc *next; >>> + /* For huge zspage in zsmalloc zpool */ >>> + unsigned long handle; >>> + }; >>> + struct zspage *zspage; >> >> There was a discussion with Yosry on including memcg_data on zpdesc >> even if it's not used at the moment. >> >> Maybe you can look at: >> https://lore.kernel.org/linux-mm/CAB=+i9Quz9iP2-Lq=oQfKVVnzPDtOaKMm=hUPbnRg5hRxH+qaA@mail.gmail.com/ > > Thanks for notice. > The memcg_data isn't used for zpdesc. And I have a bit confusion since Yosry said: "I think to drop memcg_data we need to enlighten the code that ...". So we actually don't need to have this unused member, is this right, Yosry? > >> >>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c >>> index fec1a39e5bbe..67bb80b7413a 100644 >>> --- a/mm/zsmalloc.c >>> +++ b/mm/zsmalloc.c >>> @@ -13,17 +13,17 @@ >>> >>> /* >>> * Following is how we use various fields and flags of underlying >>> - * struct page(s) to form a zspage. >>> + * struct zpdesc(page) to form a zspage. >>> * >>> - * Usage of struct page fields: >>> - * page->private: points to zspage >>> - * page->index: links together all component pages of a zspage >>> + * Usage of struct zpdesc fields: >>> + * zpdesc->zspage: points to zspage >>> + * zpdesc->next: links together all component pages of a zspage >>> * For the huge page, this is always 0, so we use this field >>> * to store handle. >>> * page->page_type: PG_zsmalloc, lower 16 bit locate the first object >>> * offset in a subpage of a zspage >>> * >>> - * Usage of struct page flags: >>> + * Usage of struct zpdesc(page) flags: >>> * PG_private: identifies the first component page >>> * PG_owner_priv_1: identifies the huge component page >> >> the comment for PG_owner_priv_1 can safely be removed as it's not used >> after commit a41ec880aa7b ("zsmalloc: move huge compressed obj from >> page to zspage") > > Right, thanks for info! > >> >>> @@ -948,7 +949,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, >>> set_page_private(page, (unsigned long)zspage); >>> page->index = 0; >>> if (i == 0) { >>> - zspage->first_page = page; >>> + zspage->first_zpdesc = page_zpdesc(page); >>> SetPagePrivate(page); >>> if (unlikely(class->objs_per_zspage == 1 && >>> class->pages_per_zspage == 1)) >>> @@ -1325,7 +1326,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, >>> link->handle = handle; >>> else >>> /* record handle to page->index */ >>> - zspage->first_page->index = handle; >>> + zspage->first_zpdesc->handle = handle; >> >> FYI this line seems to conflict with >> bcc6116e39f512 ("mm/zsmalloc: move record_obj() into obj_malloc()") >> on mm-unstable. > > yes, a new commit made this conflict. will update this in next version. > > Thanks > Alex >> >> Best, >> Hyeonggon
[..] > >> + union { > >> + /* Next zpdescs in a zspage in zsmalloc zpool */ > >> + struct zpdesc *next; > >> + /* For huge zspage in zsmalloc zpool */ > >> + unsigned long handle; > >> + }; > >> + struct zspage *zspage; > > > > There was a discussion with Yosry on including memcg_data on zpdesc > > even if it's not used at the moment. > > > > Maybe you can look at: > > https://lore.kernel.org/linux-mm/CAB=+i9Quz9iP2-Lq=oQfKVVnzPDtOaKMm=hUPbnRg5hRxH+qaA@mail.gmail.com/ > > Thanks for notice. > The memcg_data isn't used for zpdesc. And I have a bit confusion since Yosry said: "I think to drop memcg_data we need to enlighten the code that ...". So we actually don't need to have this unused member, is this right, Yosry? I think we need to have it, at least for now. When the pages are freed, put_page() -> folio_put() -> __folio_put() will call mem_cgroup_uncharge(). The latter will call folio_memcg() (which reads folio->memcg_data) to figure out if uncharging needs to be done. There are also other similar code paths that will check folio->memcg_data. It is currently expected to be present for all folios. So until we have custom code paths per-folio type for allocation/freeing/etc, we need to keep folio->memcg_data present and properly initialized.
On 7/1/24 9:42 PM, Yosry Ahmed wrote: > [..] >>>> + union { >>>> + /* Next zpdescs in a zspage in zsmalloc zpool */ >>>> + struct zpdesc *next; >>>> + /* For huge zspage in zsmalloc zpool */ >>>> + unsigned long handle; >>>> + }; >>>> + struct zspage *zspage; >>> >>> There was a discussion with Yosry on including memcg_data on zpdesc >>> even if it's not used at the moment. >>> >>> Maybe you can look at: >>> https://lore.kernel.org/linux-mm/CAB=+i9Quz9iP2-Lq=oQfKVVnzPDtOaKMm=hUPbnRg5hRxH+qaA@mail.gmail.com/ >> >> Thanks for notice. >> The memcg_data isn't used for zpdesc. And I have a bit confusion since Yosry said: "I think to drop memcg_data we need to enlighten the code that ...". So we actually don't need to have this unused member, is this right, Yosry? > > I think we need to have it, at least for now. When the pages are > freed, put_page() -> folio_put() -> __folio_put() will call > mem_cgroup_uncharge(). The latter will call folio_memcg() (which reads > folio->memcg_data) to figure out if uncharging needs to be done. > > There are also other similar code paths that will check > folio->memcg_data. It is currently expected to be present for all > folios. So until we have custom code paths per-folio type for > allocation/freeing/etc, we need to keep folio->memcg_data present and > properly initialized. Hi Yosry, Many thanks for detailed comments and requirements! I'm going to add this member in next version, maybe tomorrow, unless there are more comments for this patchset. :) Thank a lot! Alex
diff --git a/mm/zpdesc.h b/mm/zpdesc.h new file mode 100644 index 000000000000..a1ab5ebaa936 --- /dev/null +++ b/mm/zpdesc.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* zpdesc.h: zswap.zpool memory descriptor + * + * Written by Alex Shi <alexs@kernel.org> + * Hyeonggon Yoo <42.hyeyoo@gmail.com> + */ +#ifndef __MM_ZPDESC_H__ +#define __MM_ZPDESC_H__ + +/* + * struct zpdesc - Memory descriptor for zpool memory, now is for zsmalloc + * @flags: Page flags, PG_private: identifies the first component page + * @lru: Indirected used by page migration + * @next: Next zpdesc in a zspage in zsmalloc zpool + * @handle: For huge zspage in zsmalloc zpool + * @zspage: Pointer to zspage in zsmalloc + * + * This struct overlays struct page for now. Do not modify without a good + * understanding of the issues. + */ +struct zpdesc { + unsigned long flags; + struct list_head lru; + unsigned long _zp_pad_1; + union { + /* Next zpdescs in a zspage in zsmalloc zpool */ + struct zpdesc *next; + /* For huge zspage in zsmalloc zpool */ + unsigned long handle; + }; + struct zspage *zspage; +}; +#define ZPDESC_MATCH(pg, zp) \ + static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp)) + +ZPDESC_MATCH(flags, flags); +ZPDESC_MATCH(lru, lru); +ZPDESC_MATCH(index, next); +ZPDESC_MATCH(index, handle); +ZPDESC_MATCH(private, zspage); +#undef ZPDESC_MATCH +static_assert(sizeof(struct zpdesc) <= sizeof(struct page)); + +#define zpdesc_page(zp) (_Generic((zp), \ + const struct zpdesc *: (const struct page *)(zp), \ + struct zpdesc *: (struct page *)(zp))) + +#define zpdesc_folio(zp) (_Generic((zp), \ + const struct zpdesc *: (const struct folio *)(zp), \ + struct zpdesc *: (struct folio *)(zp))) + +#define page_zpdesc(p) (_Generic((p), \ + const struct page *: (const struct zpdesc *)(p), \ + struct page *: (struct zpdesc *)(p))) + +#endif diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index fec1a39e5bbe..67bb80b7413a 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -13,17 +13,17 @@ /* * Following is how we use various fields and flags of underlying - * struct page(s) to form a zspage. + * struct zpdesc(page) to form a zspage. * - * Usage of struct page fields: - * page->private: points to zspage - * page->index: links together all component pages of a zspage + * Usage of struct zpdesc fields: + * zpdesc->zspage: points to zspage + * zpdesc->next: links together all component pages of a zspage * For the huge page, this is always 0, so we use this field * to store handle. * page->page_type: PG_zsmalloc, lower 16 bit locate the first object * offset in a subpage of a zspage * - * Usage of struct page flags: + * Usage of struct zpdesc(page) flags: * PG_private: identifies the first component page * PG_owner_priv_1: identifies the huge component page * @@ -64,6 +64,7 @@ #include <linux/pagemap.h> #include <linux/fs.h> #include <linux/local_lock.h> +#include "zpdesc.h" #define ZSPAGE_MAGIC 0x58 @@ -253,7 +254,7 @@ struct zspage { }; unsigned int inuse; unsigned int freeobj; - struct page *first_page; + struct zpdesc *first_zpdesc; struct list_head list; /* fullness list */ struct zs_pool *pool; rwlock_t lock; @@ -448,7 +449,7 @@ static inline void mod_zspage_inuse(struct zspage *zspage, int val) static inline struct page *get_first_page(struct zspage *zspage) { - struct page *first_page = zspage->first_page; + struct page *first_page = zpdesc_page(zspage->first_zpdesc); VM_BUG_ON_PAGE(!is_first_page(first_page), first_page); return first_page; @@ -948,7 +949,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage, set_page_private(page, (unsigned long)zspage); page->index = 0; if (i == 0) { - zspage->first_page = page; + zspage->first_zpdesc = page_zpdesc(page); SetPagePrivate(page); if (unlikely(class->objs_per_zspage == 1 && class->pages_per_zspage == 1)) @@ -1325,7 +1326,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, link->handle = handle; else /* record handle to page->index */ - zspage->first_page->index = handle; + zspage->first_zpdesc->handle = handle; kunmap_atomic(vaddr); mod_zspage_inuse(zspage, 1);