Message ID | 20230619143506.45253-2-avromanov@sberdevices.ru (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Add obj allocated counter for subpages | expand |
On (23/06/19 17:35), Alexey Romanov wrote: > > We use a variable of type unsigned int to store the offset > of the first object at the subpage In turn, the offset cannot > exceed the size of PAGE_SIZE, which is usually 4096. Thus, > 12 bits are enough to store the offset. [..] > If the page size is 4Kb It's certainly not a given. PAGE_SIZE is architecture specific. PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there are, apparently, architectures that even can have PAGE_SIZE_256KB.
Hello Sergey! Thank you for feedback. On Tue, Jun 20, 2023 at 07:36:29PM +0900, Sergey Senozhatsky wrote: > On (23/06/19 17:35), Alexey Romanov wrote: > > > > We use a variable of type unsigned int to store the offset > > of the first object at the subpage In turn, the offset cannot > > exceed the size of PAGE_SIZE, which is usually 4096. Thus, > > 12 bits are enough to store the offset. > > [..] > > > If the page size is 4Kb > > It's certainly not a given. PAGE_SIZE is architecture specific. > PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there > are, apparently, architectures that even can have PAGE_SIZE_256KB. Sure. As far I understand at the moment the maximum size of the page (according to Kconfig info in linux sources) is 256Kb. In this case, we need maximum 18 bits for storing offset. And 2^18 / 32 = 8192 bytes (13 bits, I think u16 is OK for such purpose) for storing allocated objects counter. If sizeof(unsigned int) >= 32 bits the this will be enough for us. Of course, in rare cases this will not be the case. But it seems that zram and kernel already has similiar places. For example, if page size is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not wotk on such system, because we can't store offset. But such case is very rare, most systems have unsigned int over 32 bits. Therefore, I think that my idea is still applicable, we just need to change the counter type. What do you think?
On (23/06/20 11:16), Alexey Romanov wrote: > If sizeof(unsigned int) >= 32 bits the this will be enough for us. > Of course, in rare cases this will not be the case. But it seems that > zram and kernel already has similiar places. For example, if page size > is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not > wotk on such system, because we can't store offset. But such case is > very rare, most systems have unsigned int over 32 bits. > > Therefore, I think that my idea is still applicable, we just need to > change the counter type. What do you think? My gut feeling is that we better avoid mixing in architecture specific magic into generic code. It works fine until it doesn't. May be Minchan will have a different opinion tho. There can be other ways to avoid linear scan of empty sub-pages. For instance, something like below probably can cover less cases than your patch 0002, but on the other hand is rather generic, trivial and doesn't contain any assumptions on the architecture specifics. (composed/edited in mail client, so likely is broken, but outlines the idea) ==================================================================== mm/zsmalloc: do not scan empty zspages We already stop zspage migration when we detect that target zspage has no space left for any new objects. There is one more thing we can do in order to avoid doing useless work: stop scanning for allocated objects in sub-pages when we have migrated the last inuse object from the zspage in question. diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 02f7f414aade..2875152e6497 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage) return get_zspage_inuse(zspage) == class->objs_per_zspage; } +static bool zspage_empty(struct zspage *zspage) +{ + return get_zspage_inuse(zspage) == 0; +} + /** * zs_lookup_class_index() - Returns index of the zsmalloc &size_class * that hold objects of the provided size. @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, obj_idx++; record_obj(handle, free_obj); obj_free(class->size, used_obj, NULL); + + /* Stop if there are no more objects to migrate */ + if (zspage_empty(get_zspage(s_page))) + break; }
Hi! On Wed, Jun 21, 2023 at 10:17:16PM +0900, Sergey Senozhatsky wrote: > On (23/06/20 11:16), Alexey Romanov wrote: > > If sizeof(unsigned int) >= 32 bits the this will be enough for us. > > Of course, in rare cases this will not be the case. But it seems that > > zram and kernel already has similiar places. For example, if page size > > is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not > > wotk on such system, because we can't store offset. But such case is > > very rare, most systems have unsigned int over 32 bits. > > > > Therefore, I think that my idea is still applicable, we just need to > > change the counter type. What do you think? > > My gut feeling is that we better avoid mixing in architecture specific > magic into generic code. It works fine until it doesn't. May be Minchan > will have a different opinion tho. > > There can be other ways to avoid linear scan of empty sub-pages. For > instance, something like below probably can cover less cases than your > patch 0002, but on the other hand is rather generic, trivial and doesn't > contain any assumptions on the architecture specifics. > > (composed/edited in mail client, so likely is broken, but outlines > the idea) > > ==================================================================== > > mm/zsmalloc: do not scan empty zspages > > We already stop zspage migration when we detect that target > zspage has no space left for any new objects. There is > one more thing we can do in order to avoid doing useless > work: stop scanning for allocated objects in sub-pages when > we have migrated the last inuse object from the zspage in > question. > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 02f7f414aade..2875152e6497 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage) > return get_zspage_inuse(zspage) == class->objs_per_zspage; > } > > +static bool zspage_empty(struct zspage *zspage) > +{ > + return get_zspage_inuse(zspage) == 0; > +} > + > /** > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > * that hold objects of the provided size. > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > obj_idx++; > record_obj(handle, free_obj); > obj_free(class->size, used_obj, NULL); > + > + /* Stop if there are no more objects to migrate */ > + if (zspage_empty(get_zspage(s_page))) > + break; > } Yes it seems my version is not as good as I thought. Looks bad for an architecturally dependent PAGE_SIZE. Your version sounds good. In general, I can implement this option. I'll test this and send patch this week.
On (23/06/21 13:41), Alexey Romanov wrote: [..] > > +static bool zspage_empty(struct zspage *zspage) > > +{ > > + return get_zspage_inuse(zspage) == 0; > > +} > > + > > /** > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > * that hold objects of the provided size. > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > obj_idx++; > > record_obj(handle, free_obj); > > obj_free(class->size, used_obj, NULL); > > + > > + /* Stop if there are no more objects to migrate */ > > + if (zspage_empty(get_zspage(s_page))) > > + break; > > } > > Yes it seems my version is not as good as I thought. Looks bad for an > architecturally dependent PAGE_SIZE. [..] Well, we are looking for a solution that is both reasonable (perf wise) and is maintainable. > I can implement this option. I'll test this and send patch this week. Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> is good enough for you, then I can send a series tonight or tomorrow (after some testing). I have two more patches on top of that one.
On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote: > On (23/06/21 13:41), Alexey Romanov wrote: > [..] > > > +static bool zspage_empty(struct zspage *zspage) > > > +{ > > > + return get_zspage_inuse(zspage) == 0; > > > +} > > > + > > > /** > > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > > * that hold objects of the provided size. > > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > > obj_idx++; > > > record_obj(handle, free_obj); > > > obj_free(class->size, used_obj, NULL); > > > + > > > + /* Stop if there are no more objects to migrate */ > > > + if (zspage_empty(get_zspage(s_page))) > > > + break; > > > } > > > > Yes it seems my version is not as good as I thought. Looks bad for an > > architecturally dependent PAGE_SIZE. [..] > > Well, we are looking for a solution that is both reasonable (perf wise) > and is maintainable. > > > I can implement this option. I'll test this and send patch this week. > > Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> > is good enough for you, then I can send a series tonight or tomorrow (after > some testing). I have two more patches on top of that one. Yeah, Suggested-by is OK. Let's send a patch. Thank you.
On (23/06/21 13:59), Alexey Romanov wrote: > On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote: > > On (23/06/21 13:41), Alexey Romanov wrote: > > [..] > > > > +static bool zspage_empty(struct zspage *zspage) > > > > +{ > > > > + return get_zspage_inuse(zspage) == 0; > > > > +} > > > > + > > > > /** > > > > * zs_lookup_class_index() - Returns index of the zsmalloc &size_class > > > > * that hold objects of the provided size. > > > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class, > > > > obj_idx++; > > > > record_obj(handle, free_obj); > > > > obj_free(class->size, used_obj, NULL); > > > > + > > > > + /* Stop if there are no more objects to migrate */ > > > > + if (zspage_empty(get_zspage(s_page))) > > > > + break; > > > > } > > > > > > Yes it seems my version is not as good as I thought. Looks bad for an > > > architecturally dependent PAGE_SIZE. [..] > > > > Well, we are looking for a solution that is both reasonable (perf wise) > > and is maintainable. > > > > > I can implement this option. I'll test this and send patch this week. > > > > Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru> > > is good enough for you, then I can send a series tonight or tomorrow (after > > some testing). I have two more patches on top of that one. > > Yeah, Suggested-by is OK. Let's send a patch. Thank you. Got it. Let's hear from Minchan first, just to make sure that Minchan is OK with that.
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index c0d433541636..dd6e2c3429e0 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -20,7 +20,10 @@ * page->index: links together all component pages of a zspage * For the huge page, this is always 0, so we use this field * to store handle. - * page->page_type: first object offset in a subpage of zspage + * page->page_type: + * First byte: count of allocated objects (OBJ_ALLOCATED_TAG) + * in a subpage of zspage. + * Other bytes: first object offset in a subpage of zspage. * * Usage of struct page flags: * PG_private: identifies the first component page @@ -126,6 +129,9 @@ #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) +#define OBJ_ALLOCATED_BITS (sizeof(u8) * BITS_PER_BYTE) +#define OBJ_ALLOCATED_MASK ((1UL << OBJ_ALLOCATED_BITS) - 1) + #define HUGE_BITS 1 #define FULLNESS_BITS 4 #define CLASS_BITS 8 @@ -520,14 +526,37 @@ static inline struct page *get_first_page(struct zspage *zspage) return first_page; } +static inline u8 get_obj_allocated(struct page *page) +{ + return page->page_type & OBJ_ALLOCATED_MASK; +} + +static inline void set_obj_allocated(struct page *page, u8 value) +{ + page->page_type = (page->page_type & ~OBJ_ALLOCATED_MASK) | value; +} + +static inline void mod_obj_allocated(struct page *page, s8 value) +{ + u8 inuse = get_obj_allocated(page); + /* + * Overflow is not possible: + * 1. Maximum number of objects allocated on a subpage is 128. + * 2. We use this function only with value = 1 or -1. + */ + inuse += value; + set_obj_allocated(page, inuse); +} + static inline unsigned int get_first_obj_offset(struct page *page) { - return page->page_type; + return page->page_type >> OBJ_ALLOCATED_BITS; } static inline void set_first_obj_offset(struct page *page, unsigned int offset) { - page->page_type = offset; + page->page_type = (page->page_type & OBJ_ALLOCATED_MASK) | + (offset << OBJ_ALLOCATED_BITS); } static inline unsigned int get_freeobj(struct zspage *zspage) @@ -1126,6 +1155,7 @@ static struct zspage *alloc_zspage(struct zs_pool *pool, } inc_zone_page_state(page, NR_ZSPAGES); + set_obj_allocated(page, 0); pages[i] = page; } @@ -1456,6 +1486,7 @@ static unsigned long obj_malloc(struct zs_pool *pool, kunmap_atomic(vaddr); mod_zspage_inuse(zspage, 1); + mod_obj_allocated(m_page, 1); obj = location_to_obj(m_page, obj); @@ -1576,6 +1607,7 @@ static void obj_free(int class_size, unsigned long obj, unsigned long *handle) kunmap_atomic(vaddr); mod_zspage_inuse(zspage, -1); + mod_obj_allocated(f_page, -1); } void zs_free(struct zs_pool *pool, unsigned long handle)
We use a variable of type unsigned int to store the offset of the first object at the subpage In turn, the offset cannot exceed the size of PAGE_SIZE, which is usually 4096. Thus, 12 bits are enough to store the offset. We can use the remaining bytes to store, for example, the count of allocated objects on a subpage. If the page size is 4Kb, then no more than 128 (4096 / 32) objects can be allocated on the subpage, which means that one byte is enough to store the counter of allocated objects. This patch adds support for counting the number of allocated objects on a subpage in the first byte of the page_type field. The offset of the first object is now stored in the remaining bytes of this field. The sum of allocated counter for all subpages is zspage->inuse. We only count objects that have been tagged (I'm talking about OBJ_ALLOCATED_TAG) on a subpage. So, for example, in the situation: subpage 1 subpage 2 [obj1_s - obj1_e, obj2_s - ] -> [obj2_e, obj3_s - obj3_e, free space] Allocated counter for subpage 1 will be 2, and 1 for subpage 2. Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru> --- mm/zsmalloc.c | 38 +++++++++++++++++++++++++++++++++++--- 1 file changed, 35 insertions(+), 3 deletions(-)