diff mbox series

[v1,1/2] zsmalloc: add allocated objects counter for subpage

Message ID 20230619143506.45253-2-avromanov@sberdevices.ru (mailing list archive)
State New
Headers show
Series Add obj allocated counter for subpages | expand

Commit Message

Alexey Romanov June 19, 2023, 2:35 p.m. UTC
We use a variable of type unsigned int to store the offset
of the first object at the subpage In turn, the offset cannot
exceed the size of PAGE_SIZE, which is usually 4096. Thus,
12 bits are enough to store the offset.

We can use the remaining bytes to store, for example, the
count of allocated objects on a subpage. If the page size is
4Kb, then no more than 128 (4096 / 32) objects can be allocated
on the subpage, which means that one byte is enough to store
the counter of allocated objects.

This patch adds support for counting the number of allocated
objects on a subpage in the first byte of the page_type field.
The offset of the first object is now stored in the remaining
bytes of this field.

The sum of allocated counter for all subpages is zspage->inuse.
We only count objects that have been tagged (I'm talking about
OBJ_ALLOCATED_TAG) on a subpage.

So, for example, in the situation:

            subpage 1                   subpage 2
[obj1_s - obj1_e, obj2_s - ] -> [obj2_e, obj3_s - obj3_e, free space]

Allocated counter for subpage 1 will be 2, and 1 for subpage 2.

Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
---
 mm/zsmalloc.c | 38 +++++++++++++++++++++++++++++++++++---
 1 file changed, 35 insertions(+), 3 deletions(-)

Comments

Sergey Senozhatsky June 20, 2023, 10:36 a.m. UTC | #1
On (23/06/19 17:35), Alexey Romanov wrote:
> 
> We use a variable of type unsigned int to store the offset
> of the first object at the subpage In turn, the offset cannot
> exceed the size of PAGE_SIZE, which is usually 4096. Thus,
> 12 bits are enough to store the offset.

[..]

> If the page size is 4Kb

It's certainly not a given. PAGE_SIZE is architecture specific.
PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there
are, apparently, architectures that even can have PAGE_SIZE_256KB.
Alexey Romanov June 20, 2023, 11:16 a.m. UTC | #2
Hello Sergey! Thank you for feedback.

On Tue, Jun 20, 2023 at 07:36:29PM +0900, Sergey Senozhatsky wrote:
> On (23/06/19 17:35), Alexey Romanov wrote:
> > 
> > We use a variable of type unsigned int to store the offset
> > of the first object at the subpage In turn, the offset cannot
> > exceed the size of PAGE_SIZE, which is usually 4096. Thus,
> > 12 bits are enough to store the offset.
> 
> [..]
> 
> > If the page size is 4Kb
> 
> It's certainly not a given. PAGE_SIZE is architecture specific.
> PAGE_SIZE_16KB and PAGE_SIZE_64KB would be simple examples, but there
> are, apparently, architectures that even can have PAGE_SIZE_256KB.

Sure. 

As far I understand at the moment the maximum size of the page
(according to Kconfig info in linux sources) is 256Kb. In this case,
we need maximum 18 bits for storing offset. And 2^18 / 32 = 8192 bytes
(13 bits, I think u16 is OK for such purpose) for storing allocated
objects counter. 

If sizeof(unsigned int) >= 32 bits the this will be enough for us. 
Of course, in rare cases this will not be the case. But it seems that
zram and kernel already has similiar places. For example, if page size
is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not
wotk on such system, because we can't store offset. But such case is
very rare, most systems have unsigned int over 32 bits. 

Therefore, I think that my idea is still applicable, we just need to
change the counter type. What do you think?
Sergey Senozhatsky June 21, 2023, 1:17 p.m. UTC | #3
On (23/06/20 11:16), Alexey Romanov wrote:
> If sizeof(unsigned int) >= 32 bits the this will be enough for us. 
> Of course, in rare cases this will not be the case. But it seems that
> zram and kernel already has similiar places. For example, if page size
> is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not
> wotk on such system, because we can't store offset. But such case is
> very rare, most systems have unsigned int over 32 bits. 
> 
> Therefore, I think that my idea is still applicable, we just need to
> change the counter type. What do you think?

My gut feeling is that we better avoid mixing in architecture specific
magic into generic code. It works fine until it doesn't. May be Minchan
will have a different opinion tho.

There can be other ways to avoid linear scan of empty sub-pages. For
instance, something like below probably can cover less cases than your
patch 0002, but on the other hand is rather generic, trivial and doesn't
contain any assumptions on the architecture specifics.

(composed/edited in mail client, so likely is broken, but outlines
 the idea)

====================================================================

mm/zsmalloc: do not scan empty zspages

We already stop zspage migration when we detect that target
zspage has no space left for any new objects. There is
one more thing we can do in order to avoid doing useless
work: stop scanning for allocated objects in sub-pages when
we have migrated the last inuse object from the zspage in
question.

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 02f7f414aade..2875152e6497 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage)
        return get_zspage_inuse(zspage) == class->objs_per_zspage;
 }
 
+static bool zspage_empty(struct zspage *zspage)
+{
+	return get_zspage_inuse(zspage) == 0;
+}
+
 /**
  * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
  * that hold objects of the provided size.
@@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
 		obj_idx++;
 		record_obj(handle, free_obj);
 		obj_free(class->size, used_obj, NULL);
+
+		/* Stop if there are no more objects to migrate */
+		if (zspage_empty(get_zspage(s_page)))
+			break;
 	}
Alexey Romanov June 21, 2023, 1:41 p.m. UTC | #4
Hi!

On Wed, Jun 21, 2023 at 10:17:16PM +0900, Sergey Senozhatsky wrote:
> On (23/06/20 11:16), Alexey Romanov wrote:
> > If sizeof(unsigned int) >= 32 bits the this will be enough for us. 
> > Of course, in rare cases this will not be the case. But it seems that
> > zram and kernel already has similiar places. For example, if page size
> > is 256 Kb and sizeof(unsigned int) = 16 bits (2 byte), zram will not
> > wotk on such system, because we can't store offset. But such case is
> > very rare, most systems have unsigned int over 32 bits. 
> > 
> > Therefore, I think that my idea is still applicable, we just need to
> > change the counter type. What do you think?
> 
> My gut feeling is that we better avoid mixing in architecture specific
> magic into generic code. It works fine until it doesn't. May be Minchan
> will have a different opinion tho.
> 
> There can be other ways to avoid linear scan of empty sub-pages. For
> instance, something like below probably can cover less cases than your
> patch 0002, but on the other hand is rather generic, trivial and doesn't
> contain any assumptions on the architecture specifics.
> 
> (composed/edited in mail client, so likely is broken, but outlines
>  the idea)
> 
> ====================================================================
> 
> mm/zsmalloc: do not scan empty zspages
> 
> We already stop zspage migration when we detect that target
> zspage has no space left for any new objects. There is
> one more thing we can do in order to avoid doing useless
> work: stop scanning for allocated objects in sub-pages when
> we have migrated the last inuse object from the zspage in
> question.
> 
> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> index 02f7f414aade..2875152e6497 100644
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1263,6 +1263,11 @@ static bool zspage_full(struct size_class *class, struct zspage *zspage)
>         return get_zspage_inuse(zspage) == class->objs_per_zspage;
>  }
>  
> +static bool zspage_empty(struct zspage *zspage)
> +{
> +	return get_zspage_inuse(zspage) == 0;
> +}
> +
>  /**
>   * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
>   * that hold objects of the provided size.
> @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
>  		obj_idx++;
>  		record_obj(handle, free_obj);
>  		obj_free(class->size, used_obj, NULL);
> +
> +		/* Stop if there are no more objects to migrate */
> +		if (zspage_empty(get_zspage(s_page)))
> +			break;
>  	}

Yes it seems my version is not as good as I thought. Looks bad for an
architecturally dependent PAGE_SIZE.

Your version sounds good. In general, I can implement this option. I'll
test this and send patch this week.
Sergey Senozhatsky June 21, 2023, 1:55 p.m. UTC | #5
On (23/06/21 13:41), Alexey Romanov wrote:
[..]
> > +static bool zspage_empty(struct zspage *zspage)
> > +{
> > +	return get_zspage_inuse(zspage) == 0;
> > +}
> > +
> >  /**
> >   * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
> >   * that hold objects of the provided size.
> > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
> >  		obj_idx++;
> >  		record_obj(handle, free_obj);
> >  		obj_free(class->size, used_obj, NULL);
> > +
> > +		/* Stop if there are no more objects to migrate */
> > +		if (zspage_empty(get_zspage(s_page)))
> > +			break;
> >  	}
> 
> Yes it seems my version is not as good as I thought. Looks bad for an
> architecturally dependent PAGE_SIZE. [..]

Well, we are looking for a solution that is both reasonable (perf wise)
and is maintainable.

> I can implement this option. I'll test this and send patch this week.

Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru>
is good enough for you, then I can send a series tonight or tomorrow (after
some testing). I have two more patches on top of that one.
Alexey Romanov June 21, 2023, 1:59 p.m. UTC | #6
On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote:
> On (23/06/21 13:41), Alexey Romanov wrote:
> [..]
> > > +static bool zspage_empty(struct zspage *zspage)
> > > +{
> > > +	return get_zspage_inuse(zspage) == 0;
> > > +}
> > > +
> > >  /**
> > >   * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
> > >   * that hold objects of the provided size.
> > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > >  		obj_idx++;
> > >  		record_obj(handle, free_obj);
> > >  		obj_free(class->size, used_obj, NULL);
> > > +
> > > +		/* Stop if there are no more objects to migrate */
> > > +		if (zspage_empty(get_zspage(s_page)))
> > > +			break;
> > >  	}
> > 
> > Yes it seems my version is not as good as I thought. Looks bad for an
> > architecturally dependent PAGE_SIZE. [..]
> 
> Well, we are looking for a solution that is both reasonable (perf wise)
> and is maintainable.
> 
> > I can implement this option. I'll test this and send patch this week.
> 
> Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru>
> is good enough for you, then I can send a series tonight or tomorrow (after
> some testing). I have two more patches on top of that one.

Yeah, Suggested-by is OK. Let's send a patch. Thank you.
Sergey Senozhatsky June 21, 2023, 2:18 p.m. UTC | #7
On (23/06/21 13:59), Alexey Romanov wrote:
> On Wed, Jun 21, 2023 at 10:55:18PM +0900, Sergey Senozhatsky wrote:
> > On (23/06/21 13:41), Alexey Romanov wrote:
> > [..]
> > > > +static bool zspage_empty(struct zspage *zspage)
> > > > +{
> > > > +	return get_zspage_inuse(zspage) == 0;
> > > > +}
> > > > +
> > > >  /**
> > > >   * zs_lookup_class_index() - Returns index of the zsmalloc &size_class
> > > >   * that hold objects of the provided size.
> > > > @@ -1787,6 +1792,10 @@ static void migrate_zspage(struct zs_pool *pool, struct size_class *class,
> > > >  		obj_idx++;
> > > >  		record_obj(handle, free_obj);
> > > >  		obj_free(class->size, used_obj, NULL);
> > > > +
> > > > +		/* Stop if there are no more objects to migrate */
> > > > +		if (zspage_empty(get_zspage(s_page)))
> > > > +			break;
> > > >  	}
> > > 
> > > Yes it seems my version is not as good as I thought. Looks bad for an
> > > architecturally dependent PAGE_SIZE. [..]
> > 
> > Well, we are looking for a solution that is both reasonable (perf wise)
> > and is maintainable.
> > 
> > > I can implement this option. I'll test this and send patch this week.
> > 
> > Either that or, if Suggested-by: Alexey Romanov <AVRomanov@sberdevices.ru>
> > is good enough for you, then I can send a series tonight or tomorrow (after
> > some testing). I have two more patches on top of that one.
> 
> Yeah, Suggested-by is OK. Let's send a patch. Thank you.

Got it.

Let's hear from Minchan first, just to make sure that Minchan is OK
with that.
diff mbox series

Patch

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index c0d433541636..dd6e2c3429e0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -20,7 +20,10 @@ 
  *	page->index: links together all component pages of a zspage
  *		For the huge page, this is always 0, so we use this field
  *		to store handle.
- *	page->page_type: first object offset in a subpage of zspage
+ *	page->page_type:
+ *		First byte: count of allocated objects (OBJ_ALLOCATED_TAG)
+ *		in a subpage of zspage.
+ *		Other bytes: first object offset in a subpage of zspage.
  *
  * Usage of struct page flags:
  *	PG_private: identifies the first component page
@@ -126,6 +129,9 @@ 
 #define OBJ_INDEX_BITS	(BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
 #define OBJ_INDEX_MASK	((_AC(1, UL) << OBJ_INDEX_BITS) - 1)
 
+#define OBJ_ALLOCATED_BITS	(sizeof(u8) * BITS_PER_BYTE)
+#define OBJ_ALLOCATED_MASK	((1UL << OBJ_ALLOCATED_BITS) - 1)
+
 #define HUGE_BITS	1
 #define FULLNESS_BITS	4
 #define CLASS_BITS	8
@@ -520,14 +526,37 @@  static inline struct page *get_first_page(struct zspage *zspage)
 	return first_page;
 }
 
+static inline u8 get_obj_allocated(struct page *page)
+{
+	return page->page_type & OBJ_ALLOCATED_MASK;
+}
+
+static inline void set_obj_allocated(struct page *page, u8 value)
+{
+	page->page_type = (page->page_type & ~OBJ_ALLOCATED_MASK) | value;
+}
+
+static inline void mod_obj_allocated(struct page *page, s8 value)
+{
+	u8 inuse = get_obj_allocated(page);
+	/*
+	 * Overflow is not possible:
+	 * 1. Maximum number of objects allocated on a subpage is 128.
+	 * 2. We use this function only with value = 1 or -1.
+	 */
+	inuse += value;
+	set_obj_allocated(page, inuse);
+}
+
 static inline unsigned int get_first_obj_offset(struct page *page)
 {
-	return page->page_type;
+	return page->page_type >> OBJ_ALLOCATED_BITS;
 }
 
 static inline void set_first_obj_offset(struct page *page, unsigned int offset)
 {
-	page->page_type = offset;
+	page->page_type = (page->page_type & OBJ_ALLOCATED_MASK) |
+			  (offset << OBJ_ALLOCATED_BITS);
 }
 
 static inline unsigned int get_freeobj(struct zspage *zspage)
@@ -1126,6 +1155,7 @@  static struct zspage *alloc_zspage(struct zs_pool *pool,
 		}
 
 		inc_zone_page_state(page, NR_ZSPAGES);
+		set_obj_allocated(page, 0);
 		pages[i] = page;
 	}
 
@@ -1456,6 +1486,7 @@  static unsigned long obj_malloc(struct zs_pool *pool,
 
 	kunmap_atomic(vaddr);
 	mod_zspage_inuse(zspage, 1);
+	mod_obj_allocated(m_page, 1);
 
 	obj = location_to_obj(m_page, obj);
 
@@ -1576,6 +1607,7 @@  static void obj_free(int class_size, unsigned long obj, unsigned long *handle)
 
 	kunmap_atomic(vaddr);
 	mod_zspage_inuse(zspage, -1);
+	mod_obj_allocated(f_page, -1);
 }
 
 void zs_free(struct zs_pool *pool, unsigned long handle)