diff mbox series

[RFC,v3] mm/slub: Optimize slub memory usage

Message ID 20230719102104.1954891-1-jaypatel@linux.ibm.com (mailing list archive)
State New
Headers show
Series [RFC,v3] mm/slub: Optimize slub memory usage | expand

Commit Message

Jay Patel July 19, 2023, 10:21 a.m. UTC
In the current implementation of the slub memory allocator, the slab
order selection process follows these criteria:

1) Determine the minimum order required to serve the minimum number of
objects (min_objects). This calculation is based on the formula (order =
min_objects * object_size / PAGE_SIZE).  2) If the minimum order is
greater than the maximum allowed order (slub_max_order), set
slub_max_order as the order for this slab.  3) If the minimum order is
less than the slub_max_order, iterate through a loop from minimum order
to slub_max_order and check if the condition (rem <= slab_size /
fract_leftover) holds true. Here, slab_size is calculated as (PAGE_SIZE
<< order), rem is (slab_size % object_size), and fract_leftover can have
values of 16, 8, or 4. If the condition is true, select that order for
the slab.

However, in point 3, when calculating the fraction left over, it can
result in a large range of values (like 1 Kb to 256 bytes on 4K page
size & 4 Kb to 16 Kb on 64K page size with order 0 and goes on
increasing with higher order) when compared to the remainder (rem). This
can lead to the selection of an order that results in more memory
wastage. To mitigate such wastage, we have modified point 3 as follows:
To adjust the value of fract_leftover based on the page size, while
retaining the current value as the default for a 4K page size.

Test results are as follows:

1) On 160 CPUs with 64K Page size

+-----------------+----------------+----------------+
|          Total wastage in slub memory             |
+-----------------+----------------+----------------+
|                 | After Boot     |After Hackbench |
| Normal          | 932 Kb         | 1812 Kb        |
| With Patch      | 729 Kb         | 1636 Kb        |
| Wastage reduce  | ~22%           | ~10%           |
+-----------------+----------------+----------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 1855296        | 2944576        |
| With Patch      | 1544576        | 2692032        |
| Memory reduce   | ~17%           | ~9%            |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+-----+----------+----------+-----------+
| Amean | 1   | 1.2727   | 1.2450   | ( 2.22%)  |
| Amean | 4   | 1.6063   | 1.5810   | ( 1.60%)  |
| Amean | 7   | 2.4190   | 2.3983   | ( 0.86%)  |
| Amean | 12  | 3.9730   | 3.9347   | ( 0.97%)  |
| Amean | 21  | 6.9823   | 6.8957   | ( 1.26%)  |
| Amean | 30  | 10.1867  | 10.0600  | ( 1.26%)  |
| Amean | 48  | 16.7490  | 16.4853  | ( 1.60%)  |
| Amean | 79  | 28.1870  | 27.8673  | ( 1.15%)  |
| Amean | 110 | 39.8363  | 39.3793  | ( 1.16%) |
| Amean | 141 | 51.5277  | 51.4907  | ( 0.07%)  |
| Amean | 172 | 62.9700  | 62.7300  | ( 0.38%)  |
| Amean | 203 | 74.5037  | 74.0630  | ( 0.59%)  |
| Amean | 234 | 85.6560  | 85.3587  | ( 0.35%)  |
| Amean | 265 | 96.9883  | 96.3770  | ( 0.63%)  |
| Amean | 296 | 108.6893 | 108.0870 | ( 0.56%)  |
+-------+-----+----------+----------+-----------+

2) On 16 CPUs with 64K Page size

+----------------+----------------+----------------+
|          Total wastage in slub memory            |
+----------------+----------------+----------------+
|                | After Boot     | After Hackbench|
| Normal         | 273 Kb         | 544 Kb         |
| With Patch     | 260 Kb         | 500 Kb         |
| Wastage reduce | ~5%            | ~9%            |
+----------------+----------------+----------------+

+-----------------+----------------+----------------+
|            Total slub memory                      |
+-----------------+----------------+----------------+
|                 | After Boot     | After Hackbench|
| Normal          | 275840          | 412480        |
| With Patch      | 272768          | 406208        |
| Memory reduce   | ~1%             | ~2%           |
+-----------------+----------------+----------------+

hackbench-process-sockets
+-------+----+---------+---------+-----------+
| Amean | 1  | 0.9513  | 0.9250  | ( 2.77%)  |
| Amean | 4  | 2.9630  | 2.9570  | ( 0.20%)  |
| Amean | 7  | 5.1780  | 5.1763  | ( 0.03%)  |
| Amean | 12 | 8.8833  | 8.8817  | ( 0.02%)  |
| Amean | 21 | 15.7577 | 15.6883 | ( 0.44%)  |
| Amean | 30 | 22.2063 | 22.2843 | ( -0.35%) |
| Amean | 48 | 36.0587 | 36.1390 | ( -0.22%) |
| Amean | 64 | 49.7803 | 49.3457 | ( 0.87%)  |
+-------+----+---------+---------+-----------+

Signed-off-by: Jay Patel <jaypatel@linux.ibm.com>
---
Changes from V2
1) removed all page order selection logic for slab cache base on
wastage.
2) Increasing fraction size base on page size (keeping current value
as default to 4K page)

Changes from V1
1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it
will return with PAGE_ALLOC_COSTLY_ORDER.
2) Similarly, if min_objects * object_size < PAGE_SIZE, then it will
return with slub_min_order.
3) Additionally, I changed slub_max_order to 2. There is no specific
reason for using the value 2, but it provided the best results in
terms of performance without any noticeable impact.

 arch/powerpc/include/asm/page.h |  2 ++
 mm/slub.c                       | 15 +++++----------
 2 files changed, 7 insertions(+), 10 deletions(-)
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
index f2b6bf5687d0..0dc53692d0e1 100644
--- a/arch/powerpc/include/asm/page.h
+++ b/arch/powerpc/include/asm/page.h
@@ -22,6 +22,8 @@ 
  */
 #define PAGE_SHIFT		CONFIG_PPC_PAGE_SHIFT
 #define PAGE_SIZE		(ASM_CONST(1) << PAGE_SHIFT)
+#define PAGE_SHIFT_4K		12
+#define PAGE_SIZE_4K		(1 << PAGE_SHIFT_4K)
 
 #ifndef __ASSEMBLY__
 #ifndef CONFIG_HUGETLB_PAGE
diff --git a/mm/slub.c b/mm/slub.c
index c87628cd8a9a..058bcc235b63 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4117,6 +4117,7 @@  static inline int calculate_order(unsigned int size)
 	unsigned int min_objects;
 	unsigned int max_objects;
 	unsigned int nr_cpus;
+	unsigned int page_frac;
 
 	/*
 	 * Attempt to find best configuration for a slab. This
@@ -4145,10 +4146,12 @@  static inline int calculate_order(unsigned int size)
 	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
-	while (min_objects > 1) {
+	page_frac = ((PAGE_SIZE/PAGE_SIZE_4K) == 1) ? 0 : PAGE_SIZE/PAGE_SIZE_4K;
+
+	while (min_objects >= 1) {
 		unsigned int fraction;
 
-		fraction = 16;
+		fraction = 16 + page_frac;
 		while (fraction >= 4) {
 			order = calc_slab_order(size, min_objects,
 					slub_max_order, fraction);
@@ -4159,14 +4162,6 @@  static inline int calculate_order(unsigned int size)
 		min_objects--;
 	}
 
-	/*
-	 * We were unable to place multiple objects in a slab. Now
-	 * lets see if we can place a single object there.
-	 */
-	order = calc_slab_order(size, 1, slub_max_order, 1);
-	if (order <= slub_max_order)
-		return order;
-
 	/*
 	 * Doh this slab cannot be placed using slub_max_order.
 	 */