[RFC,v5,13/16] slub: Enable balancing slabs across nodes

Message ID	20190520054017.32299-14-tobin@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: softfail (google.com: domain of transitioning tobin@kernel.org does not designate 66.111.4.230 as permitted sender) client-ip=66.111.4.230; From: "Tobin C. Harding" <tobin@kernel.org> To: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org> Cc: "Tobin C. Harding" <tobin@kernel.org>, Roman Gushchin <guro@fb.com>, Alexander Viro <viro@ftp.linux.org.uk>, Christoph Hellwig <hch@infradead.org>, Pekka Enberg <penberg@cs.helsinki.fi>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Christopher Lameter <cl@linux.com>, Miklos Szeredi <mszeredi@redhat.com>, Andreas Dilger <adilger@dilger.ca>, Waiman Long <longman@redhat.com>, Tycho Andersen <tycho@tycho.ws>, Theodore Ts'o <tytso@mit.edu>, Andi Kleen <ak@linux.intel.com>, David Chinner <david@fromorbit.com>, Nick Piggin <npiggin@gmail.com>, Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v5 13/16] slub: Enable balancing slabs across nodes Date: Mon, 20 May 2019 15:40:14 +1000 Message-Id: <20190520054017.32299-14-tobin@kernel.org> In-Reply-To: <20190520054017.32299-1-tobin@kernel.org> References: <20190520054017.32299-1-tobin@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	Slab Movable Objects (SMO) \| expand [RFC,v5,00/16] Slab Movable Objects (SMO) [RFC,v5,01/16] slub: Add isolate() and migrate() methods [RFC,v5,02/16] tools/vm/slabinfo: Add support for -C and -M options [RFC,v5,03/16] slub: Sort slab cache list [RFC,v5,04/16] slub: Slab defrag core [RFC,v5,05/16] tools/vm/slabinfo: Add remote node defrag ratio output [RFC,v5,06/16] tools/vm/slabinfo: Add defrag_used_ratio output [RFC,v5,07/16] tools/testing/slab: Add object migration test module [RFC,v5,08/16] tools/testing/slab: Add object migration test suite [RFC,v5,09/16] lib: Separate radix_tree_node and xa_node slab cache [RFC,v5,10/16] xarray: Implement migration function for xa_node objects [RFC,v5,11/16] tools/testing/slab: Add XArray movable objects tests [RFC,v5,12/16] slub: Enable moving objects to/from specific nodes [RFC,v5,13/16] slub: Enable balancing slabs across nodes [RFC,v5,14/16] dcache: Provide a dentry constructor [RFC,v5,15/16] dcache: Implement partial shrink via Slab Movable Objects [RFC,v5,16/16] dcache: Add CONFIG_DCACHE_SMO

Message ID

20190520054017.32299-14-tobin@kernel.org (mailing list archive)

State

New, archived

Headers

Received-SPF: softfail (google.com: domain of transitioning tobin@kernel.org
 does not designate 66.111.4.230 as permitted sender) client-ip=66.111.4.230;
From: "Tobin C. Harding" <tobin@kernel.org>
To: Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>
Cc: "Tobin C. Harding" <tobin@kernel.org>,
	Roman Gushchin <guro@fb.com>,
	Alexander Viro <viro@ftp.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	David Rientjes <rientjes@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Christopher Lameter <cl@linux.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Andreas Dilger <adilger@dilger.ca>,
	Waiman Long <longman@redhat.com>,
	Tycho Andersen <tycho@tycho.ws>,
	Theodore Ts'o <tytso@mit.edu>,
	Andi Kleen <ak@linux.intel.com>,
	David Chinner <david@fromorbit.com>,
	Nick Piggin <npiggin@gmail.com>,
	Rik van Riel <riel@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH v5 13/16] slub: Enable balancing slabs across nodes
Date: Mon, 20 May 2019 15:40:14 +1000
Message-Id: <20190520054017.32299-14-tobin@kernel.org>
In-Reply-To: <20190520054017.32299-1-tobin@kernel.org>
References: <20190520054017.32299-1-tobin@kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

Slab Movable Objects (SMO) | expand

Commit Message

Tobin C. Harding May 20, 2019, 5:40 a.m. UTC

We have just implemented Slab Movable Objects (SMO).  On NUMA systems
slabs can become unbalanced i.e. many slabs on one node while other
nodes have few slabs.  Using SMO we can balance the slabs across all
the nodes.

The algorithm used is as follows:

 1. Move all objects to node 0 (this has the effect of defragmenting the
    cache).

 2. Calculate the desired number of slabs for each node (this is done
    using the approximation nr_slabs / nr_nodes).

 3. Loop over the nodes moving the desired number of slabs from node 0
    to the node.

Feature is conditionally built in with CONFIG_SMO_NODE, this is because
we need the full list (we enable SLUB_DEBUG to get this).  Future
version may separate final list out of SLUB_DEBUG.

Expose this functionality to userspace via a sysfs entry.  Add sysfs
entry:

       /sysfs/kernel/slab/<cache>/balance

Write of '1' to this file triggers balance, no other value accepted.

This feature relies on SMO being enable for the cache, this is done with
a call to, after the isolate/migrate functions have been defined.

	kmem_cache_setup_mobility(s, isolate, migrate)

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
---
 mm/slub.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 120 insertions(+)

Comments

Roman Gushchin May 21, 2019, 1:04 a.m. UTC | #1

On Mon, May 20, 2019 at 03:40:14PM +1000, Tobin C. Harding wrote:
> We have just implemented Slab Movable Objects (SMO).  On NUMA systems
> slabs can become unbalanced i.e. many slabs on one node while other
> nodes have few slabs.  Using SMO we can balance the slabs across all
> the nodes.
> 
> The algorithm used is as follows:
> 
>  1. Move all objects to node 0 (this has the effect of defragmenting the
>     cache).

This already sounds dangerous (or costly). Can't it be done without
cross-node data moves?

> 
>  2. Calculate the desired number of slabs for each node (this is done
>     using the approximation nr_slabs / nr_nodes).

So that on this step only (actual data size - desired data size) has
to be moved?

Thanks!

Tobin Harding May 21, 2019, 1:44 a.m. UTC | #2

On Tue, May 21, 2019 at 01:04:10AM +0000, Roman Gushchin wrote:
> On Mon, May 20, 2019 at 03:40:14PM +1000, Tobin C. Harding wrote:
> > We have just implemented Slab Movable Objects (SMO).  On NUMA systems
> > slabs can become unbalanced i.e. many slabs on one node while other
> > nodes have few slabs.  Using SMO we can balance the slabs across all
> > the nodes.
> > 
> > The algorithm used is as follows:
> > 
> >  1. Move all objects to node 0 (this has the effect of defragmenting the
> >     cache).
> 
> This already sounds dangerous (or costly). Can't it be done without
> cross-node data moves?
>
> > 
> >  2. Calculate the desired number of slabs for each node (this is done
> >     using the approximation nr_slabs / nr_nodes).
> 
> So that on this step only (actual data size - desired data size) has
> to be moved?

This is just the most braindead algorithm I could come up with.  Surely
there are a bunch of things that could be improved.  Since I don't know
the exact use case it seemed best not to optimize for any one use case.

I'll review, comment on, and test any algorithm you come up with!

thanks,
Tobin.

diff --git a/mm/slub.c b/mm/slub.c
index 9582f2fc97d2..25b6d1e408e3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4574,6 +4574,109 @@  static unsigned long kmem_cache_move_to_node(struct kmem_cache *s, int node)
 
 	return left;
 }
+
+/*
+ * kmem_cache_move_slabs() - Attempt to move @num slabs to target_node,
+ * @s: The cache we are working on.
+ * @node: The node to move objects from.
+ * @target_node: The node to move objects to.
+ * @num: The number of slabs to move.
+ *
+ * Attempts to move @num slabs from @node to @target_node.  This is done
+ * by migrating objects from slabs on the full_list.
+ *
+ * Return: The number of slabs moved or error code.
+ */
+static long kmem_cache_move_slabs(struct kmem_cache *s,
+				  int node, int target_node, long num)
+{
+	struct kmem_cache_node *n = get_node(s, node);
+	LIST_HEAD(move_list);
+	struct page *page, *page2;
+	unsigned long flags;
+	void **scratch;
+	long done = 0;
+
+	if (node == target_node)
+		return -EINVAL;
+
+	scratch = alloc_scratch(s);
+	if (!scratch)
+		return -ENOMEM;
+
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry_safe(page, page2, &n->full, lru) {
+		if (!slab_trylock(page))
+			/* Busy slab. Get out of the way */
+			continue;
+
+		list_move(&page->lru, &move_list);
+		page->frozen = 1;
+		slab_unlock(page);
+
+		if (++done >= num)
+			break;
+	}
+	spin_unlock_irqrestore(&n->list_lock, flags);
+
+	list_for_each_entry(page, &move_list, lru) {
+		if (page->inuse)
+			move_slab_page(page, scratch, target_node);
+	}
+	kfree(scratch);
+
+	/* Inspect results and dispose of pages */
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry_safe(page, page2, &move_list, lru) {
+		list_del(&page->lru);
+		slab_lock(page);
+		page->frozen = 0;
+
+		if (page->inuse) {
+			/*
+			 * This is best effort only, if slab still has
+			 * objects just put it back on the partial list.
+			 */
+			n->nr_partial++;
+			list_add_tail(&page->lru, &n->partial);
+			slab_unlock(page);
+		} else {
+			slab_unlock(page);
+			discard_slab(s, page);
+		}
+	}
+	spin_unlock_irqrestore(&n->list_lock, flags);
+
+	return done;
+}
+
+/*
+ * kmem_cache_balance_nodes() - Balance slabs across nodes.
+ * @s: The cache we are working on.
+ */
+static void kmem_cache_balance_nodes(struct kmem_cache *s)
+{
+	struct kmem_cache_node *n = get_node(s, 0);
+	unsigned long desired_nr_slabs_per_node;
+	unsigned long nr_slabs;
+	int nr_nodes = 0;
+	int nid;
+
+	(void)kmem_cache_move_to_node(s, 0);
+
+	for_each_node_state(nid, N_NORMAL_MEMORY)
+		nr_nodes++;
+
+	nr_slabs = atomic_long_read(&n->nr_slabs);
+	desired_nr_slabs_per_node = nr_slabs / nr_nodes;
+
+	for_each_node_state(nid, N_NORMAL_MEMORY) {
+		if (nid == 0)
+			continue;
+
+		kmem_cache_move_slabs(s, 0, nid, desired_nr_slabs_per_node);
+	}
+}
 #endif
 
 /**
@@ -5838,6 +5941,22 @@  static ssize_t move_store(struct kmem_cache *s, const char *buf, size_t length)
 	return length;
 }
 SLAB_ATTR(move);
+
+static ssize_t balance_show(struct kmem_cache *s, char *buf)
+{
+	return 0;
+}
+
+static ssize_t balance_store(struct kmem_cache *s,
+			     const char *buf, size_t length)
+{
+	if (buf[0] == '1')
+		kmem_cache_balance_nodes(s);
+	else
+		return -EINVAL;
+	return length;
+}
+SLAB_ATTR(balance);
 #endif	/* CONFIG_SMO_NODE */
 
 #ifdef CONFIG_NUMA
@@ -5966,6 +6085,7 @@  static struct attribute *slab_attrs[] = {
 	&shrink_attr.attr,
 #ifdef CONFIG_SMO_NODE
 	&move_attr.attr,
+	&balance_attr.attr,
 #endif
 	&slabs_cpu_partial_attr.attr,
 #ifdef CONFIG_SLUB_DEBUG

[RFC,v5,13/16] slub: Enable balancing slabs across nodes

Commit Message

Comments

Patch