Message ID | 20181108011204.9491-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/slub: skip node in case there is no slab to acquire | expand |
On Thu, 8 Nov 2018 09:12:04 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > for_each_zone_zonelist() iterates the zonelist one by one, which means > it will iterate on zones on the same node. While get_partial_node() > checks available slab on node base instead of zone. > > This patch skip a node in case get_partial_node() fails to acquire slab > on that node. This is rather hard to follow. I *think* the patch is a performance optimization: prevent get_any_partial() from checking a node which get_partial_node() has already looked at? Could we please have a more complete changelog? > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1873,7 +1873,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, > * Get a page from somewhere. Search in increasing NUMA distances. > */ > static void *get_any_partial(struct kmem_cache *s, gfp_t flags, > - struct kmem_cache_cpu *c) > + struct kmem_cache_cpu *c, int except) > { > #ifdef CONFIG_NUMA > struct zonelist *zonelist; > @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, > enum zone_type high_zoneidx = gfp_zone(flags); > void *object; > unsigned int cpuset_mems_cookie; > + nodemask_t nmask = node_states[N_MEMORY]; > + > + node_clear(except, nmask); And please add a comment describing what's happening here and why it is done. Adding a sentence to the block comment over get_any_partial() would be suitable.
On Fri, Nov 09, 2018 at 12:48:06PM -0800, Andrew Morton wrote: >On Thu, 8 Nov 2018 09:12:04 +0800 Wei Yang <richard.weiyang@gmail.com> wrote: > >> for_each_zone_zonelist() iterates the zonelist one by one, which means >> it will iterate on zones on the same node. While get_partial_node() >> checks available slab on node base instead of zone. >> >> This patch skip a node in case get_partial_node() fails to acquire slab >> on that node. > >This is rather hard to follow. > >I *think* the patch is a performance optimization: prevent >get_any_partial() from checking a node which get_partial_node() has >already looked at? You are right :-) > >Could we please have a more complete changelog? Hmm... I would like to. But I am not sure which part makes you hard to follow. If you would like to tell me the pain point, I am glad to think about how to make it more obvious. > >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1873,7 +1873,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, >> * Get a page from somewhere. Search in increasing NUMA distances. >> */ >> static void *get_any_partial(struct kmem_cache *s, gfp_t flags, >> - struct kmem_cache_cpu *c) >> + struct kmem_cache_cpu *c, int except) >> { >> #ifdef CONFIG_NUMA >> struct zonelist *zonelist; >> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, >> enum zone_type high_zoneidx = gfp_zone(flags); >> void *object; >> unsigned int cpuset_mems_cookie; >> + nodemask_t nmask = node_states[N_MEMORY]; >> + >> + node_clear(except, nmask); > >And please add a comment describing what's happening here and why it is >done. Adding a sentence to the block comment over get_any_partial() >would be suitable. > Sure, I would address this in next spin.
On Thu 08-11-18 09:12:04, Wei Yang wrote: > for_each_zone_zonelist() iterates the zonelist one by one, which means > it will iterate on zones on the same node. While get_partial_node() > checks available slab on node base instead of zone. > > This patch skip a node in case get_partial_node() fails to acquire slab > on that node. If this is an optimization then it should be accompanied by some numbers. > @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, > enum zone_type high_zoneidx = gfp_zone(flags); > void *object; > unsigned int cpuset_mems_cookie; > + nodemask_t nmask = node_states[N_MEMORY]; This will allocate a large bitmask on the stack and that is no-go for something that might be called from a potentially deep call stack already. Also are you sure that the micro-optimization offsets the copying overhead?
On Tue, Nov 13, 2018 at 02:17:51PM +0100, Michal Hocko wrote: >On Thu 08-11-18 09:12:04, Wei Yang wrote: >> for_each_zone_zonelist() iterates the zonelist one by one, which means >> it will iterate on zones on the same node. While get_partial_node() >> checks available slab on node base instead of zone. >> >> This patch skip a node in case get_partial_node() fails to acquire slab >> on that node. > >If this is an optimization then it should be accompanied by some >numbers. Let me try to get some test result. Do you have some suggestion on the test suite? Is kernel build a proper test? > >> @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, >> enum zone_type high_zoneidx = gfp_zone(flags); >> void *object; >> unsigned int cpuset_mems_cookie; >> + nodemask_t nmask = node_states[N_MEMORY]; > >This will allocate a large bitmask on the stack and that is no-go for >something that might be called from a potentially deep call stack >already. Also are you sure that the micro-optimization offsets the >copying overhead? > You are right. I didn't pay attention to this. I got one other idea to achieve this effect, like the one in get_page_from_freelist(). In get_page_from_freelist(), we use last_pgdat_dirty_limit to track the last node out of dirty limit. I am willing to borrow this idea in get_any_partial() to skip a node. Well, let me do some tests to see whether this is visible. >-- >Michal Hocko >SUSE Labs
On Tue 13-11-18 13:26:24, Wei Yang wrote: > On Tue, Nov 13, 2018 at 02:17:51PM +0100, Michal Hocko wrote: > >On Thu 08-11-18 09:12:04, Wei Yang wrote: > >> for_each_zone_zonelist() iterates the zonelist one by one, which means > >> it will iterate on zones on the same node. While get_partial_node() > >> checks available slab on node base instead of zone. > >> > >> This patch skip a node in case get_partial_node() fails to acquire slab > >> on that node. > > > >If this is an optimization then it should be accompanied by some > >numbers. > > Let me try to get some test result. > > Do you have some suggestion on the test suite? Is kernel build a proper > test? Make sure that the workload is hitting hard on this particular code path that it matters. I am not aware of any such workload but others might know better. In any case, if you are up to optimize something then you should evaluate what kind of workload might benefit from it. If there is no workload then it is likely not worth bothering. Some changes might look like obvious improvements but then they might add a maintenance burden or they might be even wrong for other reasons. Recent patches you have posted show both issues. I would encourage you to look at a practical issues instead. Throwing random patches by reading the code without having a larger picture is usually not the best way to go. [...] > In get_page_from_freelist(), we use last_pgdat_dirty_limit to track the > last node out of dirty limit. I am willing to borrow this idea in > get_any_partial() to skip a node. > > Well, let me do some tests to see whether this is visible. See the above. Each and every change has its cost and patches make sense only when both the future maintenance cost and the review cost are payed off.
diff --git a/mm/slub.c b/mm/slub.c index e3629cd7aff1..97a480b5dfb9 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1873,7 +1873,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, * Get a page from somewhere. Search in increasing NUMA distances. */ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, - struct kmem_cache_cpu *c) + struct kmem_cache_cpu *c, int except) { #ifdef CONFIG_NUMA struct zonelist *zonelist; @@ -1882,6 +1882,9 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, enum zone_type high_zoneidx = gfp_zone(flags); void *object; unsigned int cpuset_mems_cookie; + nodemask_t nmask = node_states[N_MEMORY]; + + node_clear(except, nmask); /* * The defrag ratio allows a configuration of the tradeoffs between @@ -1908,7 +1911,8 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, do { cpuset_mems_cookie = read_mems_allowed_begin(); zonelist = node_zonelist(mempolicy_slab_node(), flags); - for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { + for_each_zone_zonelist_nodemask(zone, z, zonelist, + high_zoneidx, &nmask) { struct kmem_cache_node *n; n = get_node(s, zone_to_nid(zone)); @@ -1926,6 +1930,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags, */ return object; } + node_clear(zone_to_nid(zone), nmask); } } } while (read_mems_allowed_retry(cpuset_mems_cookie)); @@ -1951,7 +1956,7 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node, if (object || node != NUMA_NO_NODE) return object; - return get_any_partial(s, flags, c); + return get_any_partial(s, flags, c, searchnode); } #ifdef CONFIG_PREEMPT
for_each_zone_zonelist() iterates the zonelist one by one, which means it will iterate on zones on the same node. While get_partial_node() checks available slab on node base instead of zone. This patch skip a node in case get_partial_node() fails to acquire slab on that node. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- mm/slub.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)