diff mbox series

[RFC,09/26] mm, slub: move disabling/enabling irqs to ___slab_alloc()

Message ID 20210524233946.20352-10-vbabka@suse.cz (mailing list archive)
State New, archived
Headers show
Series SLUB: use local_lock for kmem_cache_cpu protection and reduce disabling irqs | expand

Commit Message

Vlastimil Babka May 24, 2021, 11:39 p.m. UTC
Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
includes cases where this is not needed, such as when the allocation ends up in
the page allocator and has to awkwardly enable irqs back based on gfp flags.
Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
it hits the __slab_alloc() slow path, and long periods with disabled interrupts
are undesirable.

As a first step towards reducing irq disabled periods, move irq handling into
___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
from becoming invalid via migrate_disable(). This does not protect against
access preemption, which is still done by disabled irq for most of
___slab_alloc(). As the small immediate benefit, slab_out_of_memory() call from
___slab_alloc() is now done with irqs enabled.

kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
before calling ___slab_alloc(), which then disables them at its discretion. The
whole kmem_cache_alloc_bulk() operation also disables cpu migration.

When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
preemption, because new_slab() will re-enable interrupts in contexts that allow
blocking.

The patch itself will thus increase overhead a bit due to disabled migration
and increased disabling/enabling irqs in kmem_cache_alloc_bulk(), but that will
be gradually improved in the following patches.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

Comments

Mel Gorman May 25, 2021, 12:35 p.m. UTC | #1
On Tue, May 25, 2021 at 01:39:29AM +0200, Vlastimil Babka wrote:
> Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
> includes cases where this is not needed, such as when the allocation ends up in
> the page allocator and has to awkwardly enable irqs back based on gfp flags.
> Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
> it hits the __slab_alloc() slow path, and long periods with disabled interrupts
> are undesirable.
> 
> As a first step towards reducing irq disabled periods, move irq handling into
> ___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
> from becoming invalid via migrate_disable(). This does not protect against
> access preemption, which is still done by disabled irq for most of
> ___slab_alloc(). As the small immediate benefit, slab_out_of_memory() call from
> ___slab_alloc() is now done with irqs enabled.
> 
> kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
> before calling ___slab_alloc(), which then disables them at its discretion. The
> whole kmem_cache_alloc_bulk() operation also disables cpu migration.
> 
> When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
> preemption, because new_slab() will re-enable interrupts in contexts that allow
> blocking.
> 
> The patch itself will thus increase overhead a bit due to disabled migration
> and increased disabling/enabling irqs in kmem_cache_alloc_bulk(), but that will
> be gradually improved in the following patches.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Why did you use migrate_disable instead of preempt_disable? There is a
fairly large comment in include/linux/preempt.h on why migrate_disable
is undesirable so new users are likely to be put under the microscope
once Thomas or Peter notice it.

I think you are using it so that an allocation request can be preempted by
a higher priority task but given that the code was disabling interrupts,
there was already some preemption latency. However, migrate_disable
is more expensive than preempt_disable (function call versus a simple
increment). On that basis, I'd recommend starting with preempt_disable
and only using migrate_disable if necessary.

Bonus points for adding a comment where ___slab_alloc disables IRQs to
clarify what is protected -- I assume it's protecting kmem_cache_cpu
from being modified from interrupt context. If so, it's potentially a
local_lock candidate.
Vlastimil Babka May 25, 2021, 12:47 p.m. UTC | #2
On 5/25/21 2:35 PM, Mel Gorman wrote:
> On Tue, May 25, 2021 at 01:39:29AM +0200, Vlastimil Babka wrote:
>> Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
>> includes cases where this is not needed, such as when the allocation ends up in
>> the page allocator and has to awkwardly enable irqs back based on gfp flags.
>> Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
>> it hits the __slab_alloc() slow path, and long periods with disabled interrupts
>> are undesirable.
>> 
>> As a first step towards reducing irq disabled periods, move irq handling into
>> ___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
>> from becoming invalid via migrate_disable(). This does not protect against
>> access preemption, which is still done by disabled irq for most of
>> ___slab_alloc(). As the small immediate benefit, slab_out_of_memory() call from
>> ___slab_alloc() is now done with irqs enabled.
>> 
>> kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
>> before calling ___slab_alloc(), which then disables them at its discretion. The
>> whole kmem_cache_alloc_bulk() operation also disables cpu migration.
>> 
>> When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
>> preemption, because new_slab() will re-enable interrupts in contexts that allow
>> blocking.
>> 
>> The patch itself will thus increase overhead a bit due to disabled migration
>> and increased disabling/enabling irqs in kmem_cache_alloc_bulk(), but that will
>> be gradually improved in the following patches.
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Why did you use migrate_disable instead of preempt_disable? There is a
> fairly large comment in include/linux/preempt.h on why migrate_disable
> is undesirable so new users are likely to be put under the microscope
> once Thomas or Peter notice it.

I understood it as while undesirable, there's nothing better for now.

> I think you are using it so that an allocation request can be preempted by
> a higher priority task but given that the code was disabling interrupts,
> there was already some preemption latency.

Yes, and the disabled interrupts will get progressively "smaller" in the series.

> However, migrate_disable
> is more expensive than preempt_disable (function call versus a simple
> increment).

That's true, I think perhaps it could be reimplemented so that on !PREEMPT_RT
and with no lockdep/preempt/whatnot debugging it could just translate to an
inline migrate_disable?

> On that basis, I'd recommend starting with preempt_disable
> and only using migrate_disable if necessary.

That's certainly possible and you're right it would be a less disruptive step.
My thinking was that on !PREEMPT_RT it's actually just preempt_disable (however
with the call overhead currently), but PREEMPT_RT would welcome the lack of
preempt disable. I'd be interested to hear RT guys opinion here.

> Bonus points for adding a comment where ___slab_alloc disables IRQs to
> clarify what is protected -- I assume it's protecting kmem_cache_cpu
> from being modified from interrupt context. If so, it's potentially a
> local_lock candidate.

Yeah that gets cleared up later :)
Mel Gorman May 25, 2021, 3:10 p.m. UTC | #3
On Tue, May 25, 2021 at 02:47:10PM +0200, Vlastimil Babka wrote:
> On 5/25/21 2:35 PM, Mel Gorman wrote:
> > On Tue, May 25, 2021 at 01:39:29AM +0200, Vlastimil Babka wrote:
> >> Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
> >> includes cases where this is not needed, such as when the allocation ends up in
> >> the page allocator and has to awkwardly enable irqs back based on gfp flags.
> >> Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
> >> it hits the __slab_alloc() slow path, and long periods with disabled interrupts
> >> are undesirable.
> >> 
> >> As a first step towards reducing irq disabled periods, move irq handling into
> >> ___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
> >> from becoming invalid via migrate_disable(). This does not protect against
> >> access preemption, which is still done by disabled irq for most of
> >> ___slab_alloc(). As the small immediate benefit, slab_out_of_memory() call from
> >> ___slab_alloc() is now done with irqs enabled.
> >> 
> >> kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
> >> before calling ___slab_alloc(), which then disables them at its discretion. The
> >> whole kmem_cache_alloc_bulk() operation also disables cpu migration.
> >> 
> >> When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
> >> preemption, because new_slab() will re-enable interrupts in contexts that allow
> >> blocking.
> >> 
> >> The patch itself will thus increase overhead a bit due to disabled migration
> >> and increased disabling/enabling irqs in kmem_cache_alloc_bulk(), but that will
> >> be gradually improved in the following patches.
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > 
> > Why did you use migrate_disable instead of preempt_disable? There is a
> > fairly large comment in include/linux/preempt.h on why migrate_disable
> > is undesirable so new users are likely to be put under the microscope
> > once Thomas or Peter notice it.
> 
> I understood it as while undesirable, there's nothing better for now.
> 

I think the "better" option is to reduce preempt_disable sections as
much as possible but you probably have limited options there. It might
be easier to justify if the sections you were protecting need to go to
sleep like what mm/highmem.c needs but that does not appear to be the case.

> > I think you are using it so that an allocation request can be preempted by
> > a higher priority task but given that the code was disabling interrupts,
> > there was already some preemption latency.
> 
> Yes, and the disabled interrupts will get progressively "smaller" in the series.
> 
> > However, migrate_disable
> > is more expensive than preempt_disable (function call versus a simple
> > increment).
> 
> That's true, I think perhaps it could be reimplemented so that on !PREEMPT_RT
> and with no lockdep/preempt/whatnot debugging it could just translate to an
> inline migrate_disable?
> 

It might be a bit too large for that.

> > On that basis, I'd recommend starting with preempt_disable
> > and only using migrate_disable if necessary.
> 
> That's certainly possible and you're right it would be a less disruptive step.
> My thinking was that on !PREEMPT_RT it's actually just preempt_disable (however
> with the call overhead currently), but PREEMPT_RT would welcome the lack of
> preempt disable. I'd be interested to hear RT guys opinion here.
> 

It does more than preempt_disable even on !PREEMPT_RT. It's only on !SMP
that it becomes inline. While it might allow a higher priority task to
preempt, PREEMPT_RT is also not the common case and I think it's better
to use the lighter-weight option for the majority of configurations.

> > Bonus points for adding a comment where ___slab_alloc disables IRQs to
> > clarify what is protected -- I assume it's protecting kmem_cache_cpu
> > from being modified from interrupt context. If so, it's potentially a
> > local_lock candidate.
> 
> Yeah that gets cleared up later :)
> 

I saw that after glancing through the rest of the series. While I didn't
spot anything major, I'd also like to hear from Peter or Thomas on whether
migrate_disable or preempt_disable would be preferred for mm/slub.c. The
preempt-rt tree does not help answer the question given that the slub
changes there are mostly about deferring some work until IRQs are enabled.
Vlastimil Babka May 25, 2021, 5:24 p.m. UTC | #4
On 5/25/21 2:47 PM, Vlastimil Babka wrote:
> On 5/25/21 2:35 PM, Mel Gorman wrote:
>> 
>> Why did you use migrate_disable instead of preempt_disable? There is a
>> fairly large comment in include/linux/preempt.h on why migrate_disable
>> is undesirable so new users are likely to be put under the microscope
>> once Thomas or Peter notice it.
> 
> I understood it as while undesirable, there's nothing better for now.

Ah I now recalled the more important reason. By my understanding of
Documentation/locking/locktypes.rst it's not possible on PREEMPT_RT to do a
preempt_disable() and then take a spin_lock (or local_lock) which is a mutex on
RT and needs preemption enabled to take it. And one of the goals is that
list_lock would not have to be raw_spinlock on RT anymore.

>> I think you are using it so that an allocation request can be preempted by
>> a higher priority task but given that the code was disabling interrupts,
>> there was already some preemption latency.
> 
> Yes, and the disabled interrupts will get progressively "smaller" in the series.
> 
>> However, migrate_disable
>> is more expensive than preempt_disable (function call versus a simple
>> increment).
> 
> That's true, I think perhaps it could be reimplemented so that on !PREEMPT_RT
> and with no lockdep/preempt/whatnot debugging it could just translate to an
> inline migrate_disable?

Correction: I meant "translate to an inline preempt_disable" which would then
not change anything for !PREEMPT_RT.
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index 06f30c9ad361..c5f4f9282496 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2631,7 +2631,7 @@  static inline void *get_freelist(struct kmem_cache *s, struct page *page)
  * we need to allocate a new slab. This is the slowest path since it involves
  * a call to the page allocator and the setup of a new slab.
  *
- * Version of __slab_alloc to use when we know that interrupts are
+ * Version of __slab_alloc to use when we know that preemption is
  * already disabled (which is the case for bulk allocation).
  */
 static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
@@ -2639,9 +2639,11 @@  static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 {
 	void *freelist;
 	struct page *page;
+	unsigned long flags;
 
 	stat(s, ALLOC_SLOWPATH);
 
+	local_irq_save(flags);
 	page = c->page;
 	if (!page) {
 		/*
@@ -2704,6 +2706,7 @@  static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	VM_BUG_ON(!c->page->frozen);
 	c->freelist = get_freepointer(s, freelist);
 	c->tid = next_tid(c->tid);
+	local_irq_restore(flags);
 	return freelist;
 
 new_slab:
@@ -2721,14 +2724,17 @@  static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 		goto check_new_page;
 	}
 
+	migrate_enable();
 	page = new_slab(s, gfpflags, node);
+	migrate_disable();
+	c = this_cpu_ptr(s->cpu_slab);
 
 	if (unlikely(!page)) {
+		local_irq_restore(flags);
 		slab_out_of_memory(s, gfpflags, node);
 		return NULL;
 	}
 
-	c = raw_cpu_ptr(s->cpu_slab);
 	if (c->page)
 		flush_slab(s, c);
 
@@ -2768,6 +2774,7 @@  static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 return_single:
 
 	deactivate_slab(s, page, get_freepointer(s, freelist), c);
+	local_irq_restore(flags);
 	return freelist;
 }
 
@@ -2779,20 +2786,19 @@  static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 			  unsigned long addr, struct kmem_cache_cpu *c)
 {
 	void *p;
-	unsigned long flags;
 
-	local_irq_save(flags);
+	migrate_disable();
 #ifdef CONFIG_PREEMPTION
 	/*
 	 * We may have been preempted and rescheduled on a different
-	 * cpu before disabling interrupts. Need to reload cpu area
+	 * cpu before disabling preemption. Need to reload cpu area
 	 * pointer.
 	 */
 	c = this_cpu_ptr(s->cpu_slab);
 #endif
 
 	p = ___slab_alloc(s, gfpflags, node, addr, c);
-	local_irq_restore(flags);
+	migrate_enable();
 	return p;
 }
 
@@ -3312,8 +3318,9 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	 * IRQs, which protects against PREEMPT and interrupts
 	 * handlers invoking normal fastpath.
 	 */
-	local_irq_disable();
+	migrate_disable();
 	c = this_cpu_ptr(s->cpu_slab);
+	local_irq_disable();
 
 	for (i = 0; i < size; i++) {
 		void *object = kfence_alloc(s, s->object_size, flags);
@@ -3334,6 +3341,8 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			 */
 			c->tid = next_tid(c->tid);
 
+			local_irq_enable();
+
 			/*
 			 * Invoking slow path likely have side-effect
 			 * of re-populating per CPU c->freelist
@@ -3346,6 +3355,8 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			c = this_cpu_ptr(s->cpu_slab);
 			maybe_wipe_obj_freeptr(s, p[i]);
 
+			local_irq_disable();
+
 			continue; /* goto for-loop */
 		}
 		c->freelist = get_freepointer(s, object);
@@ -3354,6 +3365,7 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
+	migrate_enable();
 
 	/*
 	 * memcg and kmem_cache debug support and memory initialization.