diff mbox series

[8/8] zsmalloc: replace get_cpu_var with local_lock

Message ID 20211110185433.1981097-9-minchan@kernel.org (mailing list archive)
State New
Headers show
Series zsmalloc: remove bit_spin_lock | expand

Commit Message

Minchan Kim Nov. 10, 2021, 6:54 p.m. UTC
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The usage of get_cpu_var() in zs_map_object() is problematic because
it disables preemption and makes it impossible to acquire any sleeping
lock on PREEMPT_RT such as a spinlock_t.
Replace the get_cpu_var() usage with a local_lock_t which is embedded
struct mapping_area. It ensures that the access the struct is
synchronized against all users on the same CPU.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[minchan: remove the bit_spin_lock part and change the title]
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/zsmalloc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Sebastian Andrzej Siewior Nov. 11, 2021, 8:56 a.m. UTC | #1
On 2021-11-10 10:54:33 [-0800], Minchan Kim wrote:
> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The From line used to be
   From: Mike Galbraith <umgwanakikbuti@gmail.com>

sorry if I dropped it earlier.

> The usage of get_cpu_var() in zs_map_object() is problematic because
> it disables preemption and makes it impossible to acquire any sleeping
> lock on PREEMPT_RT such as a spinlock_t.
> Replace the get_cpu_var() usage with a local_lock_t which is embedded
> struct mapping_area. It ensures that the access the struct is
> synchronized against all users on the same CPU.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> [minchan: remove the bit_spin_lock part and change the title]
> Signed-off-by: Minchan Kim <minchan@kernel.org>

So you removed the bit_spin_lock part here but replaced the bitspin lock
with a rwsem in an earlier patch in this series.
This should work.

Thank you.

Sebastian
Minchan Kim Nov. 11, 2021, 11:08 p.m. UTC | #2
On Thu, Nov 11, 2021 at 09:56:58AM +0100, Sebastian Andrzej Siewior wrote:
> On 2021-11-10 10:54:33 [-0800], Minchan Kim wrote:
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
> The From line used to be
>    From: Mike Galbraith <umgwanakikbuti@gmail.com>
> 
> sorry if I dropped it earlier.

Let me change it at respin.
Davidlohr Bueso Nov. 15, 2021, 3:56 a.m. UTC | #3
On Wed, 10 Nov 2021, Minchan Kim wrote:

>From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
>The usage of get_cpu_var() in zs_map_object() is problematic because
>it disables preemption and makes it impossible to acquire any sleeping
>lock on PREEMPT_RT such as a spinlock_t.
>Replace the get_cpu_var() usage with a local_lock_t which is embedded
>struct mapping_area. It ensures that the access the struct is
>synchronized against all users on the same CPU.

On a similar note (and sorry for hijacking the thread) I've been looking
at the remaining users of get_cpu_light() in a hope to see them upstreamed
and removed out-of-tree now that we have local locks.

There are six, and afaict, they can be addressed with either using local
locks:

1. netif_rx. We can add a local_lock_t to softnet_data which is the pcpu
data strucutre used by enqueue_to_backlog(). Then replace both preempt_disable
and get_cpu with local_lock(&softnet_data.lock).

2. blk-mq. Such scenarios rely on implicitly disabling preemption to guarantee
running __blk_mq_run_hw_queue() on the right CPU. But we can use a local lock
for synchronous hw queue runs, thus adding a local_lock_t to struct blk_mq_hw_ctx.

3. raid5. We can add a local_lock_t to struct raid5_percpu.

4. scsi/fcoe. There are 3 things here to consider: tx stats management,
fcoe_percpu_s and the exchange manager pool. For the first two adding
a local_lock_t to fc_stats and fcoe_percpu_s should do it, but we would
have to do a migrate_disable() for pool case in fc_exch_em_alloc() which
yes is ugly... pool->lock is already there.

... or flat-out migrate_disabling when the per-CPU data structure already
has a spinlock it acquires anyway, which will do the serialization:

5. vmalloc. Since we already have a vmap_block_queue.lock

6. sunrpc. Since we already have a pool->sp_lock.

I've got patches for these but perhaps I'm missing a fundamental reason as
to why these are still out-of-tree and get_cpu()-light family is still around.
For example, direct migrate_disable() calls are frowned upon and could well
be unacceptable - albeit it's recent user growth upstream.

Thoughts?

Thanks,
Davidlohr
Sebastian Andrzej Siewior Nov. 15, 2021, 7:27 a.m. UTC | #4
On 2021-11-14 19:56:47 [-0800], Davidlohr Bueso wrote:
> ... or flat-out migrate_disabling when the per-CPU data structure already
> has a spinlock it acquires anyway, which will do the serialization:
> 
> 5. vmalloc. Since we already have a vmap_block_queue.lock
> 
> 6. sunrpc. Since we already have a pool->sp_lock.
> 
> I've got patches for these but perhaps I'm missing a fundamental reason as
> to why these are still out-of-tree and get_cpu()-light family is still around.
> For example, direct migrate_disable() calls are frowned upon and could well
> be unacceptable - albeit it's recent user growth upstream.
> 
> Thoughts?

I think tglx is looking into this to get it done differently. We had a
few more users of get_cpu_light() and we got rid of a few of them. We
also had more local_lock() users but we got rid of all but two I think
before local_lock() was suggested upstream.
From RT's point of view, get_cpu() and get_cpu_var() leads almost always
to trouble. The vmalloc example from above, I don't think there is need
for get_cpu() or migrate_disable() or anything at all because the
per-CPU data structure itself is locked. The window of possible CPU
migration is little so even if it happens, the result is correct.

> Thanks,
> Davidlohr

Sebastian
diff mbox series

Patch

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 5d4c4d254679..7e03cc9363bb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -65,6 +65,7 @@ 
 #include <linux/wait.h>
 #include <linux/pagemap.h>
 #include <linux/fs.h>
+#include <linux/local_lock.h>
 
 #define ZSPAGE_MAGIC	0x58
 
@@ -276,6 +277,7 @@  struct zspage {
 };
 
 struct mapping_area {
+	local_lock_t lock;
 	char *vm_buf; /* copy buffer for objects that span pages */
 	char *vm_addr; /* address of kmap_atomic()'ed pages */
 	enum zs_mapmode vm_mm; /* mapping mode */
@@ -451,7 +453,9 @@  MODULE_ALIAS("zpool-zsmalloc");
 #endif /* CONFIG_ZPOOL */
 
 /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
-static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
+static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = {
+	.lock	= INIT_LOCAL_LOCK(lock),
+};
 
 static __maybe_unused int is_first_page(struct page *page)
 {
@@ -1269,7 +1273,8 @@  void *zs_map_object(struct zs_pool *pool, unsigned long handle,
 	class = zspage_class(pool, zspage);
 	off = (class->size * obj_idx) & ~PAGE_MASK;
 
-	area = &get_cpu_var(zs_map_area);
+	local_lock(&zs_map_area.lock);
+	area = this_cpu_ptr(&zs_map_area);
 	area->vm_mm = mm;
 	if (off + class->size <= PAGE_SIZE) {
 		/* this object is contained entirely within a page */
@@ -1320,7 +1325,7 @@  void zs_unmap_object(struct zs_pool *pool, unsigned long handle)
 
 		__zs_unmap_object(area, pages, off, class->size);
 	}
-	put_cpu_var(zs_map_area);
+	local_unlock(&zs_map_area.lock);
 
 	migrate_read_unlock(zspage);
 }