Message ID | 20211110185433.1981097-9-minchan@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | zsmalloc: remove bit_spin_lock | expand |
On 2021-11-10 10:54:33 [-0800], Minchan Kim wrote: > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> The From line used to be From: Mike Galbraith <umgwanakikbuti@gmail.com> sorry if I dropped it earlier. > The usage of get_cpu_var() in zs_map_object() is problematic because > it disables preemption and makes it impossible to acquire any sleeping > lock on PREEMPT_RT such as a spinlock_t. > Replace the get_cpu_var() usage with a local_lock_t which is embedded > struct mapping_area. It ensures that the access the struct is > synchronized against all users on the same CPU. > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > [minchan: remove the bit_spin_lock part and change the title] > Signed-off-by: Minchan Kim <minchan@kernel.org> So you removed the bit_spin_lock part here but replaced the bitspin lock with a rwsem in an earlier patch in this series. This should work. Thank you. Sebastian
On Thu, Nov 11, 2021 at 09:56:58AM +0100, Sebastian Andrzej Siewior wrote: > On 2021-11-10 10:54:33 [-0800], Minchan Kim wrote: > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > The From line used to be > From: Mike Galbraith <umgwanakikbuti@gmail.com> > > sorry if I dropped it earlier. Let me change it at respin.
On Wed, 10 Nov 2021, Minchan Kim wrote: >From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > >The usage of get_cpu_var() in zs_map_object() is problematic because >it disables preemption and makes it impossible to acquire any sleeping >lock on PREEMPT_RT such as a spinlock_t. >Replace the get_cpu_var() usage with a local_lock_t which is embedded >struct mapping_area. It ensures that the access the struct is >synchronized against all users on the same CPU. On a similar note (and sorry for hijacking the thread) I've been looking at the remaining users of get_cpu_light() in a hope to see them upstreamed and removed out-of-tree now that we have local locks. There are six, and afaict, they can be addressed with either using local locks: 1. netif_rx. We can add a local_lock_t to softnet_data which is the pcpu data strucutre used by enqueue_to_backlog(). Then replace both preempt_disable and get_cpu with local_lock(&softnet_data.lock). 2. blk-mq. Such scenarios rely on implicitly disabling preemption to guarantee running __blk_mq_run_hw_queue() on the right CPU. But we can use a local lock for synchronous hw queue runs, thus adding a local_lock_t to struct blk_mq_hw_ctx. 3. raid5. We can add a local_lock_t to struct raid5_percpu. 4. scsi/fcoe. There are 3 things here to consider: tx stats management, fcoe_percpu_s and the exchange manager pool. For the first two adding a local_lock_t to fc_stats and fcoe_percpu_s should do it, but we would have to do a migrate_disable() for pool case in fc_exch_em_alloc() which yes is ugly... pool->lock is already there. ... or flat-out migrate_disabling when the per-CPU data structure already has a spinlock it acquires anyway, which will do the serialization: 5. vmalloc. Since we already have a vmap_block_queue.lock 6. sunrpc. Since we already have a pool->sp_lock. I've got patches for these but perhaps I'm missing a fundamental reason as to why these are still out-of-tree and get_cpu()-light family is still around. For example, direct migrate_disable() calls are frowned upon and could well be unacceptable - albeit it's recent user growth upstream. Thoughts? Thanks, Davidlohr
On 2021-11-14 19:56:47 [-0800], Davidlohr Bueso wrote: > ... or flat-out migrate_disabling when the per-CPU data structure already > has a spinlock it acquires anyway, which will do the serialization: > > 5. vmalloc. Since we already have a vmap_block_queue.lock > > 6. sunrpc. Since we already have a pool->sp_lock. > > I've got patches for these but perhaps I'm missing a fundamental reason as > to why these are still out-of-tree and get_cpu()-light family is still around. > For example, direct migrate_disable() calls are frowned upon and could well > be unacceptable - albeit it's recent user growth upstream. > > Thoughts? I think tglx is looking into this to get it done differently. We had a few more users of get_cpu_light() and we got rid of a few of them. We also had more local_lock() users but we got rid of all but two I think before local_lock() was suggested upstream. From RT's point of view, get_cpu() and get_cpu_var() leads almost always to trouble. The vmalloc example from above, I don't think there is need for get_cpu() or migrate_disable() or anything at all because the per-CPU data structure itself is locked. The window of possible CPU migration is little so even if it happens, the result is correct. > Thanks, > Davidlohr Sebastian
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 5d4c4d254679..7e03cc9363bb 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -65,6 +65,7 @@ #include <linux/wait.h> #include <linux/pagemap.h> #include <linux/fs.h> +#include <linux/local_lock.h> #define ZSPAGE_MAGIC 0x58 @@ -276,6 +277,7 @@ struct zspage { }; struct mapping_area { + local_lock_t lock; char *vm_buf; /* copy buffer for objects that span pages */ char *vm_addr; /* address of kmap_atomic()'ed pages */ enum zs_mapmode vm_mm; /* mapping mode */ @@ -451,7 +453,9 @@ MODULE_ALIAS("zpool-zsmalloc"); #endif /* CONFIG_ZPOOL */ /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ -static DEFINE_PER_CPU(struct mapping_area, zs_map_area); +static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = { + .lock = INIT_LOCAL_LOCK(lock), +}; static __maybe_unused int is_first_page(struct page *page) { @@ -1269,7 +1273,8 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, class = zspage_class(pool, zspage); off = (class->size * obj_idx) & ~PAGE_MASK; - area = &get_cpu_var(zs_map_area); + local_lock(&zs_map_area.lock); + area = this_cpu_ptr(&zs_map_area); area->vm_mm = mm; if (off + class->size <= PAGE_SIZE) { /* this object is contained entirely within a page */ @@ -1320,7 +1325,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long handle) __zs_unmap_object(area, pages, off, class->size); } - put_cpu_var(zs_map_area); + local_unlock(&zs_map_area.lock); migrate_read_unlock(zspage); }