Message ID | fae85e4440a8ef6f13192476bd33a4826416fc58.camel@gmx.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat | expand |
(mailer partially munged formatting? resend)
mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat
zsmalloc map/unmap methods use preemption disabling bit spinlocks. Take the
mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done
in zswap_frontswap_store().
Signed-off-by: Mike Galbraith <efault@gmx.de>
Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration"
---
mm/zswap.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned
/* decompress */
dlen = PAGE_SIZE;
+ acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
+ mutex_lock(acomp_ctx->mutex);
src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
if (zpool_evictable(entry->pool->zpool))
src += sizeof(struct zswap_header);
- acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
- mutex_lock(acomp_ctx->mutex);
sg_init_one(&input, src, entry->length);
sg_init_table(&output, 1);
sg_set_page(&output, page, PAGE_SIZE, 0);
acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
- mutex_unlock(acomp_ctx->mutex);
zpool_unmap_handle(entry->pool->zpool, entry->handle);
+ mutex_unlock(acomp_ctx->mutex);
BUG_ON(ret);
freeentry:
Hi Mike, On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote: > > (mailer partially munged formatting? resend) > > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat > > zsmalloc map/unmap methods use preemption disabling bit spinlocks. Take the > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done > in zswap_frontswap_store(). oh wait... So is zsmalloc taking a spin lock in its map callback and releasing it only in unmap? In this case, I would rather keep zswap as is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it. Best regards, Vitaly > Signed-off-by: Mike Galbraith <efault@gmx.de> > Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration" > --- > mm/zswap.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned > > /* decompress */ > dlen = PAGE_SIZE; > + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > + mutex_lock(acomp_ctx->mutex); > src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); > if (zpool_evictable(entry->pool->zpool)) > src += sizeof(struct zswap_header); > > - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > - mutex_lock(acomp_ctx->mutex); > sg_init_one(&input, src, entry->length); > sg_init_table(&output, 1); > sg_set_page(&output, page, PAGE_SIZE, 0); > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); > ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); > - mutex_unlock(acomp_ctx->mutex); > > zpool_unmap_handle(entry->pool->zpool, entry->handle); > + mutex_unlock(acomp_ctx->mutex); > BUG_ON(ret); > > freeentry: >
On Sat, 2020-12-19 at 11:20 +0100, Vitaly Wool wrote: > Hi Mike, > > On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote: > > > > (mailer partially munged formatting? resend) > > > > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat > > > > zsmalloc map/unmap methods use preemption disabling bit spinlocks. Take the > > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done > > in zswap_frontswap_store(). > > oh wait... So is zsmalloc taking a spin lock in its map callback and > releasing it only in unmap? In this case, I would rather keep zswap as > is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it. The kernel that generated that splat was NOT an RT kernel, it was plain master.today with a PREEMPT config. -Mike
On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote: > > On Sat, 2020-12-19 at 11:20 +0100, Vitaly Wool wrote: > > Hi Mike, > > > > On Sat, Dec 19, 2020 at 11:12 AM Mike Galbraith <efault@gmx.de> wrote: > > > > > > (mailer partially munged formatting? resend) > > > > > > mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat > > > > > > zsmalloc map/unmap methods use preemption disabling bit spinlocks. Take the > > > mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done > > > in zswap_frontswap_store(). > > > > oh wait... So is zsmalloc taking a spin lock in its map callback and > > releasing it only in unmap? In this case, I would rather keep zswap as > > is, mark zsmalloc as RT unsafe and have zsmalloc maintainer fix it. > > The kernel that generated that splat was NOT an RT kernel, it was plain > master.today with a PREEMPT config. I see, thanks. I don't think it makes things better for zsmalloc though. From what I can see, the offending code is this: > /* From now on, migration cannot move the object */ > pin_tag(handle); Bit spinlock is taken in pin_tag(). I find the comment above somewhat misleading, why is it necessary to take a spinlock to prevent migration? I would guess an atomic flag should normally be enough. zswap is not broken here, it is zsmalloc that needs to be fixed. Best regards, Vitaly
On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote: > On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote: > > > The kernel that generated that splat was NOT an RT kernel, it was plain > > master.today with a PREEMPT config. > > > I see, thanks. I don't think it makes things better for zsmalloc > though. From what I can see, the offending code is this: > > > /* From now on, migration cannot move the object */ > > pin_tag(handle); > > Bit spinlock is taken in pin_tag(). I find the comment above somewhat > misleading, why is it necessary to take a spinlock to prevent > migration? I would guess an atomic flag should normally be enough. > > zswap is not broken here, it is zsmalloc that needs to be fixed. Cool, those damn bit spinlocks going away would be a case of happiness for RT as well :) -Mike
(CC zsmalloc maintainers) On Sat, 2020-12-19 at 11:59 +0100, Mike Galbraith wrote: > On Sat, 2020-12-19 at 11:46 +0100, Vitaly Wool wrote: > > On Sat, 19 Dec 2020, 11:27 Mike Galbraith, <efault@gmx.de> wrote: > > > > > The kernel that generated that splat was NOT an RT kernel, it was plain > > > master.today with a PREEMPT config. > > > > > > I see, thanks. I don't think it makes things better for zsmalloc > > though. From what I can see, the offending code is this: > > > > > /* From now on, migration cannot move the object */ > > > pin_tag(handle); > > > > Bit spinlock is taken in pin_tag(). I find the comment above somewhat > > misleading, why is it necessary to take a spinlock to prevent > > migration? I would guess an atomic flag should normally be enough. > > > > zswap is not broken here, it is zsmalloc that needs to be fixed. > > Cool, those damn bit spinlocks going away would be a case of happiness > for RT as well :) > > -Mike
--- a/mm/zswap.c +++ b/mm/zswap.c @@ -1258,20 +1258,20 @@ static int zswap_frontswap_load(unsigned /* decompress */ dlen = PAGE_SIZE; + acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); + mutex_lock(acomp_ctx->mutex); src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO); if (zpool_evictable(entry->pool->zpool)) src += sizeof(struct zswap_header); - acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); - mutex_lock(acomp_ctx->mutex); sg_init_one(&input, src, entry->length); sg_init_table(&output, 1); sg_set_page(&output, page, PAGE_SIZE, 0); acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen); ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait); - mutex_unlock(acomp_ctx->mutex); zpool_unmap_handle(entry->pool->zpool, entry->handle); + mutex_unlock(acomp_ctx->mutex); BUG_ON(ret); freeentry:
Greetings, Beating on zswap+zsmalloc leads to the splat below, possible patch below that. [ 139.794413] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 [ 139.794585] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 907, name: avahi-daemon [ 139.794608] 2 locks held by avahi-daemon/907: [ 139.794621] #0: ffff8881010766d8 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x15b/0x6e0 [ 139.794659] #1: ffff8881b6b13110 (&zspage->lock){.+.+}-{2:2}, at: zs_map_object+0x89/0x350 [ 139.794691] Preemption disabled at: [ 139.794693] [<ffffffff812763a8>] zs_map_object+0x38/0x350 [ 139.794719] CPU: 0 PID: 907 Comm: avahi-daemon Kdump: loaded Tainted: G E 5.10.0.g3644e2d-preempt #4 [ 139.794746] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 139.794766] Call Trace: [ 139.794775] dump_stack+0x77/0x97 [ 139.794788] ___might_sleep+0x17d/0x260 [ 139.794806] __mutex_lock+0x47/0x9d0 [ 139.794823] ? zswap_frontswap_load+0xcb/0x240 [ 139.794841] ? zs_map_object+0x248/0x350 [ 139.794858] ? zswap_frontswap_load+0xcb/0x240 [ 139.794871] zswap_frontswap_load+0xcb/0x240 [ 139.794886] ? lru_cache_add+0xe9/0x1b0 [ 139.794904] __frontswap_load+0x6e/0xd0 [ 139.794921] swap_readpage+0x70/0x240 [ 139.794939] swap_cluster_readahead+0x1d9/0x280 [ 139.794964] ? swapin_readahead+0x140/0x3e0 [ 139.794979] swapin_readahead+0x140/0x3e0 [ 139.794997] ? lookup_swap_cache+0x5c/0x190 [ 139.795016] ? do_swap_page+0x2f7/0x900 [ 139.795030] do_swap_page+0x2f7/0x900 [ 139.795046] handle_mm_fault+0xa06/0x13b0 [ 139.795083] exc_page_fault+0x2a3/0x6e0 [ 139.795110] ? asm_exc_page_fault+0x1e/0x30 [ 139.795139] ? asm_exc_page_fault+0x8/0x30 [ 139.795166] asm_exc_page_fault+0x1e/0x30 [ 139.795191] RIP: 0033:0x560caa4074d0 [ 139.795215] Code: 89 05 e4 74 21 00 0f 84 28 06 00 00 e8 d9 13 00 00 e8 b4 12 00 00 44 8b 25 b9 75 21 00 45 85 e4 0f 85 ac 04 00 00 41 83 cf ff <48> 8b 3d b1 74 21 00 44 89 fe e8 d1 f6 ff ff 85 c0 89 c3 0f 88 d9 [ 139.795299] RSP: 002b:00007ffeb4ec5ce0 EFLAGS: 00010246 [ 139.795316] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007ff1616e4db4 [ 139.795336] RDX: 0000000000000001 RSI: 00007ffeb4ec5c5f RDI: 0000000000000006 [ 139.795356] RBP: 0000560cab63be60 R08: 0000560cab656220 R09: 0000560cab670750 [ 139.795376] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 139.795395] R13: 0000560cab63f550 R14: 0000560cab63c000 R15: 00000000ffffffff (CC Sebastian because RT has the ~same problem at the same spot, curable the same way, or by fixing the upstream buglet then take backports+fix, and nuking the associated RT patch) mm/zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat zsmalloc map/unmap methods use preemption disabling bit spinlocks. Take the mutex outside of pool map/unmap methods in zswap_frontswap_load() as is done in zswap_frontswap_store(). Signed-off-by: Mike Galbraith <efault@gmx.de> Fixes: 1ec3b5fe6eec "mm/zswap: move to use crypto_acomp API for hardware acceleration" --- mm/zswap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)