[-rfc,0/3] mm: kasan: fix softlock when populate or depopulate pte

Message ID	20230906124234.134200-1-wangkefeng.wang@huawei.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Kefeng Wang <wangkefeng.wang@huawei.com> To: Andrey Ryabinin <ryabinin.a.a@gmail.com>, Alexander Potapenko <glider@google.com>, Andrey Konovalov <andreyknvl@gmail.com>, Dmitry Vyukov <dvyukov@google.com>, Vincenzo Frascino <vincenzo.frascino@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Uladzislau Rezki <urezki@gmail.com>, Christoph Hellwig <hch@infradead.org>, Lorenzo Stoakes <lstoakes@gmail.com>, <kasan-dev@googlegroups.com>, <linux-mm@kvack.org> CC: Kefeng Wang <wangkefeng.wang@huawei.com> Subject: [PATCH -rfc 0/3] mm: kasan: fix softlock when populate or depopulate pte Date: Wed, 6 Sep 2023 20:42:31 +0800 Message-ID: <20230906124234.134200-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: kasan: fix softlock when populate or depopulate pte \| expand [-rfc,0/3] mm: kasan: fix softlock when populate or depopulate pte [-rfc,1/3] mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte() [-rfc,2/3] mm: kasan: shadow: move free_page() out of page table lock [-rfc,3/3] mm: kasan: shadow: HACK: add cond_resched_lock() in kasan_depopulate_vmalloc_pte()

Kefeng Wang Sept. 6, 2023, 12:42 p.m. UTC

This is a RFC, even patch3 is a hack to fix the softlock issue when
populate or depopulate pte with large region, looking forward to your
reply and advise, thanks.

Kefeng Wang (3):
  mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
  mm: kasan: shadow: move free_page() out of page table lock
  mm: kasan: shadow: HACK add cond_resched_lock() in
    kasan_depopulate_vmalloc_pte()

 include/linux/kasan.h |  9 ++++++---
 mm/kasan/shadow.c     | 20 +++++++++++++-------
 mm/vmalloc.c          |  7 ++++---
 3 files changed, 23 insertions(+), 13 deletions(-)

Kefeng Wang Sept. 15, 2023, 12:58 a.m. UTC | #1

Hi All， any suggest or comments，many thanks.

On 2023/9/6 20:42, Kefeng Wang wrote:
> This is a RFC, even patch3 is a hack to fix the softlock issue when
> populate or depopulate pte with large region, looking forward to your
> reply and advise, thanks.

Here is full stack，for populate pte，

[    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
[    C3] Modules linked in: test(OE+)
[    C3] irq event stamp: 320776
[    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>] 
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>] 
el1_interrupt+0x38/0xa8
[    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>] 
__do_softirq+0x658/0x7ac
[    C3] softirqs last disabled at (318169): [<ffff800080047fd8>] 
____do_softirq+0x18/0x30
[    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 
6.5.0+ #595
[    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C3] sp : ffff800093386d70
[    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
[    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
[    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
[    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
[    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
[    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
[    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
[    C3] Call trace:
[    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C3]  rmqueue_bulk+0x434/0x6b8
[    C3]  get_page_from_freelist+0xdd4/0x1680
[    C3]  __alloc_pages+0x244/0x508
[    C3]  alloc_pages+0xf0/0x218
[    C3]  __get_free_pages+0x1c/0x50
[    C3]  kasan_populate_vmalloc_pte+0x30/0x188
[    C3]  __apply_to_page_range+0x3ec/0x650
[    C3]  apply_to_page_range+0x1c/0x30
[    C3]  kasan_populate_vmalloc+0x60/0x70
[    C3]  alloc_vmap_area.part.67+0x328/0xe50
[    C3]  alloc_vmap_area+0x4c/0x78
[    C3]  __get_vm_area_node.constprop.76+0x130/0x240
[    C3]  __vmalloc_node_range+0x12c/0x340
[    C3]  __vmalloc_node+0x8c/0xb0
[    C3]  vmalloc+0x2c/0x40
[    C3]  show_mem_init+0x1c/0xff8 [test]
[    C3]  do_one_initcall+0xe4/0x500
[    C3]  do_init_module+0x100/0x358
[    C3]  load_module+0x2e64/0x2fc8
[    C3]  init_module_from_file+0xec/0x148
[    C3]  idempotent_init_module+0x278/0x380
[    C3]  __arm64_sys_finit_module+0x88/0xf8
[    C3]  invoke_syscall+0x64/0x188
[    C3]  el0_svc_common.constprop.1+0xec/0x198
[    C3]  do_el0_svc+0x48/0xc8
[    C3]  el0_svc+0x3c/0xe8
[    C3]  el0t_64_sync_handler+0xa0/0xc8
[    C3]  el0t_64_sync+0x188/0x190

and for depopuldate pte，

[    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
[    C6] Modules linked in: test(OE+)
[    C6] irq event stamp: 39458
[    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>] 
_raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>] 
el1_interrupt+0x38/0xa8
[    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>] 
__do_softirq+0x658/0x7ac
[    C6] softirqs last disabled at (39415): [<ffff800080047fd8>] 
____do_softirq+0x18/0x30
[    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL 
6.5.0+ #595
[    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[    C6] Workqueue: events drain_vmap_area_work
[    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
[    C6] sp : ffff80008fe676b0
[    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
[    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
[    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
[    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
[    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
[    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
[    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
[    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
[    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
[    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
[    C6] Call trace:
[    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
[    C6]  free_pcppages_bulk+0x2bc/0x3e0
[    C6]  free_unref_page_commit+0x1fc/0x290
[    C6]  free_unref_page+0x184/0x250
[    C6]  __free_pages+0x154/0x1a0
[    C6]  free_pages+0x88/0xb0
[    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
[    C6]  __apply_to_page_range+0x3ec/0x650
[    C6]  apply_to_existing_page_range+0x1c/0x30
[    C6]  kasan_release_vmalloc+0xa4/0x118
[    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
[    C6]  drain_vmap_area_work+0x60/0xc0
[    C6]  process_one_work+0x4cc/0xa38
[    C6]  worker_thread+0x240/0x638
[    C6]  kthread+0x1c8/0x1e0
[    C6]  ret_from_fork+0x10/0x20



> 
> Kefeng Wang (3):
>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>    mm: kasan: shadow: move free_page() out of page table lock
>    mm: kasan: shadow: HACK add cond_resched_lock() in
>      kasan_depopulate_vmalloc_pte()
> 
>   include/linux/kasan.h |  9 ++++++---
>   mm/kasan/shadow.c     | 20 +++++++++++++-------
>   mm/vmalloc.c          |  7 ++++---
>   3 files changed, 23 insertions(+), 13 deletions(-)
>

Kefeng Wang Oct. 18, 2023, 2:16 p.m. UTC | #2

The issue is easy to reproduced with large vmalloc, kindly ping...

On 2023/9/15 8:58, Kefeng Wang wrote:
> Hi All， any suggest or comments，many thanks.
> 
> On 2023/9/6 20:42, Kefeng Wang wrote:
>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>> populate or depopulate pte with large region, looking forward to your
>> reply and advise, thanks.
> 
> Here is full stack，for populate pte，
> 
> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> [    C3] Modules linked in: test(OE+)
> [    C3] irq event stamp: 320776
> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>] 
> _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>] 
> el1_interrupt+0x38/0xa8
> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>] 
> __do_softirq+0x658/0x7ac
> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>] 
> ____do_softirq+0x18/0x30
> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C3] sp : ffff800093386d70
> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> [    C3] Call trace:
> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C3]  rmqueue_bulk+0x434/0x6b8
> [    C3]  get_page_from_freelist+0xdd4/0x1680
> [    C3]  __alloc_pages+0x244/0x508
> [    C3]  alloc_pages+0xf0/0x218
> [    C3]  __get_free_pages+0x1c/0x50
> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> [    C3]  __apply_to_page_range+0x3ec/0x650
> [    C3]  apply_to_page_range+0x1c/0x30
> [    C3]  kasan_populate_vmalloc+0x60/0x70
> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> [    C3]  alloc_vmap_area+0x4c/0x78
> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> [    C3]  __vmalloc_node_range+0x12c/0x340
> [    C3]  __vmalloc_node+0x8c/0xb0
> [    C3]  vmalloc+0x2c/0x40
> [    C3]  show_mem_init+0x1c/0xff8 [test]
> [    C3]  do_one_initcall+0xe4/0x500
> [    C3]  do_init_module+0x100/0x358
> [    C3]  load_module+0x2e64/0x2fc8
> [    C3]  init_module_from_file+0xec/0x148
> [    C3]  idempotent_init_module+0x278/0x380
> [    C3]  __arm64_sys_finit_module+0x88/0xf8
> [    C3]  invoke_syscall+0x64/0x188
> [    C3]  el0_svc_common.constprop.1+0xec/0x198
> [    C3]  do_el0_svc+0x48/0xc8
> [    C3]  el0_svc+0x3c/0xe8
> [    C3]  el0t_64_sync_handler+0xa0/0xc8
> [    C3]  el0t_64_sync+0x188/0x190
> 
> and for depopuldate pte，
> 
> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> [    C6] Modules linked in: test(OE+)
> [    C6] irq event stamp: 39458
> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>] 
> _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>] 
> el1_interrupt+0x38/0xa8
> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>] 
> __do_softirq+0x658/0x7ac
> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>] 
> ____do_softirq+0x18/0x30
> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL 
> 6.5.0+ #595
> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> [    C6] Workqueue: events drain_vmap_area_work
> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> [    C6] sp : ffff80008fe676b0
> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> [    C6] Call trace:
> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> [    C6]  free_unref_page_commit+0x1fc/0x290
> [    C6]  free_unref_page+0x184/0x250
> [    C6]  __free_pages+0x154/0x1a0
> [    C6]  free_pages+0x88/0xb0
> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> [    C6]  __apply_to_page_range+0x3ec/0x650
> [    C6]  apply_to_existing_page_range+0x1c/0x30
> [    C6]  kasan_release_vmalloc+0xa4/0x118
> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> [    C6]  drain_vmap_area_work+0x60/0xc0
> [    C6]  process_one_work+0x4cc/0xa38
> [    C6]  worker_thread+0x240/0x638
> [    C6]  kthread+0x1c8/0x1e0
> [    C6]  ret_from_fork+0x10/0x20
> 
> 
> 
>>
>> Kefeng Wang (3):
>>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>    mm: kasan: shadow: move free_page() out of page table lock
>>    mm: kasan: shadow: HACK add cond_resched_lock() in
>>      kasan_depopulate_vmalloc_pte()
>>
>>   include/linux/kasan.h |  9 ++++++---
>>   mm/kasan/shadow.c     | 20 +++++++++++++-------
>>   mm/vmalloc.c          |  7 ++++---
>>   3 files changed, 23 insertions(+), 13 deletions(-)
>>

Marco Elver Oct. 18, 2023, 4:37 p.m. UTC | #3

On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
<kasan-dev@googlegroups.com> wrote:
>
> The issue is easy to reproduced with large vmalloc, kindly ping...
>
> On 2023/9/15 8:58, Kefeng Wang wrote:
> > Hi All， any suggest or comments，many thanks.
> >
> > On 2023/9/6 20:42, Kefeng Wang wrote:
> >> This is a RFC, even patch3 is a hack to fix the softlock issue when
> >> populate or depopulate pte with large region, looking forward to your
> >> reply and advise, thanks.
> >
> > Here is full stack，for populate pte，
> >
> > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > [    C3] Modules linked in: test(OE+)
> > [    C3] irq event stamp: 320776
> > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > el1_interrupt+0x38/0xa8
> > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > __do_softirq+0x658/0x7ac
> > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > ____do_softirq+0x18/0x30
> > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C3] sp : ffff800093386d70
> > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > [    C3] Call trace:
> > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C3]  rmqueue_bulk+0x434/0x6b8
> > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > [    C3]  __alloc_pages+0x244/0x508
> > [    C3]  alloc_pages+0xf0/0x218
> > [    C3]  __get_free_pages+0x1c/0x50
> > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > [    C3]  __apply_to_page_range+0x3ec/0x650
> > [    C3]  apply_to_page_range+0x1c/0x30
> > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > [    C3]  alloc_vmap_area+0x4c/0x78
> > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > [    C3]  __vmalloc_node_range+0x12c/0x340
> > [    C3]  __vmalloc_node+0x8c/0xb0
> > [    C3]  vmalloc+0x2c/0x40
> > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > [    C3]  do_one_initcall+0xe4/0x500
> > [    C3]  do_init_module+0x100/0x358
> > [    C3]  load_module+0x2e64/0x2fc8
> > [    C3]  init_module_from_file+0xec/0x148
> > [    C3]  idempotent_init_module+0x278/0x380
> > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > [    C3]  invoke_syscall+0x64/0x188
> > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > [    C3]  do_el0_svc+0x48/0xc8
> > [    C3]  el0_svc+0x3c/0xe8
> > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > [    C3]  el0t_64_sync+0x188/0x190
> >
> > and for depopuldate pte，
> >
> > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > [    C6] Modules linked in: test(OE+)
> > [    C6] irq event stamp: 39458
> > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > el1_interrupt+0x38/0xa8
> > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > __do_softirq+0x658/0x7ac
> > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > ____do_softirq+0x18/0x30
> > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > 6.5.0+ #595
> > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > [    C6] Workqueue: events drain_vmap_area_work
> > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > [    C6] sp : ffff80008fe676b0
> > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > [    C6] Call trace:
> > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > [    C6]  free_unref_page_commit+0x1fc/0x290
> > [    C6]  free_unref_page+0x184/0x250
> > [    C6]  __free_pages+0x154/0x1a0
> > [    C6]  free_pages+0x88/0xb0
> > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > [    C6]  __apply_to_page_range+0x3ec/0x650
> > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > [    C6]  drain_vmap_area_work+0x60/0xc0
> > [    C6]  process_one_work+0x4cc/0xa38
> > [    C6]  worker_thread+0x240/0x638
> > [    C6]  kthread+0x1c8/0x1e0
> > [    C6]  ret_from_fork+0x10/0x20
> >
> >
> >
> >>
> >> Kefeng Wang (3):
> >>    mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
> >>    mm: kasan: shadow: move free_page() out of page table lock
> >>    mm: kasan: shadow: HACK add cond_resched_lock() in
> >>      kasan_depopulate_vmalloc_pte()

The first 2 patches look ok, but yeah, the last is a hack. I also
don't have any better suggestions, only more questions.

Does this only happen on arm64?
Do you have a minimal reproducer you can share?

Kefeng Wang Oct. 19, 2023, 1:40 a.m. UTC | #4

On 2023/10/19 0:37, Marco Elver wrote:
> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> <kasan-dev@googlegroups.com> wrote:
>>
>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>
>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>> Hi All， any suggest or comments，many thanks.
>>>
>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>> populate or depopulate pte with large region, looking forward to your
>>>> reply and advise, thanks.
>>>
>>> Here is full stack，for populate pte，
>>>
>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>> [    C3] Modules linked in: test(OE+)
>>> [    C3] irq event stamp: 320776
>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>> el1_interrupt+0x38/0xa8
>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>> __do_softirq+0x658/0x7ac
>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>> ____do_softirq+0x18/0x30
>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C3] sp : ffff800093386d70
>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>> [    C3] Call trace:
>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>> [    C3]  __alloc_pages+0x244/0x508
>>> [    C3]  alloc_pages+0xf0/0x218
>>> [    C3]  __get_free_pages+0x1c/0x50
>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>> [    C3]  apply_to_page_range+0x1c/0x30
>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>> [    C3]  vmalloc+0x2c/0x40
>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>> [    C3]  do_one_initcall+0xe4/0x500
>>> [    C3]  do_init_module+0x100/0x358
>>> [    C3]  load_module+0x2e64/0x2fc8
>>> [    C3]  init_module_from_file+0xec/0x148
>>> [    C3]  idempotent_init_module+0x278/0x380
>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>> [    C3]  invoke_syscall+0x64/0x188
>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>> [    C3]  do_el0_svc+0x48/0xc8
>>> [    C3]  el0_svc+0x3c/0xe8
>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>> [    C3]  el0t_64_sync+0x188/0x190
>>>
>>> and for depopuldate pte，
>>>
>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>> [    C6] Modules linked in: test(OE+)
>>> [    C6] irq event stamp: 39458
>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>> el1_interrupt+0x38/0xa8
>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>> __do_softirq+0x658/0x7ac
>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>> ____do_softirq+0x18/0x30
>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>> 6.5.0+ #595
>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>> [    C6] Workqueue: events drain_vmap_area_work
>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>> [    C6] sp : ffff80008fe676b0
>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>> [    C6] Call trace:
>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>> [    C6]  free_unref_page+0x184/0x250
>>> [    C6]  __free_pages+0x154/0x1a0
>>> [    C6]  free_pages+0x88/0xb0
>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>> [    C6]  process_one_work+0x4cc/0xa38
>>> [    C6]  worker_thread+0x240/0x638
>>> [    C6]  kthread+0x1c8/0x1e0
>>> [    C6]  ret_from_fork+0x10/0x20
>>>
>>>
>>>
>>>>
>>>> Kefeng Wang (3):
>>>>     mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>>>     mm: kasan: shadow: move free_page() out of page table lock
>>>>     mm: kasan: shadow: HACK add cond_resched_lock() in
>>>>       kasan_depopulate_vmalloc_pte()
> 
> The first 2 patches look ok, but yeah, the last is a hack. I also
> don't have any better suggestions, only more questions.

Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock 
to mutex lock only if KASAN enabled?

> 
> Does this only happen on arm64?

Our test case run on arm64 qemu(host is x86), so it run much more slower 
than real board.
> Do you have a minimal reproducer you can share?
Here is the code in test driver,

void *buf = vmalloc(40UL << 30);
vfree(buf);

Uladzislau Rezki Oct. 19, 2023, 6:17 a.m. UTC | #5

On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/10/19 0:37, Marco Elver wrote:
> > On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> > <kasan-dev@googlegroups.com> wrote:
> > > 
> > > The issue is easy to reproduced with large vmalloc, kindly ping...
> > > 
> > > On 2023/9/15 8:58, Kefeng Wang wrote:
> > > > Hi All， any suggest or comments，many thanks.
> > > > 
> > > > On 2023/9/6 20:42, Kefeng Wang wrote:
> > > > > This is a RFC, even patch3 is a hack to fix the softlock issue when
> > > > > populate or depopulate pte with large region, looking forward to your
> > > > > reply and advise, thanks.
> > > > 
> > > > Here is full stack，for populate pte，
> > > > 
> > > > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > > > [    C3] Modules linked in: test(OE+)
> > > > [    C3] irq event stamp: 320776
> > > > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > > > el1_interrupt+0x38/0xa8
> > > > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > > > __do_softirq+0x658/0x7ac
> > > > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > > > ____do_softirq+0x18/0x30
> > > > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > > > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C3] sp : ffff800093386d70
> > > > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > > > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > > > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > > > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > > > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > > > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > > > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > > > [    C3] Call trace:
> > > > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C3]  rmqueue_bulk+0x434/0x6b8
> > > > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > > > [    C3]  __alloc_pages+0x244/0x508
> > > > [    C3]  alloc_pages+0xf0/0x218
> > > > [    C3]  __get_free_pages+0x1c/0x50
> > > > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > > > [    C3]  __apply_to_page_range+0x3ec/0x650
> > > > [    C3]  apply_to_page_range+0x1c/0x30
> > > > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > > > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > > > [    C3]  alloc_vmap_area+0x4c/0x78
> > > > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > > > [    C3]  __vmalloc_node_range+0x12c/0x340
> > > > [    C3]  __vmalloc_node+0x8c/0xb0
> > > > [    C3]  vmalloc+0x2c/0x40
> > > > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > > > [    C3]  do_one_initcall+0xe4/0x500
> > > > [    C3]  do_init_module+0x100/0x358
> > > > [    C3]  load_module+0x2e64/0x2fc8
> > > > [    C3]  init_module_from_file+0xec/0x148
> > > > [    C3]  idempotent_init_module+0x278/0x380
> > > > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > > > [    C3]  invoke_syscall+0x64/0x188
> > > > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > > > [    C3]  do_el0_svc+0x48/0xc8
> > > > [    C3]  el0_svc+0x3c/0xe8
> > > > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > > > [    C3]  el0t_64_sync+0x188/0x190
> > > > 
> > > > and for depopuldate pte，
> > > > 
> > > > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > > > [    C6] Modules linked in: test(OE+)
> > > > [    C6] irq event stamp: 39458
> > > > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > > > el1_interrupt+0x38/0xa8
> > > > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > > > __do_softirq+0x658/0x7ac
> > > > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > > > ____do_softirq+0x18/0x30
> > > > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > > > 6.5.0+ #595
> > > > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > [    C6] Workqueue: events drain_vmap_area_work
> > > > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > [    C6] sp : ffff80008fe676b0
> > > > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > > > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > > > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > > > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > > > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > > > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > > > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > > > [    C6] Call trace:
> > > > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > > > [    C6]  free_unref_page_commit+0x1fc/0x290
> > > > [    C6]  free_unref_page+0x184/0x250
> > > > [    C6]  __free_pages+0x154/0x1a0
> > > > [    C6]  free_pages+0x88/0xb0
> > > > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > > > [    C6]  __apply_to_page_range+0x3ec/0x650
> > > > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > > > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > > > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > > > [    C6]  drain_vmap_area_work+0x60/0xc0
> > > > [    C6]  process_one_work+0x4cc/0xa38
> > > > [    C6]  worker_thread+0x240/0x638
> > > > [    C6]  kthread+0x1c8/0x1e0
> > > > [    C6]  ret_from_fork+0x10/0x20
> > > > 
> > > > 
> > > > 
> > > > > 
> > > > > Kefeng Wang (3):
> > > > >     mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
> > > > >     mm: kasan: shadow: move free_page() out of page table lock
> > > > >     mm: kasan: shadow: HACK add cond_resched_lock() in
> > > > >       kasan_depopulate_vmalloc_pte()
> > 
> > The first 2 patches look ok, but yeah, the last is a hack. I also
> > don't have any better suggestions, only more questions.
> 
> Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock to
> mutex lock only if KASAN enabled?
> 
I do not think it is a good suggestion. Could you please clarify the
reason of such conversion?

> > 
> > Does this only happen on arm64?
> 
> Our test case run on arm64 qemu(host is x86), so it run much more slower
> than real board.
> > Do you have a minimal reproducer you can share?
> Here is the code in test driver,
> 
> void *buf = vmalloc(40UL << 30);
> vfree(buf);
> 
What is a test driver? Why do you need 42G of memmory, for which purpose?

--
Uladzislau Rezki

Kefeng Wang Oct. 19, 2023, 7:26 a.m. UTC | #6

On 2023/10/19 14:17, Uladzislau Rezki wrote:
> On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
>>
>>
>> On 2023/10/19 0:37, Marco Elver wrote:
>>> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
>>> <kasan-dev@googlegroups.com> wrote:
>>>>
>>>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>>>
>>>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>>>> Hi All， any suggest or comments，many thanks.
>>>>>
>>>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>>>> populate or depopulate pte with large region, looking forward to your
>>>>>> reply and advise, thanks.
>>>>>
>>>>> Here is full stack，for populate pte，
>>>>>
>>>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>>>> [    C3] Modules linked in: test(OE+)
>>>>> [    C3] irq event stamp: 320776
>>>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>>>> el1_interrupt+0x38/0xa8
>>>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>>>> __do_softirq+0x658/0x7ac
>>>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>>>> ____do_softirq+0x18/0x30
>>>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C3] sp : ffff800093386d70
>>>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>>>> [    C3] Call trace:
>>>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>>>> [    C3]  __alloc_pages+0x244/0x508
>>>>> [    C3]  alloc_pages+0xf0/0x218
>>>>> [    C3]  __get_free_pages+0x1c/0x50
>>>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>>>> [    C3]  apply_to_page_range+0x1c/0x30
>>>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>>>> [    C3]  vmalloc+0x2c/0x40
>>>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>>>> [    C3]  do_one_initcall+0xe4/0x500
>>>>> [    C3]  do_init_module+0x100/0x358
>>>>> [    C3]  load_module+0x2e64/0x2fc8
>>>>> [    C3]  init_module_from_file+0xec/0x148
>>>>> [    C3]  idempotent_init_module+0x278/0x380
>>>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>>>> [    C3]  invoke_syscall+0x64/0x188
>>>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>>>> [    C3]  do_el0_svc+0x48/0xc8
>>>>> [    C3]  el0_svc+0x3c/0xe8
>>>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>>>> [    C3]  el0t_64_sync+0x188/0x190
>>>>>
>>>>> and for depopuldate pte，
>>>>>
>>>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>>>> [    C6] Modules linked in: test(OE+)
>>>>> [    C6] irq event stamp: 39458
>>>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>>>> el1_interrupt+0x38/0xa8
>>>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>>>> __do_softirq+0x658/0x7ac
>>>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>>>> ____do_softirq+0x18/0x30
>>>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>>>> 6.5.0+ #595
>>>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>> [    C6] Workqueue: events drain_vmap_area_work
>>>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>> [    C6] sp : ffff80008fe676b0
>>>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>>>> [    C6] Call trace:
>>>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>>>> [    C6]  free_unref_page+0x184/0x250
>>>>> [    C6]  __free_pages+0x154/0x1a0
>>>>> [    C6]  free_pages+0x88/0xb0
>>>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>>>> [    C6]  process_one_work+0x4cc/0xa38
>>>>> [    C6]  worker_thread+0x240/0x638
>>>>> [    C6]  kthread+0x1c8/0x1e0
>>>>> [    C6]  ret_from_fork+0x10/0x20
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Kefeng Wang (3):
>>>>>>      mm: kasan: shadow: add cond_resched() in kasan_populate_vmalloc_pte()
>>>>>>      mm: kasan: shadow: move free_page() out of page table lock
>>>>>>      mm: kasan: shadow: HACK add cond_resched_lock() in
>>>>>>        kasan_depopulate_vmalloc_pte()
>>>
>>> The first 2 patches look ok, but yeah, the last is a hack. I also
>>> don't have any better suggestions, only more questions.
>>
>> Thanks Marco, maybe we could convert free_vmap_area_lock from spinlock to
>> mutex lock only if KASAN enabled?
>>
> I do not think it is a good suggestion. Could you please clarify the
> reason of such conversion?

See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan 
will populate and depopulate vmalloc pte, those will spend more time 
than no-kasan kernel, for unmap, and there is already a 
cond_resched_lock() in __purge_vmap_area_lazy(), but with more time 
consumed under spinlock(free_vmap_area_lock), and we couldn't add 
cond_resched_lock in kasan_depopulate_vmalloc_pte(), so if spin lock 
converted to mutex lock, we could add a cond_resched into kasan 
depopulate, this is why make such conversion if kasan enabled, but this
conversion maybe not correct, any better solution?

> 
>>>
>>> Does this only happen on arm64?
>>
>> Our test case run on arm64 qemu(host is x86), so it run much more slower
>> than real board.
>>> Do you have a minimal reproducer you can share?
>> Here is the code in test driver,
>>
>> void *buf = vmalloc(40UL << 30);
>> vfree(buf);
>>
> What is a test driver? Why do you need 42G of memmory, for which purpose?

This is just to accelerate reproduction of above issue, the main reason 
of the issue is too much time spent during kasan_populate_vmalloc() and
kasan_release_vmalloc().

Thanks.

> 
> --
> Uladzislau Rezki
>

Uladzislau Rezki Oct. 19, 2023, 8:53 a.m. UTC | #7

On Thu, Oct 19, 2023 at 03:26:48PM +0800, Kefeng Wang wrote:
> 
> 
> On 2023/10/19 14:17, Uladzislau Rezki wrote:
> > On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
> > > 
> > > 
> > > On 2023/10/19 0:37, Marco Elver wrote:
> > > > On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
> > > > <kasan-dev@googlegroups.com> wrote:
> > > > > 
> > > > > The issue is easy to reproduced with large vmalloc, kindly ping...
> > > > > 
> > > > > On 2023/9/15 8:58, Kefeng Wang wrote:
> > > > > > Hi All， any suggest or comments，many thanks.
> > > > > > 
> > > > > > On 2023/9/6 20:42, Kefeng Wang wrote:
> > > > > > > This is a RFC, even patch3 is a hack to fix the softlock issue when
> > > > > > > populate or depopulate pte with large region, looking forward to your
> > > > > > > reply and advise, thanks.
> > > > > > 
> > > > > > Here is full stack，for populate pte，
> > > > > > 
> > > > > > [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
> > > > > > [    C3] Modules linked in: test(OE+)
> > > > > > [    C3] irq event stamp: 320776
> > > > > > [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
> > > > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
> > > > > > el1_interrupt+0x38/0xa8
> > > > > > [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
> > > > > > __do_softirq+0x658/0x7ac
> > > > > > [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
> > > > > > ____do_softirq+0x18/0x30
> > > > > > [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
> > > > > > [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > > > [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > > > [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C3] sp : ffff800093386d70
> > > > > > [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
> > > > > > [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
> > > > > > [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
> > > > > > [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > > > [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
> > > > > > [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
> > > > > > [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > > > [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > > > [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
> > > > > > [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
> > > > > > [    C3] Call trace:
> > > > > > [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C3]  rmqueue_bulk+0x434/0x6b8
> > > > > > [    C3]  get_page_from_freelist+0xdd4/0x1680
> > > > > > [    C3]  __alloc_pages+0x244/0x508
> > > > > > [    C3]  alloc_pages+0xf0/0x218
> > > > > > [    C3]  __get_free_pages+0x1c/0x50
> > > > > > [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
> > > > > > [    C3]  __apply_to_page_range+0x3ec/0x650
> > > > > > [    C3]  apply_to_page_range+0x1c/0x30
> > > > > > [    C3]  kasan_populate_vmalloc+0x60/0x70
> > > > > > [    C3]  alloc_vmap_area.part.67+0x328/0xe50
> > > > > > [    C3]  alloc_vmap_area+0x4c/0x78
> > > > > > [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
> > > > > > [    C3]  __vmalloc_node_range+0x12c/0x340
> > > > > > [    C3]  __vmalloc_node+0x8c/0xb0
> > > > > > [    C3]  vmalloc+0x2c/0x40
> > > > > > [    C3]  show_mem_init+0x1c/0xff8 [test]
> > > > > > [    C3]  do_one_initcall+0xe4/0x500
> > > > > > [    C3]  do_init_module+0x100/0x358
> > > > > > [    C3]  load_module+0x2e64/0x2fc8
> > > > > > [    C3]  init_module_from_file+0xec/0x148
> > > > > > [    C3]  idempotent_init_module+0x278/0x380
> > > > > > [    C3]  __arm64_sys_finit_module+0x88/0xf8
> > > > > > [    C3]  invoke_syscall+0x64/0x188
> > > > > > [    C3]  el0_svc_common.constprop.1+0xec/0x198
> > > > > > [    C3]  do_el0_svc+0x48/0xc8
> > > > > > [    C3]  el0_svc+0x3c/0xe8
> > > > > > [    C3]  el0t_64_sync_handler+0xa0/0xc8
> > > > > > [    C3]  el0t_64_sync+0x188/0x190
> > > > > > 
This trace is stuck in the rmqueue_bulk() because you request a
huge alloc size. It has nothing to do with free_vmap_area_lock,
it is about bulk allocator. It gets stuck to accomplish such
demand.


> > > > > > and for depopuldate pte，
> > > > > > 
> > > > > > [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
> > > > > > [    C6] Modules linked in: test(OE+)
> > > > > > [    C6] irq event stamp: 39458
> > > > > > [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
> > > > > > _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
> > > > > > el1_interrupt+0x38/0xa8
> > > > > > [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
> > > > > > __do_softirq+0x658/0x7ac
> > > > > > [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
> > > > > > ____do_softirq+0x18/0x30
> > > > > > [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
> > > > > > 6.5.0+ #595
> > > > > > [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> > > > > > [    C6] Workqueue: events drain_vmap_area_work
> > > > > > [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > > > > [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
> > > > > > [    C6] sp : ffff80008fe676b0
> > > > > > [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
> > > > > > [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
> > > > > > [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
> > > > > > [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
> > > > > > [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
> > > > > > [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
> > > > > > [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
> > > > > > [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
> > > > > > [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
> > > > > > [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
> > > > > > [    C6] Call trace:
> > > > > > [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
> > > > > > [    C6]  free_pcppages_bulk+0x2bc/0x3e0
> > > > > > [    C6]  free_unref_page_commit+0x1fc/0x290
> > > > > > [    C6]  free_unref_page+0x184/0x250
> > > > > > [    C6]  __free_pages+0x154/0x1a0
> > > > > > [    C6]  free_pages+0x88/0xb0
> > > > > > [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
> > > > > > [    C6]  __apply_to_page_range+0x3ec/0x650
> > > > > > [    C6]  apply_to_existing_page_range+0x1c/0x30
> > > > > > [    C6]  kasan_release_vmalloc+0xa4/0x118
> > > > > > [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
> > > > > > [    C6]  drain_vmap_area_work+0x60/0xc0
> > > > > > [    C6]  process_one_work+0x4cc/0xa38
> > > > > > [    C6]  worker_thread+0x240/0x638
> > > > > > [    C6]  kthread+0x1c8/0x1e0
> > > > > > [    C6]  ret_from_fork+0x10/0x20
> > > > > > 
> 
> See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan will
> populate and depopulate vmalloc pte, those will spend more time than
> no-kasan kernel, for unmap, and there is already a cond_resched_lock() in
> __purge_vmap_area_lazy(), but with more time consumed under
> spinlock(free_vmap_area_lock), and we couldn't add cond_resched_lock in
> kasan_depopulate_vmalloc_pte(), so if spin lock converted to mutex lock, we
> could add a cond_resched into kasan depopulate, this is why make such
> conversion if kasan enabled, but this
> conversion maybe not correct, any better solution?
> 
I have at least below thoughts:

a) Add a max allowed threshold that user can request over vmalloc() call.
  I do not think ~40G is a correct request.

b) This can fix unmap path: 

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ef8599d394fd..988735da5c5c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1723,7 +1723,6 @@ static void purge_fragmented_blocks_allcpus(void);
  */
 static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 {
-	unsigned long resched_threshold;
 	unsigned int num_purged_areas = 0;
 	struct list_head local_purge_list;
 	struct vmap_area *va, *n_va;
@@ -1747,36 +1746,32 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
 			struct vmap_area, list)->va_end);
 
 	flush_tlb_kernel_range(start, end);
-	resched_threshold = lazy_max_pages() << 1;
 
-	spin_lock(&free_vmap_area_lock);
 	list_for_each_entry_safe(va, n_va, &local_purge_list, list) {
 		unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
 		unsigned long orig_start = va->va_start;
 		unsigned long orig_end = va->va_end;
 
+		if (is_vmalloc_or_module_addr((void *)orig_start))
+			kasan_release_vmalloc(orig_start, orig_end,
+					      va->va_start, va->va_end);
+
 		/*
 		 * Finally insert or merge lazily-freed area. It is
 		 * detached and there is no need to "unlink" it from
 		 * anything.
 		 */
+		spin_lock(&free_vmap_area_lock);
 		va = merge_or_add_vmap_area_augment(va, &free_vmap_area_root,
 				&free_vmap_area_list);
+		spin_unlock(&free_vmap_area_lock);
 
 		if (!va)
 			continue;
 
-		if (is_vmalloc_or_module_addr((void *)orig_start))
-			kasan_release_vmalloc(orig_start, orig_end,
-					      va->va_start, va->va_end);
-
 		atomic_long_sub(nr, &vmap_lazy_nr);
 		num_purged_areas++;
-
-		if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
-			cond_resched_lock(&free_vmap_area_lock);
 	}
-	spin_unlock(&free_vmap_area_lock);
 
 out:
 	trace_purge_vmap_area_lazy(start, end, num_purged_areas);
<snip>

c) bulk-path i have not checked, but on a high level kasan_populate_vmalloc()
should take a breath between requests.

--
Uladzislau Rezki

Kefeng Wang Oct. 19, 2023, 9:47 a.m. UTC | #8

On 2023/10/19 16:53, Uladzislau Rezki wrote:
> On Thu, Oct 19, 2023 at 03:26:48PM +0800, Kefeng Wang wrote:
>>
>>
>> On 2023/10/19 14:17, Uladzislau Rezki wrote:
>>> On Thu, Oct 19, 2023 at 09:40:10AM +0800, Kefeng Wang wrote:
>>>>
>>>>
>>>> On 2023/10/19 0:37, Marco Elver wrote:
>>>>> On Wed, 18 Oct 2023 at 16:16, 'Kefeng Wang' via kasan-dev
>>>>> <kasan-dev@googlegroups.com> wrote:
>>>>>>
>>>>>> The issue is easy to reproduced with large vmalloc, kindly ping...
>>>>>>
>>>>>> On 2023/9/15 8:58, Kefeng Wang wrote:
>>>>>>> Hi All， any suggest or comments，many thanks.
>>>>>>>
>>>>>>> On 2023/9/6 20:42, Kefeng Wang wrote:
>>>>>>>> This is a RFC, even patch3 is a hack to fix the softlock issue when
>>>>>>>> populate or depopulate pte with large region, looking forward to your
>>>>>>>> reply and advise, thanks.
>>>>>>>
>>>>>>> Here is full stack，for populate pte，
>>>>>>>
>>>>>>> [    C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [insmod:458]
>>>>>>> [    C3] Modules linked in: test(OE+)
>>>>>>> [    C3] irq event stamp: 320776
>>>>>>> [    C3] hardirqs last  enabled at (320775): [<ffff8000815a0c98>]
>>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C3] hardirqs last disabled at (320776): [<ffff8000815816e0>]
>>>>>>> el1_interrupt+0x38/0xa8
>>>>>>> [    C3] softirqs last  enabled at (318174): [<ffff800080040ba8>]
>>>>>>> __do_softirq+0x658/0x7ac
>>>>>>> [    C3] softirqs last disabled at (318169): [<ffff800080047fd8>]
>>>>>>> ____do_softirq+0x18/0x30
>>>>>>> [    C3] CPU: 3 PID: 458 Comm: insmod Tainted: G           OE 6.5.0+ #595
>>>>>>> [    C3] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>>>> [    C3] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> [    C3] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C3] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C3] sp : ffff800093386d70
>>>>>>> [    C3] x29: ffff800093386d70 x28: 0000000000000801 x27: ffff0007ffffa9c0
>>>>>>> [    C3] x26: 0000000000000000 x25: 000000000000003f x24: fffffc0004353708
>>>>>>> [    C3] x23: ffff0006d476bad8 x22: fffffc0004353748 x21: 0000000000000000
>>>>>>> [    C3] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>>>> [    C3] x17: ffff80008024e7fc x16: ffff80008055a8f0 x15: ffff80008024ec60
>>>>>>> [    C3] x14: ffff80008024ead0 x13: ffff80008024e7fc x12: ffff6000fffff5f9
>>>>>>> [    C3] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>>>> [    C3] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>>>> [    C3] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700012670d70
>>>>>>> [    C3] x2 : 0000000000000001 x1 : c9a5dbfae610fa24 x0 : 000000000004e507
>>>>>>> [    C3] Call trace:
>>>>>>> [    C3]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C3]  rmqueue_bulk+0x434/0x6b8
>>>>>>> [    C3]  get_page_from_freelist+0xdd4/0x1680
>>>>>>> [    C3]  __alloc_pages+0x244/0x508
>>>>>>> [    C3]  alloc_pages+0xf0/0x218
>>>>>>> [    C3]  __get_free_pages+0x1c/0x50
>>>>>>> [    C3]  kasan_populate_vmalloc_pte+0x30/0x188
>>>>>>> [    C3]  __apply_to_page_range+0x3ec/0x650
>>>>>>> [    C3]  apply_to_page_range+0x1c/0x30
>>>>>>> [    C3]  kasan_populate_vmalloc+0x60/0x70
>>>>>>> [    C3]  alloc_vmap_area.part.67+0x328/0xe50
>>>>>>> [    C3]  alloc_vmap_area+0x4c/0x78
>>>>>>> [    C3]  __get_vm_area_node.constprop.76+0x130/0x240
>>>>>>> [    C3]  __vmalloc_node_range+0x12c/0x340
>>>>>>> [    C3]  __vmalloc_node+0x8c/0xb0
>>>>>>> [    C3]  vmalloc+0x2c/0x40
>>>>>>> [    C3]  show_mem_init+0x1c/0xff8 [test]
>>>>>>> [    C3]  do_one_initcall+0xe4/0x500
>>>>>>> [    C3]  do_init_module+0x100/0x358
>>>>>>> [    C3]  load_module+0x2e64/0x2fc8
>>>>>>> [    C3]  init_module_from_file+0xec/0x148
>>>>>>> [    C3]  idempotent_init_module+0x278/0x380
>>>>>>> [    C3]  __arm64_sys_finit_module+0x88/0xf8
>>>>>>> [    C3]  invoke_syscall+0x64/0x188
>>>>>>> [    C3]  el0_svc_common.constprop.1+0xec/0x198
>>>>>>> [    C3]  do_el0_svc+0x48/0xc8
>>>>>>> [    C3]  el0_svc+0x3c/0xe8
>>>>>>> [    C3]  el0t_64_sync_handler+0xa0/0xc8
>>>>>>> [    C3]  el0t_64_sync+0x188/0x190
>>>>>>>
> This trace is stuck in the rmqueue_bulk() because you request a
> huge alloc size. It has nothing to do with free_vmap_area_lock,
> it is about bulk allocator. It gets stuck to accomplish such
> demand.

Yes, this is not about spinlock issue, it runs too much time in
kasan_populate_vmalloc() as the __apply_to_page_range() with a
large range, and this issue could be fixed by adding a cond_resched()
in kasan_populate_vmalloc(), see patch1.



> 
> 
>>>>>>> and for depopuldate pte，
>>>>>>>
>>>>>>> [    C6] watchdog: BUG: soft lockup - CPU#6 stuck for 48s! [kworker/6:1:59]
>>>>>>> [    C6] Modules linked in: test(OE+)
>>>>>>> [    C6] irq event stamp: 39458
>>>>>>> [    C6] hardirqs last  enabled at (39457): [<ffff8000815a0c98>]
>>>>>>> _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C6] hardirqs last disabled at (39458): [<ffff8000815816e0>]
>>>>>>> el1_interrupt+0x38/0xa8
>>>>>>> [    C6] softirqs last  enabled at (39420): [<ffff800080040ba8>]
>>>>>>> __do_softirq+0x658/0x7ac
>>>>>>> [    C6] softirqs last disabled at (39415): [<ffff800080047fd8>]
>>>>>>> ____do_softirq+0x18/0x30
>>>>>>> [    C6] CPU: 6 PID: 59 Comm: kworker/6:1 Tainted: G           OEL
>>>>>>> 6.5.0+ #595
>>>>>>> [    C6] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
>>>>>>> [    C6] Workqueue: events drain_vmap_area_work
>>>>>>> [    C6] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> [    C6] pc : _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C6] lr : _raw_spin_unlock_irqrestore+0x98/0xb8
>>>>>>> [    C6] sp : ffff80008fe676b0
>>>>>>> [    C6] x29: ffff80008fe676b0 x28: fffffc000601d310 x27: ffff000edf5dfa80
>>>>>>> [    C6] x26: ffff000edf5dfad8 x25: 0000000000000000 x24: 0000000000000006
>>>>>>> [    C6] x23: ffff000edf5dfad4 x22: 0000000000000000 x21: 0000000000000006
>>>>>>> [    C6] x20: ffff0007ffffafc0 x19: 0000000000000000 x18: 0000000000000000
>>>>>>> [    C6] x17: ffff8000805544b8 x16: ffff800080553d94 x15: ffff8000805c11b0
>>>>>>> [    C6] x14: ffff8000805baeb0 x13: ffff800080047e10 x12: ffff6000fffff5f9
>>>>>>> [    C6] x11: 1fffe000fffff5f8 x10: ffff6000fffff5f8 x9 : 1fffe000fffff5f8
>>>>>>> [    C6] x8 : dfff800000000000 x7 : 00000000f2000000 x6 : dfff800000000000
>>>>>>> [    C6] x5 : 00000000f2f2f200 x4 : dfff800000000000 x3 : ffff700011fcce98
>>>>>>> [    C6] x2 : 0000000000000001 x1 : cf09d5450e2b4f7f x0 : 0000000000009a21
>>>>>>> [    C6] Call trace:
>>>>>>> [    C6]  _raw_spin_unlock_irqrestore+0x50/0xb8
>>>>>>> [    C6]  free_pcppages_bulk+0x2bc/0x3e0
>>>>>>> [    C6]  free_unref_page_commit+0x1fc/0x290
>>>>>>> [    C6]  free_unref_page+0x184/0x250
>>>>>>> [    C6]  __free_pages+0x154/0x1a0
>>>>>>> [    C6]  free_pages+0x88/0xb0
>>>>>>> [    C6]  kasan_depopulate_vmalloc_pte+0x58/0x80
>>>>>>> [    C6]  __apply_to_page_range+0x3ec/0x650
>>>>>>> [    C6]  apply_to_existing_page_range+0x1c/0x30
>>>>>>> [    C6]  kasan_release_vmalloc+0xa4/0x118
>>>>>>> [    C6]  __purge_vmap_area_lazy+0x4f4/0xe30
>>>>>>> [    C6]  drain_vmap_area_work+0x60/0xc0
>>>>>>> [    C6]  process_one_work+0x4cc/0xa38
>>>>>>> [    C6]  worker_thread+0x240/0x638
>>>>>>> [    C6]  kthread+0x1c8/0x1e0
>>>>>>> [    C6]  ret_from_fork+0x10/0x20
>>>>>>>
>>
>> See Call Trace of softlock, when map/unmap the vmalloc buf, the kasan will
>> populate and depopulate vmalloc pte, those will spend more time than
>> no-kasan kernel, for unmap, and there is already a cond_resched_lock() in
>> __purge_vmap_area_lazy(), but with more time consumed under
>> spinlock(free_vmap_area_lock), and we couldn't add cond_resched_lock in
>> kasan_depopulate_vmalloc_pte(), so if spin lock converted to mutex lock, we
>> could add a cond_resched into kasan depopulate, this is why make such
>> conversion if kasan enabled, but this
>> conversion maybe not correct, any better solution?
>>
> I have at least below thoughts:
> 
> a) Add a max allowed threshold that user can request over vmalloc() call.
>    I do not think ~40G is a correct request.

I don't know, but maybe some driver could map large range , but we do
meet this issue in qemu, though it is very low probability.

> 
> b) This can fix unmap path:
> 
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ef8599d394fd..988735da5c5c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1723,7 +1723,6 @@ static void purge_fragmented_blocks_allcpus(void);
>    */
>   static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>   {
> -	unsigned long resched_threshold;
>   	unsigned int num_purged_areas = 0;
>   	struct list_head local_purge_list;
>   	struct vmap_area *va, *n_va;
> @@ -1747,36 +1746,32 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>   			struct vmap_area, list)->va_end);
>   
>   	flush_tlb_kernel_range(start, end);
> -	resched_threshold = lazy_max_pages() << 1;
>   
> -	spin_lock(&free_vmap_area_lock);
>   	list_for_each_entry_safe(va, n_va, &local_purge_list, list) {
>   		unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
>   		unsigned long orig_start = va->va_start;
>   		unsigned long orig_end = va->va_end;
>   
> +		if (is_vmalloc_or_module_addr((void *)orig_start))
> +			kasan_release_vmalloc(orig_start, orig_end,
> +					      va->va_start, va->va_end);


> +
>   		/*
>   		 * Finally insert or merge lazily-freed area. It is
>   		 * detached and there is no need to "unlink" it from
>   		 * anything.
>   		 */
> +		spin_lock(&free_vmap_area_lock);
>   		va = merge_or_add_vmap_area_augment(va, &free_vmap_area_root,
>   				&free_vmap_area_list);
> +		spin_unlock(&free_vmap_area_lock);
>   
>   		if (!va)
>   			continue;
>   
> -		if (is_vmalloc_or_module_addr((void *)orig_start))
> -			kasan_release_vmalloc(orig_start, orig_end,
> -					      va->va_start, va->va_end);
> -
>   		atomic_long_sub(nr, &vmap_lazy_nr);
>   		num_purged_areas++;
> -
> -		if (atomic_long_read(&vmap_lazy_nr) < resched_threshold)
> -			cond_resched_lock(&free_vmap_area_lock);
>   	}
> -	spin_unlock(&free_vmap_area_lock);
>   
>   out:
>   	trace_purge_vmap_area_lazy(start, end, num_purged_areas);
> <snip>

Thanks for you suggestion. but check kasan_release_vmalloc(), it seems
that kasan_release_vmalloc() need free_vmap_area_lock from comment[1],

Marco and all kasan maintainers, please help to check the above way.

[1] https://elixir.bootlin.com/linux/v6.6-rc6/source/mm/kasan/shadow.c#L491

> 
> c) bulk-path i have not checked, but on a high level kasan_populate_vmalloc()
> should take a breath between requests.
> 
> --
> Uladzislau Rezki
>

[-rfc,0/3] mm: kasan: fix softlock when populate or depopulate pte

Message

Comments