Message ID | 20201105170249.387069-1-minchan@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Revert "mm/vunmap: add cond_resched() in vunmap_pmd_range" | expand |
On Thu, Nov 05, 2020 at 09:02:49AM -0800, Minchan Kim wrote: > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > While I was doing zram testing, I found sometimes decompression failed > since the compression buffer was corrupted. With investigation, > I found below commit calls cond_resched unconditionally so it could > make a problem in atomic context if the task is reschedule. I don't think you're supposed to call unmap_kernel_range() from atomic context. At least vfree() punts to __vfree_deferred() if in_interrupt() is true. I forget the original reason for why that is.
On Thu, Nov 05, 2020 at 05:16:02PM +0000, Matthew Wilcox wrote: > On Thu, Nov 05, 2020 at 09:02:49AM -0800, Minchan Kim wrote: > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > While I was doing zram testing, I found sometimes decompression failed > > since the compression buffer was corrupted. With investigation, > > I found below commit calls cond_resched unconditionally so it could > > make a problem in atomic context if the task is reschedule. > > I don't think you're supposed to call unmap_kernel_range() from > atomic context. At least vfree() punts to __vfree_deferred() if > in_interrupt() is true. I forget the original reason for why that is. It would be desirable, However, the logic was there for several years and made regression from the commit in stable kernel for now. Can't we have graceful shutdown if we want to deprecate the API usecase in atomic context?
On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > While I was doing zram testing, I found sometimes decompression failed > since the compression buffer was corrupted. With investigation, > I found below commit calls cond_resched unconditionally so it could > make a problem in atomic context if the task is reschedule. > > Revert the original commit for now. > > [ 55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108 > [ 55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog > [ 55.111973] 3 locks held by memhog/946: > [ 55.112807] #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160 > [ 55.114151] #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160 > [ 55.115848] #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0 > [ 55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316 > [ 55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > [ 55.122540] Call Trace: > [ 55.122974] dump_stack+0x8b/0xb8 > [ 55.123588] ___might_sleep.cold+0xb6/0xc6 > [ 55.124328] unmap_kernel_range_noflush+0x2eb/0x350 > [ 55.125198] unmap_kernel_range+0x14/0x30 > [ 55.125920] zs_unmap_object+0xd5/0xe0 > [ 55.126604] zram_bvec_rw.isra.0+0x38c/0x8e0 > [ 55.127462] zram_rw_page+0x90/0x101 > [ 55.128199] bdev_write_page+0x92/0xe0 > [ 55.128957] ? swap_slot_free_notify+0xb0/0xb0 > [ 55.129841] __swap_writepage+0x94/0x4a0 > [ 55.130636] ? do_raw_spin_unlock+0x4b/0xa0 > [ 55.131462] ? _raw_spin_unlock+0x1f/0x30 > [ 55.132261] ? page_swapcount+0x6c/0x90 > [ 55.133038] pageout+0xe3/0x3a0 > [ 55.133702] shrink_page_list+0xb94/0xd60 > [ 55.134626] shrink_inactive_list+0x158/0x460 > > ... > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -102,8 +102,6 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > if (pmd_none_or_clear_bad(pmd)) > continue; > vunmap_pte_range(pmd, addr, next, mask); > - > - cond_resched(); > } while (pmd++, addr = next, addr != end); > } If this is triggering a warning then why isn't the might_sleep() in remove_vm_area() also triggering? Sigh. I also cannot remember why these vfree() functions have to be so awkward. The mutex_trylock(&vmap_purge_lock) isn't permitted in interrupt context because mutex_trylock() is stupid, but what was the issue with non-interrupt atomic code?
Hi Andrew, On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > While I was doing zram testing, I found sometimes decompression failed > > since the compression buffer was corrupted. With investigation, > > I found below commit calls cond_resched unconditionally so it could > > make a problem in atomic context if the task is reschedule. > > > > Revert the original commit for now. > > > > [ 55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108 > > [ 55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog > > [ 55.111973] 3 locks held by memhog/946: > > [ 55.112807] #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160 > > [ 55.114151] #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160 > > [ 55.115848] #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0 > > [ 55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316 > > [ 55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > [ 55.122540] Call Trace: > > [ 55.122974] dump_stack+0x8b/0xb8 > > [ 55.123588] ___might_sleep.cold+0xb6/0xc6 > > [ 55.124328] unmap_kernel_range_noflush+0x2eb/0x350 > > [ 55.125198] unmap_kernel_range+0x14/0x30 > > [ 55.125920] zs_unmap_object+0xd5/0xe0 > > [ 55.126604] zram_bvec_rw.isra.0+0x38c/0x8e0 > > [ 55.127462] zram_rw_page+0x90/0x101 > > [ 55.128199] bdev_write_page+0x92/0xe0 > > [ 55.128957] ? swap_slot_free_notify+0xb0/0xb0 > > [ 55.129841] __swap_writepage+0x94/0x4a0 > > [ 55.130636] ? do_raw_spin_unlock+0x4b/0xa0 > > [ 55.131462] ? _raw_spin_unlock+0x1f/0x30 > > [ 55.132261] ? page_swapcount+0x6c/0x90 > > [ 55.133038] pageout+0xe3/0x3a0 > > [ 55.133702] shrink_page_list+0xb94/0xd60 > > [ 55.134626] shrink_inactive_list+0x158/0x460 > > > > ... > > > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -102,8 +102,6 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > > if (pmd_none_or_clear_bad(pmd)) > > continue; > > vunmap_pte_range(pmd, addr, next, mask); > > - > > - cond_resched(); > > } while (pmd++, addr = next, addr != end); > > } > > If this is triggering a warning then why isn't the might_sleep() in > remove_vm_area() also triggering? I don't understand what specific callpath you are talking but if it's clearly called in atomic context, the reason would be config combination I met. CONFIG_PREEMPT_VOLUNTARY + no CONFIG_DEBUG_ATOMIC_SLEEP It makes preempt_count logic void so might_sleep warning will not work. > > Sigh. I also cannot remember why these vfree() functions have to be so > awkward. The mutex_trylock(&vmap_purge_lock) isn't permitted in > interrupt context because mutex_trylock() is stupid, but what was the > issue with non-interrupt atomic code? > Seems like a latency issue. commit f9e09977671b Author: Christoph Hellwig <hch@lst.de> Date: Mon Dec 12 16:44:23 2016 -0800 mm: turn vmap_purge_lock into a mutex The purge_lock spinlock causes high latencies with non RT kernel,
On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote: > Hi Andrew, > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > > > While I was doing zram testing, I found sometimes decompression failed > > > since the compression buffer was corrupted. With investigation, > > > I found below commit calls cond_resched unconditionally so it could > > > make a problem in atomic context if the task is reschedule. > > > > > > Revert the original commit for now. > > > > > > [ 55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108 > > > [ 55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog > > > [ 55.111973] 3 locks held by memhog/946: > > > [ 55.112807] #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160 > > > [ 55.114151] #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160 > > > [ 55.115848] #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0 > > > [ 55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316 > > > [ 55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > > [ 55.122540] Call Trace: > > > [ 55.122974] dump_stack+0x8b/0xb8 > > > [ 55.123588] ___might_sleep.cold+0xb6/0xc6 > > > [ 55.124328] unmap_kernel_range_noflush+0x2eb/0x350 > > > [ 55.125198] unmap_kernel_range+0x14/0x30 > > > [ 55.125920] zs_unmap_object+0xd5/0xe0 > > > [ 55.126604] zram_bvec_rw.isra.0+0x38c/0x8e0 > > > [ 55.127462] zram_rw_page+0x90/0x101 > > > [ 55.128199] bdev_write_page+0x92/0xe0 > > > [ 55.128957] ? swap_slot_free_notify+0xb0/0xb0 > > > [ 55.129841] __swap_writepage+0x94/0x4a0 > > > [ 55.130636] ? do_raw_spin_unlock+0x4b/0xa0 > > > [ 55.131462] ? _raw_spin_unlock+0x1f/0x30 > > > [ 55.132261] ? page_swapcount+0x6c/0x90 > > > [ 55.133038] pageout+0xe3/0x3a0 > > > [ 55.133702] shrink_page_list+0xb94/0xd60 > > > [ 55.134626] shrink_inactive_list+0x158/0x460 > > > > > > ... > > > > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -102,8 +102,6 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > > > if (pmd_none_or_clear_bad(pmd)) > > > continue; > > > vunmap_pte_range(pmd, addr, next, mask); > > > - > > > - cond_resched(); > > > } while (pmd++, addr = next, addr != end); > > > } > > > > If this is triggering a warning then why isn't the might_sleep() in > > remove_vm_area() also triggering? > > I don't understand what specific callpath you are talking but if > it's clearly called in atomic context, the reason would be config > combination I met. > > CONFIG_PREEMPT_VOLUNTARY + no CONFIG_DEBUG_ATOMIC_SLEEP > > It makes preempt_count logic void so might_sleep warning will not work. > > > > > Sigh. I also cannot remember why these vfree() functions have to be so > > awkward. The mutex_trylock(&vmap_purge_lock) isn't permitted in > > interrupt context because mutex_trylock() is stupid, but what was the > > issue with non-interrupt atomic code? > > > > Seems like a latency issue. > > commit f9e09977671b > Author: Christoph Hellwig <hch@lst.de> > Date: Mon Dec 12 16:44:23 2016 -0800 > > mm: turn vmap_purge_lock into a mutex > > The purge_lock spinlock causes high latencies with non RT kernel, > __purge_vmap_area_lazy() is our bottleneck. The list of areas to be drained can be quite big, so the process itself is time consuming. That is why it is protected by the mutex instead of spinlock. Latency issues. I proposed to optimize it here: https://lkml.org/lkml/2020/9/7/815 I will send out the patch soon, but it is another topic not related to this issue. -- Vlad Rezki
Hi Andrew, How should we proceed this problem? On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote: > Hi Andrew, > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > > > While I was doing zram testing, I found sometimes decompression failed > > > since the compression buffer was corrupted. With investigation, > > > I found below commit calls cond_resched unconditionally so it could > > > make a problem in atomic context if the task is reschedule. > > > > > > Revert the original commit for now. > > > > > > [ 55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108 > > > [ 55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog > > > [ 55.111973] 3 locks held by memhog/946: > > > [ 55.112807] #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160 > > > [ 55.114151] #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160 > > > [ 55.115848] #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0 > > > [ 55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316 > > > [ 55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > > > [ 55.122540] Call Trace: > > > [ 55.122974] dump_stack+0x8b/0xb8 > > > [ 55.123588] ___might_sleep.cold+0xb6/0xc6 > > > [ 55.124328] unmap_kernel_range_noflush+0x2eb/0x350 > > > [ 55.125198] unmap_kernel_range+0x14/0x30 > > > [ 55.125920] zs_unmap_object+0xd5/0xe0 > > > [ 55.126604] zram_bvec_rw.isra.0+0x38c/0x8e0 > > > [ 55.127462] zram_rw_page+0x90/0x101 > > > [ 55.128199] bdev_write_page+0x92/0xe0 > > > [ 55.128957] ? swap_slot_free_notify+0xb0/0xb0 > > > [ 55.129841] __swap_writepage+0x94/0x4a0 > > > [ 55.130636] ? do_raw_spin_unlock+0x4b/0xa0 > > > [ 55.131462] ? _raw_spin_unlock+0x1f/0x30 > > > [ 55.132261] ? page_swapcount+0x6c/0x90 > > > [ 55.133038] pageout+0xe3/0x3a0 > > > [ 55.133702] shrink_page_list+0xb94/0xd60 > > > [ 55.134626] shrink_inactive_list+0x158/0x460 > > > > > > ... > > > > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -102,8 +102,6 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > > > if (pmd_none_or_clear_bad(pmd)) > > > continue; > > > vunmap_pte_range(pmd, addr, next, mask); > > > - > > > - cond_resched(); > > > } while (pmd++, addr = next, addr != end); > > > } > > > > If this is triggering a warning then why isn't the might_sleep() in > > remove_vm_area() also triggering? > > I don't understand what specific callpath you are talking but if > it's clearly called in atomic context, the reason would be config > combination I met. > > CONFIG_PREEMPT_VOLUNTARY + no CONFIG_DEBUG_ATOMIC_SLEEP > > It makes preempt_count logic void so might_sleep warning will not work. > > > > > Sigh. I also cannot remember why these vfree() functions have to be so > > awkward. The mutex_trylock(&vmap_purge_lock) isn't permitted in > > interrupt context because mutex_trylock() is stupid, but what was the > > issue with non-interrupt atomic code? > > > > Seems like a latency issue. > > commit f9e09977671b > Author: Christoph Hellwig <hch@lst.de> > Date: Mon Dec 12 16:44:23 2016 -0800 > > mm: turn vmap_purge_lock into a mutex > > The purge_lock spinlock causes high latencies with non RT kernel, >
On Thu, 12 Nov 2020 12:01:01 -0800 Minchan Kim <minchan@kernel.org> wrote: > > On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote: > > Hi Andrew, > > > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > > > > > While I was doing zram testing, I found sometimes decompression failed > > > > since the compression buffer was corrupted. With investigation, > > > > I found below commit calls cond_resched unconditionally so it could > > > > make a problem in atomic context if the task is reschedule. > > > > > > > > Revert the original commit for now. > > How should we proceed this problem? > (top-posting repaired - please don't). Well, we don't want to reintroduce the softlockup reports which e47110e90584a2 fixed, and we certainly don't want to do that on behalf of code which is using the unmap_kernel_range() interface incorrectly. So I suggest either a) make zs_unmap_object() stop doing the unmapping from atomic context or b) figure out whether the vmalloc unmap code is *truly* unsafe from atomic context - perhaps it is only unsafe from interrupt context, in which case we can rework the vmalloc.c checks to reflect this, or c) make the vfree code callable from all contexts. Which is by far the preferred solution, but may be tough. Or maybe not so tough - if the only problem in the vmalloc code is the use of mutex_trylock() from irqs then it may be as simple as switching to old-style semaphores and using down_trylock(), which I think is irq-safe. However old-style semaphores are deprecated. A hackyish approach might be to use an rwsem always in down_write mode and use down_write_trylock(), which I think is also callable from interrrupt context. But I have a feeling that there are other reasons why vfree shouldn't be called from atomic context, apart from the mutex_trylock-in-irq issue.
On Thu, Nov 12, 2020 at 02:49:19PM -0800, Andrew Morton wrote: > On Thu, 12 Nov 2020 12:01:01 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote: > > > Hi Andrew, > > > > > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > > > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > > > > > > > While I was doing zram testing, I found sometimes decompression failed > > > > > since the compression buffer was corrupted. With investigation, > > > > > I found below commit calls cond_resched unconditionally so it could > > > > > make a problem in atomic context if the task is reschedule. > > > > > > > > > > Revert the original commit for now. > > > > How should we proceed this problem? > > > > (top-posting repaired - please don't). > > Well, we don't want to reintroduce the softlockup reports which > e47110e90584a2 fixed, and we certainly don't want to do that on behalf > of code which is using the unmap_kernel_range() interface incorrectly. > > So I suggest either > > a) make zs_unmap_object() stop doing the unmapping from atomic context or It's not easy since the path already hold several spin_locks as well as per-cpu context. I could pursuit the direction but it takes several steps to change entire locking scheme in the zsmalloc, which will take time(we couldn't leave zsmalloc broken until then) and hard to land on stable tree. > > b) figure out whether the vmalloc unmap code is *truly* unsafe from > atomic context - perhaps it is only unsafe from interrupt context, > in which case we can rework the vmalloc.c checks to reflect this, or I don't get the point. I assume your suggestion would be "let's make the vunmap code atomic context safe" but how could it solve this problem? The point from e47110e90584a2 was softlockup could be triggered if vunamp deal with large mapping so need *explict reschedule* point for CONFIG_PREEMPT_VOLUNTARY. However, CONFIG_PREEMPT_VOLUNTARY doesn't consider peempt count so even though we could make vunamp atomic safe to make a call under spin_lock: spin_lock(&A); vunmap vunmap_pmd_range cond_resched <- bang Below options would have same problem, too. Let me know if I misunderstand something. > > c) make the vfree code callable from all contexts. Which is by far > the preferred solution, but may be tough. > > > Or maybe not so tough - if the only problem in the vmalloc code is the > use of mutex_trylock() from irqs then it may be as simple as switching > to old-style semaphores and using down_trylock(), which I think is > irq-safe. > > However old-style semaphores are deprecated. A hackyish approach might > be to use an rwsem always in down_write mode and use > down_write_trylock(), which I think is also callable from interrrupt > context. > > But I have a feeling that there are other reasons why vfree shouldn't > be called from atomic context, apart from the mutex_trylock-in-irq > issue.
On Fri, Nov 13, 2020 at 08:25:29AM -0800, Minchan Kim wrote: > On Thu, Nov 12, 2020 at 02:49:19PM -0800, Andrew Morton wrote: > > On Thu, 12 Nov 2020 12:01:01 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > > > > On Sat, Nov 07, 2020 at 12:39:39AM -0800, Minchan Kim wrote: > > > > Hi Andrew, > > > > > > > > On Fri, Nov 06, 2020 at 05:59:33PM -0800, Andrew Morton wrote: > > > > > On Thu, 5 Nov 2020 09:02:49 -0800 Minchan Kim <minchan@kernel.org> wrote: > > > > > > > > > > > This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. > > > > > > > > > > > > While I was doing zram testing, I found sometimes decompression failed > > > > > > since the compression buffer was corrupted. With investigation, > > > > > > I found below commit calls cond_resched unconditionally so it could > > > > > > make a problem in atomic context if the task is reschedule. > > > > > > > > > > > > Revert the original commit for now. > > > > > > How should we proceed this problem? > > > > > > > (top-posting repaired - please don't). > > > > Well, we don't want to reintroduce the softlockup reports which > > e47110e90584a2 fixed, and we certainly don't want to do that on behalf > > of code which is using the unmap_kernel_range() interface incorrectly. > > > > So I suggest either > > > > a) make zs_unmap_object() stop doing the unmapping from atomic context or > > It's not easy since the path already hold several spin_locks as well as > per-cpu context. I could pursuit the direction but it takes several > steps to change entire locking scheme in the zsmalloc, which will > take time(we couldn't leave zsmalloc broken until then) and hard to > land on stable tree. > > > > > b) figure out whether the vmalloc unmap code is *truly* unsafe from > > atomic context - perhaps it is only unsafe from interrupt context, > > in which case we can rework the vmalloc.c checks to reflect this, or > > I don't get the point. I assume your suggestion would be "let's make the > vunmap code atomic context safe" but how could it solve this problem? > > The point from e47110e90584a2 was softlockup could be triggered if > vunamp deal with large mapping so need *explict reschedule* point > for CONFIG_PREEMPT_VOLUNTARY. However, CONFIG_PREEMPT_VOLUNTARY > doesn't consider peempt count so even though we could make vunamp > atomic safe to make a call under spin_lock: > > spin_lock(&A); > vunmap > vunmap_pmd_range > cond_resched <- bang > > Below options would have same problem, too. > Let me know if I misunderstand something. > > > > > c) make the vfree code callable from all contexts. Which is by far > > the preferred solution, but may be tough. > > > > > > Or maybe not so tough - if the only problem in the vmalloc code is the > > use of mutex_trylock() from irqs then it may be as simple as switching > > to old-style semaphores and using down_trylock(), which I think is > > irq-safe. > > > > However old-style semaphores are deprecated. A hackyish approach might > > be to use an rwsem always in down_write mode and use > > down_write_trylock(), which I think is also callable from interrrupt > > context. > > > > But I have a feeling that there are other reasons why vfree shouldn't > > be called from atomic context, apart from the mutex_trylock-in-irq > > issue. How about this? From 0733bc468d0072147c2ecf998d7cc3af2234bc87 Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@kernel.org> Date: Mon, 16 Nov 2020 09:38:40 -0800 Subject: [PATCH] mm: unmap_kernel_range_atomic unmap_kernel_range had been atomic operation and zsmalloc has used it in atomic context in zs_unmap_object. However, ("e47110e90584, mm/vunmap: add cond_resched() in vunmap_pmd_range") changed it into non-atomic operation via adding cond_resched. It causes zram decompresion failure by corrupting compressed buffer in atomic context. This patch introduces unmap_kernel_range_atomic which works for only range less than PMD_SIZE to prevent cond_resched call. Signed-off-by: Minchan Kim <minchan@kernel.org> --- include/linux/vmalloc.h | 2 ++ mm/vmalloc.c | 23 +++++++++++++++++++++-- mm/zsmalloc.c | 2 +- 3 files changed, 24 insertions(+), 3 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 938eaf9517e2..36b1ecc2d014 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -180,6 +180,7 @@ int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot, struct page **pages); extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size); extern void unmap_kernel_range(unsigned long addr, unsigned long size); +extern void unmap_kernel_range_atomic(unsigned long addr, unsigned long size); static inline void set_vm_flush_reset_perms(void *addr) { struct vm_struct *vm = find_vm_area(addr); @@ -200,6 +201,7 @@ unmap_kernel_range_noflush(unsigned long addr, unsigned long size) { } #define unmap_kernel_range unmap_kernel_range_noflush +#define unmap_kernel_range_atomic unmap_kernel_range_noflush static inline void set_vm_flush_reset_perms(void *addr) { } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d7075ad340aa..714e5425dc45 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -88,6 +88,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, pmd_t *pmd; unsigned long next; int cleared; + bool check_resched = (end - addr) > PMD_SIZE; pmd = pmd_offset(pud, addr); do { @@ -102,8 +103,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); - - cond_resched(); + if (check_resched) + cond_resched(); } while (pmd++, addr = next, addr != end); } @@ -2024,6 +2025,24 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) flush_tlb_kernel_range(addr, end); } +/** + * unmap_kernel_range_atomic - unmap kernel VM area and flush cache and TLB + * @addr: start of the VM area to unmap + * @size: size of the VM area to unmap + * + * Similar to unmap_kernel_range_noflush() but it's atomic. @size should be + * less than PMD_SIZE. + */ +void unmap_kernel_range_atomic(unsigned long addr, unsigned long size) +{ + unsigned long end = addr + size; + + flush_cache_vunmap(addr, end); + WARN_ON(size > PMD_SIZE); + unmap_kernel_range_noflush(addr, size); + flush_tlb_kernel_range(addr, end); +} + static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 662ee420706f..9decc7634852 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1154,7 +1154,7 @@ static inline void __zs_unmap_object(struct mapping_area *area, { unsigned long addr = (unsigned long)area->vm_addr; - unmap_kernel_range(addr, PAGE_SIZE * 2); + unmap_kernel_range_atomic(addr, PAGE_SIZE * 2); } #else /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
Let's cc Uladzislau on vmalloc things? > How about this? Well, lol, that's a simple approach to avoiding the problem ;) > unmap_kernel_range had been atomic operation and zsmalloc has used > it in atomic context in zs_unmap_object. > However, ("e47110e90584, mm/vunmap: add cond_resched() in vunmap_pmd_range") > changed it into non-atomic operation via adding cond_resched. > It causes zram decompresion failure by corrupting compressed buffer > in atomic context. > > This patch introduces unmap_kernel_range_atomic which works for > only range less than PMD_SIZE to prevent cond_resched call. > > ... > > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -180,6 +180,7 @@ int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot, > struct page **pages); > extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size); > extern void unmap_kernel_range(unsigned long addr, unsigned long size); > +extern void unmap_kernel_range_atomic(unsigned long addr, unsigned long size); > static inline void set_vm_flush_reset_perms(void *addr) > { > struct vm_struct *vm = find_vm_area(addr); > @@ -200,6 +201,7 @@ unmap_kernel_range_noflush(unsigned long addr, unsigned long size) > { > } > #define unmap_kernel_range unmap_kernel_range_noflush > +#define unmap_kernel_range_atomic unmap_kernel_range_noflush > static inline void set_vm_flush_reset_perms(void *addr) > { > } > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d7075ad340aa..714e5425dc45 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -88,6 +88,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > pmd_t *pmd; > unsigned long next; > int cleared; > + bool check_resched = (end - addr) > PMD_SIZE; > > pmd = pmd_offset(pud, addr); > do { > @@ -102,8 +103,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > if (pmd_none_or_clear_bad(pmd)) > continue; > vunmap_pte_range(pmd, addr, next, mask); > - > - cond_resched(); > + if (check_resched) > + cond_resched(); > } while (pmd++, addr = next, addr != end); > } > > @@ -2024,6 +2025,24 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) > flush_tlb_kernel_range(addr, end); > } > > +/** > + * unmap_kernel_range_atomic - unmap kernel VM area and flush cache and TLB > + * @addr: start of the VM area to unmap > + * @size: size of the VM area to unmap > + * > + * Similar to unmap_kernel_range_noflush() but it's atomic. @size should be > + * less than PMD_SIZE. > + */ > +void unmap_kernel_range_atomic(unsigned long addr, unsigned long size) > +{ > + unsigned long end = addr + size; > + > + flush_cache_vunmap(addr, end); > + WARN_ON(size > PMD_SIZE); WARN_ON_ONCE() would be better here - no point in creating a million warnings where one would suffice. > + unmap_kernel_range_noflush(addr, size); > + flush_tlb_kernel_range(addr, end); > +} > + > static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, > struct vmap_area *va, unsigned long flags, const void *caller) > { > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > index 662ee420706f..9decc7634852 100644 > --- a/mm/zsmalloc.c > +++ b/mm/zsmalloc.c > @@ -1154,7 +1154,7 @@ static inline void __zs_unmap_object(struct mapping_area *area, > { > unsigned long addr = (unsigned long)area->vm_addr; > > - unmap_kernel_range(addr, PAGE_SIZE * 2); > + unmap_kernel_range_atomic(addr, PAGE_SIZE * 2); > } I suppose we could live with it if no better solutions are forthcoming.
Btw, I remember that the whole vmalloc magic in zsmalloc was only giving a small benefit for a few niche use cases. Given that it generally has very strange interaction with the vmalloc core, including using various APIs not used by any driver I'm going to ask once again why we can't just drop the CONFIG_ZSMALLOC_PGTABLE_MAPPING case entirely?
> > Let's cc Uladzislau on vmalloc things? > > > How about this? > > Well, lol, that's a simple approach to avoiding the problem ;) > To me it looks like a specific workaround for a specific one user. > > unmap_kernel_range had been atomic operation and zsmalloc has used > > it in atomic context in zs_unmap_object. > > However, ("e47110e90584, mm/vunmap: add cond_resched() in vunmap_pmd_range") > > changed it into non-atomic operation via adding cond_resched. > > It causes zram decompresion failure by corrupting compressed buffer > > in atomic context. > > > > This patch introduces unmap_kernel_range_atomic which works for > > only range less than PMD_SIZE to prevent cond_resched call. > > > > ... > > > > --- a/include/linux/vmalloc.h > > +++ b/include/linux/vmalloc.h > > @@ -180,6 +180,7 @@ int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot, > > struct page **pages); > > extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size); > > extern void unmap_kernel_range(unsigned long addr, unsigned long size); > > +extern void unmap_kernel_range_atomic(unsigned long addr, unsigned long size); > > static inline void set_vm_flush_reset_perms(void *addr) > > { > > struct vm_struct *vm = find_vm_area(addr); > > @@ -200,6 +201,7 @@ unmap_kernel_range_noflush(unsigned long addr, unsigned long size) > > { > > } > > #define unmap_kernel_range unmap_kernel_range_noflush > > +#define unmap_kernel_range_atomic unmap_kernel_range_noflush > > static inline void set_vm_flush_reset_perms(void *addr) > > { > > } > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > index d7075ad340aa..714e5425dc45 100644 > > --- a/mm/vmalloc.c > > +++ b/mm/vmalloc.c > > @@ -88,6 +88,7 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > > pmd_t *pmd; > > unsigned long next; > > int cleared; > > + bool check_resched = (end - addr) > PMD_SIZE; > > > > pmd = pmd_offset(pud, addr); > > do { > > @@ -102,8 +103,8 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, > > if (pmd_none_or_clear_bad(pmd)) > > continue; > > vunmap_pte_range(pmd, addr, next, mask); > > - > > - cond_resched(); > > + if (check_resched) > > + cond_resched(); > > } while (pmd++, addr = next, addr != end); > > } > > > > @@ -2024,6 +2025,24 @@ void unmap_kernel_range(unsigned long addr, unsigned long size) > > flush_tlb_kernel_range(addr, end); > > } > > > > +/** > > + * unmap_kernel_range_atomic - unmap kernel VM area and flush cache and TLB > > + * @addr: start of the VM area to unmap > > + * @size: size of the VM area to unmap > > + * > > + * Similar to unmap_kernel_range_noflush() but it's atomic. @size should be > > + * less than PMD_SIZE. > > + */ > > +void unmap_kernel_range_atomic(unsigned long addr, unsigned long size) > > +{ > > + unsigned long end = addr + size; > > + > > + flush_cache_vunmap(addr, end); > > + WARN_ON(size > PMD_SIZE); > > WARN_ON_ONCE() would be better here - no point in creating a million > warnings where one would suffice. > > > + unmap_kernel_range_noflush(addr, size); > > + flush_tlb_kernel_range(addr, end); > > +} > > + > > static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, > > struct vmap_area *va, unsigned long flags, const void *caller) > > { > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c > > index 662ee420706f..9decc7634852 100644 > > --- a/mm/zsmalloc.c > > +++ b/mm/zsmalloc.c > > @@ -1154,7 +1154,7 @@ static inline void __zs_unmap_object(struct mapping_area *area, > > { > > unsigned long addr = (unsigned long)area->vm_addr; > > > > - unmap_kernel_range(addr, PAGE_SIZE * 2); > > + unmap_kernel_range_atomic(addr, PAGE_SIZE * 2); > > } > > I suppose we could live with it if no better solutions are forthcoming. > Maybe solve it on zsmalloc side? For example to add __zs_unmap_object_deferred(), so it schedules the work that calls unmap_kernel_range() on a list of mapping_area objects. -- Vlad Rezki
On Tue, Nov 17, 2020 at 01:56:32PM +0000, Christoph Hellwig wrote: > Btw, I remember that the whole vmalloc magic in zsmalloc was only giving > a small benefit for a few niche use cases. Given that it generally has > very strange interaction with the vmalloc core, including using various > APIs not used by any driver I'm going to ask once again why we can't > just drop the CONFIG_ZSMALLOC_PGTABLE_MAPPING case entirely? Yub, I remeber the discussion. https://lore.kernel.org/linux-mm/20200416203736.GB50092@google.com/ I wanted to remove it but 30% gain made me think again before deciding to drop it. Since it continue to make problems and Linux is approaching to deprecate the 32bit machines, I think it would be better to drop it rather than inventing weird workaround. Ccing Tony since omap2plus have used it by default for just in case. From fc1b17a120991fd86b9e1153ab22d0b0bdadd8d0 Mon Sep 17 00:00:00 2001 From: Minchan Kim <minchan@kernel.org> Date: Tue, 17 Nov 2020 11:58:51 -0800 Subject: [PATCH] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING Even though this option showed some amount improvement(e.g., 30%) in some arm32 platforms, it has been headache to maintain since it have abused APIs[1](e.g., unmap_kernel_range in atomic context). Since we are approaching to deprecate 32bit machines and already made the config option available for only builtin build since v5.8, lastly it has been not default option in zsmalloc, it's time to drop the option for better maintainance. [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org Cc: Tony Lindgren <tony@atomide.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range") Signed-off-by: Minchan Kim <minchan@kernel.org> --- arch/arm/configs/omap2plus_defconfig | 1 - include/linux/zsmalloc.h | 1 - mm/Kconfig | 13 ------- mm/zsmalloc.c | 54 ---------------------------- 4 files changed, 69 deletions(-) diff --git a/arch/arm/configs/omap2plus_defconfig b/arch/arm/configs/omap2plus_defconfig index 77716f500813..70ee84a314c8 100644 --- a/arch/arm/configs/omap2plus_defconfig +++ b/arch/arm/configs/omap2plus_defconfig @@ -81,7 +81,6 @@ CONFIG_PARTITION_ADVANCED=y CONFIG_BINFMT_MISC=y CONFIG_CMA=y CONFIG_ZSMALLOC=m -CONFIG_ZSMALLOC_PGTABLE_MAPPING=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h index 0fdbf653b173..4807ca4d52e0 100644 --- a/include/linux/zsmalloc.h +++ b/include/linux/zsmalloc.h @@ -20,7 +20,6 @@ * zsmalloc mapping modes * * NOTE: These only make a difference when a mapped object spans pages. - * They also have no effect when ZSMALLOC_PGTABLE_MAPPING is selected. */ enum zs_mapmode { ZS_MM_RW, /* normal read-write mapping */ diff --git a/mm/Kconfig b/mm/Kconfig index 29904dc16bfc..7af1a55b708e 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -707,19 +707,6 @@ config ZSMALLOC returned by an alloc(). This handle must be mapped in order to access the allocated space. -config ZSMALLOC_PGTABLE_MAPPING - bool "Use page table mapping to access object in zsmalloc" - depends on ZSMALLOC=y - help - By default, zsmalloc uses a copy-based object mapping method to - access allocations that span two pages. However, if a particular - architecture (ex, ARM) performs VM mapping faster than copying, - then you should select this. This causes zsmalloc to use page table - mapping rather than copying for object mapping. - - You can check speed with zsmalloc benchmark: - https://github.com/spartacus06/zsmapbench - config ZSMALLOC_STAT bool "Export zsmalloc statistics" depends on ZSMALLOC diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index b03bee2e1b5f..7289f502ffac 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -293,11 +293,7 @@ struct zspage { }; struct mapping_area { -#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING - struct vm_struct *vm; /* vm area for mapping object that span pages */ -#else char *vm_buf; /* copy buffer for objects that span pages */ -#endif char *vm_addr; /* address of kmap_atomic()'ed pages */ enum zs_mapmode vm_mm; /* mapping mode */ }; @@ -1110,54 +1106,6 @@ static struct zspage *find_get_zspage(struct size_class *class) return zspage; } -#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING -static inline int __zs_cpu_up(struct mapping_area *area) -{ - /* - * Make sure we don't leak memory if a cpu UP notification - * and zs_init() race and both call zs_cpu_up() on the same cpu - */ - if (area->vm) - return 0; - area->vm = get_vm_area(PAGE_SIZE * 2, 0); - if (!area->vm) - return -ENOMEM; - - /* - * Populate ptes in advance to avoid pte allocation with GFP_KERNEL - * in non-preemtible context of zs_map_object. - */ - return apply_to_page_range(&init_mm, (unsigned long)area->vm->addr, - PAGE_SIZE * 2, NULL, NULL); -} - -static inline void __zs_cpu_down(struct mapping_area *area) -{ - if (area->vm) - free_vm_area(area->vm); - area->vm = NULL; -} - -static inline void *__zs_map_object(struct mapping_area *area, - struct page *pages[2], int off, int size) -{ - unsigned long addr = (unsigned long)area->vm->addr; - - BUG_ON(map_kernel_range(addr, PAGE_SIZE * 2, PAGE_KERNEL, pages) < 0); - area->vm_addr = area->vm->addr; - return area->vm_addr + off; -} - -static inline void __zs_unmap_object(struct mapping_area *area, - struct page *pages[2], int off, int size) -{ - unsigned long addr = (unsigned long)area->vm_addr; - - unmap_kernel_range(addr, PAGE_SIZE * 2); -} - -#else /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */ - static inline int __zs_cpu_up(struct mapping_area *area) { /* @@ -1238,8 +1186,6 @@ static void __zs_unmap_object(struct mapping_area *area, pagefault_enable(); } -#endif /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */ - static int zs_cpu_prepare(unsigned int cpu) { struct mapping_area *area;
On (20/11/17 12:29), Minchan Kim wrote: > Yub, I remeber the discussion. > https://lore.kernel.org/linux-mm/20200416203736.GB50092@google.com/ > > I wanted to remove it but 30% gain made me think again before > deciding to drop it. > Since it continue to make problems and Linux is approaching to > deprecate the 32bit machines, I think it would be better to drop it > rather than inventing weird workaround. > > Ccing Tony since omap2plus have used it by default for just in case. > > From fc1b17a120991fd86b9e1153ab22d0b0bdadd8d0 Mon Sep 17 00:00:00 2001 > From: Minchan Kim <minchan@kernel.org> > Date: Tue, 17 Nov 2020 11:58:51 -0800 > Subject: [PATCH] mm/zsmalloc.c: drop ZSMALLOC_PGTABLE_MAPPING > > Even though this option showed some amount improvement(e.g., 30%) > in some arm32 platforms, it has been headache to maintain since it > have abused APIs[1](e.g., unmap_kernel_range in atomic context). > > Since we are approaching to deprecate 32bit machines and already made > the config option available for only builtin build since v5.8, lastly > it has been not default option in zsmalloc, it's time to drop the > option for better maintainance. > > [1] http://lore.kernel.org/linux-mm/20201105170249.387069-1-minchan@kernel.org > > Cc: Tony Lindgren <tony@atomide.com> > Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: <stable@vger.kernel.org> > Fixes: e47110e90584 ("mm/vunmap: add cond_resched() in vunmap_pmd_range") > Signed-off-by: Minchan Kim <minchan@kernel.org> Looks good to me. Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> -ss
* Sergey Senozhatsky <sergey.senozhatsky@gmail.com> [201118 02:07]: > On (20/11/17 12:29), Minchan Kim wrote: > > Yub, I remeber the discussion. > > https://lore.kernel.org/linux-mm/20200416203736.GB50092@google.com/ > > > > I wanted to remove it but 30% gain made me think again before > > deciding to drop it. > > Since it continue to make problems and Linux is approaching to > > deprecate the 32bit machines, I think it would be better to drop it > > rather than inventing weird workaround. > > > > Ccing Tony since omap2plus have used it by default for just in case. Looks like make omap2lus_defconfig is not selecting it anyways and I never even noticed earlier: Acked-by: Tony Lindgren <tony@atomide.com>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6ae491a8b210..a4f1d39ce710 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -102,8 +102,6 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); - - cond_resched(); } while (pmd++, addr = next, addr != end); }
This reverts commit e47110e90584a22e9980510b00d0dfad3a83354e. While I was doing zram testing, I found sometimes decompression failed since the compression buffer was corrupted. With investigation, I found below commit calls cond_resched unconditionally so it could make a problem in atomic context if the task is reschedule. Revert the original commit for now. [ 55.109012] BUG: sleeping function called from invalid context at mm/vmalloc.c:108^M [ 55.110774] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 946, name: memhog^M [ 55.111973] 3 locks held by memhog/946:^M [ 55.112807] #0: ffff9d01d4b193e8 (&mm->mmap_lock#2){++++}-{4:4}, at: __mm_populate+0x103/0x160^M [ 55.114151] #1: ffffffffa3d53de0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0xa98/0x1160^M [ 55.115848] #2: ffff9d01d56b8110 (&zspage->lock){.+.+}-{3:3}, at: zs_map_object+0x8e/0x1f0^M [ 55.118947] CPU: 0 PID: 946 Comm: memhog Not tainted 5.9.3-00011-gc5bfc0287345-dirty #316^M [ 55.121265] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014^M [ 55.122540] Call Trace:^M [ 55.122974] dump_stack+0x8b/0xb8^M [ 55.123588] ___might_sleep.cold+0xb6/0xc6^M [ 55.124328] unmap_kernel_range_noflush+0x2eb/0x350^M [ 55.125198] unmap_kernel_range+0x14/0x30^M [ 55.125920] zs_unmap_object+0xd5/0xe0^M [ 55.126604] zram_bvec_rw.isra.0+0x38c/0x8e0^M [ 55.127462] zram_rw_page+0x90/0x101^M [ 55.128199] bdev_write_page+0x92/0xe0^M [ 55.128957] ? swap_slot_free_notify+0xb0/0xb0^M [ 55.129841] __swap_writepage+0x94/0x4a0^M [ 55.130636] ? do_raw_spin_unlock+0x4b/0xa0^M [ 55.131462] ? _raw_spin_unlock+0x1f/0x30^M [ 55.132261] ? page_swapcount+0x6c/0x90^M [ 55.133038] pageout+0xe3/0x3a0^M [ 55.133702] shrink_page_list+0xb94/0xd60^M [ 55.134626] shrink_inactive_list+0x158/0x460^M Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Harish Sriram <harish@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Minchan Kim <minchan@kernel.org> --- mm/vmalloc.c | 2 -- 1 file changed, 2 deletions(-)