Message ID | 20241021164837.2681358-1-roman.gushchin@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: page_alloc: move mlocked flag clearance into free_pages_prepare() | expand |
On 10/21/24 18:48, Roman Gushchin wrote: > Syzbot reported [1] a bad page state problem caused by a page > being freed using free_page() still having a mlocked flag at > free_pages_prepare() stage: > > BUG: Bad page state in process syz.0.15 pfn:1137bb > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb > flags: 0x400000000080000(mlocked|node=0|zone=1) > raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 > raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > page_owner tracks the page as allocated > page last allocated via order 0, migratetype Unmovable, gfp_mask > 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid > 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 > set_page_owner include/linux/page_owner.h:32 [inline] > post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 > prep_new_page mm/page_alloc.c:1545 [inline] > get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 > __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 > alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 > kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 > kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] > kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] > kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 > vfs_ioctl fs/ioctl.c:51 [inline] > __do_sys_ioctl fs/ioctl.c:907 [inline] > __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > page last free pid 951 tgid 951 stack trace: > reset_page_owner include/linux/page_owner.h:25 [inline] > free_pages_prepare mm/page_alloc.c:1108 [inline] > free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 > vfree+0x181/0x2e0 mm/vmalloc.c:3361 > delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 > process_one_work kernel/workqueue.c:3229 [inline] > process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 > worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 > kthread+0x2df/0x370 kernel/kthread.c:389 > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > The problem was originally introduced by > commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final > clearance"): it was handling focused on handling pagecache > and anonymous memory and wasn't suitable for lower level > get_page()/free_page() API's used for example by KVM, as with > this reproducer. Does that mean KVM is mlocking pages that are not pagecache nor anonymous, thus not LRU? How and why (and since when) is that done? > Fix it by moving the mlocked flag clearance down to > free_page_prepare(). > > The bug itself if fairly old and harmless (aside from generating these > warnings), so the stable backport is likely not justified. But since there's a Cc: stable below, it will be backported :) > Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 > Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> > Cc: <stable@vger.kernel.org> > Cc: Hugh Dickins <hughd@google.com> > Cc: Matthew Wilcox <willy@infradead.org> > --- > mm/page_alloc.c | 9 +++++++++ > mm/swap.c | 14 -------------- > 2 files changed, 9 insertions(+), 14 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bc55d39eb372..24200651ad92 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, > bool skip_kasan_poison = should_skip_kasan_poison(page); > bool init = want_init_on_free(); > bool compound = PageCompound(page); > + struct folio *folio = page_folio(page); > > VM_BUG_ON_PAGE(PageTail(page), page); > > @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, > if (memcg_kmem_online() && PageMemcgKmem(page)) > __memcg_kmem_uncharge_page(page, order); > > + if (unlikely(folio_test_mlocked(folio))) { > + long nr_pages = folio_nr_pages(folio); > + > + __folio_clear_mlocked(folio); > + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); > + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); > + } Why drop the useful comment? > + > if (unlikely(PageHWPoison(page)) && !order) { > /* Do not let hwpoison pages hit pcplists/buddy */ > reset_page_owner(page, order); > diff --git a/mm/swap.c b/mm/swap.c > index 835bdf324b76..7cd0f4719423 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, > lruvec_del_folio(*lruvecp, folio); > __folio_clear_lru_flags(folio); > } > - > - /* > - * In rare cases, when truncation or holepunching raced with > - * munlock after VM_LOCKED was cleared, Mlocked may still be > - * found set here. This does not indicate a problem, unless > - * "unevictable_pgs_cleared" appears worryingly large. > - */ > - if (unlikely(folio_test_mlocked(folio))) { > - long nr_pages = folio_nr_pages(folio); > - > - __folio_clear_mlocked(folio); > - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); > - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); > - } > } > > /*
On Mon, Oct 21, 2024 at 07:01:59PM +0200, Vlastimil Babka wrote: > On 10/21/24 18:48, Roman Gushchin wrote: > > Syzbot reported [1] a bad page state problem caused by a page > > being freed using free_page() still having a mlocked flag at > > free_pages_prepare() stage: > > > > BUG: Bad page state in process syz.0.15 pfn:1137bb > > page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb > > flags: 0x400000000080000(mlocked|node=0|zone=1) > > raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 > > raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 > > page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > > page_owner tracks the page as allocated > > page last allocated via order 0, migratetype Unmovable, gfp_mask > > 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid > > 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 > > set_page_owner include/linux/page_owner.h:32 [inline] > > post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 > > prep_new_page mm/page_alloc.c:1545 [inline] > > get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 > > __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 > > alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 > > kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 > > kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] > > kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] > > kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 > > vfs_ioctl fs/ioctl.c:51 [inline] > > __do_sys_ioctl fs/ioctl.c:907 [inline] > > __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 > > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > > do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 > > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > page last free pid 951 tgid 951 stack trace: > > reset_page_owner include/linux/page_owner.h:25 [inline] > > free_pages_prepare mm/page_alloc.c:1108 [inline] > > free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 > > vfree+0x181/0x2e0 mm/vmalloc.c:3361 > > delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 > > process_one_work kernel/workqueue.c:3229 [inline] > > process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 > > worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 > > kthread+0x2df/0x370 kernel/kthread.c:389 > > ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 > > > > The problem was originally introduced by > > commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final > > clearance"): it was handling focused on handling pagecache > > and anonymous memory and wasn't suitable for lower level > > get_page()/free_page() API's used for example by KVM, as with > > this reproducer. > > Does that mean KVM is mlocking pages that are not pagecache nor anonymous, > thus not LRU? How and why (and since when) is that done? KVM allows to mmap and mlock several pages allocated directly. Please, take a look at the reproducer: https://syzkaller.appspot.com/x/repro.c?x=1437939f980000 > > > Fix it by moving the mlocked flag clearance down to > > free_page_prepare(). > > > > The bug itself if fairly old and harmless (aside from generating these > > warnings), so the stable backport is likely not justified. > > But since there's a Cc: stable below, it will be backported :) My bad, I changed my mind in the last minute and added Cc: stable but forgot to drop this sentence. > > > Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 > > Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") > > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> > > Cc: <stable@vger.kernel.org> > > Cc: Hugh Dickins <hughd@google.com> > > Cc: Matthew Wilcox <willy@infradead.org> > > --- > > mm/page_alloc.c | 9 +++++++++ > > mm/swap.c | 14 -------------- > > 2 files changed, 9 insertions(+), 14 deletions(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index bc55d39eb372..24200651ad92 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, > > bool skip_kasan_poison = should_skip_kasan_poison(page); > > bool init = want_init_on_free(); > > bool compound = PageCompound(page); > > + struct folio *folio = page_folio(page); > > > > VM_BUG_ON_PAGE(PageTail(page), page); > > > > @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, > > if (memcg_kmem_online() && PageMemcgKmem(page)) > > __memcg_kmem_uncharge_page(page, order); > > > > + if (unlikely(folio_test_mlocked(folio))) { > > + long nr_pages = folio_nr_pages(folio); > > + > > + __folio_clear_mlocked(folio); > > + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); > > + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); > > + } > > Why drop the useful comment? Agree. Sounds like I need to restore the comment, drop no stable backport recommendation and send v2. Thank you for taking a look!
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc55d39eb372..24200651ad92 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1053,6 +1054,14 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..7cd0f4719423 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /*
Syzbot reported [1] a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.0.15 pfn:1137bb page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb flags: 0x400000000080000(mlocked|node=0|zone=1) raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e page last free pid 951 tgid 951 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 vfree+0x181/0x2e0 mm/vmalloc.c:3361 delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 kthread+0x2df/0x370 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), so the stable backport is likely not justified. Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> --- mm/page_alloc.c | 9 +++++++++ mm/swap.c | 14 -------------- 2 files changed, 9 insertions(+), 14 deletions(-)