Message ID | 20220210073111.61199-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/hugetlb: Fix kernel crash with hugetlb mremap | expand |
On 2/9/22 23:31, Aneesh Kumar K.V wrote: > This fixes the below crash: > > kernel BUG at include/linux/mm.h:2373! > cpu 0x5d: Vector: 700 (Program Check) at [c00000003c6e76e0] > pc: c000000000581a54: pmd_to_page+0x54/0x80 > lr: c00000000058d184: move_hugetlb_page_tables+0x4e4/0x5b0 > sp: c00000003c6e7980 > msr: 9000000000029033 > current = 0xc00000003bd8d980 > paca = 0xc000200fff610100 irqmask: 0x03 irq_happened: 0x01 > pid = 9349, comm = hugepage-mremap > kernel BUG at include/linux/mm.h:2373! > [link register ] c00000000058d184 move_hugetlb_page_tables+0x4e4/0x5b0 > [c00000003c6e7980] c00000000058cecc move_hugetlb_page_tables+0x22c/0x5b0 (unreliable) > [c00000003c6e7a90] c00000000053b78c move_page_tables+0xdbc/0x1010 > [c00000003c6e7bd0] c00000000053bc34 move_vma+0x254/0x5f0 > [c00000003c6e7c90] c00000000053c790 sys_mremap+0x7c0/0x900 > [c00000003c6e7db0] c00000000002c450 system_call_exception+0x160/0x2c0 > > the kernel can't use huge_pte_offset before it set the pte entry because a page table > lookup check for huge PTE bit in the page table to differentiate between a > huge pte entry and a pointer to pte page. A huge_pte_alloc won't mark the > page table entry huge and hence kernel should not use huge_pte_offset after > a huge_pte_alloc. Thanks Aneesh! Architectures that use the default version of huge_pte_offset (like X86) 'got away' with this because of the default return: pmd = pmd_offset(pud, addr); /* must be pmd huge, non-present or none */ return (pte_t *)pmd; > > Cc: Mina Almasry <almasrymina@google.com> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Should we add a Fixes: tag and cc stable? > --- > mm/hugetlb.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
On Thu, Feb 10, 2022 at 11:42 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 2/9/22 23:31, Aneesh Kumar K.V wrote: > > This fixes the below crash: > > > > kernel BUG at include/linux/mm.h:2373! > > cpu 0x5d: Vector: 700 (Program Check) at [c00000003c6e76e0] > > pc: c000000000581a54: pmd_to_page+0x54/0x80 > > lr: c00000000058d184: move_hugetlb_page_tables+0x4e4/0x5b0 > > sp: c00000003c6e7980 > > msr: 9000000000029033 > > current = 0xc00000003bd8d980 > > paca = 0xc000200fff610100 irqmask: 0x03 irq_happened: 0x01 > > pid = 9349, comm = hugepage-mremap > > kernel BUG at include/linux/mm.h:2373! > > [link register ] c00000000058d184 move_hugetlb_page_tables+0x4e4/0x5b0 > > [c00000003c6e7980] c00000000058cecc move_hugetlb_page_tables+0x22c/0x5b0 (unreliable) > > [c00000003c6e7a90] c00000000053b78c move_page_tables+0xdbc/0x1010 > > [c00000003c6e7bd0] c00000000053bc34 move_vma+0x254/0x5f0 > > [c00000003c6e7c90] c00000000053c790 sys_mremap+0x7c0/0x900 > > [c00000003c6e7db0] c00000000002c450 system_call_exception+0x160/0x2c0 > > > > the kernel can't use huge_pte_offset before it set the pte entry because a page table > > lookup check for huge PTE bit in the page table to differentiate between a > > huge pte entry and a pointer to pte page. A huge_pte_alloc won't mark the > > page table entry huge and hence kernel should not use huge_pte_offset after > > a huge_pte_alloc. > > Thanks Aneesh! > > Architectures that use the default version of huge_pte_offset (like X86) > 'got away' with this because of the default return: > > pmd = pmd_offset(pud, addr); > /* must be pmd huge, non-present or none */ > return (pte_t *)pmd; > > > > > Cc: Mina Almasry <almasrymina@google.com> > > Cc: Mike Kravetz <mike.kravetz@oracle.com> > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Thanks Aneesh! Reviewed-by: Mina Almasry <almasrymina@google.com> > > Should we add a Fixes: tag and cc stable? > Yes please if possible. > > --- > > mm/hugetlb.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> > -- > Mike Kravetz > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 61895cc01d09..e57650a9404f 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -4851,14 +4851,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, > > } > > > > static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, > > - unsigned long new_addr, pte_t *src_pte) > > + unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) > > { > > struct hstate *h = hstate_vma(vma); > > struct mm_struct *mm = vma->vm_mm; > > - pte_t *dst_pte, pte; > > spinlock_t *src_ptl, *dst_ptl; > > + pte_t pte; > > > > - dst_pte = huge_pte_offset(mm, new_addr, huge_page_size(h)); > > dst_ptl = huge_pte_lock(h, mm, dst_pte); > > src_ptl = huge_pte_lockptr(h, mm, src_pte); > > > > @@ -4917,7 +4916,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, > > if (!dst_pte) > > break; > > > > - move_huge_pte(vma, old_addr, new_addr, src_pte); > > + move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); > > } > > flush_tlb_range(vma, old_end - len, old_end); > > mmu_notifier_invalidate_range_end(&range); >
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 61895cc01d09..e57650a9404f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4851,14 +4851,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pte_t *src_pte) + unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) { struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; - pte_t *dst_pte, pte; spinlock_t *src_ptl, *dst_ptl; + pte_t pte; - dst_pte = huge_pte_offset(mm, new_addr, huge_page_size(h)); dst_ptl = huge_pte_lock(h, mm, dst_pte); src_ptl = huge_pte_lockptr(h, mm, src_pte); @@ -4917,7 +4916,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, if (!dst_pte) break; - move_huge_pte(vma, old_addr, new_addr, src_pte); + move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); } flush_tlb_range(vma, old_end - len, old_end); mmu_notifier_invalidate_range_end(&range);
This fixes the below crash: kernel BUG at include/linux/mm.h:2373! cpu 0x5d: Vector: 700 (Program Check) at [c00000003c6e76e0] pc: c000000000581a54: pmd_to_page+0x54/0x80 lr: c00000000058d184: move_hugetlb_page_tables+0x4e4/0x5b0 sp: c00000003c6e7980 msr: 9000000000029033 current = 0xc00000003bd8d980 paca = 0xc000200fff610100 irqmask: 0x03 irq_happened: 0x01 pid = 9349, comm = hugepage-mremap kernel BUG at include/linux/mm.h:2373! [link register ] c00000000058d184 move_hugetlb_page_tables+0x4e4/0x5b0 [c00000003c6e7980] c00000000058cecc move_hugetlb_page_tables+0x22c/0x5b0 (unreliable) [c00000003c6e7a90] c00000000053b78c move_page_tables+0xdbc/0x1010 [c00000003c6e7bd0] c00000000053bc34 move_vma+0x254/0x5f0 [c00000003c6e7c90] c00000000053c790 sys_mremap+0x7c0/0x900 [c00000003c6e7db0] c00000000002c450 system_call_exception+0x160/0x2c0 the kernel can't use huge_pte_offset before it set the pte entry because a page table lookup check for huge PTE bit in the page table to differentiate between a huge pte entry and a pointer to pte page. A huge_pte_alloc won't mark the page table entry huge and hence kernel should not use huge_pte_offset after a huge_pte_alloc. Cc: Mina Almasry <almasrymina@google.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- mm/hugetlb.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)