Message ID | 1400163562-7481-1-git-send-email-msalter@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote: > The following happens when trying to run a kvm guest on a kernel > configured for 64k pages. This doesn't happen with 4k pages: > > BUG: failure at include/linux/mm.h:297/put_page_testzero()! > Kernel panic - not syncing: BUG! > CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1 > Call trace: > [<fffffe0000096034>] dump_backtrace+0x0/0x16c > [<fffffe00000961b4>] show_stack+0x14/0x1c > [<fffffe000066e648>] dump_stack+0x84/0xb0 > [<fffffe0000668678>] panic+0xf4/0x220 > [<fffffe000018ec78>] free_reserved_area+0x0/0x110 > [<fffffe000018edd8>] free_pages+0x50/0x88 > [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40 > [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44 > [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184 > [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c > [<fffffe00001edc1c>] __fput+0xb0/0x288 > [<fffffe00001ede4c>] ____fput+0xc/0x14 > [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c > [<fffffe0000095c14>] do_notify_resume+0x54/0x58 > > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() > on the stage2 pgd which leads to the BUG in put_page_testzero(). This > happens because a pud_huge() test in unmap_range() returns true when it > should always be false with 2-level pages tables used by 64k pages. > This patch removes support for huge puds if 2-level pagetables are > being used. Hi Mark, I'm still catching up with myself, sorry (was off sick for a couple of days)... I thought unmap_range was going to be changed? Does the following help things? https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html Cheers,
On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote: > On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote: > > The following happens when trying to run a kvm guest on a kernel > > configured for 64k pages. This doesn't happen with 4k pages: > > > > BUG: failure at include/linux/mm.h:297/put_page_testzero()! > > Kernel panic - not syncing: BUG! > > CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1 > > Call trace: > > [<fffffe0000096034>] dump_backtrace+0x0/0x16c > > [<fffffe00000961b4>] show_stack+0x14/0x1c > > [<fffffe000066e648>] dump_stack+0x84/0xb0 > > [<fffffe0000668678>] panic+0xf4/0x220 > > [<fffffe000018ec78>] free_reserved_area+0x0/0x110 > > [<fffffe000018edd8>] free_pages+0x50/0x88 > > [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40 > > [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44 > > [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184 > > [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c > > [<fffffe00001edc1c>] __fput+0xb0/0x288 > > [<fffffe00001ede4c>] ____fput+0xc/0x14 > > [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c > > [<fffffe0000095c14>] do_notify_resume+0x54/0x58 > > > > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() > > on the stage2 pgd which leads to the BUG in put_page_testzero(). This > > happens because a pud_huge() test in unmap_range() returns true when it > > should always be false with 2-level pages tables used by 64k pages. > > This patch removes support for huge puds if 2-level pagetables are > > being used. > > Hi Mark, > I'm still catching up with myself, sorry (was off sick for a couple > of days)... > > I thought unmap_range was going to be changed? > Does the following help things? > https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html No, I get the same BUG. Regardless, pud_huge() should always return false for 2-level pagetables, right?
On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote: > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote: >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote: >> > The following happens when trying to run a kvm guest on a kernel >> > configured for 64k pages. This doesn't happen with 4k pages: >> > >> > BUG: failure at include/linux/mm.h:297/put_page_testzero()! >> > Kernel panic - not syncing: BUG! >> > CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1 >> > Call trace: >> > [<fffffe0000096034>] dump_backtrace+0x0/0x16c >> > [<fffffe00000961b4>] show_stack+0x14/0x1c >> > [<fffffe000066e648>] dump_stack+0x84/0xb0 >> > [<fffffe0000668678>] panic+0xf4/0x220 >> > [<fffffe000018ec78>] free_reserved_area+0x0/0x110 >> > [<fffffe000018edd8>] free_pages+0x50/0x88 >> > [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40 >> > [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44 >> > [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184 >> > [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c >> > [<fffffe00001edc1c>] __fput+0xb0/0x288 >> > [<fffffe00001ede4c>] ____fput+0xc/0x14 >> > [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c >> > [<fffffe0000095c14>] do_notify_resume+0x54/0x58 >> > >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This >> > happens because a pud_huge() test in unmap_range() returns true when it >> > should always be false with 2-level pages tables used by 64k pages. >> > This patch removes support for huge puds if 2-level pagetables are >> > being used. >> >> Hi Mark, >> I'm still catching up with myself, sorry (was off sick for a couple >> of days)... >> >> I thought unmap_range was going to be changed? >> Does the following help things? >> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html > > No, I get the same BUG. Regardless, pud_huge() should always return > false for 2-level pagetables, right? Okay, thanks for giving that a go. Yeah I agree for 64K granule it doesn't make sense to have a huge_pud. The patch looks sound now, but checking for a folded pmd may run into problems if/when we get to 3-levels and 64K pages in future. Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a bit more robust? Cheers,
On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote: > On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote: > > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote: > >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote: > >> > The following happens when trying to run a kvm guest on a kernel > >> > configured for 64k pages. This doesn't happen with 4k pages: > >> > > >> > BUG: failure at include/linux/mm.h:297/put_page_testzero()! > >> > Kernel panic - not syncing: BUG! > >> > CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1 > >> > Call trace: > >> > [<fffffe0000096034>] dump_backtrace+0x0/0x16c > >> > [<fffffe00000961b4>] show_stack+0x14/0x1c > >> > [<fffffe000066e648>] dump_stack+0x84/0xb0 > >> > [<fffffe0000668678>] panic+0xf4/0x220 > >> > [<fffffe000018ec78>] free_reserved_area+0x0/0x110 > >> > [<fffffe000018edd8>] free_pages+0x50/0x88 > >> > [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40 > >> > [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44 > >> > [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184 > >> > [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c > >> > [<fffffe00001edc1c>] __fput+0xb0/0x288 > >> > [<fffffe00001ede4c>] ____fput+0xc/0x14 > >> > [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c > >> > [<fffffe0000095c14>] do_notify_resume+0x54/0x58 > >> > > >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() > >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This > >> > happens because a pud_huge() test in unmap_range() returns true when it > >> > should always be false with 2-level pages tables used by 64k pages. > >> > This patch removes support for huge puds if 2-level pagetables are > >> > being used. > >> > >> Hi Mark, > >> I'm still catching up with myself, sorry (was off sick for a couple > >> of days)... > >> > >> I thought unmap_range was going to be changed? > >> Does the following help things? > >> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html > > > > No, I get the same BUG. Regardless, pud_huge() should always return > > false for 2-level pagetables, right? > > Okay, thanks for giving that a go. > > Yeah I agree for 64K granule it doesn't make sense to have a huge_pud. > The patch looks sound now, but checking for a folded pmd may run into > problems if/when we get to 3-levels and 64K pages in future. > > Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a > bit more robust? > I don't think testing based on granule size is generally correct either. Maybe support for 3-level page tables with 64k granule gets added as an option. That would break the pagesize based test. With a folded pmd, we know there is no pud, so pud_huge() should always be false.
On Thu, May 15, 2014 at 07:39:17PM +0100, Mark Salter wrote: > On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote: > > On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote: > > > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote: > > >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote: > > >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() > > >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This > > >> > happens because a pud_huge() test in unmap_range() returns true when it > > >> > should always be false with 2-level pages tables used by 64k pages. > > >> > This patch removes support for huge puds if 2-level pagetables are > > >> > being used. [...] > > Yeah I agree for 64K granule it doesn't make sense to have a huge_pud. > > The patch looks sound now, but checking for a folded pmd may run into > > problems if/when we get to 3-levels and 64K pages in future. > > > > Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a > > bit more robust? > > I don't think testing based on granule size is generally correct either. > Maybe support for 3-level page tables with 64k granule gets added as an > option. That would break the pagesize based test. With a folded pmd, we > know there is no pud, so pud_huge() should always be false. I agree, pud_huge() should be false in the same way we define pud_present() to be 1 when __PGTABLE_PMD_FOLDED. The *_huge() macros aren't covered by the generic headers unfortunately (some clean-up would be useful at some point but for now this patch is fine).
On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote: > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > index 5e9aec3..9bed38f 100644 > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd) > > int pud_huge(pud_t pud) > { > +#ifndef __PAGETABLE_PMD_FOLDED > return !(pud_val(pud) & PUD_TABLE_BIT); > +#else > + return 0; > +#endif > } > > int pmd_huge_support(void) > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt) > unsigned long ps = memparse(opt, &opt); > if (ps == PMD_SIZE) { > hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > +#ifndef __PAGETABLE_PMD_FOLDED > } else if (ps == PUD_SIZE) { > hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > +#endif Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the #ifndef here? Maybe the compiler is smart enough to remove it but it's not on a critical path anyway, so I wouldn't bother.
On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote: > On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote: > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > > index 5e9aec3..9bed38f 100644 > > --- a/arch/arm64/mm/hugetlbpage.c > > +++ b/arch/arm64/mm/hugetlbpage.c > > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd) > > > > int pud_huge(pud_t pud) > > { > > +#ifndef __PAGETABLE_PMD_FOLDED > > return !(pud_val(pud) & PUD_TABLE_BIT); > > +#else > > + return 0; > > +#endif > > } > > > > int pmd_huge_support(void) > > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt) > > unsigned long ps = memparse(opt, &opt); > > if (ps == PMD_SIZE) { > > hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > > +#ifndef __PAGETABLE_PMD_FOLDED > > } else if (ps == PUD_SIZE) { > > hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > > +#endif > > Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the > #ifndef here? Maybe the compiler is smart enough to remove it but it's > not on a critical path anyway, so I wouldn't bother. > Yes, I think it would remove it. In any case, one less ifdef would be a good thing.
On Fri, May 16, 2014 at 04:54:11PM +0100, Mark Salter wrote: > On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote: > > On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote: > > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > > > index 5e9aec3..9bed38f 100644 > > > --- a/arch/arm64/mm/hugetlbpage.c > > > +++ b/arch/arm64/mm/hugetlbpage.c > > > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd) > > > > > > int pud_huge(pud_t pud) > > > { > > > +#ifndef __PAGETABLE_PMD_FOLDED > > > return !(pud_val(pud) & PUD_TABLE_BIT); > > > +#else > > > + return 0; > > > +#endif > > > } > > > > > > int pmd_huge_support(void) > > > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt) > > > unsigned long ps = memparse(opt, &opt); > > > if (ps == PMD_SIZE) { > > > hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); > > > +#ifndef __PAGETABLE_PMD_FOLDED > > > } else if (ps == PUD_SIZE) { > > > hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); > > > +#endif > > > > Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the > > #ifndef here? Maybe the compiler is smart enough to remove it but it's > > not on a critical path anyway, so I wouldn't bother. > > Yes, I think it would remove it. In any case, one less ifdef would be > a good thing. I merged this patch and dropped the last #ifndef. I still have doubts about the kvm code calling put_page more than necessary, especially since pud == pmd and the loop continues after pud_huge() returns true, but your patch looks harmless. Unless Steve has any objection, I'll push it to mainline. I also added Cc: stable # v3.11+ Thanks.
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 5e9aec3..9bed38f 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd) int pud_huge(pud_t pud) { +#ifndef __PAGETABLE_PMD_FOLDED return !(pud_val(pud) & PUD_TABLE_BIT); +#else + return 0; +#endif } int pmd_huge_support(void) @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt) unsigned long ps = memparse(opt, &opt); if (ps == PMD_SIZE) { hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT); +#ifndef __PAGETABLE_PMD_FOLDED } else if (ps == PUD_SIZE) { hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT); +#endif } else { pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20); return 0;
The following happens when trying to run a kvm guest on a kernel configured for 64k pages. This doesn't happen with 4k pages: BUG: failure at include/linux/mm.h:297/put_page_testzero()! Kernel panic - not syncing: BUG! CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF 3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1 Call trace: [<fffffe0000096034>] dump_backtrace+0x0/0x16c [<fffffe00000961b4>] show_stack+0x14/0x1c [<fffffe000066e648>] dump_stack+0x84/0xb0 [<fffffe0000668678>] panic+0xf4/0x220 [<fffffe000018ec78>] free_reserved_area+0x0/0x110 [<fffffe000018edd8>] free_pages+0x50/0x88 [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40 [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44 [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184 [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c [<fffffe00001edc1c>] __fput+0xb0/0x288 [<fffffe00001ede4c>] ____fput+0xc/0x14 [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c [<fffffe0000095c14>] do_notify_resume+0x54/0x58 In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page() on the stage2 pgd which leads to the BUG in put_page_testzero(). This happens because a pud_huge() test in unmap_range() returns true when it should always be false with 2-level pages tables used by 64k pages. This patch removes support for huge puds if 2-level pagetables are being used. Signed-off-by: Mark Salter <msalter@redhat.com> --- arch/arm64/mm/hugetlbpage.c | 6 ++++++ 1 file changed, 6 insertions(+)