diff mbox

arm64: fix pud_huge() for 2-level pagetables

Message ID 1400163562-7481-1-git-send-email-msalter@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Salter May 15, 2014, 2:19 p.m. UTC
The following happens when trying to run a kvm guest on a kernel
configured for 64k pages. This doesn't happen with 4k pages:

  BUG: failure at include/linux/mm.h:297/put_page_testzero()!
  Kernel panic - not syncing: BUG!
  CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
  Call trace:
  [<fffffe0000096034>] dump_backtrace+0x0/0x16c
  [<fffffe00000961b4>] show_stack+0x14/0x1c
  [<fffffe000066e648>] dump_stack+0x84/0xb0
  [<fffffe0000668678>] panic+0xf4/0x220
  [<fffffe000018ec78>] free_reserved_area+0x0/0x110
  [<fffffe000018edd8>] free_pages+0x50/0x88
  [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
  [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
  [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
  [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
  [<fffffe00001edc1c>] __fput+0xb0/0x288
  [<fffffe00001ede4c>] ____fput+0xc/0x14
  [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
  [<fffffe0000095c14>] do_notify_resume+0x54/0x58

In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
on the stage2 pgd which leads to the BUG in put_page_testzero(). This
happens because a pud_huge() test in unmap_range() returns true when it
should always be false with 2-level pages tables used by 64k pages.
This patch removes support for huge puds if 2-level pagetables are
being used.

Signed-off-by: Mark Salter <msalter@redhat.com>
---
 arch/arm64/mm/hugetlbpage.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Steve Capper May 15, 2014, 2:44 p.m. UTC | #1
On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> The following happens when trying to run a kvm guest on a kernel
> configured for 64k pages. This doesn't happen with 4k pages:
> 
>   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
>   Kernel panic - not syncing: BUG!
>   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
>   Call trace:
>   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
>   [<fffffe00000961b4>] show_stack+0x14/0x1c
>   [<fffffe000066e648>] dump_stack+0x84/0xb0
>   [<fffffe0000668678>] panic+0xf4/0x220
>   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
>   [<fffffe000018edd8>] free_pages+0x50/0x88
>   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
>   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
>   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
>   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
>   [<fffffe00001edc1c>] __fput+0xb0/0x288
>   [<fffffe00001ede4c>] ____fput+0xc/0x14
>   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
>   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> 
> In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> happens because a pud_huge() test in unmap_range() returns true when it
> should always be false with 2-level pages tables used by 64k pages.
> This patch removes support for huge puds if 2-level pagetables are
> being used.

Hi Mark,
I'm still catching up with myself, sorry  (was off sick for a couple
of days)...

I thought unmap_range was going to be changed?
Does the following help things?
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html

Cheers,
Mark Salter May 15, 2014, 4:27 p.m. UTC | #2
On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> > The following happens when trying to run a kvm guest on a kernel
> > configured for 64k pages. This doesn't happen with 4k pages:
> > 
> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
> >   Kernel panic - not syncing: BUG!
> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
> >   Call trace:
> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
> >   [<fffffe0000668678>] panic+0xf4/0x220
> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
> >   [<fffffe000018edd8>] free_pages+0x50/0x88
> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> > 
> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> > happens because a pud_huge() test in unmap_range() returns true when it
> > should always be false with 2-level pages tables used by 64k pages.
> > This patch removes support for huge puds if 2-level pagetables are
> > being used.
> 
> Hi Mark,
> I'm still catching up with myself, sorry  (was off sick for a couple
> of days)...
> 
> I thought unmap_range was going to be changed?
> Does the following help things?
> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html

No, I get the same BUG. Regardless, pud_huge() should always return
false for 2-level pagetables, right?
Steve Capper May 15, 2014, 5:55 p.m. UTC | #3
On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
>> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
>> > The following happens when trying to run a kvm guest on a kernel
>> > configured for 64k pages. This doesn't happen with 4k pages:
>> >
>> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
>> >   Kernel panic - not syncing: BUG!
>> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
>> >   Call trace:
>> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
>> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
>> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
>> >   [<fffffe0000668678>] panic+0xf4/0x220
>> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
>> >   [<fffffe000018edd8>] free_pages+0x50/0x88
>> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
>> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
>> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
>> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
>> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
>> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
>> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
>> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
>> >
>> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
>> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
>> > happens because a pud_huge() test in unmap_range() returns true when it
>> > should always be false with 2-level pages tables used by 64k pages.
>> > This patch removes support for huge puds if 2-level pagetables are
>> > being used.
>>
>> Hi Mark,
>> I'm still catching up with myself, sorry  (was off sick for a couple
>> of days)...
>>
>> I thought unmap_range was going to be changed?
>> Does the following help things?
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html
>
> No, I get the same BUG. Regardless, pud_huge() should always return
> false for 2-level pagetables, right?

Okay, thanks for giving that a go.

Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
The patch looks sound now, but checking for a folded pmd may run into
problems if/when we get to 3-levels and 64K pages in future.

Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
bit more robust?

Cheers,
Mark Salter May 15, 2014, 6:39 p.m. UTC | #4
On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote:
> On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> >> > The following happens when trying to run a kvm guest on a kernel
> >> > configured for 64k pages. This doesn't happen with 4k pages:
> >> >
> >> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
> >> >   Kernel panic - not syncing: BUG!
> >> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
> >> >   Call trace:
> >> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
> >> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
> >> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
> >> >   [<fffffe0000668678>] panic+0xf4/0x220
> >> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
> >> >   [<fffffe000018edd8>] free_pages+0x50/0x88
> >> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
> >> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
> >> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
> >> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
> >> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
> >> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
> >> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
> >> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> >> >
> >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> >> > happens because a pud_huge() test in unmap_range() returns true when it
> >> > should always be false with 2-level pages tables used by 64k pages.
> >> > This patch removes support for huge puds if 2-level pagetables are
> >> > being used.
> >>
> >> Hi Mark,
> >> I'm still catching up with myself, sorry  (was off sick for a couple
> >> of days)...
> >>
> >> I thought unmap_range was going to be changed?
> >> Does the following help things?
> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html
> >
> > No, I get the same BUG. Regardless, pud_huge() should always return
> > false for 2-level pagetables, right?
> 
> Okay, thanks for giving that a go.
> 
> Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
> The patch looks sound now, but checking for a folded pmd may run into
> problems if/when we get to 3-levels and 64K pages in future.
> 
> Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
> bit more robust?
> 

I don't think testing based on granule size is generally correct either.
Maybe support for 3-level page tables with 64k granule gets added as an
option. That would break the pagesize based test. With a folded pmd, we
know there is no pud, so pud_huge() should always be false.
Catalin Marinas May 16, 2014, 9:51 a.m. UTC | #5
On Thu, May 15, 2014 at 07:39:17PM +0100, Mark Salter wrote:
> On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote:
> > On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> > > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> > >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> > >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> > >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> > >> > happens because a pud_huge() test in unmap_range() returns true when it
> > >> > should always be false with 2-level pages tables used by 64k pages.
> > >> > This patch removes support for huge puds if 2-level pagetables are
> > >> > being used.
[...]
> > Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
> > The patch looks sound now, but checking for a folded pmd may run into
> > problems if/when we get to 3-levels and 64K pages in future.
> > 
> > Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
> > bit more robust?
> 
> I don't think testing based on granule size is generally correct either.
> Maybe support for 3-level page tables with 64k granule gets added as an
> option. That would break the pagesize based test. With a folded pmd, we
> know there is no pud, so pud_huge() should always be false.

I agree, pud_huge() should be false in the same way we define
pud_present() to be 1 when __PGTABLE_PMD_FOLDED. The *_huge() macros
aren't covered by the generic headers unfortunately (some clean-up would
be useful at some point but for now this patch is fine).
Catalin Marinas May 16, 2014, 10:04 a.m. UTC | #6
On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5e9aec3..9bed38f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
>  
>  int pud_huge(pud_t pud)
>  {
> +#ifndef __PAGETABLE_PMD_FOLDED
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
> +#else
> +	return 0;
> +#endif
>  }
>  
>  int pmd_huge_support(void)
> @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
>  	unsigned long ps = memparse(opt, &opt);
>  	if (ps == PMD_SIZE) {
>  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> +#ifndef __PAGETABLE_PMD_FOLDED
>  	} else if (ps == PUD_SIZE) {
>  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> +#endif

Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
#ifndef here? Maybe the compiler is smart enough to remove it but it's
not on a critical path anyway, so I wouldn't bother.
Mark Salter May 16, 2014, 3:54 p.m. UTC | #7
On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote:
> On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 5e9aec3..9bed38f 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
> >  
> >  int pud_huge(pud_t pud)
> >  {
> > +#ifndef __PAGETABLE_PMD_FOLDED
> >  	return !(pud_val(pud) & PUD_TABLE_BIT);
> > +#else
> > +	return 0;
> > +#endif
> >  }
> >  
> >  int pmd_huge_support(void)
> > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
> >  	unsigned long ps = memparse(opt, &opt);
> >  	if (ps == PMD_SIZE) {
> >  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> >  	} else if (ps == PUD_SIZE) {
> >  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> > +#endif
> 
> Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
> #ifndef here? Maybe the compiler is smart enough to remove it but it's
> not on a critical path anyway, so I wouldn't bother.
> 

Yes, I think it would remove it. In any case, one less ifdef would be
a good thing.
Catalin Marinas May 16, 2014, 4:20 p.m. UTC | #8
On Fri, May 16, 2014 at 04:54:11PM +0100, Mark Salter wrote:
> On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote:
> > On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > index 5e9aec3..9bed38f 100644
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
> > >  
> > >  int pud_huge(pud_t pud)
> > >  {
> > > +#ifndef __PAGETABLE_PMD_FOLDED
> > >  	return !(pud_val(pud) & PUD_TABLE_BIT);
> > > +#else
> > > +	return 0;
> > > +#endif
> > >  }
> > >  
> > >  int pmd_huge_support(void)
> > > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
> > >  	unsigned long ps = memparse(opt, &opt);
> > >  	if (ps == PMD_SIZE) {
> > >  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> > > +#ifndef __PAGETABLE_PMD_FOLDED
> > >  	} else if (ps == PUD_SIZE) {
> > >  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> > > +#endif
> > 
> > Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
> > #ifndef here? Maybe the compiler is smart enough to remove it but it's
> > not on a critical path anyway, so I wouldn't bother.
> 
> Yes, I think it would remove it. In any case, one less ifdef would be
> a good thing.

I merged this patch and dropped the last #ifndef.

I still have doubts about the kvm code calling put_page more than
necessary, especially since pud == pmd and the loop continues after
pud_huge() returns true, but your patch looks harmless.

Unless Steve has any objection, I'll push it to mainline. I also added
Cc: stable # v3.11+

Thanks.
diff mbox

Patch

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 5e9aec3..9bed38f 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -51,7 +51,11 @@  int pmd_huge(pmd_t pmd)
 
 int pud_huge(pud_t pud)
 {
+#ifndef __PAGETABLE_PMD_FOLDED
 	return !(pud_val(pud) & PUD_TABLE_BIT);
+#else
+	return 0;
+#endif
 }
 
 int pmd_huge_support(void)
@@ -64,8 +68,10 @@  static __init int setup_hugepagesz(char *opt)
 	unsigned long ps = memparse(opt, &opt);
 	if (ps == PMD_SIZE) {
 		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
+#ifndef __PAGETABLE_PMD_FOLDED
 	} else if (ps == PUD_SIZE) {
 		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
+#endif
 	} else {
 		pr_err("hugepagesz: Unsupported page size %lu M\n", ps >> 20);
 		return 0;